xref: /netbsd-src/external/bsd/zstd/dist/doc/decompressor_permissive.md (revision 3117ece4fc4a4ca4489ba793710b60b0d26bab6c)
1*3117ece4SchristosDecompressor Permissiveness to Invalid Data
2*3117ece4Schristos===========================================
3*3117ece4Schristos
4*3117ece4SchristosThis document describes the behavior of the reference decompressor in cases
5*3117ece4Schristoswhere it accepts formally invalid data instead of reporting an error.
6*3117ece4Schristos
7*3117ece4SchristosWhile the reference decompressor *must* decode any compliant frame following
8*3117ece4Schristosthe specification, its ability to detect erroneous data is on a best effort
9*3117ece4Schristosbasis: the decoder may accept input data that would be formally invalid,
10*3117ece4Schristoswhen it causes no risk to the decoder, and which detection would cost too much
11*3117ece4Schristoscomplexity or speed regression.
12*3117ece4Schristos
13*3117ece4SchristosIn practice, the vast majority of invalid data are detected, if only because
14*3117ece4Schristosmany corruption events are dangerous for the decoder process (such as
15*3117ece4Schristosrequesting an out-of-bound memory access) and many more are easy to check.
16*3117ece4Schristos
17*3117ece4SchristosThis document lists a few known cases where invalid data was formerly accepted
18*3117ece4Schristosby the decoder, and what has changed since.
19*3117ece4Schristos
20*3117ece4Schristos
21*3117ece4SchristosOffset == 0
22*3117ece4Schristos-----------
23*3117ece4Schristos
24*3117ece4Schristos**Last affected version**: v1.5.5
25*3117ece4Schristos
26*3117ece4Schristos**Produced by the reference compressor**: No
27*3117ece4Schristos
28*3117ece4Schristos**Example Frame**: `28b5 2ffd 0000 4500 0008 0002 002f 430b ae`
29*3117ece4Schristos
30*3117ece4SchristosIf a sequence is decoded with `literals_length = 0` and `offset_value = 3`
31*3117ece4Schristoswhile `Repeated_Offset_1 = 1`, the computed offset will be `0`, which is
32*3117ece4Schristosinvalid.
33*3117ece4Schristos
34*3117ece4SchristosThe reference decompressor up to v1.5.5 processes this case as if the computed
35*3117ece4Schristosoffset was `1`, including inserting `1` into the repeated offset list.
36*3117ece4SchristosThis prevents the output buffer from remaining uninitialized, thus denying a
37*3117ece4Schristospotential attack vector from an untrusted source.
38*3117ece4SchristosHowever, in the rare case where this scenario would be the outcome of a
39*3117ece4Schristostransmission or storage error, the decoder relies on the checksum to detect
40*3117ece4Schristosthe error.
41*3117ece4Schristos
42*3117ece4SchristosIn newer versions, this case is always detected and reported as a corruption error.
43*3117ece4Schristos
44*3117ece4Schristos
45*3117ece4SchristosNon-zeroes reserved bits
46*3117ece4Schristos------------------------
47*3117ece4Schristos
48*3117ece4Schristos**Last affected version**: v1.5.5
49*3117ece4Schristos
50*3117ece4Schristos**Produced by the reference compressor**: No
51*3117ece4Schristos
52*3117ece4SchristosThe Sequences section of each block has a header, and one of its elements is a
53*3117ece4Schristosbyte, which describes the compression mode of each symbol.
54*3117ece4SchristosThis byte contains 2 reserved bits which must be set to zero.
55*3117ece4Schristos
56*3117ece4SchristosThe reference decompressor up to v1.5.5 just ignores these 2 bits.
57*3117ece4SchristosThis behavior has no consequence for the rest of the frame decoding process.
58*3117ece4Schristos
59*3117ece4SchristosIn newer versions, the 2 reserved bits are actively checked for value zero,
60*3117ece4Schristosand the decoder reports a corruption error if they are not.
61