xref: /netbsd-src/external/bsd/zstd/dist/doc/decompressor_errata.md (revision 3117ece4fc4a4ca4489ba793710b60b0d26bab6c)
1*3117ece4SchristosDecompressor Errata
2*3117ece4Schristos===================
3*3117ece4Schristos
4*3117ece4SchristosThis document captures known decompressor bugs, where the decompressor rejects a valid zstd frame.
5*3117ece4SchristosEach entry will contain:
6*3117ece4Schristos1. The last affected decompressor versions.
7*3117ece4Schristos2. The decompressor components affected.
8*3117ece4Schristos2. Whether the compressed frame could ever be produced by the reference compressor.
9*3117ece4Schristos3. An example frame (hexadecimal string when it can be short enough, link to golden file otherwise)
10*3117ece4Schristos4. A description of the bug.
11*3117ece4Schristos
12*3117ece4SchristosThe document is in reverse chronological order, with the bugs that affect the most recent zstd decompressor versions listed first.
13*3117ece4Schristos
14*3117ece4Schristos
15*3117ece4SchristosNo sequence using the 2-bytes format
16*3117ece4Schristos------------------------------------------------
17*3117ece4Schristos
18*3117ece4Schristos**Last affected version**: v1.5.5
19*3117ece4Schristos
20*3117ece4Schristos**Affected decompressor component(s)**: Library & CLI
21*3117ece4Schristos
22*3117ece4Schristos**Produced by the reference compressor**: No
23*3117ece4Schristos
24*3117ece4Schristos**Example Frame**: see zstd/tests/golden-decompression/zeroSeq_2B.zst
25*3117ece4Schristos
26*3117ece4SchristosThe zstd decoder incorrectly expects FSE tables when there are 0 sequences present in the block
27*3117ece4Schristosif the value 0 is encoded using the 2-bytes format.
28*3117ece4SchristosInstead, it should immediately end the sequence section, and move on to next block.
29*3117ece4Schristos
30*3117ece4SchristosThis situation was never generated by the reference compressor,
31*3117ece4Schristosbecause representing 0 sequences with the 2-bytes format is inefficient
32*3117ece4Schristos(the 1-byte format is always used in this case).
33*3117ece4Schristos
34*3117ece4Schristos
35*3117ece4SchristosCompressed block with a size of exactly 128 KB
36*3117ece4Schristos------------------------------------------------
37*3117ece4Schristos
38*3117ece4Schristos**Last affected version**: v1.5.2
39*3117ece4Schristos
40*3117ece4Schristos**Affected decompressor component(s)**: Library & CLI
41*3117ece4Schristos
42*3117ece4Schristos**Produced by the reference compressor**: No
43*3117ece4Schristos
44*3117ece4Schristos**Example Frame**: see zstd/tests/golden-decompression/block-128k.zst
45*3117ece4Schristos
46*3117ece4SchristosThe zstd decoder incorrectly rejected blocks of type `Compressed_Block` when their size was exactly 128 KB.
47*3117ece4SchristosNote that `128 KB - 1` was accepted, and `128 KB + 1` is forbidden by the spec.
48*3117ece4Schristos
49*3117ece4SchristosThis type of block was never generated by the reference compressor.
50*3117ece4Schristos
51*3117ece4SchristosThese blocks used to be disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689).
52*3117ece4Schristos
53*3117ece4Schristos> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).
54*3117ece4Schristos
55*3117ece4Schristos
56*3117ece4SchristosCompressed block with 0 literals and 0 sequences
57*3117ece4Schristos------------------------------------------------
58*3117ece4Schristos
59*3117ece4Schristos**Last affected version**: v1.5.2
60*3117ece4Schristos
61*3117ece4Schristos**Affected decompressor component(s)**: Library & CLI
62*3117ece4Schristos
63*3117ece4Schristos**Produced by the reference compressor**: No
64*3117ece4Schristos
65*3117ece4Schristos**Example Frame**: `28b5 2ffd 2000 1500 0000 00`
66*3117ece4Schristos
67*3117ece4SchristosThe zstd decoder incorrectly rejected blocks of type `Compressed_Block` that encodes literals as `Raw_Literals_Block` with no literals, and has no sequences.
68*3117ece4Schristos
69*3117ece4SchristosThis type of block was never generated by the reference compressor.
70*3117ece4Schristos
71*3117ece4SchristosAdditionally, these blocks were disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689).
72*3117ece4Schristos
73*3117ece4Schristos> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).
74*3117ece4Schristos
75*3117ece4Schristos
76*3117ece4SchristosFirst block is RLE block
77*3117ece4Schristos------------------------
78*3117ece4Schristos
79*3117ece4Schristos**Last affected version**: v1.4.3
80*3117ece4Schristos
81*3117ece4Schristos**Affected decompressor component(s)**: CLI only
82*3117ece4Schristos
83*3117ece4Schristos**Produced by the reference compressor**: No
84*3117ece4Schristos
85*3117ece4Schristos**Example Frame**: `28b5 2ffd a001 0002 0002 0010 000b 0000 00`
86*3117ece4Schristos
87*3117ece4SchristosThe zstd CLI decompressor rejected cases where the first block was an RLE block whose `Block_Size` is 131072, and the frame contains more than one block.
88*3117ece4SchristosThis only affected the zstd CLI, and not the library.
89*3117ece4Schristos
90*3117ece4SchristosThe example is an RLE block with 131072 bytes, followed by a second RLE block with 1 byte.
91*3117ece4Schristos
92*3117ece4SchristosThe compressor currently works around this limitation by explicitly avoiding producing RLE blocks as the first
93*3117ece4Schristosblock.
94*3117ece4Schristos
95*3117ece4Schristoshttps://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L3527-L3535
96*3117ece4Schristos
97*3117ece4Schristos
98*3117ece4SchristosTiny FSE Table & Block
99*3117ece4Schristos----------------------
100*3117ece4Schristos
101*3117ece4Schristos**Last affected version**: v1.3.4
102*3117ece4Schristos
103*3117ece4Schristos**Affected decompressor component(s)**: Library & CLI
104*3117ece4Schristos
105*3117ece4Schristos**Produced by the reference compressor**: Possibly until version v1.3.4, but probably never
106*3117ece4Schristos
107*3117ece4Schristos**Example Frame**: `28b5 2ffd 2027 c500 0080 f3f1 f0ec ebc6 c5c7 f09d 4300 0000 e0e0 0658 0100 603e 52`
108*3117ece4Schristos
109*3117ece4SchristosThe zstd library rejected blocks of type `Compressed_Block` whose offset of the last table with type `FSE_Compressed_Mode` was less than 4 bytes from the end of the block.
110*3117ece4Schristos
111*3117ece4SchristosIn more depth, let `Last_Table_Offset` be the offset in the compressed block (excluding the header) that
112*3117ece4Schristosthe last table with type `FSE_Compressed_Mode` started. If `Block_Content - Last_Table_Offset < 4` then
113*3117ece4Schristosthe buggy zstd decompressor would reject the block. This occurs when the last serialized table is 2 bytes
114*3117ece4Schristosand the bitstream size is 1 byte.
115*3117ece4Schristos
116*3117ece4SchristosFor example:
117*3117ece4Schristos* There is 1 sequence in the block
118*3117ece4Schristos* `Literals_Lengths_Mode` is `FSE_Compressed_Mode` & the serialized table size is 2 bytes
119*3117ece4Schristos* `Offsets_Mode` is `Predefined_Mode`
120*3117ece4Schristos* `Match_Lengths_Mode` is `Predefined_Mode`
121*3117ece4Schristos* The bitstream is 1 byte. E.g. there is only one sequence and it fits in 1 byte.
122*3117ece4Schristos
123*3117ece4SchristosThe total `Block_Content` is `5` bytes, and `Last_Table_Offset` is `2`.
124*3117ece4Schristos
125*3117ece4SchristosSee the compressor workaround code:
126*3117ece4Schristos
127*3117ece4Schristoshttps://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L2667-L2682
128*3117ece4Schristos
129*3117ece4SchristosMagicless format
130*3117ece4Schristos----------------------
131*3117ece4Schristos
132*3117ece4Schristos**Last affected version**: v1.5.5
133*3117ece4Schristos
134*3117ece4Schristos**Affected decompressor component(s)**: Library
135*3117ece4Schristos
136*3117ece4Schristos**Produced by the reference compressor**: Yes (example: https://gist.github.com/embg/9940726094f4cf2cef162cffe9319232)
137*3117ece4Schristos
138*3117ece4Schristos**Example Frame**: `27 b5 2f fd 00 03 19 00 00 66 6f 6f 3f ba c4 59`
139*3117ece4Schristos
140*3117ece4Schristosv1.5.6 fixes several bugs in which the magicless-format decoder rejects valid frames.
141*3117ece4SchristosThese include but are not limited to:
142*3117ece4Schristos* Valid frames that happen to begin with a legacy magic number (little-endian)
143*3117ece4Schristos* Valid frames that happen to begin with a skippable magic number (little-endian)
144*3117ece4Schristos
145*3117ece4SchristosIf you are affected by this issue and cannot update to v1.5.6 or later, there is a
146*3117ece4Schristosworkaround to recover affected data. Simply prepend the ZSTD magic number
147*3117ece4Schristos`0xFD2FB528` (little-endian) to your data and decompress using the standard-format
148*3117ece4Schristosdecoder.
149