1*3117ece4SchristosDecompressor Errata 2*3117ece4Schristos=================== 3*3117ece4Schristos 4*3117ece4SchristosThis document captures known decompressor bugs, where the decompressor rejects a valid zstd frame. 5*3117ece4SchristosEach entry will contain: 6*3117ece4Schristos1. The last affected decompressor versions. 7*3117ece4Schristos2. The decompressor components affected. 8*3117ece4Schristos2. Whether the compressed frame could ever be produced by the reference compressor. 9*3117ece4Schristos3. An example frame (hexadecimal string when it can be short enough, link to golden file otherwise) 10*3117ece4Schristos4. A description of the bug. 11*3117ece4Schristos 12*3117ece4SchristosThe document is in reverse chronological order, with the bugs that affect the most recent zstd decompressor versions listed first. 13*3117ece4Schristos 14*3117ece4Schristos 15*3117ece4SchristosNo sequence using the 2-bytes format 16*3117ece4Schristos------------------------------------------------ 17*3117ece4Schristos 18*3117ece4Schristos**Last affected version**: v1.5.5 19*3117ece4Schristos 20*3117ece4Schristos**Affected decompressor component(s)**: Library & CLI 21*3117ece4Schristos 22*3117ece4Schristos**Produced by the reference compressor**: No 23*3117ece4Schristos 24*3117ece4Schristos**Example Frame**: see zstd/tests/golden-decompression/zeroSeq_2B.zst 25*3117ece4Schristos 26*3117ece4SchristosThe zstd decoder incorrectly expects FSE tables when there are 0 sequences present in the block 27*3117ece4Schristosif the value 0 is encoded using the 2-bytes format. 28*3117ece4SchristosInstead, it should immediately end the sequence section, and move on to next block. 29*3117ece4Schristos 30*3117ece4SchristosThis situation was never generated by the reference compressor, 31*3117ece4Schristosbecause representing 0 sequences with the 2-bytes format is inefficient 32*3117ece4Schristos(the 1-byte format is always used in this case). 33*3117ece4Schristos 34*3117ece4Schristos 35*3117ece4SchristosCompressed block with a size of exactly 128 KB 36*3117ece4Schristos------------------------------------------------ 37*3117ece4Schristos 38*3117ece4Schristos**Last affected version**: v1.5.2 39*3117ece4Schristos 40*3117ece4Schristos**Affected decompressor component(s)**: Library & CLI 41*3117ece4Schristos 42*3117ece4Schristos**Produced by the reference compressor**: No 43*3117ece4Schristos 44*3117ece4Schristos**Example Frame**: see zstd/tests/golden-decompression/block-128k.zst 45*3117ece4Schristos 46*3117ece4SchristosThe zstd decoder incorrectly rejected blocks of type `Compressed_Block` when their size was exactly 128 KB. 47*3117ece4SchristosNote that `128 KB - 1` was accepted, and `128 KB + 1` is forbidden by the spec. 48*3117ece4Schristos 49*3117ece4SchristosThis type of block was never generated by the reference compressor. 50*3117ece4Schristos 51*3117ece4SchristosThese blocks used to be disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689). 52*3117ece4Schristos 53*3117ece4Schristos> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block). 54*3117ece4Schristos 55*3117ece4Schristos 56*3117ece4SchristosCompressed block with 0 literals and 0 sequences 57*3117ece4Schristos------------------------------------------------ 58*3117ece4Schristos 59*3117ece4Schristos**Last affected version**: v1.5.2 60*3117ece4Schristos 61*3117ece4Schristos**Affected decompressor component(s)**: Library & CLI 62*3117ece4Schristos 63*3117ece4Schristos**Produced by the reference compressor**: No 64*3117ece4Schristos 65*3117ece4Schristos**Example Frame**: `28b5 2ffd 2000 1500 0000 00` 66*3117ece4Schristos 67*3117ece4SchristosThe zstd decoder incorrectly rejected blocks of type `Compressed_Block` that encodes literals as `Raw_Literals_Block` with no literals, and has no sequences. 68*3117ece4Schristos 69*3117ece4SchristosThis type of block was never generated by the reference compressor. 70*3117ece4Schristos 71*3117ece4SchristosAdditionally, these blocks were disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689). 72*3117ece4Schristos 73*3117ece4Schristos> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block). 74*3117ece4Schristos 75*3117ece4Schristos 76*3117ece4SchristosFirst block is RLE block 77*3117ece4Schristos------------------------ 78*3117ece4Schristos 79*3117ece4Schristos**Last affected version**: v1.4.3 80*3117ece4Schristos 81*3117ece4Schristos**Affected decompressor component(s)**: CLI only 82*3117ece4Schristos 83*3117ece4Schristos**Produced by the reference compressor**: No 84*3117ece4Schristos 85*3117ece4Schristos**Example Frame**: `28b5 2ffd a001 0002 0002 0010 000b 0000 00` 86*3117ece4Schristos 87*3117ece4SchristosThe zstd CLI decompressor rejected cases where the first block was an RLE block whose `Block_Size` is 131072, and the frame contains more than one block. 88*3117ece4SchristosThis only affected the zstd CLI, and not the library. 89*3117ece4Schristos 90*3117ece4SchristosThe example is an RLE block with 131072 bytes, followed by a second RLE block with 1 byte. 91*3117ece4Schristos 92*3117ece4SchristosThe compressor currently works around this limitation by explicitly avoiding producing RLE blocks as the first 93*3117ece4Schristosblock. 94*3117ece4Schristos 95*3117ece4Schristoshttps://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L3527-L3535 96*3117ece4Schristos 97*3117ece4Schristos 98*3117ece4SchristosTiny FSE Table & Block 99*3117ece4Schristos---------------------- 100*3117ece4Schristos 101*3117ece4Schristos**Last affected version**: v1.3.4 102*3117ece4Schristos 103*3117ece4Schristos**Affected decompressor component(s)**: Library & CLI 104*3117ece4Schristos 105*3117ece4Schristos**Produced by the reference compressor**: Possibly until version v1.3.4, but probably never 106*3117ece4Schristos 107*3117ece4Schristos**Example Frame**: `28b5 2ffd 2027 c500 0080 f3f1 f0ec ebc6 c5c7 f09d 4300 0000 e0e0 0658 0100 603e 52` 108*3117ece4Schristos 109*3117ece4SchristosThe zstd library rejected blocks of type `Compressed_Block` whose offset of the last table with type `FSE_Compressed_Mode` was less than 4 bytes from the end of the block. 110*3117ece4Schristos 111*3117ece4SchristosIn more depth, let `Last_Table_Offset` be the offset in the compressed block (excluding the header) that 112*3117ece4Schristosthe last table with type `FSE_Compressed_Mode` started. If `Block_Content - Last_Table_Offset < 4` then 113*3117ece4Schristosthe buggy zstd decompressor would reject the block. This occurs when the last serialized table is 2 bytes 114*3117ece4Schristosand the bitstream size is 1 byte. 115*3117ece4Schristos 116*3117ece4SchristosFor example: 117*3117ece4Schristos* There is 1 sequence in the block 118*3117ece4Schristos* `Literals_Lengths_Mode` is `FSE_Compressed_Mode` & the serialized table size is 2 bytes 119*3117ece4Schristos* `Offsets_Mode` is `Predefined_Mode` 120*3117ece4Schristos* `Match_Lengths_Mode` is `Predefined_Mode` 121*3117ece4Schristos* The bitstream is 1 byte. E.g. there is only one sequence and it fits in 1 byte. 122*3117ece4Schristos 123*3117ece4SchristosThe total `Block_Content` is `5` bytes, and `Last_Table_Offset` is `2`. 124*3117ece4Schristos 125*3117ece4SchristosSee the compressor workaround code: 126*3117ece4Schristos 127*3117ece4Schristoshttps://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L2667-L2682 128*3117ece4Schristos 129*3117ece4SchristosMagicless format 130*3117ece4Schristos---------------------- 131*3117ece4Schristos 132*3117ece4Schristos**Last affected version**: v1.5.5 133*3117ece4Schristos 134*3117ece4Schristos**Affected decompressor component(s)**: Library 135*3117ece4Schristos 136*3117ece4Schristos**Produced by the reference compressor**: Yes (example: https://gist.github.com/embg/9940726094f4cf2cef162cffe9319232) 137*3117ece4Schristos 138*3117ece4Schristos**Example Frame**: `27 b5 2f fd 00 03 19 00 00 66 6f 6f 3f ba c4 59` 139*3117ece4Schristos 140*3117ece4Schristosv1.5.6 fixes several bugs in which the magicless-format decoder rejects valid frames. 141*3117ece4SchristosThese include but are not limited to: 142*3117ece4Schristos* Valid frames that happen to begin with a legacy magic number (little-endian) 143*3117ece4Schristos* Valid frames that happen to begin with a skippable magic number (little-endian) 144*3117ece4Schristos 145*3117ece4SchristosIf you are affected by this issue and cannot update to v1.5.6 or later, there is a 146*3117ece4Schristosworkaround to recover affected data. Simply prepend the ZSTD magic number 147*3117ece4Schristos`0xFD2FB528` (little-endian) to your data and decompress using the standard-format 148*3117ece4Schristosdecoder. 149