xref: /netbsd-src/external/bsd/zstd/dist/README.md (revision 3117ece4fc4a4ca4489ba793710b60b0d26bab6c)
1*3117ece4Schristos<p align="center"><img src="https://raw.githubusercontent.com/facebook/zstd/dev/doc/images/zstd_logo86.png" alt="Zstandard"></p>
2*3117ece4Schristos
3*3117ece4Schristos__Zstandard__, or `zstd` as short version, is a fast lossless compression algorithm,
4*3117ece4Schristostargeting real-time compression scenarios at zlib-level and better compression ratios.
5*3117ece4SchristosIt's backed by a very fast entropy stage, provided by [Huff0 and FSE library](https://github.com/Cyan4973/FiniteStateEntropy).
6*3117ece4Schristos
7*3117ece4SchristosZstandard's format is stable and documented in [RFC8878](https://datatracker.ietf.org/doc/html/rfc8878). Multiple independent implementations are already available.
8*3117ece4SchristosThis repository represents the reference implementation, provided as an open-source dual [BSD](LICENSE) OR [GPLv2](COPYING) licensed **C** library,
9*3117ece4Schristosand a command line utility producing and decoding `.zst`, `.gz`, `.xz` and `.lz4` files.
10*3117ece4SchristosShould your project require another programming language,
11*3117ece4Schristosa list of known ports and bindings is provided on [Zstandard homepage](https://facebook.github.io/zstd/#other-languages).
12*3117ece4Schristos
13*3117ece4Schristos**Development branch status:**
14*3117ece4Schristos
15*3117ece4Schristos[![Build Status][travisDevBadge]][travisLink]
16*3117ece4Schristos[![Build status][CircleDevBadge]][CircleLink]
17*3117ece4Schristos[![Build status][CirrusDevBadge]][CirrusLink]
18*3117ece4Schristos[![Fuzzing Status][OSSFuzzBadge]][OSSFuzzLink]
19*3117ece4Schristos
20*3117ece4Schristos[travisDevBadge]: https://api.travis-ci.com/facebook/zstd.svg?branch=dev "Continuous Integration test suite"
21*3117ece4Schristos[travisLink]: https://travis-ci.com/facebook/zstd
22*3117ece4Schristos[CircleDevBadge]: https://circleci.com/gh/facebook/zstd/tree/dev.svg?style=shield "Short test suite"
23*3117ece4Schristos[CircleLink]: https://circleci.com/gh/facebook/zstd
24*3117ece4Schristos[CirrusDevBadge]: https://api.cirrus-ci.com/github/facebook/zstd.svg?branch=dev
25*3117ece4Schristos[CirrusLink]: https://cirrus-ci.com/github/facebook/zstd
26*3117ece4Schristos[OSSFuzzBadge]: https://oss-fuzz-build-logs.storage.googleapis.com/badges/zstd.svg
27*3117ece4Schristos[OSSFuzzLink]: https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:zstd
28*3117ece4Schristos
29*3117ece4Schristos## Benchmarks
30*3117ece4Schristos
31*3117ece4SchristosFor reference, several fast compression algorithms were tested and compared
32*3117ece4Schristoson a desktop running Ubuntu 20.04 (`Linux 5.11.0-41-generic`),
33*3117ece4Schristoswith a Core i7-9700K CPU @ 4.9GHz,
34*3117ece4Schristosusing [lzbench], an open-source in-memory benchmark by @inikep
35*3117ece4Schristoscompiled with [gcc] 9.3.0,
36*3117ece4Schristoson the [Silesia compression corpus].
37*3117ece4Schristos
38*3117ece4Schristos[lzbench]: https://github.com/inikep/lzbench
39*3117ece4Schristos[Silesia compression corpus]: https://sun.aei.polsl.pl//~sdeor/index.php?page=silesia
40*3117ece4Schristos[gcc]: https://gcc.gnu.org/
41*3117ece4Schristos
42*3117ece4Schristos| Compressor name         | Ratio | Compression| Decompress.|
43*3117ece4Schristos| ---------------         | ------| -----------| ---------- |
44*3117ece4Schristos| **zstd 1.5.1 -1**       | 2.887 |   530 MB/s |  1700 MB/s |
45*3117ece4Schristos| [zlib] 1.2.11 -1        | 2.743 |    95 MB/s |   400 MB/s |
46*3117ece4Schristos| brotli 1.0.9 -0         | 2.702 |   395 MB/s |   450 MB/s |
47*3117ece4Schristos| **zstd 1.5.1 --fast=1** | 2.437 |   600 MB/s |  2150 MB/s |
48*3117ece4Schristos| **zstd 1.5.1 --fast=3** | 2.239 |   670 MB/s |  2250 MB/s |
49*3117ece4Schristos| quicklz 1.5.0 -1        | 2.238 |   540 MB/s |   760 MB/s |
50*3117ece4Schristos| **zstd 1.5.1 --fast=4** | 2.148 |   710 MB/s |  2300 MB/s |
51*3117ece4Schristos| lzo1x 2.10 -1           | 2.106 |   660 MB/s |   845 MB/s |
52*3117ece4Schristos| [lz4] 1.9.3             | 2.101 |   740 MB/s |  4500 MB/s |
53*3117ece4Schristos| lzf 3.6 -1              | 2.077 |   410 MB/s |   830 MB/s |
54*3117ece4Schristos| snappy 1.1.9            | 2.073 |   550 MB/s |  1750 MB/s |
55*3117ece4Schristos
56*3117ece4Schristos[zlib]: https://www.zlib.net/
57*3117ece4Schristos[lz4]: https://lz4.github.io/lz4/
58*3117ece4Schristos
59*3117ece4SchristosThe negative compression levels, specified with `--fast=#`,
60*3117ece4Schristosoffer faster compression and decompression speed
61*3117ece4Schristosat the cost of compression ratio (compared to level 1).
62*3117ece4Schristos
63*3117ece4SchristosZstd can also offer stronger compression ratios at the cost of compression speed.
64*3117ece4SchristosSpeed vs Compression trade-off is configurable by small increments.
65*3117ece4SchristosDecompression speed is preserved and remains roughly the same at all settings,
66*3117ece4Schristosa property shared by most LZ compression algorithms, such as [zlib] or lzma.
67*3117ece4Schristos
68*3117ece4SchristosThe following tests were run
69*3117ece4Schristoson a server running Linux Debian (`Linux version 4.14.0-3-amd64`)
70*3117ece4Schristoswith a Core i7-6700K CPU @ 4.0GHz,
71*3117ece4Schristosusing [lzbench], an open-source in-memory benchmark by @inikep
72*3117ece4Schristoscompiled with [gcc] 7.3.0,
73*3117ece4Schristoson the [Silesia compression corpus].
74*3117ece4Schristos
75*3117ece4SchristosCompression Speed vs Ratio | Decompression Speed
76*3117ece4Schristos---------------------------|--------------------
77*3117ece4Schristos![Compression Speed vs Ratio](doc/images/CSpeed2.png "Compression Speed vs Ratio") | ![Decompression Speed](doc/images/DSpeed3.png "Decompression Speed")
78*3117ece4Schristos
79*3117ece4SchristosA few other algorithms can produce higher compression ratios at slower speeds, falling outside of the graph.
80*3117ece4SchristosFor a larger picture including slow modes, [click on this link](doc/images/DCspeed5.png).
81*3117ece4Schristos
82*3117ece4Schristos
83*3117ece4Schristos## The case for Small Data compression
84*3117ece4Schristos
85*3117ece4SchristosPrevious charts provide results applicable to typical file and stream scenarios (several MB). Small data comes with different perspectives.
86*3117ece4Schristos
87*3117ece4SchristosThe smaller the amount of data to compress, the more difficult it is to compress. This problem is common to all compression algorithms, and reason is, compression algorithms learn from past data how to compress future data. But at the beginning of a new data set, there is no "past" to build upon.
88*3117ece4Schristos
89*3117ece4SchristosTo solve this situation, Zstd offers a __training mode__, which can be used to tune the algorithm for a selected type of data.
90*3117ece4SchristosTraining Zstandard is achieved by providing it with a few samples (one file per sample). The result of this training is stored in a file called "dictionary", which must be loaded before compression and decompression.
91*3117ece4SchristosUsing this dictionary, the compression ratio achievable on small data improves dramatically.
92*3117ece4Schristos
93*3117ece4SchristosThe following example uses the `github-users` [sample set](https://github.com/facebook/zstd/releases/tag/v1.1.3), created from [github public API](https://developer.github.com/v3/users/#get-all-users).
94*3117ece4SchristosIt consists of roughly 10K records weighing about 1KB each.
95*3117ece4Schristos
96*3117ece4SchristosCompression Ratio | Compression Speed | Decompression Speed
97*3117ece4Schristos------------------|-------------------|--------------------
98*3117ece4Schristos![Compression Ratio](doc/images/dict-cr.png "Compression Ratio") | ![Compression Speed](doc/images/dict-cs.png "Compression Speed") | ![Decompression Speed](doc/images/dict-ds.png "Decompression Speed")
99*3117ece4Schristos
100*3117ece4Schristos
101*3117ece4SchristosThese compression gains are achieved while simultaneously providing _faster_ compression and decompression speeds.
102*3117ece4Schristos
103*3117ece4SchristosTraining works if there is some correlation in a family of small data samples. The more data-specific a dictionary is, the more efficient it is (there is no _universal dictionary_).
104*3117ece4SchristosHence, deploying one dictionary per type of data will provide the greatest benefits.
105*3117ece4SchristosDictionary gains are mostly effective in the first few KB. Then, the compression algorithm will gradually use previously decoded content to better compress the rest of the file.
106*3117ece4Schristos
107*3117ece4Schristos### Dictionary compression How To:
108*3117ece4Schristos
109*3117ece4Schristos1. Create the dictionary
110*3117ece4Schristos
111*3117ece4Schristos   `zstd --train FullPathToTrainingSet/* -o dictionaryName`
112*3117ece4Schristos
113*3117ece4Schristos2. Compress with dictionary
114*3117ece4Schristos
115*3117ece4Schristos   `zstd -D dictionaryName FILE`
116*3117ece4Schristos
117*3117ece4Schristos3. Decompress with dictionary
118*3117ece4Schristos
119*3117ece4Schristos   `zstd -D dictionaryName --decompress FILE.zst`
120*3117ece4Schristos
121*3117ece4Schristos
122*3117ece4Schristos## Build instructions
123*3117ece4Schristos
124*3117ece4Schristos`make` is the officially maintained build system of this project.
125*3117ece4SchristosAll other build systems are "compatible" and 3rd-party maintained,
126*3117ece4Schristosthey may feature small differences in advanced options.
127*3117ece4SchristosWhen your system allows it, prefer using `make` to build `zstd` and `libzstd`.
128*3117ece4Schristos
129*3117ece4Schristos### Makefile
130*3117ece4Schristos
131*3117ece4SchristosIf your system is compatible with standard `make` (or `gmake`),
132*3117ece4Schristosinvoking `make` in root directory will generate `zstd` cli in root directory.
133*3117ece4SchristosIt will also create `libzstd` into `lib/`.
134*3117ece4Schristos
135*3117ece4SchristosOther available options include:
136*3117ece4Schristos- `make install` : create and install zstd cli, library and man pages
137*3117ece4Schristos- `make check` : create and run `zstd`, test its behavior on local platform
138*3117ece4Schristos
139*3117ece4SchristosThe `Makefile` follows the [GNU Standard Makefile conventions](https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html),
140*3117ece4Schristosallowing staged install, standard flags, directory variables and command variables.
141*3117ece4Schristos
142*3117ece4SchristosFor advanced use cases, specialized compilation flags which control binary generation
143*3117ece4Schristosare documented in [`lib/README.md`](lib/README.md#modular-build) for the `libzstd` library
144*3117ece4Schristosand in [`programs/README.md`](programs/README.md#compilation-variables) for the `zstd` CLI.
145*3117ece4Schristos
146*3117ece4Schristos### cmake
147*3117ece4Schristos
148*3117ece4SchristosA `cmake` project generator is provided within `build/cmake`.
149*3117ece4SchristosIt can generate Makefiles or other build scripts
150*3117ece4Schristosto create `zstd` binary, and `libzstd` dynamic and static libraries.
151*3117ece4Schristos
152*3117ece4SchristosBy default, `CMAKE_BUILD_TYPE` is set to `Release`.
153*3117ece4Schristos
154*3117ece4Schristos#### Support for Fat (Universal2) Output
155*3117ece4Schristos
156*3117ece4Schristos`zstd` can be built and installed with support for both Apple Silicon (M1/M2) as well as Intel by using CMake's Universal2 support.
157*3117ece4SchristosTo perform a Fat/Universal2 build and install use the following commands:
158*3117ece4Schristos
159*3117ece4Schristos```bash
160*3117ece4Schristoscmake -B build-cmake-debug -S build/cmake -G Ninja -DCMAKE_OSX_ARCHITECTURES="x86_64;x86_64h;arm64"
161*3117ece4Schristoscd build-cmake-debug
162*3117ece4Schristosninja
163*3117ece4Schristossudo ninja install
164*3117ece4Schristos```
165*3117ece4Schristos
166*3117ece4Schristos### Meson
167*3117ece4Schristos
168*3117ece4SchristosA Meson project is provided within [`build/meson`](build/meson). Follow
169*3117ece4Schristosbuild instructions in that directory.
170*3117ece4Schristos
171*3117ece4SchristosYou can also take a look at [`.travis.yml`](.travis.yml) file for an
172*3117ece4Schristosexample about how Meson is used to build this project.
173*3117ece4Schristos
174*3117ece4SchristosNote that default build type is **release**.
175*3117ece4Schristos
176*3117ece4Schristos### VCPKG
177*3117ece4SchristosYou can build and install zstd [vcpkg](https://github.com/Microsoft/vcpkg/) dependency manager:
178*3117ece4Schristos
179*3117ece4Schristos    git clone https://github.com/Microsoft/vcpkg.git
180*3117ece4Schristos    cd vcpkg
181*3117ece4Schristos    ./bootstrap-vcpkg.sh
182*3117ece4Schristos    ./vcpkg integrate install
183*3117ece4Schristos    ./vcpkg install zstd
184*3117ece4Schristos
185*3117ece4SchristosThe zstd port in vcpkg is kept up to date by Microsoft team members and community contributors.
186*3117ece4SchristosIf the version is out of date, please [create an issue or pull request](https://github.com/Microsoft/vcpkg) on the vcpkg repository.
187*3117ece4Schristos
188*3117ece4Schristos### Visual Studio (Windows)
189*3117ece4Schristos
190*3117ece4SchristosGoing into `build` directory, you will find additional possibilities:
191*3117ece4Schristos- Projects for Visual Studio 2005, 2008 and 2010.
192*3117ece4Schristos  + VS2010 project is compatible with VS2012, VS2013, VS2015 and VS2017.
193*3117ece4Schristos- Automated build scripts for Visual compiler by [@KrzysFR](https://github.com/KrzysFR), in `build/VS_scripts`,
194*3117ece4Schristos  which will build `zstd` cli and `libzstd` library without any need to open Visual Studio solution.
195*3117ece4Schristos
196*3117ece4Schristos### Buck
197*3117ece4Schristos
198*3117ece4SchristosYou can build the zstd binary via buck by executing: `buck build programs:zstd` from the root of the repo.
199*3117ece4SchristosThe output binary will be in `buck-out/gen/programs/`.
200*3117ece4Schristos
201*3117ece4Schristos### Bazel
202*3117ece4Schristos
203*3117ece4SchristosYou easily can integrate zstd into your Bazel project by using the module hosted on the [Bazel Central Repository](https://registry.bazel.build/modules/zstd).
204*3117ece4Schristos
205*3117ece4Schristos## Testing
206*3117ece4Schristos
207*3117ece4SchristosYou can run quick local smoke tests by running `make check`.
208*3117ece4SchristosIf you can't use `make`, execute the `playTest.sh` script from the `src/tests` directory.
209*3117ece4SchristosTwo env variables `$ZSTD_BIN` and `$DATAGEN_BIN` are needed for the test script to locate the `zstd` and `datagen` binary.
210*3117ece4SchristosFor information on CI testing, please refer to `TESTING.md`.
211*3117ece4Schristos
212*3117ece4Schristos## Status
213*3117ece4Schristos
214*3117ece4SchristosZstandard is currently deployed within Facebook and many other large cloud infrastructures.
215*3117ece4SchristosIt is run continuously to compress large amounts of data in multiple formats and use cases.
216*3117ece4SchristosZstandard is considered safe for production environments.
217*3117ece4Schristos
218*3117ece4Schristos## License
219*3117ece4Schristos
220*3117ece4SchristosZstandard is dual-licensed under [BSD](LICENSE) OR [GPLv2](COPYING).
221*3117ece4Schristos
222*3117ece4Schristos## Contributing
223*3117ece4Schristos
224*3117ece4SchristosThe `dev` branch is the one where all contributions are merged before reaching `release`.
225*3117ece4SchristosIf you plan to propose a patch, please commit into the `dev` branch, or its own feature branch.
226*3117ece4SchristosDirect commit to `release` are not permitted.
227*3117ece4SchristosFor more information, please read [CONTRIBUTING](CONTRIBUTING.md).
228