xref: /plan9-contrib/sys/src/cmd/lzip/README (revision 13d37d7716a3e781f408392d7869dff5927c6669)
1*13d37d77SDavid du ColombierDescription
2*13d37d77SDavid du Colombier
3*13d37d77SDavid du ColombierClzip is a C language version of lzip, fully compatible with lzip-1.4 or
4*13d37d77SDavid du Colombiernewer. As clzip is written in C, it may be easier to integrate in
5*13d37d77SDavid du Colombierapplications like package managers, embedded devices, or systems lacking
6*13d37d77SDavid du Colombiera C++ compiler.
7*13d37d77SDavid du Colombier
8*13d37d77SDavid du ColombierLzip is a lossless data compressor with a user interface similar to the
9*13d37d77SDavid du Colombierone of gzip or bzip2. Lzip can compress about as fast as gzip (lzip -0),
10*13d37d77SDavid du Colombieror compress most files more than bzip2 (lzip -9). Decompression speed is
11*13d37d77SDavid du Colombierintermediate between gzip and bzip2. Lzip is better than gzip and bzip2
12*13d37d77SDavid du Colombierfrom a data recovery perspective.
13*13d37d77SDavid du Colombier
14*13d37d77SDavid du ColombierThe lzip file format is designed for data sharing and long-term
15*13d37d77SDavid du Colombierarchiving, taking into account both data integrity and decoder
16*13d37d77SDavid du Colombieravailability:
17*13d37d77SDavid du Colombier
18*13d37d77SDavid du Colombier   * The lzip format provides very safe integrity checking and some data
19*13d37d77SDavid du Colombier     recovery means. The lziprecover program can repair bit-flip errors
20*13d37d77SDavid du Colombier     (one of the most common forms of data corruption) in lzip files,
21*13d37d77SDavid du Colombier     and provides data recovery capabilities, including error-checked
22*13d37d77SDavid du Colombier     merging of damaged copies of a file.
23*13d37d77SDavid du Colombier
24*13d37d77SDavid du Colombier   * The lzip format is as simple as possible (but not simpler). The
25*13d37d77SDavid du Colombier     lzip manual provides the source code of a simple decompressor along
26*13d37d77SDavid du Colombier     with a detailed explanation of how it works, so that with the only
27*13d37d77SDavid du Colombier     help of the lzip manual it would be possible for a digital
28*13d37d77SDavid du Colombier     archaeologist to extract the data from a lzip file long after
29*13d37d77SDavid du Colombier     quantum computers eventually render LZMA obsolete.
30*13d37d77SDavid du Colombier
31*13d37d77SDavid du Colombier   * Additionally the lzip reference implementation is copylefted, which
32*13d37d77SDavid du Colombier     guarantees that it will remain free forever.
33*13d37d77SDavid du Colombier
34*13d37d77SDavid du ColombierA nice feature of the lzip format is that a corrupt byte is easier to
35*13d37d77SDavid du Colombierrepair the nearer it is from the beginning of the file. Therefore, with
36*13d37d77SDavid du Colombierthe help of lziprecover, losing an entire archive just because of a
37*13d37d77SDavid du Colombiercorrupt byte near the beginning is a thing of the past.
38*13d37d77SDavid du Colombier
39*13d37d77SDavid du ColombierClzip uses the same well-defined exit status values used by lzip and
40*13d37d77SDavid du Colombierbzip2, which makes it safer than compressors returning ambiguous warning
41*13d37d77SDavid du Colombiervalues (like gzip) when it is used as a back end for other programs like
42*13d37d77SDavid du Colombiertar or zutils.
43*13d37d77SDavid du Colombier
44*13d37d77SDavid du ColombierClzip will automatically use the smallest possible dictionary size for
45*13d37d77SDavid du Colombiereach file without exceeding the given limit. Keep in mind that the
46*13d37d77SDavid du Colombierdecompression memory requirement is affected at compression time by the
47*13d37d77SDavid du Colombierchoice of dictionary size limit.
48*13d37d77SDavid du Colombier
49*13d37d77SDavid du ColombierThe amount of memory required for compression is about 1 or 2 times the
50*13d37d77SDavid du Colombierdictionary size limit (1 if input file size is less than dictionary size
51*13d37d77SDavid du Colombierlimit, else 2) plus 9 times the dictionary size really used. The option
52*13d37d77SDavid du Colombier'-0' is special and only requires about 1.5 MiB at most. The amount of
53*13d37d77SDavid du Colombiermemory required for decompression is about 46 kB larger than the
54*13d37d77SDavid du Colombierdictionary size really used.
55*13d37d77SDavid du Colombier
56*13d37d77SDavid du ColombierWhen compressing, clzip replaces every file given in the command line
57*13d37d77SDavid du Colombierwith a compressed version of itself, with the name "original_name.lz".
58*13d37d77SDavid du ColombierWhen decompressing, clzip attempts to guess the name for the decompressed
59*13d37d77SDavid du Colombierfile from that of the compressed file as follows:
60*13d37d77SDavid du Colombier
61*13d37d77SDavid du Colombierfilename.lz    becomes   filename
62*13d37d77SDavid du Colombierfilename.tlz   becomes   filename.tar
63*13d37d77SDavid du Colombieranyothername   becomes   anyothername.out
64*13d37d77SDavid du Colombier
65*13d37d77SDavid du Colombier(De)compressing a file is much like copying or moving it; therefore clzip
66*13d37d77SDavid du Colombierpreserves the access and modification dates, permissions, and, when
67*13d37d77SDavid du Colombierpossible, ownership of the file just as "cp -p" does. (If the user ID or
68*13d37d77SDavid du Colombierthe group ID can't be duplicated, the file permission bits S_ISUID and
69*13d37d77SDavid du ColombierS_ISGID are cleared).
70*13d37d77SDavid du Colombier
71*13d37d77SDavid du ColombierClzip is able to read from some types of non regular files if the
72*13d37d77SDavid du Colombier"--stdout" option is specified.
73*13d37d77SDavid du Colombier
74*13d37d77SDavid du ColombierIf no file names are specified, clzip compresses (or decompresses) from
75*13d37d77SDavid du Colombierstandard input to standard output. In this case, clzip will decline to
76*13d37d77SDavid du Colombierwrite compressed output to a terminal, as this would be entirely
77*13d37d77SDavid du Colombierincomprehensible and therefore pointless.
78*13d37d77SDavid du Colombier
79*13d37d77SDavid du ColombierClzip will correctly decompress a file which is the concatenation of two
80*13d37d77SDavid du Colombieror more compressed files. The result is the concatenation of the
81*13d37d77SDavid du Colombiercorresponding uncompressed files. Integrity testing of concatenated
82*13d37d77SDavid du Colombiercompressed files is also supported.
83*13d37d77SDavid du Colombier
84*13d37d77SDavid du ColombierClzip can produce multimember files, and lziprecover can safely recover
85*13d37d77SDavid du Colombierthe undamaged members in case of file damage. Clzip can also split the
86*13d37d77SDavid du Colombiercompressed output in volumes of a given size, even when reading from
87*13d37d77SDavid du Colombierstandard input. This allows the direct creation of multivolume
88*13d37d77SDavid du Colombiercompressed tar archives.
89*13d37d77SDavid du Colombier
90*13d37d77SDavid du ColombierClzip is able to compress and decompress streams of unlimited size by
91*13d37d77SDavid du Colombierautomatically creating multimember output. The members so created are
92*13d37d77SDavid du Colombierlarge, about 2 PiB each.
93*13d37d77SDavid du Colombier
94*13d37d77SDavid du ColombierIn spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a
95*13d37d77SDavid du Colombierconcrete algorithm; it is more like "any algorithm using the LZMA coding
96*13d37d77SDavid du Colombierscheme". For example, the option '-0' of lzip uses the scheme in almost
97*13d37d77SDavid du Colombierthe simplest way possible; issuing the longest match it can find, or a
98*13d37d77SDavid du Colombierliteral byte if it can't find a match. Inversely, a much more elaborated
99*13d37d77SDavid du Colombierway of finding coding sequences of minimum size than the one currently
100*13d37d77SDavid du Colombierused by lzip could be developed, and the resulting sequence could also
101*13d37d77SDavid du Colombierbe coded using the LZMA coding scheme.
102*13d37d77SDavid du Colombier
103*13d37d77SDavid du ColombierClzip currently implements two variants of the LZMA algorithm; fast
104*13d37d77SDavid du Colombier(used by option '-0') and normal (used by all other compression levels).
105*13d37d77SDavid du Colombier
106*13d37d77SDavid du ColombierThe high compression of LZMA comes from combining two basic, well-proven
107*13d37d77SDavid du Colombiercompression ideas: sliding dictionaries (LZ77/78) and markov models (the
108*13d37d77SDavid du Colombierthing used by every compression algorithm that uses a range encoder or
109*13d37d77SDavid du Colombiersimilar order-0 entropy coder as its last stage) with segregation of
110*13d37d77SDavid du Colombiercontexts according to what the bits are used for.
111*13d37d77SDavid du Colombier
112*13d37d77SDavid du ColombierThe ideas embodied in clzip are due to (at least) the following people:
113*13d37d77SDavid du ColombierAbraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for
114*13d37d77SDavid du Colombierthe definition of Markov chains), G.N.N. Martin (for the definition of
115*13d37d77SDavid du Colombierrange encoding), Igor Pavlov (for putting all the above together in
116*13d37d77SDavid du ColombierLZMA), and Julian Seward (for bzip2's CLI).
117*13d37d77SDavid du Colombier
118*13d37d77SDavid du Colombier
119*13d37d77SDavid du ColombierCopyright (C) 2010-2017 Antonio Diaz Diaz.
120*13d37d77SDavid du Colombier
121*13d37d77SDavid du ColombierThis file is free documentation: you have unlimited permission to copy,
122*13d37d77SDavid du Colombierdistribute and modify it.
123*13d37d77SDavid du Colombier
124*13d37d77SDavid du ColombierThe file Makefile.in is a data file used by configure to produce the
125*13d37d77SDavid du ColombierMakefile. It has the same copyright owner and permissions that configure
126*13d37d77SDavid du Colombieritself.
127