xref: /minix3/external/bsd/bzip2/dist/bzip2.1 (revision 4a711bea63dc53acce03198b5fbfaa103fe328d6)
1*4a711beaSLionel Sambuc.\"	$NetBSD: bzip2.1,v 1.3 2012/05/07 00:45:47 wiz Exp $
2*4a711beaSLionel Sambuc.\"
3*4a711beaSLionel Sambuc.Dd May 14, 2010
4*4a711beaSLionel Sambuc.Dt BZIP2 1
5*4a711beaSLionel Sambuc.Os
6*4a711beaSLionel Sambuc.Sh NAME
7*4a711beaSLionel Sambuc.Nm bzip2 ,
8*4a711beaSLionel Sambuc.Nm bunzip2 ,
9*4a711beaSLionel Sambuc.Nm bzcat ,
10*4a711beaSLionel Sambuc.Nm bzip2recover
11*4a711beaSLionel Sambuc.Nd block-sorting file compressor
12*4a711beaSLionel Sambuc.Sh SYNOPSIS
13*4a711beaSLionel Sambuc.Nm bzip2
14*4a711beaSLionel Sambuc.Op Fl 123456789cdfkLqstVvz
15*4a711beaSLionel Sambuc.Op Ar filename Ar
16*4a711beaSLionel Sambuc.Pp
17*4a711beaSLionel Sambuc.Nm bunzip2
18*4a711beaSLionel Sambuc.Op Fl fkLVvs
19*4a711beaSLionel Sambuc.Op Ar filename Ar
20*4a711beaSLionel Sambuc.Pp
21*4a711beaSLionel Sambuc.Nm bzcat
22*4a711beaSLionel Sambuc.Op Fl s
23*4a711beaSLionel Sambuc.Op Ar filename Ar
24*4a711beaSLionel Sambuc.Pp
25*4a711beaSLionel Sambuc.Nm bzip2recover
26*4a711beaSLionel Sambuc.Ar filename
27*4a711beaSLionel Sambuc.Sh DESCRIPTION
28*4a711beaSLionel Sambuc.Nm bzip2
29*4a711beaSLionel Sambuccompresses files using the Burrows-Wheeler block sorting
30*4a711beaSLionel Sambuctext compression algorithm, and Huffman coding.
31*4a711beaSLionel SambucCompression is generally considerably better than that achieved by
32*4a711beaSLionel Sambucmore conventional LZ77/LZ78-based compressors, and approaches the
33*4a711beaSLionel Sambucperformance of the PPM family of statistical compressors.
34*4a711beaSLionel Sambuc.Pp
35*4a711beaSLionel Sambuc.Nm bzcat
36*4a711beaSLionel Sambucdecompresses files to stdout, and
37*4a711beaSLionel Sambuc.Nm bzip2recover
38*4a711beaSLionel Sambucrecovers data from damaged bzip2 files.
39*4a711beaSLionel Sambuc.Pp
40*4a711beaSLionel SambucThe command-line options are deliberately very similar to
41*4a711beaSLionel Sambucthose of
42*4a711beaSLionel Sambuc.Xr gzip 1 ,
43*4a711beaSLionel Sambucbut they are not identical.
44*4a711beaSLionel Sambuc.Pp
45*4a711beaSLionel Sambuc.Nm bzip2
46*4a711beaSLionel Sambucexpects a list of file names to accompany the command-line flags.
47*4a711beaSLionel SambucEach file is replaced by a compressed version of
48*4a711beaSLionel Sambucitself, with the name
49*4a711beaSLionel Sambuc.Dq Pa original_name.bz2 .
50*4a711beaSLionel SambucEach compressed file has the same modification date, permissions, and,
51*4a711beaSLionel Sambucwhen possible, ownership as the corresponding original, so that these
52*4a711beaSLionel Sambucproperties can be correctly restored at decompression time.
53*4a711beaSLionel SambucFile name handling is naive in the sense that there is no mechanism
54*4a711beaSLionel Sambucfor preserving original file names, permissions, ownerships or dates
55*4a711beaSLionel Sambucin filesystems which lack these concepts, or have serious file name
56*4a711beaSLionel Sambuclength restrictions, such as
57*4a711beaSLionel Sambuc.Tn MS-DOS .
58*4a711beaSLionel Sambuc.Nm bzip2
59*4a711beaSLionel Sambucand
60*4a711beaSLionel Sambuc.Nm bunzip2
61*4a711beaSLionel Sambucwill by default not overwrite existing files.
62*4a711beaSLionel SambucIf you want this to happen, specify the
63*4a711beaSLionel Sambuc.Fl f
64*4a711beaSLionel Sambucflag.
65*4a711beaSLionel Sambuc.Pp
66*4a711beaSLionel SambucIf no file names are specified,
67*4a711beaSLionel Sambuc.Nm bzip2
68*4a711beaSLionel Sambuccompresses from standard input to standard output.
69*4a711beaSLionel SambucIn this case,
70*4a711beaSLionel Sambuc.Nm bzip2
71*4a711beaSLionel Sambucwill decline to write compressed output to a terminal, as this would
72*4a711beaSLionel Sambucbe entirely incomprehensible and therefore pointless.
73*4a711beaSLionel Sambuc.Pp
74*4a711beaSLionel Sambuc.Nm bunzip2
75*4a711beaSLionel Sambuc(or
76*4a711beaSLionel Sambuc.Nm bzip2 Fl d )
77*4a711beaSLionel Sambucdecompresses all specified files.
78*4a711beaSLionel SambucFiles which were not created by
79*4a711beaSLionel Sambuc.Nm bzip2
80*4a711beaSLionel Sambucwill be detected and ignored, and a warning issued.
81*4a711beaSLionel Sambuc.Nm bzip2
82*4a711beaSLionel Sambucattempts to guess the filename for the decompressed file
83*4a711beaSLionel Sambucfrom that of the compressed file as follows:
84*4a711beaSLionel Sambuc.Bl -column "filename.tbz2" "becomes" -offset indent
85*4a711beaSLionel Sambuc.It Pa filename.bz2  Ta becomes Ta Pa filename
86*4a711beaSLionel Sambuc.It Pa filename.bz   Ta becomes Ta Pa filename
87*4a711beaSLionel Sambuc.It Pa filename.tbz2 Ta becomes Ta Pa filename.tar
88*4a711beaSLionel Sambuc.It Pa filename.tbz  Ta becomes Ta Pa filename.tar
89*4a711beaSLionel Sambuc.It Pa anyothername  Ta becomes Ta Pa anyothername.out
90*4a711beaSLionel Sambuc.El
91*4a711beaSLionel Sambuc.Pp
92*4a711beaSLionel SambucIf the file does not end in one of the recognised endings,
93*4a711beaSLionel Sambuc.Pa .bz2 ,
94*4a711beaSLionel Sambuc.Pa .bz ,
95*4a711beaSLionel Sambuc.Pa .tbz2 ,
96*4a711beaSLionel Sambucor
97*4a711beaSLionel Sambuc.Pa .tbz ,
98*4a711beaSLionel Sambuc.Nm bzip2
99*4a711beaSLionel Sambuccomplains that it cannot guess the name of the original file, and uses
100*4a711beaSLionel Sambucthe original name with
101*4a711beaSLionel Sambuc.Pa .out
102*4a711beaSLionel Sambucappended.
103*4a711beaSLionel Sambuc.Pp
104*4a711beaSLionel SambucAs with compression, supplying no filenames causes decompression from
105*4a711beaSLionel Sambucstandard input to standard output.
106*4a711beaSLionel Sambuc.Pp
107*4a711beaSLionel Sambuc.Nm bunzip2
108*4a711beaSLionel Sambucwill correctly decompress a file which is the concatenation of two or
109*4a711beaSLionel Sambucmore compressed files.
110*4a711beaSLionel SambucThe result is the concatenation of the corresponding uncompressed
111*4a711beaSLionel Sambucfiles.
112*4a711beaSLionel SambucIntegrity testing
113*4a711beaSLionel Sambuc.Pq Fl t
114*4a711beaSLionel Sambucof concatenated compressed files is also supported.
115*4a711beaSLionel Sambuc.Pp
116*4a711beaSLionel SambucYou can also compress or decompress files to the standard output by
117*4a711beaSLionel Sambucgiving the
118*4a711beaSLionel Sambuc.Fl c
119*4a711beaSLionel Sambucflag.
120*4a711beaSLionel SambucMultiple files may be compressed and decompressed like this.
121*4a711beaSLionel SambucThe resulting outputs are fed sequentially to stdout.
122*4a711beaSLionel SambucCompression of multiple files in this manner generates a stream
123*4a711beaSLionel Sambuccontaining multiple compressed file representations.
124*4a711beaSLionel SambucSuch a stream can be decompressed correctly only by
125*4a711beaSLionel Sambuc.Nm bzip2
126*4a711beaSLionel Sambucversion 0.9.0 or later.
127*4a711beaSLionel SambucEarlier versions of
128*4a711beaSLionel Sambuc.Nm bzip2
129*4a711beaSLionel Sambucwill stop after decompressing
130*4a711beaSLionel Sambucthe first file in the stream.
131*4a711beaSLionel Sambuc.Pp
132*4a711beaSLionel Sambuc.Nm bzcat
133*4a711beaSLionel Sambuc(or
134*4a711beaSLionel Sambuc.Nm bzip2 Fl dc )
135*4a711beaSLionel Sambucdecompresses all specified files to the standard output.
136*4a711beaSLionel Sambuc.Pp
137*4a711beaSLionel SambucCompression is always performed, even if the compressed file is
138*4a711beaSLionel Sambucslightly larger than the original.
139*4a711beaSLionel SambucFiles of less than about one hundred bytes tend to get larger, since
140*4a711beaSLionel Sambucthe compression mechanism has a constant overhead in the region of 50
141*4a711beaSLionel Sambucbytes.
142*4a711beaSLionel SambucRandom data (including the output of most file compressors) is coded
143*4a711beaSLionel Sambucat about 8.05 bits per byte, giving an expansion of around 0.5%.
144*4a711beaSLionel Sambuc.Pp
145*4a711beaSLionel SambucAs a self-check for your protection,
146*4a711beaSLionel Sambuc.Nm bzip2
147*4a711beaSLionel Sambucuses 32-bit CRCs to make sure that the decompressed version of a file
148*4a711beaSLionel Sambucis identical to the original.
149*4a711beaSLionel SambucThis guards against corruption of the compressed data, and against
150*4a711beaSLionel Sambucundetected bugs in
151*4a711beaSLionel Sambuc.Nm bzip2
152*4a711beaSLionel Sambuc(hopefully very unlikely).
153*4a711beaSLionel SambucThe chances of data corruption going undetected is microscopic, about
154*4a711beaSLionel Sambucone chance in four billion for each file processed.
155*4a711beaSLionel SambucBe aware, though, that the check occurs upon decompression, so it can
156*4a711beaSLionel Sambuconly tell you that something is wrong.
157*4a711beaSLionel SambucIt can't help you recover the original uncompressed data.
158*4a711beaSLionel SambucYou can use
159*4a711beaSLionel Sambuc.Nm bzip2recover
160*4a711beaSLionel Sambucto try to recover data from
161*4a711beaSLionel Sambucdamaged files.
162*4a711beaSLionel Sambuc.Sh OPTIONS
163*4a711beaSLionel Sambuc.Bl -tag -width "XXrepetitiveXfastXX"
164*4a711beaSLionel Sambuc.It Fl Fl
165*4a711beaSLionel SambucTreats all subsequent arguments as file names, even if they start with
166*4a711beaSLionel Sambuca dash.
167*4a711beaSLionel SambucThis is so you can handle files with names beginning with a dash, for
168*4a711beaSLionel Sambucexample:
169*4a711beaSLionel Sambuc.Dl bzip2 -- -myfilename .
170*4a711beaSLionel Sambuc.It Fl 1 , Fl Fl fast
171*4a711beaSLionel Sambucto
172*4a711beaSLionel Sambuc.It Fl 9 , Fl Fl best
173*4a711beaSLionel SambucSet the block size to 100 k, 200 k ... 900 k when compressing.
174*4a711beaSLionel SambucHas no effect when decompressing.
175*4a711beaSLionel SambucSee
176*4a711beaSLionel Sambuc.Sx MEMORY MANAGEMENT
177*4a711beaSLionel Sambucbelow.
178*4a711beaSLionel SambucThe
179*4a711beaSLionel Sambuc.Fl Fl fast
180*4a711beaSLionel Sambucand
181*4a711beaSLionel Sambuc.Fl Fl best
182*4a711beaSLionel Sambucaliases are primarily for GNU
183*4a711beaSLionel Sambuc.Xr gzip 1
184*4a711beaSLionel Sambuccompatibility.
185*4a711beaSLionel SambucIn particular,
186*4a711beaSLionel Sambuc.Fl Fl fast
187*4a711beaSLionel Sambucdoesn't make things significantly faster, and
188*4a711beaSLionel Sambuc.Fl Fl best
189*4a711beaSLionel Sambucmerely selects the default behaviour.
190*4a711beaSLionel Sambuc.It Fl c , Fl Fl stdout
191*4a711beaSLionel SambucCompress or decompress to standard output.
192*4a711beaSLionel Sambuc.It Fl d , Fl Fl decompress
193*4a711beaSLionel SambucForce decompression.
194*4a711beaSLionel Sambuc.Nm bzip2 ,
195*4a711beaSLionel Sambuc.Nm bunzip2 ,
196*4a711beaSLionel Sambucand
197*4a711beaSLionel Sambuc.Nm bzcat
198*4a711beaSLionel Sambucare really the same program, and the decision about what actions to
199*4a711beaSLionel Sambuctake is done on the basis of which name is used.
200*4a711beaSLionel SambucThis flag overrides that mechanism, and forces
201*4a711beaSLionel Sambuc.Nm bzip2
202*4a711beaSLionel Sambucto decompress.
203*4a711beaSLionel Sambuc.It Fl f , Fl Fl force
204*4a711beaSLionel SambucForce overwrite of output files.
205*4a711beaSLionel SambucNormally,
206*4a711beaSLionel Sambuc.Nm bzip2
207*4a711beaSLionel Sambucwill not overwrite existing output files.
208*4a711beaSLionel SambucAlso forces
209*4a711beaSLionel Sambuc.Nm bzip2
210*4a711beaSLionel Sambucto break hard links
211*4a711beaSLionel Sambucto files, which it otherwise wouldn't do.
212*4a711beaSLionel Sambuc.Pp
213*4a711beaSLionel Sambuc.Nm bzip2
214*4a711beaSLionel Sambucnormally declines to decompress files which don't have the correct
215*4a711beaSLionel Sambucmagic header bytes.
216*4a711beaSLionel SambucIf forced
217*4a711beaSLionel Sambuc.Pq Fl f ,
218*4a711beaSLionel Sambuchowever, it will pass such files through unmodified.
219*4a711beaSLionel SambucThis is how GNU
220*4a711beaSLionel Sambuc.Xr gzip 1
221*4a711beaSLionel Sambucbehaves.
222*4a711beaSLionel Sambuc.It Fl k , Fl Fl keep
223*4a711beaSLionel SambucKeep (don't delete) input files during compression
224*4a711beaSLionel Sambucor decompression.
225*4a711beaSLionel Sambuc.It Fl L , Fl Fl license
226*4a711beaSLionel SambucDisplay the license terms and conditions.
227*4a711beaSLionel Sambuc.It Fl q , Fl Fl quiet
228*4a711beaSLionel SambucSuppress non-essential warning messages.
229*4a711beaSLionel SambucMessages pertaining to I/O errors and other critical events will not
230*4a711beaSLionel Sambucbe suppressed.
231*4a711beaSLionel Sambuc.It Fl Fl repetitive-fast
232*4a711beaSLionel Sambuc.It Fl Fl repetitive-best
233*4a711beaSLionel SambucThese flags are redundant in versions 0.9.5 and above.
234*4a711beaSLionel SambucThey provided some coarse control over the behaviour of the sorting
235*4a711beaSLionel Sambucalgorithm in earlier versions, which was sometimes useful.
236*4a711beaSLionel Sambuc0.9.5 and above have an improved algorithm which renders these flags
237*4a711beaSLionel Sambucirrelevant.
238*4a711beaSLionel Sambuc.It Fl s , Fl Fl small
239*4a711beaSLionel SambucReduce memory usage, for compression, decompression and testing.
240*4a711beaSLionel SambucFiles are decompressed and tested using a modified algorithm which
241*4a711beaSLionel Sambuconly requires 2.5 bytes per block byte.
242*4a711beaSLionel SambucThis means any file can be decompressed in 2300k of memory, albeit at
243*4a711beaSLionel Sambucabout half the normal speed.
244*4a711beaSLionel SambucDuring compression,
245*4a711beaSLionel Sambuc.Fl s
246*4a711beaSLionel Sambucselects a block size of 200k, which limits memory use to around the
247*4a711beaSLionel Sambucsame figure, at the expense of your compression ratio.
248*4a711beaSLionel SambucIn short, if your machine is low on memory (8 megabytes or less), use
249*4a711beaSLionel Sambuc.Fl s
250*4a711beaSLionel Sambucfor everything.
251*4a711beaSLionel SambucSee
252*4a711beaSLionel Sambuc.Sx MEMORY MANAGEMENT
253*4a711beaSLionel Sambucbelow.
254*4a711beaSLionel Sambuc.It Fl t , Fl Fl test
255*4a711beaSLionel SambucCheck integrity of the specified file(s), but don't decompress them.
256*4a711beaSLionel SambucThis really performs a trial decompression and throws away the result.
257*4a711beaSLionel Sambuc.It Fl V , Fl Fl version
258*4a711beaSLionel SambucDisplay the software version.
259*4a711beaSLionel Sambuc.It Fl v , Fl Fl verbose
260*4a711beaSLionel SambucVerbose mode: show the compression ratio for each file processed.
261*4a711beaSLionel SambucFurther
262*4a711beaSLionel Sambuc.Fl v Ap s
263*4a711beaSLionel Sambucincrease the verbosity level, spewing out lots of information which is
264*4a711beaSLionel Sambucprimarily of interest for diagnostic purposes.
265*4a711beaSLionel Sambuc.It Fl z , Fl Fl compress
266*4a711beaSLionel SambucThe complement to
267*4a711beaSLionel SambucFl d :
268*4a711beaSLionel Sambucforces compression, regardless of the invocation name.
269*4a711beaSLionel Sambuc.El
270*4a711beaSLionel Sambuc.Ss MEMORY MANAGEMENT
271*4a711beaSLionel Sambuc.Nm bzip2
272*4a711beaSLionel Sambuccompresses large files in blocks.
273*4a711beaSLionel SambucThe block size affects both the compression ratio achieved, and the
274*4a711beaSLionel Sambucamount of memory needed for compression and decompression.
275*4a711beaSLionel SambucThe flags
276*4a711beaSLionel Sambuc.Fl 1
277*4a711beaSLionel Sambucthrough
278*4a711beaSLionel Sambuc.Fl 9
279*4a711beaSLionel Sambucspecify the block size to be 100,000 bytes through 900,000 bytes (the
280*4a711beaSLionel Sambucdefault) respectively.
281*4a711beaSLionel SambucAt decompression time, the block size used for compression is read
282*4a711beaSLionel Sambucfrom the header of the compressed file, and
283*4a711beaSLionel Sambuc.Nm bunzip2
284*4a711beaSLionel Sambucthen allocates itself just enough memory to decompress the file.
285*4a711beaSLionel SambucSince block sizes are stored in compressed files, it follows that the
286*4a711beaSLionel Sambucflags
287*4a711beaSLionel Sambuc.Fl 1
288*4a711beaSLionel Sambucto
289*4a711beaSLionel Sambuc.Fl 9
290*4a711beaSLionel Sambucare irrelevant to and so ignored during decompression.
291*4a711beaSLionel Sambuc.Pp
292*4a711beaSLionel SambucCompression and decompression requirements, in bytes, can be estimated
293*4a711beaSLionel Sambucas:
294*4a711beaSLionel Sambuc.Bl -tag -width "Decompression:" -offset indent
295*4a711beaSLionel Sambuc.It Compression :
296*4a711beaSLionel Sambuc400k + ( 8 x block size )
297*4a711beaSLionel Sambuc.It Decompression :
298*4a711beaSLionel Sambuc100k + ( 4 x block size ), or 100k + ( 2.5 x block size )
299*4a711beaSLionel Sambuc.El
300*4a711beaSLionel SambucLarger block sizes give rapidly diminishing marginal returns.
301*4a711beaSLionel SambucMost of the compression comes from the first two or three hundred k of
302*4a711beaSLionel Sambucblock size, a fact worth bearing in mind when using
303*4a711beaSLionel Sambuc.Nm bzip2
304*4a711beaSLionel Sambucon small machines.
305*4a711beaSLionel SambucIt is also important to appreciate that the decompression memory
306*4a711beaSLionel Sambucrequirement is set at compression time by the choice of block size.
307*4a711beaSLionel Sambuc.Pp
308*4a711beaSLionel SambucFor files compressed with the default 900k block size,
309*4a711beaSLionel Sambuc.Nm bunzip2
310*4a711beaSLionel Sambucwill require about 3700 kbytes to decompress.
311*4a711beaSLionel SambucTo support decompression of any file on a 4 megabyte machine,
312*4a711beaSLionel Sambuc.Nm bunzip2
313*4a711beaSLionel Sambuchas an option to decompress using approximately half this amount of
314*4a711beaSLionel Sambucmemory, about 2300 kbytes.
315*4a711beaSLionel SambucDecompression speed is also halved, so you should use this option only
316*4a711beaSLionel Sambucwhere necessary.
317*4a711beaSLionel SambucThe relevant flag is
318*4a711beaSLionel Sambuc.Fl s .
319*4a711beaSLionel Sambuc.Pp
320*4a711beaSLionel SambucIn general, try and use the largest block size memory constraints
321*4a711beaSLionel Sambucallow, since that maximises the compression achieved.
322*4a711beaSLionel SambucCompression and decompression speed are virtually unaffected by block
323*4a711beaSLionel Sambucsize.
324*4a711beaSLionel Sambuc.Pp
325*4a711beaSLionel SambucAnother significant point applies to files which fit in a single block
326*4a711beaSLionel Sambuc-- that means most files you'd encounter using a large block size.
327*4a711beaSLionel SambucThe amount of real memory touched is proportional to the size of the
328*4a711beaSLionel Sambucfile, since the file is smaller than a block.
329*4a711beaSLionel SambucFor example, compressing a file 20,000 bytes long with the flag
330*4a711beaSLionel Sambuc.Fl 9
331*4a711beaSLionel Sambucwill cause the compressor to allocate around 7600k of memory, but only
332*4a711beaSLionel Sambuctouch 400k + 20000 * 8 = 560 kbytes of it.
333*4a711beaSLionel SambucSimilarly, the decompressor will allocate 3700k but only touch 100k +
334*4a711beaSLionel Sambuc20000 * 4 = 180 kbytes.
335*4a711beaSLionel Sambuc.Pp
336*4a711beaSLionel SambucHere is a table which summarises the maximum memory usage for different
337*4a711beaSLionel Sambucblock sizes.
338*4a711beaSLionel SambucAlso recorded is the total compressed size for 14 files of the Calgary
339*4a711beaSLionel SambucText Compression Corpus totalling 3,141,622 bytes.
340*4a711beaSLionel SambucThis column gives some feel for how compression varies with block size.
341*4a711beaSLionel SambucThese figures tend to understate the advantage of larger block sizes
342*4a711beaSLionel Sambucfor larger files, since the Corpus is dominated by smaller files.
343*4a711beaSLionel Sambuc.Bl -column "Flag" "Compression" "Decompression" "DecompressionXXs" "Corpus size"
344*4a711beaSLionel Sambuc.It Sy Flag Ta Sy Compression Ta Sy Decompression Ta Sy Decompression Fl s Ta Sy Corpus size
345*4a711beaSLionel Sambuc.It -1 Ta 1200k Ta  500k Ta  350k Ta 914704
346*4a711beaSLionel Sambuc.It -2 Ta 2000k Ta  900k Ta  600k Ta 877703
347*4a711beaSLionel Sambuc.It -3 Ta 2800k Ta 1300k Ta  850k Ta 860338
348*4a711beaSLionel Sambuc.It -4 Ta 3600k Ta 1700k Ta 1100k Ta 846899
349*4a711beaSLionel Sambuc.It -5 Ta 4400k Ta 2100k Ta 1350k Ta 845160
350*4a711beaSLionel Sambuc.It -6 Ta 5200k Ta 2500k Ta 1600k Ta 838626
351*4a711beaSLionel Sambuc.It -7 Ta 6100k Ta 2900k Ta 1850k Ta 834096
352*4a711beaSLionel Sambuc.It -8 Ta 6800k Ta 3300k Ta 2100k Ta 828642
353*4a711beaSLionel Sambuc.It -9 Ta 7600k Ta 3700k Ta 2350k Ta 828642
354*4a711beaSLionel Sambuc.El
355*4a711beaSLionel Sambuc.Ss RECOVERING DATA FROM DAMAGED FILES
356*4a711beaSLionel Sambuc.Nm bzip2
357*4a711beaSLionel Sambuccompresses files in blocks, usually 900kbytes long.
358*4a711beaSLionel SambucEach block is handled independently.
359*4a711beaSLionel SambucIf a media or transmission error causes a multi-block
360*4a711beaSLionel Sambuc.Pa .bz2
361*4a711beaSLionel Sambucfile to become damaged, it may be possible to recover data from the
362*4a711beaSLionel Sambucundamaged blocks in the file.
363*4a711beaSLionel Sambuc.Pp
364*4a711beaSLionel SambucThe compressed representation of each block is delimited by a 48-bit
365*4a711beaSLionel Sambucpattern, which makes it possible to find the block boundaries with
366*4a711beaSLionel Sambucreasonable certainty.
367*4a711beaSLionel SambucEach block also carries its own 32-bit CRC, so damaged blocks can be
368*4a711beaSLionel Sambucdistinguished from undamaged ones.
369*4a711beaSLionel Sambuc.Pp
370*4a711beaSLionel Sambuc.Nm bzip2recover
371*4a711beaSLionel Sambucis a simple program whose purpose is to search for blocks in
372*4a711beaSLionel Sambuc.Pa .bz2
373*4a711beaSLionel Sambucfiles, and write each block out into its own
374*4a711beaSLionel Sambuc.Pa .bz2
375*4a711beaSLionel Sambucfile.
376*4a711beaSLionel SambucYou can then use
377*4a711beaSLionel Sambuc.Nm bzip2
378*4a711beaSLionel Sambuc.Fl t
379*4a711beaSLionel Sambucto test the integrity of the resulting files, and decompress those
380*4a711beaSLionel Sambucwhich are undamaged.
381*4a711beaSLionel Sambuc.Pp
382*4a711beaSLionel Sambuc.Nm bzip2recover
383*4a711beaSLionel Sambuctakes a single argument, the name of the damaged file, and writes a
384*4a711beaSLionel Sambucnumber of files
385*4a711beaSLionel Sambuc.Dq Pa rec00001file.bz2 ,
386*4a711beaSLionel Sambuc.Dq Pa rec00002file.bz2 ,
387*4a711beaSLionel Sambucetc., containing the extracted blocks.
388*4a711beaSLionel SambucThe output filenames are designed so that the use of wildcards in
389*4a711beaSLionel Sambucsubsequent processing -- for example,
390*4a711beaSLionel Sambuc.Dl bzip2 -dc rec*file.bz2 \*[Gt] recovered_data
391*4a711beaSLionel Sambuc-- processes the files in the correct order.
392*4a711beaSLionel Sambuc.Pp
393*4a711beaSLionel Sambuc.Nm bzip2recover
394*4a711beaSLionel Sambucshould be of most use dealing with large
395*4a711beaSLionel Sambuc.Pa .bz2
396*4a711beaSLionel Sambucfiles, as these will contain many blocks.
397*4a711beaSLionel SambucIt is clearly futile to use it on damaged single-block files, since a
398*4a711beaSLionel Sambucdamaged block cannot be recovered.
399*4a711beaSLionel SambucIf you wish to minimise any potential data loss through media or
400*4a711beaSLionel Sambuctransmission errors, you might consider compressing with a smaller
401*4a711beaSLionel Sambucblock size.
402*4a711beaSLionel Sambuc.Ss PERFORMANCE NOTES
403*4a711beaSLionel SambucThe sorting phase of compression gathers together similar strings in
404*4a711beaSLionel Sambucthe file.
405*4a711beaSLionel SambucBecause of this, files containing very long runs of repeated
406*4a711beaSLionel Sambucsymbols, like
407*4a711beaSLionel Sambuc.Dq aabaabaabaab...
408*4a711beaSLionel Sambuc(repeated several hundred times) may compress more slowly than normal.
409*4a711beaSLionel SambucVersions 0.9.5 and above fare much better than previous versions in
410*4a711beaSLionel Sambucthis respect.
411*4a711beaSLionel SambucThe ratio between worst-case and average-case compression time is in
412*4a711beaSLionel Sambucthe region of 10:1.
413*4a711beaSLionel SambucFor previous versions, this figure was more like 100:1.
414*4a711beaSLionel SambucYou can use the
415*4a711beaSLionel Sambuc.Fl vvvv
416*4a711beaSLionel Sambucoption to monitor progress in great detail, if you want.
417*4a711beaSLionel Sambuc.Pp
418*4a711beaSLionel SambucDecompression speed is unaffected by these phenomena.
419*4a711beaSLionel Sambuc.Pp
420*4a711beaSLionel Sambuc.Nm bzip2
421*4a711beaSLionel Sambucusually allocates several megabytes of memory to operate in, and then
422*4a711beaSLionel Sambuccharges all over it in a fairly random fashion.
423*4a711beaSLionel SambucThis means that performance, both for compressing and decompressing,
424*4a711beaSLionel Sambucis largely determined by the speed at which your machine can service
425*4a711beaSLionel Sambuccache misses.
426*4a711beaSLionel SambucBecause of this, small changes to the code to reduce the miss rate
427*4a711beaSLionel Sambuchave been observed to give disproportionately large performance
428*4a711beaSLionel Sambucimprovements.
429*4a711beaSLionel SambucI imagine
430*4a711beaSLionel Sambuc.Nm bzip2
431*4a711beaSLionel Sambucwill perform best on machines with very large caches.
432*4a711beaSLionel Sambuc.Sh ENVIRONMENT
433*4a711beaSLionel Sambuc.Nm bzip2
434*4a711beaSLionel Sambucwill read arguments from the environment variables
435*4a711beaSLionel Sambuc.Ev BZIP2
436*4a711beaSLionel Sambucand
437*4a711beaSLionel Sambuc.Ev BZIP ,
438*4a711beaSLionel Sambucin that order, and will process them before any arguments read from
439*4a711beaSLionel Sambucthe command line.
440*4a711beaSLionel SambucThis gives a convenient way to supply default arguments.
441*4a711beaSLionel Sambuc.Sh EXIT STATUS
442*4a711beaSLionel Sambuc0 for a normal exit, 1 for environmental problems (file not found,
443*4a711beaSLionel Sambucinvalid flags, I/O errors, etc.), 2 to indicate a corrupt compressed
444*4a711beaSLionel Sambucfile, 3 for an internal consistency error (e.g., bug) which caused
445*4a711beaSLionel Sambuc.Nm bzip2
446*4a711beaSLionel Sambucto panic.
447*4a711beaSLionel Sambuc.Sh AUTHORS
448*4a711beaSLionel Sambuc.An -nosplit
449*4a711beaSLionel Sambuc.An Julian Seward
450*4a711beaSLionel Sambuc.Aq jseward@bzip.org
451*4a711beaSLionel Sambuc.Pp
452*4a711beaSLionel Sambuc.Pa http://www.bzip.org
453*4a711beaSLionel Sambuc.Pp
454*4a711beaSLionel SambucThe ideas embodied in
455*4a711beaSLionel Sambuc.Nm bzip2
456*4a711beaSLionel Sambucare due to (at least) the following people:
457*4a711beaSLionel Sambuc.An Michael Burrows
458*4a711beaSLionel Sambucand
459*4a711beaSLionel Sambuc.An David Wheeler
460*4a711beaSLionel Sambuc(for the block sorting transformation),
461*4a711beaSLionel Sambuc.An David Wheeler
462*4a711beaSLionel Sambuc(again, for the Huffman coder),
463*4a711beaSLionel Sambuc.An Peter Fenwick
464*4a711beaSLionel Sambuc(for the structured coding model in the original
465*4a711beaSLionel Sambuc.Nm bzip ,
466*4a711beaSLionel Sambucand many refinements), and
467*4a711beaSLionel Sambuc.An Alistair Moffat ,
468*4a711beaSLionel Sambuc.An Radford Neal ,
469*4a711beaSLionel Sambucand
470*4a711beaSLionel Sambuc.An Ian Witten
471*4a711beaSLionel Sambuc(for the arithmetic coder in the original
472*4a711beaSLionel Sambuc.Nm bzip ) .
473*4a711beaSLionel SambucI am much indebted for their help, support and advice.
474*4a711beaSLionel SambucSee the manual in the source distribution for pointers to sources of
475*4a711beaSLionel Sambucdocumentation.
476*4a711beaSLionel SambucChristian von Roques encouraged me to look for faster sorting
477*4a711beaSLionel Sambucalgorithms, so as to speed up compression.
478*4a711beaSLionel SambucBela Lubkin encouraged me to improve the worst-case compression
479*4a711beaSLionel Sambucperformance.
480*4a711beaSLionel SambucDonna Robinson XMLised the documentation.
481*4a711beaSLionel SambucThe bz* scripts are derived from those of GNU gzip.
482*4a711beaSLionel SambucMany people sent patches, helped with portability problems, lent
483*4a711beaSLionel Sambucmachines, gave advice and were generally helpful.
484*4a711beaSLionel Sambuc.Sh CAVEATS
485*4a711beaSLionel SambucI/O error messages are not as helpful as they could be.
486*4a711beaSLionel Sambuc.Nm bzip2
487*4a711beaSLionel Sambuctries hard to detect I/O errors and exit cleanly, but the details of
488*4a711beaSLionel Sambucwhat the problem is sometimes seem rather misleading.
489*4a711beaSLionel Sambuc.Pp
490*4a711beaSLionel SambucThis manual page pertains to version 1.0.6 of
491*4a711beaSLionel Sambuc.Nm bzip2 .
492*4a711beaSLionel SambucCompressed data created by this version is entirely forwards and
493*4a711beaSLionel Sambucbackwards compatible with the previous public releases, versions
494*4a711beaSLionel Sambuc0.1pl2, 0.9.0, 0.9.5, 1.0.0, 1.0.1, 1.0.2 and above, but with the
495*4a711beaSLionel Sambucfollowing exception: 0.9.0 and above can correctly decompress multiple
496*4a711beaSLionel Sambucconcatenated compressed files.
497*4a711beaSLionel Sambuc0.1pl2 cannot do this; it will stop after decompressing just the first
498*4a711beaSLionel Sambucfile in the stream.
499*4a711beaSLionel Sambuc.Pp
500*4a711beaSLionel Sambuc.Nm bzip2recover
501*4a711beaSLionel Sambucversions prior to 1.0.2 used 32-bit integers to represent bit
502*4a711beaSLionel Sambucpositions in compressed files, so they could not handle compressed
503*4a711beaSLionel Sambucfiles more than 512 megabytes long.
504*4a711beaSLionel SambucVersions 1.0.2 and above use 64-bit ints on some platforms which
505*4a711beaSLionel Sambucsupport them (GNU supported targets, and Windows).
506*4a711beaSLionel SambucTo establish whether or not
507*4a711beaSLionel Sambuc.Nm bzip2recover
508*4a711beaSLionel Sambucwas built with such a limitation, run it without arguments.
509*4a711beaSLionel SambucIn any event you can build yourself an unlimited version if you can
510*4a711beaSLionel Sambucrecompile it with MaybeUInt64 set to be an unsigned 64-bit integer.
511