1*4a711beaSLionel Sambuc.\" $NetBSD: bzip2.1,v 1.3 2012/05/07 00:45:47 wiz Exp $ 2*4a711beaSLionel Sambuc.\" 3*4a711beaSLionel Sambuc.Dd May 14, 2010 4*4a711beaSLionel Sambuc.Dt BZIP2 1 5*4a711beaSLionel Sambuc.Os 6*4a711beaSLionel Sambuc.Sh NAME 7*4a711beaSLionel Sambuc.Nm bzip2 , 8*4a711beaSLionel Sambuc.Nm bunzip2 , 9*4a711beaSLionel Sambuc.Nm bzcat , 10*4a711beaSLionel Sambuc.Nm bzip2recover 11*4a711beaSLionel Sambuc.Nd block-sorting file compressor 12*4a711beaSLionel Sambuc.Sh SYNOPSIS 13*4a711beaSLionel Sambuc.Nm bzip2 14*4a711beaSLionel Sambuc.Op Fl 123456789cdfkLqstVvz 15*4a711beaSLionel Sambuc.Op Ar filename Ar 16*4a711beaSLionel Sambuc.Pp 17*4a711beaSLionel Sambuc.Nm bunzip2 18*4a711beaSLionel Sambuc.Op Fl fkLVvs 19*4a711beaSLionel Sambuc.Op Ar filename Ar 20*4a711beaSLionel Sambuc.Pp 21*4a711beaSLionel Sambuc.Nm bzcat 22*4a711beaSLionel Sambuc.Op Fl s 23*4a711beaSLionel Sambuc.Op Ar filename Ar 24*4a711beaSLionel Sambuc.Pp 25*4a711beaSLionel Sambuc.Nm bzip2recover 26*4a711beaSLionel Sambuc.Ar filename 27*4a711beaSLionel Sambuc.Sh DESCRIPTION 28*4a711beaSLionel Sambuc.Nm bzip2 29*4a711beaSLionel Sambuccompresses files using the Burrows-Wheeler block sorting 30*4a711beaSLionel Sambuctext compression algorithm, and Huffman coding. 31*4a711beaSLionel SambucCompression is generally considerably better than that achieved by 32*4a711beaSLionel Sambucmore conventional LZ77/LZ78-based compressors, and approaches the 33*4a711beaSLionel Sambucperformance of the PPM family of statistical compressors. 34*4a711beaSLionel Sambuc.Pp 35*4a711beaSLionel Sambuc.Nm bzcat 36*4a711beaSLionel Sambucdecompresses files to stdout, and 37*4a711beaSLionel Sambuc.Nm bzip2recover 38*4a711beaSLionel Sambucrecovers data from damaged bzip2 files. 39*4a711beaSLionel Sambuc.Pp 40*4a711beaSLionel SambucThe command-line options are deliberately very similar to 41*4a711beaSLionel Sambucthose of 42*4a711beaSLionel Sambuc.Xr gzip 1 , 43*4a711beaSLionel Sambucbut they are not identical. 44*4a711beaSLionel Sambuc.Pp 45*4a711beaSLionel Sambuc.Nm bzip2 46*4a711beaSLionel Sambucexpects a list of file names to accompany the command-line flags. 47*4a711beaSLionel SambucEach file is replaced by a compressed version of 48*4a711beaSLionel Sambucitself, with the name 49*4a711beaSLionel Sambuc.Dq Pa original_name.bz2 . 50*4a711beaSLionel SambucEach compressed file has the same modification date, permissions, and, 51*4a711beaSLionel Sambucwhen possible, ownership as the corresponding original, so that these 52*4a711beaSLionel Sambucproperties can be correctly restored at decompression time. 53*4a711beaSLionel SambucFile name handling is naive in the sense that there is no mechanism 54*4a711beaSLionel Sambucfor preserving original file names, permissions, ownerships or dates 55*4a711beaSLionel Sambucin filesystems which lack these concepts, or have serious file name 56*4a711beaSLionel Sambuclength restrictions, such as 57*4a711beaSLionel Sambuc.Tn MS-DOS . 58*4a711beaSLionel Sambuc.Nm bzip2 59*4a711beaSLionel Sambucand 60*4a711beaSLionel Sambuc.Nm bunzip2 61*4a711beaSLionel Sambucwill by default not overwrite existing files. 62*4a711beaSLionel SambucIf you want this to happen, specify the 63*4a711beaSLionel Sambuc.Fl f 64*4a711beaSLionel Sambucflag. 65*4a711beaSLionel Sambuc.Pp 66*4a711beaSLionel SambucIf no file names are specified, 67*4a711beaSLionel Sambuc.Nm bzip2 68*4a711beaSLionel Sambuccompresses from standard input to standard output. 69*4a711beaSLionel SambucIn this case, 70*4a711beaSLionel Sambuc.Nm bzip2 71*4a711beaSLionel Sambucwill decline to write compressed output to a terminal, as this would 72*4a711beaSLionel Sambucbe entirely incomprehensible and therefore pointless. 73*4a711beaSLionel Sambuc.Pp 74*4a711beaSLionel Sambuc.Nm bunzip2 75*4a711beaSLionel Sambuc(or 76*4a711beaSLionel Sambuc.Nm bzip2 Fl d ) 77*4a711beaSLionel Sambucdecompresses all specified files. 78*4a711beaSLionel SambucFiles which were not created by 79*4a711beaSLionel Sambuc.Nm bzip2 80*4a711beaSLionel Sambucwill be detected and ignored, and a warning issued. 81*4a711beaSLionel Sambuc.Nm bzip2 82*4a711beaSLionel Sambucattempts to guess the filename for the decompressed file 83*4a711beaSLionel Sambucfrom that of the compressed file as follows: 84*4a711beaSLionel Sambuc.Bl -column "filename.tbz2" "becomes" -offset indent 85*4a711beaSLionel Sambuc.It Pa filename.bz2 Ta becomes Ta Pa filename 86*4a711beaSLionel Sambuc.It Pa filename.bz Ta becomes Ta Pa filename 87*4a711beaSLionel Sambuc.It Pa filename.tbz2 Ta becomes Ta Pa filename.tar 88*4a711beaSLionel Sambuc.It Pa filename.tbz Ta becomes Ta Pa filename.tar 89*4a711beaSLionel Sambuc.It Pa anyothername Ta becomes Ta Pa anyothername.out 90*4a711beaSLionel Sambuc.El 91*4a711beaSLionel Sambuc.Pp 92*4a711beaSLionel SambucIf the file does not end in one of the recognised endings, 93*4a711beaSLionel Sambuc.Pa .bz2 , 94*4a711beaSLionel Sambuc.Pa .bz , 95*4a711beaSLionel Sambuc.Pa .tbz2 , 96*4a711beaSLionel Sambucor 97*4a711beaSLionel Sambuc.Pa .tbz , 98*4a711beaSLionel Sambuc.Nm bzip2 99*4a711beaSLionel Sambuccomplains that it cannot guess the name of the original file, and uses 100*4a711beaSLionel Sambucthe original name with 101*4a711beaSLionel Sambuc.Pa .out 102*4a711beaSLionel Sambucappended. 103*4a711beaSLionel Sambuc.Pp 104*4a711beaSLionel SambucAs with compression, supplying no filenames causes decompression from 105*4a711beaSLionel Sambucstandard input to standard output. 106*4a711beaSLionel Sambuc.Pp 107*4a711beaSLionel Sambuc.Nm bunzip2 108*4a711beaSLionel Sambucwill correctly decompress a file which is the concatenation of two or 109*4a711beaSLionel Sambucmore compressed files. 110*4a711beaSLionel SambucThe result is the concatenation of the corresponding uncompressed 111*4a711beaSLionel Sambucfiles. 112*4a711beaSLionel SambucIntegrity testing 113*4a711beaSLionel Sambuc.Pq Fl t 114*4a711beaSLionel Sambucof concatenated compressed files is also supported. 115*4a711beaSLionel Sambuc.Pp 116*4a711beaSLionel SambucYou can also compress or decompress files to the standard output by 117*4a711beaSLionel Sambucgiving the 118*4a711beaSLionel Sambuc.Fl c 119*4a711beaSLionel Sambucflag. 120*4a711beaSLionel SambucMultiple files may be compressed and decompressed like this. 121*4a711beaSLionel SambucThe resulting outputs are fed sequentially to stdout. 122*4a711beaSLionel SambucCompression of multiple files in this manner generates a stream 123*4a711beaSLionel Sambuccontaining multiple compressed file representations. 124*4a711beaSLionel SambucSuch a stream can be decompressed correctly only by 125*4a711beaSLionel Sambuc.Nm bzip2 126*4a711beaSLionel Sambucversion 0.9.0 or later. 127*4a711beaSLionel SambucEarlier versions of 128*4a711beaSLionel Sambuc.Nm bzip2 129*4a711beaSLionel Sambucwill stop after decompressing 130*4a711beaSLionel Sambucthe first file in the stream. 131*4a711beaSLionel Sambuc.Pp 132*4a711beaSLionel Sambuc.Nm bzcat 133*4a711beaSLionel Sambuc(or 134*4a711beaSLionel Sambuc.Nm bzip2 Fl dc ) 135*4a711beaSLionel Sambucdecompresses all specified files to the standard output. 136*4a711beaSLionel Sambuc.Pp 137*4a711beaSLionel SambucCompression is always performed, even if the compressed file is 138*4a711beaSLionel Sambucslightly larger than the original. 139*4a711beaSLionel SambucFiles of less than about one hundred bytes tend to get larger, since 140*4a711beaSLionel Sambucthe compression mechanism has a constant overhead in the region of 50 141*4a711beaSLionel Sambucbytes. 142*4a711beaSLionel SambucRandom data (including the output of most file compressors) is coded 143*4a711beaSLionel Sambucat about 8.05 bits per byte, giving an expansion of around 0.5%. 144*4a711beaSLionel Sambuc.Pp 145*4a711beaSLionel SambucAs a self-check for your protection, 146*4a711beaSLionel Sambuc.Nm bzip2 147*4a711beaSLionel Sambucuses 32-bit CRCs to make sure that the decompressed version of a file 148*4a711beaSLionel Sambucis identical to the original. 149*4a711beaSLionel SambucThis guards against corruption of the compressed data, and against 150*4a711beaSLionel Sambucundetected bugs in 151*4a711beaSLionel Sambuc.Nm bzip2 152*4a711beaSLionel Sambuc(hopefully very unlikely). 153*4a711beaSLionel SambucThe chances of data corruption going undetected is microscopic, about 154*4a711beaSLionel Sambucone chance in four billion for each file processed. 155*4a711beaSLionel SambucBe aware, though, that the check occurs upon decompression, so it can 156*4a711beaSLionel Sambuconly tell you that something is wrong. 157*4a711beaSLionel SambucIt can't help you recover the original uncompressed data. 158*4a711beaSLionel SambucYou can use 159*4a711beaSLionel Sambuc.Nm bzip2recover 160*4a711beaSLionel Sambucto try to recover data from 161*4a711beaSLionel Sambucdamaged files. 162*4a711beaSLionel Sambuc.Sh OPTIONS 163*4a711beaSLionel Sambuc.Bl -tag -width "XXrepetitiveXfastXX" 164*4a711beaSLionel Sambuc.It Fl Fl 165*4a711beaSLionel SambucTreats all subsequent arguments as file names, even if they start with 166*4a711beaSLionel Sambuca dash. 167*4a711beaSLionel SambucThis is so you can handle files with names beginning with a dash, for 168*4a711beaSLionel Sambucexample: 169*4a711beaSLionel Sambuc.Dl bzip2 -- -myfilename . 170*4a711beaSLionel Sambuc.It Fl 1 , Fl Fl fast 171*4a711beaSLionel Sambucto 172*4a711beaSLionel Sambuc.It Fl 9 , Fl Fl best 173*4a711beaSLionel SambucSet the block size to 100 k, 200 k ... 900 k when compressing. 174*4a711beaSLionel SambucHas no effect when decompressing. 175*4a711beaSLionel SambucSee 176*4a711beaSLionel Sambuc.Sx MEMORY MANAGEMENT 177*4a711beaSLionel Sambucbelow. 178*4a711beaSLionel SambucThe 179*4a711beaSLionel Sambuc.Fl Fl fast 180*4a711beaSLionel Sambucand 181*4a711beaSLionel Sambuc.Fl Fl best 182*4a711beaSLionel Sambucaliases are primarily for GNU 183*4a711beaSLionel Sambuc.Xr gzip 1 184*4a711beaSLionel Sambuccompatibility. 185*4a711beaSLionel SambucIn particular, 186*4a711beaSLionel Sambuc.Fl Fl fast 187*4a711beaSLionel Sambucdoesn't make things significantly faster, and 188*4a711beaSLionel Sambuc.Fl Fl best 189*4a711beaSLionel Sambucmerely selects the default behaviour. 190*4a711beaSLionel Sambuc.It Fl c , Fl Fl stdout 191*4a711beaSLionel SambucCompress or decompress to standard output. 192*4a711beaSLionel Sambuc.It Fl d , Fl Fl decompress 193*4a711beaSLionel SambucForce decompression. 194*4a711beaSLionel Sambuc.Nm bzip2 , 195*4a711beaSLionel Sambuc.Nm bunzip2 , 196*4a711beaSLionel Sambucand 197*4a711beaSLionel Sambuc.Nm bzcat 198*4a711beaSLionel Sambucare really the same program, and the decision about what actions to 199*4a711beaSLionel Sambuctake is done on the basis of which name is used. 200*4a711beaSLionel SambucThis flag overrides that mechanism, and forces 201*4a711beaSLionel Sambuc.Nm bzip2 202*4a711beaSLionel Sambucto decompress. 203*4a711beaSLionel Sambuc.It Fl f , Fl Fl force 204*4a711beaSLionel SambucForce overwrite of output files. 205*4a711beaSLionel SambucNormally, 206*4a711beaSLionel Sambuc.Nm bzip2 207*4a711beaSLionel Sambucwill not overwrite existing output files. 208*4a711beaSLionel SambucAlso forces 209*4a711beaSLionel Sambuc.Nm bzip2 210*4a711beaSLionel Sambucto break hard links 211*4a711beaSLionel Sambucto files, which it otherwise wouldn't do. 212*4a711beaSLionel Sambuc.Pp 213*4a711beaSLionel Sambuc.Nm bzip2 214*4a711beaSLionel Sambucnormally declines to decompress files which don't have the correct 215*4a711beaSLionel Sambucmagic header bytes. 216*4a711beaSLionel SambucIf forced 217*4a711beaSLionel Sambuc.Pq Fl f , 218*4a711beaSLionel Sambuchowever, it will pass such files through unmodified. 219*4a711beaSLionel SambucThis is how GNU 220*4a711beaSLionel Sambuc.Xr gzip 1 221*4a711beaSLionel Sambucbehaves. 222*4a711beaSLionel Sambuc.It Fl k , Fl Fl keep 223*4a711beaSLionel SambucKeep (don't delete) input files during compression 224*4a711beaSLionel Sambucor decompression. 225*4a711beaSLionel Sambuc.It Fl L , Fl Fl license 226*4a711beaSLionel SambucDisplay the license terms and conditions. 227*4a711beaSLionel Sambuc.It Fl q , Fl Fl quiet 228*4a711beaSLionel SambucSuppress non-essential warning messages. 229*4a711beaSLionel SambucMessages pertaining to I/O errors and other critical events will not 230*4a711beaSLionel Sambucbe suppressed. 231*4a711beaSLionel Sambuc.It Fl Fl repetitive-fast 232*4a711beaSLionel Sambuc.It Fl Fl repetitive-best 233*4a711beaSLionel SambucThese flags are redundant in versions 0.9.5 and above. 234*4a711beaSLionel SambucThey provided some coarse control over the behaviour of the sorting 235*4a711beaSLionel Sambucalgorithm in earlier versions, which was sometimes useful. 236*4a711beaSLionel Sambuc0.9.5 and above have an improved algorithm which renders these flags 237*4a711beaSLionel Sambucirrelevant. 238*4a711beaSLionel Sambuc.It Fl s , Fl Fl small 239*4a711beaSLionel SambucReduce memory usage, for compression, decompression and testing. 240*4a711beaSLionel SambucFiles are decompressed and tested using a modified algorithm which 241*4a711beaSLionel Sambuconly requires 2.5 bytes per block byte. 242*4a711beaSLionel SambucThis means any file can be decompressed in 2300k of memory, albeit at 243*4a711beaSLionel Sambucabout half the normal speed. 244*4a711beaSLionel SambucDuring compression, 245*4a711beaSLionel Sambuc.Fl s 246*4a711beaSLionel Sambucselects a block size of 200k, which limits memory use to around the 247*4a711beaSLionel Sambucsame figure, at the expense of your compression ratio. 248*4a711beaSLionel SambucIn short, if your machine is low on memory (8 megabytes or less), use 249*4a711beaSLionel Sambuc.Fl s 250*4a711beaSLionel Sambucfor everything. 251*4a711beaSLionel SambucSee 252*4a711beaSLionel Sambuc.Sx MEMORY MANAGEMENT 253*4a711beaSLionel Sambucbelow. 254*4a711beaSLionel Sambuc.It Fl t , Fl Fl test 255*4a711beaSLionel SambucCheck integrity of the specified file(s), but don't decompress them. 256*4a711beaSLionel SambucThis really performs a trial decompression and throws away the result. 257*4a711beaSLionel Sambuc.It Fl V , Fl Fl version 258*4a711beaSLionel SambucDisplay the software version. 259*4a711beaSLionel Sambuc.It Fl v , Fl Fl verbose 260*4a711beaSLionel SambucVerbose mode: show the compression ratio for each file processed. 261*4a711beaSLionel SambucFurther 262*4a711beaSLionel Sambuc.Fl v Ap s 263*4a711beaSLionel Sambucincrease the verbosity level, spewing out lots of information which is 264*4a711beaSLionel Sambucprimarily of interest for diagnostic purposes. 265*4a711beaSLionel Sambuc.It Fl z , Fl Fl compress 266*4a711beaSLionel SambucThe complement to 267*4a711beaSLionel SambucFl d : 268*4a711beaSLionel Sambucforces compression, regardless of the invocation name. 269*4a711beaSLionel Sambuc.El 270*4a711beaSLionel Sambuc.Ss MEMORY MANAGEMENT 271*4a711beaSLionel Sambuc.Nm bzip2 272*4a711beaSLionel Sambuccompresses large files in blocks. 273*4a711beaSLionel SambucThe block size affects both the compression ratio achieved, and the 274*4a711beaSLionel Sambucamount of memory needed for compression and decompression. 275*4a711beaSLionel SambucThe flags 276*4a711beaSLionel Sambuc.Fl 1 277*4a711beaSLionel Sambucthrough 278*4a711beaSLionel Sambuc.Fl 9 279*4a711beaSLionel Sambucspecify the block size to be 100,000 bytes through 900,000 bytes (the 280*4a711beaSLionel Sambucdefault) respectively. 281*4a711beaSLionel SambucAt decompression time, the block size used for compression is read 282*4a711beaSLionel Sambucfrom the header of the compressed file, and 283*4a711beaSLionel Sambuc.Nm bunzip2 284*4a711beaSLionel Sambucthen allocates itself just enough memory to decompress the file. 285*4a711beaSLionel SambucSince block sizes are stored in compressed files, it follows that the 286*4a711beaSLionel Sambucflags 287*4a711beaSLionel Sambuc.Fl 1 288*4a711beaSLionel Sambucto 289*4a711beaSLionel Sambuc.Fl 9 290*4a711beaSLionel Sambucare irrelevant to and so ignored during decompression. 291*4a711beaSLionel Sambuc.Pp 292*4a711beaSLionel SambucCompression and decompression requirements, in bytes, can be estimated 293*4a711beaSLionel Sambucas: 294*4a711beaSLionel Sambuc.Bl -tag -width "Decompression:" -offset indent 295*4a711beaSLionel Sambuc.It Compression : 296*4a711beaSLionel Sambuc400k + ( 8 x block size ) 297*4a711beaSLionel Sambuc.It Decompression : 298*4a711beaSLionel Sambuc100k + ( 4 x block size ), or 100k + ( 2.5 x block size ) 299*4a711beaSLionel Sambuc.El 300*4a711beaSLionel SambucLarger block sizes give rapidly diminishing marginal returns. 301*4a711beaSLionel SambucMost of the compression comes from the first two or three hundred k of 302*4a711beaSLionel Sambucblock size, a fact worth bearing in mind when using 303*4a711beaSLionel Sambuc.Nm bzip2 304*4a711beaSLionel Sambucon small machines. 305*4a711beaSLionel SambucIt is also important to appreciate that the decompression memory 306*4a711beaSLionel Sambucrequirement is set at compression time by the choice of block size. 307*4a711beaSLionel Sambuc.Pp 308*4a711beaSLionel SambucFor files compressed with the default 900k block size, 309*4a711beaSLionel Sambuc.Nm bunzip2 310*4a711beaSLionel Sambucwill require about 3700 kbytes to decompress. 311*4a711beaSLionel SambucTo support decompression of any file on a 4 megabyte machine, 312*4a711beaSLionel Sambuc.Nm bunzip2 313*4a711beaSLionel Sambuchas an option to decompress using approximately half this amount of 314*4a711beaSLionel Sambucmemory, about 2300 kbytes. 315*4a711beaSLionel SambucDecompression speed is also halved, so you should use this option only 316*4a711beaSLionel Sambucwhere necessary. 317*4a711beaSLionel SambucThe relevant flag is 318*4a711beaSLionel Sambuc.Fl s . 319*4a711beaSLionel Sambuc.Pp 320*4a711beaSLionel SambucIn general, try and use the largest block size memory constraints 321*4a711beaSLionel Sambucallow, since that maximises the compression achieved. 322*4a711beaSLionel SambucCompression and decompression speed are virtually unaffected by block 323*4a711beaSLionel Sambucsize. 324*4a711beaSLionel Sambuc.Pp 325*4a711beaSLionel SambucAnother significant point applies to files which fit in a single block 326*4a711beaSLionel Sambuc-- that means most files you'd encounter using a large block size. 327*4a711beaSLionel SambucThe amount of real memory touched is proportional to the size of the 328*4a711beaSLionel Sambucfile, since the file is smaller than a block. 329*4a711beaSLionel SambucFor example, compressing a file 20,000 bytes long with the flag 330*4a711beaSLionel Sambuc.Fl 9 331*4a711beaSLionel Sambucwill cause the compressor to allocate around 7600k of memory, but only 332*4a711beaSLionel Sambuctouch 400k + 20000 * 8 = 560 kbytes of it. 333*4a711beaSLionel SambucSimilarly, the decompressor will allocate 3700k but only touch 100k + 334*4a711beaSLionel Sambuc20000 * 4 = 180 kbytes. 335*4a711beaSLionel Sambuc.Pp 336*4a711beaSLionel SambucHere is a table which summarises the maximum memory usage for different 337*4a711beaSLionel Sambucblock sizes. 338*4a711beaSLionel SambucAlso recorded is the total compressed size for 14 files of the Calgary 339*4a711beaSLionel SambucText Compression Corpus totalling 3,141,622 bytes. 340*4a711beaSLionel SambucThis column gives some feel for how compression varies with block size. 341*4a711beaSLionel SambucThese figures tend to understate the advantage of larger block sizes 342*4a711beaSLionel Sambucfor larger files, since the Corpus is dominated by smaller files. 343*4a711beaSLionel Sambuc.Bl -column "Flag" "Compression" "Decompression" "DecompressionXXs" "Corpus size" 344*4a711beaSLionel Sambuc.It Sy Flag Ta Sy Compression Ta Sy Decompression Ta Sy Decompression Fl s Ta Sy Corpus size 345*4a711beaSLionel Sambuc.It -1 Ta 1200k Ta 500k Ta 350k Ta 914704 346*4a711beaSLionel Sambuc.It -2 Ta 2000k Ta 900k Ta 600k Ta 877703 347*4a711beaSLionel Sambuc.It -3 Ta 2800k Ta 1300k Ta 850k Ta 860338 348*4a711beaSLionel Sambuc.It -4 Ta 3600k Ta 1700k Ta 1100k Ta 846899 349*4a711beaSLionel Sambuc.It -5 Ta 4400k Ta 2100k Ta 1350k Ta 845160 350*4a711beaSLionel Sambuc.It -6 Ta 5200k Ta 2500k Ta 1600k Ta 838626 351*4a711beaSLionel Sambuc.It -7 Ta 6100k Ta 2900k Ta 1850k Ta 834096 352*4a711beaSLionel Sambuc.It -8 Ta 6800k Ta 3300k Ta 2100k Ta 828642 353*4a711beaSLionel Sambuc.It -9 Ta 7600k Ta 3700k Ta 2350k Ta 828642 354*4a711beaSLionel Sambuc.El 355*4a711beaSLionel Sambuc.Ss RECOVERING DATA FROM DAMAGED FILES 356*4a711beaSLionel Sambuc.Nm bzip2 357*4a711beaSLionel Sambuccompresses files in blocks, usually 900kbytes long. 358*4a711beaSLionel SambucEach block is handled independently. 359*4a711beaSLionel SambucIf a media or transmission error causes a multi-block 360*4a711beaSLionel Sambuc.Pa .bz2 361*4a711beaSLionel Sambucfile to become damaged, it may be possible to recover data from the 362*4a711beaSLionel Sambucundamaged blocks in the file. 363*4a711beaSLionel Sambuc.Pp 364*4a711beaSLionel SambucThe compressed representation of each block is delimited by a 48-bit 365*4a711beaSLionel Sambucpattern, which makes it possible to find the block boundaries with 366*4a711beaSLionel Sambucreasonable certainty. 367*4a711beaSLionel SambucEach block also carries its own 32-bit CRC, so damaged blocks can be 368*4a711beaSLionel Sambucdistinguished from undamaged ones. 369*4a711beaSLionel Sambuc.Pp 370*4a711beaSLionel Sambuc.Nm bzip2recover 371*4a711beaSLionel Sambucis a simple program whose purpose is to search for blocks in 372*4a711beaSLionel Sambuc.Pa .bz2 373*4a711beaSLionel Sambucfiles, and write each block out into its own 374*4a711beaSLionel Sambuc.Pa .bz2 375*4a711beaSLionel Sambucfile. 376*4a711beaSLionel SambucYou can then use 377*4a711beaSLionel Sambuc.Nm bzip2 378*4a711beaSLionel Sambuc.Fl t 379*4a711beaSLionel Sambucto test the integrity of the resulting files, and decompress those 380*4a711beaSLionel Sambucwhich are undamaged. 381*4a711beaSLionel Sambuc.Pp 382*4a711beaSLionel Sambuc.Nm bzip2recover 383*4a711beaSLionel Sambuctakes a single argument, the name of the damaged file, and writes a 384*4a711beaSLionel Sambucnumber of files 385*4a711beaSLionel Sambuc.Dq Pa rec00001file.bz2 , 386*4a711beaSLionel Sambuc.Dq Pa rec00002file.bz2 , 387*4a711beaSLionel Sambucetc., containing the extracted blocks. 388*4a711beaSLionel SambucThe output filenames are designed so that the use of wildcards in 389*4a711beaSLionel Sambucsubsequent processing -- for example, 390*4a711beaSLionel Sambuc.Dl bzip2 -dc rec*file.bz2 \*[Gt] recovered_data 391*4a711beaSLionel Sambuc-- processes the files in the correct order. 392*4a711beaSLionel Sambuc.Pp 393*4a711beaSLionel Sambuc.Nm bzip2recover 394*4a711beaSLionel Sambucshould be of most use dealing with large 395*4a711beaSLionel Sambuc.Pa .bz2 396*4a711beaSLionel Sambucfiles, as these will contain many blocks. 397*4a711beaSLionel SambucIt is clearly futile to use it on damaged single-block files, since a 398*4a711beaSLionel Sambucdamaged block cannot be recovered. 399*4a711beaSLionel SambucIf you wish to minimise any potential data loss through media or 400*4a711beaSLionel Sambuctransmission errors, you might consider compressing with a smaller 401*4a711beaSLionel Sambucblock size. 402*4a711beaSLionel Sambuc.Ss PERFORMANCE NOTES 403*4a711beaSLionel SambucThe sorting phase of compression gathers together similar strings in 404*4a711beaSLionel Sambucthe file. 405*4a711beaSLionel SambucBecause of this, files containing very long runs of repeated 406*4a711beaSLionel Sambucsymbols, like 407*4a711beaSLionel Sambuc.Dq aabaabaabaab... 408*4a711beaSLionel Sambuc(repeated several hundred times) may compress more slowly than normal. 409*4a711beaSLionel SambucVersions 0.9.5 and above fare much better than previous versions in 410*4a711beaSLionel Sambucthis respect. 411*4a711beaSLionel SambucThe ratio between worst-case and average-case compression time is in 412*4a711beaSLionel Sambucthe region of 10:1. 413*4a711beaSLionel SambucFor previous versions, this figure was more like 100:1. 414*4a711beaSLionel SambucYou can use the 415*4a711beaSLionel Sambuc.Fl vvvv 416*4a711beaSLionel Sambucoption to monitor progress in great detail, if you want. 417*4a711beaSLionel Sambuc.Pp 418*4a711beaSLionel SambucDecompression speed is unaffected by these phenomena. 419*4a711beaSLionel Sambuc.Pp 420*4a711beaSLionel Sambuc.Nm bzip2 421*4a711beaSLionel Sambucusually allocates several megabytes of memory to operate in, and then 422*4a711beaSLionel Sambuccharges all over it in a fairly random fashion. 423*4a711beaSLionel SambucThis means that performance, both for compressing and decompressing, 424*4a711beaSLionel Sambucis largely determined by the speed at which your machine can service 425*4a711beaSLionel Sambuccache misses. 426*4a711beaSLionel SambucBecause of this, small changes to the code to reduce the miss rate 427*4a711beaSLionel Sambuchave been observed to give disproportionately large performance 428*4a711beaSLionel Sambucimprovements. 429*4a711beaSLionel SambucI imagine 430*4a711beaSLionel Sambuc.Nm bzip2 431*4a711beaSLionel Sambucwill perform best on machines with very large caches. 432*4a711beaSLionel Sambuc.Sh ENVIRONMENT 433*4a711beaSLionel Sambuc.Nm bzip2 434*4a711beaSLionel Sambucwill read arguments from the environment variables 435*4a711beaSLionel Sambuc.Ev BZIP2 436*4a711beaSLionel Sambucand 437*4a711beaSLionel Sambuc.Ev BZIP , 438*4a711beaSLionel Sambucin that order, and will process them before any arguments read from 439*4a711beaSLionel Sambucthe command line. 440*4a711beaSLionel SambucThis gives a convenient way to supply default arguments. 441*4a711beaSLionel Sambuc.Sh EXIT STATUS 442*4a711beaSLionel Sambuc0 for a normal exit, 1 for environmental problems (file not found, 443*4a711beaSLionel Sambucinvalid flags, I/O errors, etc.), 2 to indicate a corrupt compressed 444*4a711beaSLionel Sambucfile, 3 for an internal consistency error (e.g., bug) which caused 445*4a711beaSLionel Sambuc.Nm bzip2 446*4a711beaSLionel Sambucto panic. 447*4a711beaSLionel Sambuc.Sh AUTHORS 448*4a711beaSLionel Sambuc.An -nosplit 449*4a711beaSLionel Sambuc.An Julian Seward 450*4a711beaSLionel Sambuc.Aq jseward@bzip.org 451*4a711beaSLionel Sambuc.Pp 452*4a711beaSLionel Sambuc.Pa http://www.bzip.org 453*4a711beaSLionel Sambuc.Pp 454*4a711beaSLionel SambucThe ideas embodied in 455*4a711beaSLionel Sambuc.Nm bzip2 456*4a711beaSLionel Sambucare due to (at least) the following people: 457*4a711beaSLionel Sambuc.An Michael Burrows 458*4a711beaSLionel Sambucand 459*4a711beaSLionel Sambuc.An David Wheeler 460*4a711beaSLionel Sambuc(for the block sorting transformation), 461*4a711beaSLionel Sambuc.An David Wheeler 462*4a711beaSLionel Sambuc(again, for the Huffman coder), 463*4a711beaSLionel Sambuc.An Peter Fenwick 464*4a711beaSLionel Sambuc(for the structured coding model in the original 465*4a711beaSLionel Sambuc.Nm bzip , 466*4a711beaSLionel Sambucand many refinements), and 467*4a711beaSLionel Sambuc.An Alistair Moffat , 468*4a711beaSLionel Sambuc.An Radford Neal , 469*4a711beaSLionel Sambucand 470*4a711beaSLionel Sambuc.An Ian Witten 471*4a711beaSLionel Sambuc(for the arithmetic coder in the original 472*4a711beaSLionel Sambuc.Nm bzip ) . 473*4a711beaSLionel SambucI am much indebted for their help, support and advice. 474*4a711beaSLionel SambucSee the manual in the source distribution for pointers to sources of 475*4a711beaSLionel Sambucdocumentation. 476*4a711beaSLionel SambucChristian von Roques encouraged me to look for faster sorting 477*4a711beaSLionel Sambucalgorithms, so as to speed up compression. 478*4a711beaSLionel SambucBela Lubkin encouraged me to improve the worst-case compression 479*4a711beaSLionel Sambucperformance. 480*4a711beaSLionel SambucDonna Robinson XMLised the documentation. 481*4a711beaSLionel SambucThe bz* scripts are derived from those of GNU gzip. 482*4a711beaSLionel SambucMany people sent patches, helped with portability problems, lent 483*4a711beaSLionel Sambucmachines, gave advice and were generally helpful. 484*4a711beaSLionel Sambuc.Sh CAVEATS 485*4a711beaSLionel SambucI/O error messages are not as helpful as they could be. 486*4a711beaSLionel Sambuc.Nm bzip2 487*4a711beaSLionel Sambuctries hard to detect I/O errors and exit cleanly, but the details of 488*4a711beaSLionel Sambucwhat the problem is sometimes seem rather misleading. 489*4a711beaSLionel Sambuc.Pp 490*4a711beaSLionel SambucThis manual page pertains to version 1.0.6 of 491*4a711beaSLionel Sambuc.Nm bzip2 . 492*4a711beaSLionel SambucCompressed data created by this version is entirely forwards and 493*4a711beaSLionel Sambucbackwards compatible with the previous public releases, versions 494*4a711beaSLionel Sambuc0.1pl2, 0.9.0, 0.9.5, 1.0.0, 1.0.1, 1.0.2 and above, but with the 495*4a711beaSLionel Sambucfollowing exception: 0.9.0 and above can correctly decompress multiple 496*4a711beaSLionel Sambucconcatenated compressed files. 497*4a711beaSLionel Sambuc0.1pl2 cannot do this; it will stop after decompressing just the first 498*4a711beaSLionel Sambucfile in the stream. 499*4a711beaSLionel Sambuc.Pp 500*4a711beaSLionel Sambuc.Nm bzip2recover 501*4a711beaSLionel Sambucversions prior to 1.0.2 used 32-bit integers to represent bit 502*4a711beaSLionel Sambucpositions in compressed files, so they could not handle compressed 503*4a711beaSLionel Sambucfiles more than 512 megabytes long. 504*4a711beaSLionel SambucVersions 1.0.2 and above use 64-bit ints on some platforms which 505*4a711beaSLionel Sambucsupport them (GNU supported targets, and Windows). 506*4a711beaSLionel SambucTo establish whether or not 507*4a711beaSLionel Sambuc.Nm bzip2recover 508*4a711beaSLionel Sambucwas built with such a limitation, run it without arguments. 509*4a711beaSLionel SambucIn any event you can build yourself an unlimited version if you can 510*4a711beaSLionel Sambucrecompile it with MaybeUInt64 set to be an unsigned 64-bit integer. 511