1 2=head1 NAME 3 4IO::Compress::FAQ -- Frequently Asked Questions about IO::Compress 5 6=head1 DESCRIPTION 7 8Common questions answered. 9 10=head1 GENERAL 11 12=head2 Compatibility with Unix compress/uncompress. 13 14Although C<Compress::Zlib> has a pair of functions called C<compress> and 15C<uncompress>, they are I<not> related to the Unix programs of the same 16name. The C<Compress::Zlib> module is not compatible with Unix 17C<compress>. 18 19If you have the C<uncompress> program available, you can use this to read 20compressed files 21 22 open F, "uncompress -c $filename |"; 23 while (<F>) 24 { 25 ... 26 27Alternatively, if you have the C<gunzip> program available, you can use 28this to read compressed files 29 30 open F, "gunzip -c $filename |"; 31 while (<F>) 32 { 33 ... 34 35and this to write compress files, if you have the C<compress> program 36available 37 38 open F, "| compress -c $filename "; 39 print F "data"; 40 ... 41 close F ; 42 43=head2 Accessing .tar.Z files 44 45The C<Archive::Tar> module can optionally use C<Compress::Zlib> (via the 46C<IO::Zlib> module) to access tar files that have been compressed with 47C<gzip>. Unfortunately tar files compressed with the Unix C<compress> 48utility cannot be read by C<Compress::Zlib> and so cannot be directly 49accessed by C<Archive::Tar>. 50 51If the C<uncompress> or C<gunzip> programs are available, you can use one 52of these workarounds to read C<.tar.Z> files from C<Archive::Tar> 53 54Firstly with C<uncompress> 55 56 use strict; 57 use warnings; 58 use Archive::Tar; 59 60 open F, "uncompress -c $filename |"; 61 my $tar = Archive::Tar->new(*F); 62 ... 63 64and this with C<gunzip> 65 66 use strict; 67 use warnings; 68 use Archive::Tar; 69 70 open F, "gunzip -c $filename |"; 71 my $tar = Archive::Tar->new(*F); 72 ... 73 74Similarly, if the C<compress> program is available, you can use this to 75write a C<.tar.Z> file 76 77 use strict; 78 use warnings; 79 use Archive::Tar; 80 use IO::File; 81 82 my $fh = new IO::File "| compress -c >$filename"; 83 my $tar = Archive::Tar->new(); 84 ... 85 $tar->write($fh); 86 $fh->close ; 87 88=head2 How do I recompress using a different compression? 89 90This is easier that you might expect if you realise that all the 91C<IO::Compress::*> objects are derived from C<IO::File> and that all the 92C<IO::Uncompress::*> modules can read from an C<IO::File> filehandle. 93 94So, for example, say you have a file compressed with gzip that you want to 95recompress with bzip2. Here is all that is needed to carry out the 96recompression. 97 98 use IO::Uncompress::Gunzip ':all'; 99 use IO::Compress::Bzip2 ':all'; 100 101 my $gzipFile = "somefile.gz"; 102 my $bzipFile = "somefile.bz2"; 103 104 my $gunzip = new IO::Uncompress::Gunzip $gzipFile 105 or die "Cannot gunzip $gzipFile: $GunzipError\n" ; 106 107 bzip2 $gunzip => $bzipFile 108 or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ; 109 110Note, there is a limitation of this technique. Some compression file 111formats store extra information along with the compressed data payload. For 112example, gzip can optionally store the original filename and Zip stores a 113lot of information about the original file. If the original compressed file 114contains any of this extra information, it will not be transferred to the 115new compressed file using the technique above. 116 117=head1 ZIP 118 119=head2 What Compression Types do IO::Compress::Zip & IO::Uncompress::Unzip support? 120 121The following compression formats are supported by C<IO::Compress::Zip> and 122C<IO::Uncompress::Unzip> 123 124=over 5 125 126=item * Store (method 0) 127 128No compression at all. 129 130=item * Deflate (method 8) 131 132This is the default compression used when creating a zip file with 133C<IO::Compress::Zip>. 134 135=item * Bzip2 (method 12) 136 137Only supported if the C<IO-Compress-Bzip2> module is installed. 138 139=item * Lzma (method 14) 140 141Only supported if the C<IO-Compress-Lzma> module is installed. 142 143=back 144 145=head2 Can I Read/Write Zip files larger the 4 Gig? 146 147Yes, both the C<IO-Compress-Zip> and C<IO-Uncompress-Unzip> modules 148support the zip feature called I<Zip64>. That allows them to read/write 149files/buffers larger than 4Gig. 150 151If you are creating a Zip file using the one-shot interface, and any of the 152input files is greater than 4Gig, a zip64 complaint zip file will be 153created. 154 155 zip "really-large-file" => "my.zip"; 156 157Similarly with the one-shot interface, if the input is a buffer larger than 1584 Gig, a zip64 complaint zip file will be created. 159 160 zip \$really_large_buffer => "my.zip"; 161 162The one-shot interface allows you to force the creation of a zip64 zip file 163by including the C<Zip64> option. 164 165 zip $filehandle => "my.zip", Zip64 => 1; 166 167If you want to create a zip64 zip file with the OO interface you must 168specify the C<Zip64> option. 169 170 my $zip = new IO::Compress::Zip "whatever", Zip64 => 1; 171 172When uncompressing with C<IO-Uncompress-Unzip>, it will automatically 173detect if the zip file is zip64. 174 175If you intend to manipulate the Zip64 zip files created with 176C<IO-Compress-Zip> using an external zip/unzip, make sure that it supports 177Zip64. 178 179In particular, if you are using Info-Zip you need to have zip version 3.x 180or better to update a Zip64 archive and unzip version 6.x to read a zip64 181archive. 182 183=head2 Can I write more that 64K entries is a Zip files? 184 185Yes. Zip64 allows this. See previous question. 186 187=head2 Zip Resources 188 189The primary reference for zip files is the "appnote" document available at 190L<http://www.pkware.com/documents/casestudies/APPNOTE.TXT> 191 192An alternatively is the Info-Zip appnote. This is available from 193L<ftp://ftp.info-zip.org/pub/infozip/doc/> 194 195=head1 GZIP 196 197=head2 Gzip Resources 198 199The primary reference for gzip files is RFC 1952 200L<http://www.faqs.org/rfcs/rfc1952.html> 201 202The primary site for gzip is F<http://www.gzip.org>. 203 204=head2 Dealing with Concatenated gzip files 205 206If the gunzip program encounters a file containing multiple gzip files 207concatenated together it will automatically uncompress them all. 208The example below illustrates this behaviour 209 210 $ echo abc | gzip -c >x.gz 211 $ echo def | gzip -c >>x.gz 212 $ gunzip -c x.gz 213 abc 214 def 215 216By default C<IO::Uncompress::Gunzip> will I<not> behave like the gunzip 217program. It will only uncompress the first gzip data stream in the file, as 218shown below 219 220 $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT' 221 abc 222 223To force C<IO::Uncompress::Gunzip> to uncompress all the gzip data streams, 224include the C<MultiStream> option, as shown below 225 226 $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT, MultiStream => 1' 227 abc 228 def 229 230=head1 ZLIB 231 232=head2 Zlib Resources 233 234The primary site for the I<zlib> compression library is 235F<http://www.zlib.org>. 236 237=head1 Bzip2 238 239=head2 Bzip2 Resources 240 241The primary site for bzip2 is F<http://www.bzip.org>. 242 243=head2 Dealing with Concatenated bzip2 files 244 245If the bunzip2 program encounters a file containing multiple bzip2 files 246concatenated together it will automatically uncompress them all. 247The example below illustrates this behaviour 248 249 $ echo abc | bzip2 -c >x.bz2 250 $ echo def | bzip2 -c >>x.bz2 251 $ bunzip2 -c x.bz2 252 abc 253 def 254 255By default C<IO::Uncompress::Bunzip2> will I<not> behave like the bunzip2 256program. It will only uncompress the first bunzip2 data stream in the file, as 257shown below 258 259 $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT' 260 abc 261 262To force C<IO::Uncompress::Bunzip2> to uncompress all the bzip2 data streams, 263include the C<MultiStream> option, as shown below 264 265 $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT, MultiStream => 1' 266 abc 267 def 268 269=head2 Interoperating with Pbzip2 270 271Pbzip2 (L<http://compression.ca/pbzip2/>) is a parallel implementation of 272bzip2. The output from pbzip2 consists of a series of concatenated bzip2 273data streams. 274 275By default C<IO::Uncompress::Bzip2> will only uncompress the first bzip2 276data stream in a pbzip2 file. To uncompress the complete pbzip2 file you 277must include the C<MultiStream> option, like this. 278 279 bunzip2 $input => \$output, MultiStream => 1 280 or die "bunzip2 failed: $Bunzip2Error\n"; 281 282=head1 HTTP & NETWORK 283 284=head2 Apache::GZip Revisited 285 286Below is a mod_perl Apache compression module, called C<Apache::GZip>, 287taken from 288F<http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression> 289 290 package Apache::GZip; 291 #File: Apache::GZip.pm 292 293 use strict vars; 294 use Apache::Constants ':common'; 295 use Compress::Zlib; 296 use IO::File; 297 use constant GZIP_MAGIC => 0x1f8b; 298 use constant OS_MAGIC => 0x03; 299 300 sub handler { 301 my $r = shift; 302 my ($fh,$gz); 303 my $file = $r->filename; 304 return DECLINED unless $fh=IO::File->new($file); 305 $r->header_out('Content-Encoding'=>'gzip'); 306 $r->send_http_header; 307 return OK if $r->header_only; 308 309 tie *STDOUT,'Apache::GZip',$r; 310 print($_) while <$fh>; 311 untie *STDOUT; 312 return OK; 313 } 314 315 sub TIEHANDLE { 316 my($class,$r) = @_; 317 # initialize a deflation stream 318 my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef; 319 320 # gzip header -- don't ask how I found out 321 $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC)); 322 323 return bless { r => $r, 324 crc => crc32(undef), 325 d => $d, 326 l => 0 327 },$class; 328 } 329 330 sub PRINT { 331 my $self = shift; 332 foreach (@_) { 333 # deflate the data 334 my $data = $self->{d}->deflate($_); 335 $self->{r}->print($data); 336 # keep track of its length and crc 337 $self->{l} += length($_); 338 $self->{crc} = crc32($_,$self->{crc}); 339 } 340 } 341 342 sub DESTROY { 343 my $self = shift; 344 345 # flush the output buffers 346 my $data = $self->{d}->flush; 347 $self->{r}->print($data); 348 349 # print the CRC and the total length (uncompressed) 350 $self->{r}->print(pack("LL",@{$self}{qw/crc l/})); 351 } 352 353 1; 354 355Here's the Apache configuration entry you'll need to make use of it. Once 356set it will result in everything in the /compressed directory will be 357compressed automagically. 358 359 <Location /compressed> 360 SetHandler perl-script 361 PerlHandler Apache::GZip 362 </Location> 363 364Although at first sight there seems to be quite a lot going on in 365C<Apache::GZip>, you could sum up what the code was doing as follows -- 366read the contents of the file in C<< $r->filename >>, compress it and write 367the compressed data to standard output. That's all. 368 369This code has to jump through a few hoops to achieve this because 370 371=over 372 373=item 1. 374 375The gzip support in C<Compress::Zlib> version 1.x can only work with a real 376filesystem filehandle. The filehandles used by Apache modules are not 377associated with the filesystem. 378 379=item 2. 380 381That means all the gzip support has to be done by hand - in this case by 382creating a tied filehandle to deal with creating the gzip header and 383trailer. 384 385=back 386 387C<IO::Compress::Gzip> doesn't have that filehandle limitation (this was one 388of the reasons for writing it in the first place). So if 389C<IO::Compress::Gzip> is used instead of C<Compress::Zlib> the whole tied 390filehandle code can be removed. Here is the rewritten code. 391 392 package Apache::GZip; 393 394 use strict vars; 395 use Apache::Constants ':common'; 396 use IO::Compress::Gzip; 397 use IO::File; 398 399 sub handler { 400 my $r = shift; 401 my ($fh,$gz); 402 my $file = $r->filename; 403 return DECLINED unless $fh=IO::File->new($file); 404 $r->header_out('Content-Encoding'=>'gzip'); 405 $r->send_http_header; 406 return OK if $r->header_only; 407 408 my $gz = new IO::Compress::Gzip '-', Minimal => 1 409 or return DECLINED ; 410 411 print $gz $_ while <$fh>; 412 413 return OK; 414 } 415 416or even more succinctly, like this, using a one-shot gzip 417 418 package Apache::GZip; 419 420 use strict vars; 421 use Apache::Constants ':common'; 422 use IO::Compress::Gzip qw(gzip); 423 424 sub handler { 425 my $r = shift; 426 $r->header_out('Content-Encoding'=>'gzip'); 427 $r->send_http_header; 428 return OK if $r->header_only; 429 430 gzip $r->filename => '-', Minimal => 1 431 or return DECLINED ; 432 433 return OK; 434 } 435 436 1; 437 438The use of one-shot C<gzip> above just reads from C<< $r->filename >> and 439writes the compressed data to standard output. 440 441Note the use of the C<Minimal> option in the code above. When using gzip 442for Content-Encoding you should I<always> use this option. In the example 443above it will prevent the filename being included in the gzip header and 444make the size of the gzip data stream a slight bit smaller. 445 446=head2 Compressed files and Net::FTP 447 448The C<Net::FTP> module provides two low-level methods called C<stor> and 449C<retr> that both return filehandles. These filehandles can used with the 450C<IO::Compress/Uncompress> modules to compress or uncompress files read 451from or written to an FTP Server on the fly, without having to create a 452temporary file. 453 454Firstly, here is code that uses C<retr> to uncompressed a file as it is 455read from the FTP Server. 456 457 use Net::FTP; 458 use IO::Uncompress::Gunzip qw(:all); 459 460 my $ftp = new Net::FTP ... 461 462 my $retr_fh = $ftp->retr($compressed_filename); 463 gunzip $retr_fh => $outFilename, AutoClose => 1 464 or die "Cannot uncompress '$compressed_file': $GunzipError\n"; 465 466and this to compress a file as it is written to the FTP Server 467 468 use Net::FTP; 469 use IO::Compress::Gzip qw(:all); 470 471 my $stor_fh = $ftp->stor($filename); 472 gzip "filename" => $stor_fh, AutoClose => 1 473 or die "Cannot compress '$filename': $GzipError\n"; 474 475=head1 MISC 476 477=head2 Using C<InputLength> to uncompress data embedded in a larger file/buffer. 478 479A fairly common use-case is where compressed data is embedded in a larger 480file/buffer and you want to read both. 481 482As an example consider the structure of a zip file. This is a well-defined 483file format that mixes both compressed and uncompressed sections of data in 484a single file. 485 486For the purposes of this discussion you can think of a zip file as sequence 487of compressed data streams, each of which is prefixed by an uncompressed 488local header. The local header contains information about the compressed 489data stream, including the name of the compressed file and, in particular, 490the length of the compressed data stream. 491 492To illustrate how to use C<InputLength> here is a script that walks a zip 493file and prints out how many lines are in each compressed file (if you 494intend write code to walking through a zip file for real see 495L<IO::Uncompress::Unzip/"Walking through a zip file"> ). Also, although 496this example uses the zlib-based compression, the technique can be used by 497the other C<IO::Uncompress::*> modules. 498 499 use strict; 500 use warnings; 501 502 use IO::File; 503 use IO::Uncompress::RawInflate qw(:all); 504 505 use constant ZIP_LOCAL_HDR_SIG => 0x04034b50; 506 use constant ZIP_LOCAL_HDR_LENGTH => 30; 507 508 my $file = $ARGV[0] ; 509 510 my $fh = new IO::File "<$file" 511 or die "Cannot open '$file': $!\n"; 512 513 while (1) 514 { 515 my $sig; 516 my $buffer; 517 518 my $x ; 519 ($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH 520 or die "Truncated file: $!\n"; 521 522 my $signature = unpack ("V", substr($buffer, 0, 4)); 523 524 last unless $signature == ZIP_LOCAL_HDR_SIG; 525 526 # Read Local Header 527 my $gpFlag = unpack ("v", substr($buffer, 6, 2)); 528 my $compressedMethod = unpack ("v", substr($buffer, 8, 2)); 529 my $compressedLength = unpack ("V", substr($buffer, 18, 4)); 530 my $uncompressedLength = unpack ("V", substr($buffer, 22, 4)); 531 my $filename_length = unpack ("v", substr($buffer, 26, 2)); 532 my $extra_length = unpack ("v", substr($buffer, 28, 2)); 533 534 my $filename ; 535 $fh->read($filename, $filename_length) == $filename_length 536 or die "Truncated file\n"; 537 538 $fh->read($buffer, $extra_length) == $extra_length 539 or die "Truncated file\n"; 540 541 if ($compressedMethod != 8 && $compressedMethod != 0) 542 { 543 warn "Skipping file '$filename' - not deflated $compressedMethod\n"; 544 $fh->read($buffer, $compressedLength) == $compressedLength 545 or die "Truncated file\n"; 546 next; 547 } 548 549 if ($compressedMethod == 0 && $gpFlag & 8 == 8) 550 { 551 die "Streamed Stored not supported for '$filename'\n"; 552 } 553 554 next if $compressedLength == 0; 555 556 # Done reading the Local Header 557 558 my $inf = new IO::Uncompress::RawInflate $fh, 559 Transparent => 1, 560 InputLength => $compressedLength 561 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ; 562 563 my $line_count = 0; 564 565 while (<$inf>) 566 { 567 ++ $line_count; 568 } 569 570 print "$filename: $line_count\n"; 571 } 572 573The majority of the code above is concerned with reading the zip local 574header data. The code that I want to focus on is at the bottom. 575 576 while (1) { 577 578 # read local zip header data 579 # get $filename 580 # get $compressedLength 581 582 my $inf = new IO::Uncompress::RawInflate $fh, 583 Transparent => 1, 584 InputLength => $compressedLength 585 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ; 586 587 my $line_count = 0; 588 589 while (<$inf>) 590 { 591 ++ $line_count; 592 } 593 594 print "$filename: $line_count\n"; 595 } 596 597The call to C<IO::Uncompress::RawInflate> creates a new filehandle C<$inf> 598that can be used to read from the parent filehandle C<$fh>, uncompressing 599it as it goes. The use of the C<InputLength> option will guarantee that 600I<at most> C<$compressedLength> bytes of compressed data will be read from 601the C<$fh> filehandle (The only exception is for an error case like a 602truncated file or a corrupt data stream). 603 604This means that once RawInflate is finished C<$fh> will be left at the 605byte directly after the compressed data stream. 606 607Now consider what the code looks like without C<InputLength> 608 609 while (1) { 610 611 # read local zip header data 612 # get $filename 613 # get $compressedLength 614 615 # read all the compressed data into $data 616 read($fh, $data, $compressedLength); 617 618 my $inf = new IO::Uncompress::RawInflate \$data, 619 Transparent => 1, 620 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ; 621 622 my $line_count = 0; 623 624 while (<$inf>) 625 { 626 ++ $line_count; 627 } 628 629 print "$filename: $line_count\n"; 630 } 631 632The difference here is the addition of the temporary variable C<$data>. 633This is used to store a copy of the compressed data while it is being 634uncompressed. 635 636If you know that C<$compressedLength> isn't that big then using temporary 637storage won't be a problem. But if C<$compressedLength> is very large or 638you are writing an application that other people will use, and so have no 639idea how big C<$compressedLength> will be, it could be an issue. 640 641Using C<InputLength> avoids the use of temporary storage and means the 642application can cope with large compressed data streams. 643 644One final point -- obviously C<InputLength> can only be used whenever you 645know the length of the compressed data beforehand, like here with a zip 646file. 647 648=head1 SEE ALSO 649 650L<Compress::Zlib>, L<IO::Compress::Gzip>, L<IO::Uncompress::Gunzip>, L<IO::Compress::Deflate>, L<IO::Uncompress::Inflate>, L<IO::Compress::RawDeflate>, L<IO::Uncompress::RawInflate>, L<IO::Compress::Bzip2>, L<IO::Uncompress::Bunzip2>, L<IO::Compress::Lzma>, L<IO::Uncompress::UnLzma>, L<IO::Compress::Xz>, L<IO::Uncompress::UnXz>, L<IO::Compress::Lzop>, L<IO::Uncompress::UnLzop>, L<IO::Compress::Lzf>, L<IO::Uncompress::UnLzf>, L<IO::Uncompress::AnyInflate>, L<IO::Uncompress::AnyUncompress> 651 652L<IO::Compress::FAQ|IO::Compress::FAQ> 653 654L<File::GlobMapper|File::GlobMapper>, L<Archive::Zip|Archive::Zip>, 655L<Archive::Tar|Archive::Tar>, 656L<IO::Zlib|IO::Zlib> 657 658=head1 AUTHOR 659 660This module was written by Paul Marquess, F<pmqs@cpan.org>. 661 662=head1 MODIFICATION HISTORY 663 664See the Changes file. 665 666=head1 COPYRIGHT AND LICENSE 667 668Copyright (c) 2005-2014 Paul Marquess. All rights reserved. 669 670This program is free software; you can redistribute it and/or 671modify it under the same terms as Perl itself. 672 673