161f28255Scgd 23b93c56bScgd @(#)README 8.1 (Berkeley) 6/9/93 361f28255Scgd 461f28255ScgdCompress version 4.0 improvements over 3.0: 561f28255Scgd o compress() speedup (10-50%) by changing division hash to xor 661f28255Scgd o decompress() speedup (5-10%) 761f28255Scgd o Memory requirements reduced (3-30%) 861f28255Scgd o Stack requirements reduced to less than 4kb 961f28255Scgd o Removed 'Big+Fast' compress code (FBITS) because of compress speedup 1061f28255Scgd o Portability mods for Z8000 and PC/XT (but not zeus 3.2) 1161f28255Scgd o Default to 'quiet' mode 1261f28255Scgd o Unification of 'force' flags 1361f28255Scgd o Manual page overhaul 1461f28255Scgd o Portability enhancement for M_XENIX 1561f28255Scgd o Removed text on #else and #endif 1661f28255Scgd o Added "-V" switch to print version and options 1761f28255Scgd o Added #defines for SIGNED_COMPARE_SLOW 1861f28255Scgd o Added Makefile and "usermem" program 1961f28255Scgd o Removed all floating point computations 2061f28255Scgd o New programs: [deleted] 2161f28255Scgd 2261f28255ScgdThe "usermem" script attempts to determine the maximum process size. Some 2361f28255Scgdediting of the script may be necessary (see the comments). [It should work 2461f28255Scgdfine on 4.3 bsd.] If you can't get it to work at all, just create file 2561f28255Scgd"USERMEM" containing the maximum process size in decimal. 2661f28255Scgd 2761f28255ScgdThe following preprocessor symbols control the compilation of "compress.c": 2861f28255Scgd 2961f28255Scgd o USERMEM Maximum process memory on the system 3061f28255Scgd o SACREDMEM Amount to reserve for other proceses 3161f28255Scgd o SIGNED_COMPARE_SLOW Unsigned compare instructions are faster 3261f28255Scgd o NO_UCHAR Don't use "unsigned char" types 3361f28255Scgd o BITS Overrules default set by USERMEM-SACREDMEM 3461f28255Scgd o vax Generate inline assembler 3561f28255Scgd o interdata Defines SIGNED_COMPARE_SLOW 3661f28255Scgd o M_XENIX Makes arrays < 65536 bytes each 3761f28255Scgd o pdp11 BITS=12, NO_UCHAR 3861f28255Scgd o z8000 BITS=12 3961f28255Scgd o pcxt BITS=12 4061f28255Scgd o BSD4_2 Allow long filenames ( > 14 characters) & 4161f28255Scgd Call setlinebuf(stderr) 4261f28255Scgd 4361f28255ScgdThe difference "usermem-sacredmem" determines the maximum BITS that can be 4461f28255Scgdspecified with the "-b" flag. 4561f28255Scgd 4661f28255Scgdmemory: at least BITS 4761f28255Scgd------ -- ----- ---- 4861f28255Scgd 433,484 16 4961f28255Scgd 229,600 15 5061f28255Scgd 127,536 14 5161f28255Scgd 73,464 13 5261f28255Scgd 0 12 5361f28255Scgd 5461f28255ScgdThe default is BITS=16. 5561f28255Scgd 5661f28255ScgdThe maximum bits can be overrulled by specifying "-DBITS=bits" at 5761f28255Scgdcompilation time. 5861f28255Scgd 5961f28255ScgdWARNING: files compressed on a large machine with more bits than allowed by 6061f28255Scgda version of compress on a smaller machine cannot be decompressed! Use the 6161f28255Scgd"-b12" flag to generate a file on a large machine that can be uncompressed 6261f28255Scgdon a 16-bit machine. 6361f28255Scgd 6461f28255ScgdThe output of compress 4.0 is fully compatible with that of compress 3.0. 6561f28255ScgdIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or 6661f28255Scgdthe output of compress 3.0 may be fed into uncompress 4.0. 6761f28255Scgd 6861f28255ScgdThe output of compress 4.0 not compatible with that of 6961f28255Scgdcompress 2.0. However, compress 4.0 still accepts the output of 7061f28255Scgdcompress 2.0. To generate output that is compatible with compress 7161f28255Scgd2.0, use the undocumented "-C" flag. 7261f28255Scgd 7361f28255Scgd -from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85 7461f28255Scgd-------------------------------- 7561f28255Scgd 7661f28255ScgdEnclosed is compress version 3.0 with the following changes: 7761f28255Scgd 7861f28255Scgd1. "Block" compression is performed. After the BITS run out, the 7961f28255Scgd compression ratio is checked every so often. If it is decreasing, 8061f28255Scgd the table is cleared and a new set of substrings are generated. 8161f28255Scgd 8261f28255Scgd This makes the output of compress 3.0 not compatible with that of 8361f28255Scgd compress 2.0. However, compress 3.0 still accepts the output of 8461f28255Scgd compress 2.0. To generate output that is compatible with compress 8561f28255Scgd 2.0, use the undocumented "-C" flag. 8661f28255Scgd 8761f28255Scgd2. A quiet "-q" flag has been added for use by the news system. 8861f28255Scgd 8961f28255Scgd3. The character chaining has been deleted and the program now uses 9061f28255Scgd hashing. This improves the speed of the program, especially 9161f28255Scgd during decompression. Other speed improvements have been made, 9261f28255Scgd such as using putc() instead of fwrite(). 9361f28255Scgd 9461f28255Scgd4. A large table is used on large machines when a relatively small 9561f28255Scgd number of bits is specified. This saves much time when compressing 9661f28255Scgd for a 16-bit machine on a 32-bit virtual machine. Note that the 9761f28255Scgd speed improvement only occurs when the input file is > 30000 9861f28255Scgd characters, and the -b BITS is less than or equal to the cutoff 9961f28255Scgd described below. 10061f28255Scgd 10161f28255ScgdMost of these changes were made by James A. Woods (ames!jaw). Thank you 10261f28255ScgdJames! 10361f28255Scgd 10461f28255ScgdTo compile compress: 10561f28255Scgd 10661f28255Scgd cc -O -DUSERMEM=usermem -o compress compress.c 10761f28255Scgd 10861f28255ScgdWhere "usermem" is the amount of physical user memory available (in bytes). 10961f28255ScgdIf any physical memory is to be reserved for other processes, put in 11061f28255Scgd"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved. 11161f28255Scgd 11261f28255ScgdThe difference "usermem-sacredmem" determines the maximum BITS that can be 11361f28255Scgdspecified, and the cutoff bits where the large+fast table is used. 11461f28255Scgd 11561f28255Scgdmemory: at least BITS cutoff 11661f28255Scgd------ -- ----- ---- ------ 11761f28255Scgd 4,718,592 16 13 11861f28255Scgd 2,621,440 16 12 11961f28255Scgd 1,572,864 16 11 12061f28255Scgd 1,048,576 16 10 12161f28255Scgd 631,808 16 -- 12261f28255Scgd 329,728 15 -- 12361f28255Scgd 178,176 14 -- 12461f28255Scgd 99,328 13 -- 12561f28255Scgd 0 12 -- 12661f28255Scgd 12761f28255ScgdThe default memory size is 750,000 which gives a maximum BITS=16 and no 12861f28255Scgdlarge+fast table. 12961f28255Scgd 13061f28255ScgdThe maximum bits can be overruled by specifying "-DBITS=bits" at 13161f28255Scgdcompilation time. 13261f28255Scgd 13361f28255ScgdIf your machine doesn't support unsigned characters, define "NO_UCHAR" 13461f28255Scgdwhen compiling. 13561f28255Scgd 13661f28255ScgdIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling. 13761f28255Scgd 13861f28255ScgdAfter compilation, move "compress" to a standard executable location, such 13961f28255Scgdas /usr/local. Then: 14061f28255Scgd cd /usr/local 14161f28255Scgd ln compress uncompress 14261f28255Scgd ln compress zcat 14361f28255Scgd 14461f28255ScgdOn machines that have a fixed stack size (such as Perkin-Elmer), set the 14561f28255Scgdstack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 14661f28255Scgd 14761f28255ScgdNext, install the manual (compress.l). 14861f28255Scgd cp compress.l /usr/man/manl 14961f28255Scgd cd /usr/man/manl 15061f28255Scgd ln compress.l uncompress.l 15161f28255Scgd ln compress.l zcat.l 15261f28255Scgd 15361f28255Scgd - or - 15461f28255Scgd 15561f28255Scgd cp compress.l /usr/man/man1/compress.1 15661f28255Scgd cd /usr/man/man1 15761f28255Scgd ln compress.1 uncompress.1 15861f28255Scgd ln compress.1 zcat.1 15961f28255Scgd 16061f28255Scgd regards, 16161f28255Scgd petsd!joe 16261f28255Scgd 16361f28255ScgdHere is a note from the net: 16461f28255Scgd 16561f28255Scgd>From hplabs!pesnta!amd!turtlevax!ken Sat Jan 5 03:35:20 1985 16661f28255ScgdPath: ames!hplabs!pesnta!amd!turtlevax!ken 16761f28255ScgdFrom: ken@turtlevax.UUCP (Ken Turkowski) 16861f28255ScgdNewsgroups: net.sources 16961f28255ScgdSubject: Re: Compress release 3.0 : sample Makefile 17061f28255ScgdOrganization: CADLINC, Inc. @ Menlo Park, CA 17161f28255Scgd 17261f28255ScgdIn the compress 3.0 source recently posted to mod.sources, there is a 17361f28255Scgd#define variable which can be set for optimum performance on a machine 17461f28255Scgdwith a large amount of memory. A program (usermem) to calculate the 17561f28255Scgduseable amount of physical user memory is enclosed, as well as a sample 17661f28255Scgd4.2bsd Vax Makefile for compress. 17761f28255Scgd 17861f28255ScgdHere is the README file from the previous version of compress (2.0): 17961f28255Scgd 18061f28255Scgd>Enclosed is compress.c version 2.0 with the following bugs fixed: 18161f28255Scgd> 18261f28255Scgd>1. The packed files produced by compress are different on different 18361f28255Scgd> machines and dependent on the vax sysgen option. 18461f28255Scgd> The bug was in the different byte/bit ordering on the 18561f28255Scgd> various machines. This has been fixed. 18661f28255Scgd> 18761f28255Scgd> This version is NOT compatible with the original vax posting 18861f28255Scgd> unless the '-DCOMPATIBLE' option is specified to the C 18961f28255Scgd> compiler. The original posting has a bug which I fixed, 19061f28255Scgd> causing incompatible files. I recommend you NOT to use this 19161f28255Scgd> option unless you already have a lot of packed files from 19261f28255Scgd> the original posting by thomas. 19361f28255Scgd>2. The exit status is not well defined (on some machines) causing the 19461f28255Scgd> scripts to fail. 19561f28255Scgd> The exit status is now 0,1 or 2 and is documented in 19661f28255Scgd> compress.l. 19761f28255Scgd>3. The function getopt() is not available in all C libraries. 19861f28255Scgd> The function getopt() is no longer referenced by the 19961f28255Scgd> program. 20061f28255Scgd>4. Error status is not being checked on the fwrite() and fflush() calls. 20161f28255Scgd> Fixed. 20261f28255Scgd> 20361f28255Scgd>The following enhancements have been made: 20461f28255Scgd> 20561f28255Scgd>1. Added facilities of "compact" into the compress program. "Pack", 20661f28255Scgd> "Unpack", and "Pcat" are no longer required (no longer supplied). 20761f28255Scgd>2. Installed work around for C compiler bug with "-O". 20861f28255Scgd>3. Added a magic number header (\037\235). Put the bits specified 20961f28255Scgd> in the file. 21061f28255Scgd>4. Added "-f" flag to force overwrite of output file. 21161f28255Scgd>5. Added "-c" flag and "zcat" program. 'ln compress zcat' after you 21261f28255Scgd> compile. 21361f28255Scgd>6. The 'uncompress' script has been deleted; simply 21461f28255Scgd> 'ln compress uncompress' after you compile and it will work. 21561f28255Scgd>7. Removed extra bit masking for machines that support unsigned 21661f28255Scgd> characters. If your machine doesn't support unsigned characters, 21761f28255Scgd> define "NO_UCHAR" when compiling. 21861f28255Scgd> 21961f28255Scgd>Compile "compress.c" with "-O -o compress" flags. Move "compress" to a 22061f28255Scgd>standard executable location, such as /usr/local. Then: 22161f28255Scgd> cd /usr/local 22261f28255Scgd> ln compress uncompress 22361f28255Scgd> ln compress zcat 22461f28255Scgd> 22561f28255Scgd>On machines that have a fixed stack size (such as Perkin-Elmer), set the 22661f28255Scgd>stack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 22761f28255Scgd> 22861f28255Scgd>Next, install the manual (compress.l). 22961f28255Scgd> cp compress.l /usr/man/manl - or - 23061f28255Scgd> cp compress.l /usr/man/man1/compress.1 23161f28255Scgd> 23261f28255Scgd>Here is the README that I sent with my first posting: 23361f28255Scgd> 23461f28255Scgd>>Enclosed is a modified version of compress.c, along with scripts to make it 235*5b28f239Srillig>>run identically to pack(1), unpack(1), and pcat(1). Here is what I 23661f28255Scgd>>(petsd!joe) and a colleague (petsd!peora!srd) did: 23761f28255Scgd>> 23861f28255Scgd>>1. Removed VAX dependencies. 23961f28255Scgd>>2. Changed the struct to separate arrays; saves mucho memory. 24061f28255Scgd>>3. Did comparisons in unsigned, where possible. (Faster on Perkin-Elmer.) 24161f28255Scgd>>4. Sorted the character next chain and changed the search to stop 24261f28255Scgd>>prematurely. This saves a lot on the execution time when compressing. 24361f28255Scgd>> 24461f28255Scgd>>This version is totally compatible with the original version. Even though 24561f28255Scgd>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit 24661f28255Scgd>>machine, due to the size of the arrays. 24761f28255Scgd>> 24861f28255Scgd>>Here is the README file from the original author: 24961f28255Scgd>> 25061f28255Scgd>>>Well, with all this discussion about file compression (for news batching 25161f28255Scgd>>>in particular) going around, I decided to implement the text compression 25261f28255Scgd>>>algorithm described in the June Computer magazine. The author claimed 25361f28255Scgd>>>blinding speed and good compression ratios. It's certainly faster than 25461f28255Scgd>>>compact (but, then, what wouldn't be), but it's also the same speed as 25561f28255Scgd>>>pack, and gets better compression than both of them. On 350K bytes of 25661f28255Scgd>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80 25761f28255Scgd>>>seconds, and compress (herein) also took 80 seconds. But, compact and 25861f28255Scgd>>>pack got about 30% compression, whereas compress got over 50%. So, I 25961f28255Scgd>>>decided I had something, and that others might be interested, too. 26061f28255Scgd>>> 26161f28255Scgd>>>As is probably true of compact and pack (although I haven't checked), 26261f28255Scgd>>>the byte order within a word is probably relevant here, but as long as 26361f28255Scgd>>>you stay on a single machine type, you should be ok. (Can anybody 26461f28255Scgd>>>elucidate on this?) There are a couple of asm's in the code (extv and 26561f28255Scgd>>>insv instructions), so anyone porting it to another machine will have to 26661f28255Scgd>>>deal with this anyway (and could probably make it compatible with Vax 26761f28255Scgd>>>byte order at the same time). Anyway, I've linted the code (both with 26861f28255Scgd>>>and without -p), so it should run elsewhere. Note the longs in the 26961f28255Scgd>>>code, you can take these out if you reduce BITS to <= 15. 27061f28255Scgd>>> 27161f28255Scgd>>>Have fun, and as always, if you make good enhancements, or bug fixes, 27261f28255Scgd>>>I'd like to see them. 27361f28255Scgd>>> 27461f28255Scgd>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas) 27561f28255Scgd>> 27661f28255Scgd>> regards, 27761f28255Scgd>> joe 27861f28255Scgd>> 27961f28255Scgd>>-- 28061f28255Scgd>>Full-Name: Joseph M. Orost 28161f28255Scgd>>UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe 28261f28255Scgd>>US Mail: MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724 28361f28255Scgd>>Phone: (201) 870-5844 284