122591Smckusick 2*63059Sbostic @(#)README 8.1 (Berkeley) 06/09/93 322591Smckusick 424834SlepreauCompress version 4.0 improvements over 3.0: 524834Slepreau o compress() speedup (10-50%) by changing division hash to xor 624834Slepreau o decompress() speedup (5-10%) 724834Slepreau o Memory requirements reduced (3-30%) 824834Slepreau o Stack requirements reduced to less than 4kb 924834Slepreau o Removed 'Big+Fast' compress code (FBITS) because of compress speedup 1024834Slepreau o Portability mods for Z8000 and PC/XT (but not zeus 3.2) 1124834Slepreau o Default to 'quiet' mode 1224834Slepreau o Unification of 'force' flags 1324834Slepreau o Manual page overhaul 1424834Slepreau o Portability enhancement for M_XENIX 1524834Slepreau o Removed text on #else and #endif 1624834Slepreau o Added "-V" switch to print version and options 1724834Slepreau o Added #defines for SIGNED_COMPARE_SLOW 1824834Slepreau o Added Makefile and "usermem" program 1924834Slepreau o Removed all floating point computations 2024834Slepreau o New programs: [deleted] 2122732Smckusick 2224834SlepreauThe "usermem" script attempts to determine the maximum process size. Some 2324834Slepreauediting of the script may be necessary (see the comments). [It should work 2424834Slepreaufine on 4.3 bsd.] If you can't get it to work at all, just create file 2524834Slepreau"USERMEM" containing the maximum process size in decimal. 2622732Smckusick 2724834SlepreauThe following preprocessor symbols control the compilation of "compress.c": 2822732Smckusick 2924834Slepreau o USERMEM Maximum process memory on the system 3024834Slepreau o SACREDMEM Amount to reserve for other proceses 3124834Slepreau o SIGNED_COMPARE_SLOW Unsigned compare instructions are faster 3224834Slepreau o NO_UCHAR Don't use "unsigned char" types 3324834Slepreau o BITS Overrules default set by USERMEM-SACREDMEM 3424834Slepreau o vax Generate inline assembler 3524834Slepreau o interdata Defines SIGNED_COMPARE_SLOW 3624834Slepreau o M_XENIX Makes arrays < 65536 bytes each 3724834Slepreau o pdp11 BITS=12, NO_UCHAR 3824834Slepreau o z8000 BITS=12 3924834Slepreau o pcxt BITS=12 4024834Slepreau o BSD4_2 Allow long filenames ( > 14 characters) & 4124834Slepreau Call setlinebuf(stderr) 4224834Slepreau 4324834SlepreauThe difference "usermem-sacredmem" determines the maximum BITS that can be 4424834Slepreauspecified with the "-b" flag. 4524834Slepreau 4624834Slepreaumemory: at least BITS 4724834Slepreau------ -- ----- ---- 4824834Slepreau 433,484 16 4924834Slepreau 229,600 15 5024834Slepreau 127,536 14 5124834Slepreau 73,464 13 5224834Slepreau 0 12 5324834Slepreau 5424834SlepreauThe default is BITS=16. 5524834Slepreau 5624834SlepreauThe maximum bits can be overrulled by specifying "-DBITS=bits" at 5724834Slepreaucompilation time. 5824834Slepreau 5924834SlepreauWARNING: files compressed on a large machine with more bits than allowed by 6024834Slepreaua version of compress on a smaller machine cannot be decompressed! Use the 6124834Slepreau"-b12" flag to generate a file on a large machine that can be uncompressed 6224834Slepreauon a 16-bit machine. 6324834Slepreau 6424834SlepreauThe output of compress 4.0 is fully compatible with that of compress 3.0. 6524834SlepreauIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or 6624834Slepreauthe output of compress 3.0 may be fed into uncompress 4.0. 6724834Slepreau 6824834SlepreauThe output of compress 4.0 not compatible with that of 6924834Slepreaucompress 2.0. However, compress 4.0 still accepts the output of 7024834Slepreaucompress 2.0. To generate output that is compatible with compress 7124834Slepreau2.0, use the undocumented "-C" flag. 7224834Slepreau 7324834Slepreau -from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85 7424834Slepreau-------------------------------- 7524834Slepreau 7622591SmckusickEnclosed is compress version 3.0 with the following changes: 7722591Smckusick 7822591Smckusick1. "Block" compression is performed. After the BITS run out, the 7922591Smckusick compression ratio is checked every so often. If it is decreasing, 8022591Smckusick the table is cleared and a new set of substrings are generated. 8122591Smckusick 8224834Slepreau This makes the output of compress 3.0 not compatible with that of 8322591Smckusick compress 2.0. However, compress 3.0 still accepts the output of 8424834Slepreau compress 2.0. To generate output that is compatible with compress 8522591Smckusick 2.0, use the undocumented "-C" flag. 8622591Smckusick 8722591Smckusick2. A quiet "-q" flag has been added for use by the news system. 8822591Smckusick 8922591Smckusick3. The character chaining has been deleted and the program now uses 9024834Slepreau hashing. This improves the speed of the program, especially 9124834Slepreau during decompression. Other speed improvements have been made, 9224834Slepreau such as using putc() instead of fwrite(). 9322591Smckusick 9422591Smckusick4. A large table is used on large machines when a relatively small 9522591Smckusick number of bits is specified. This saves much time when compressing 9624834Slepreau for a 16-bit machine on a 32-bit virtual machine. Note that the 9724834Slepreau speed improvement only occurs when the input file is > 30000 9824834Slepreau characters, and the -b BITS is less than or equal to the cutoff 9924834Slepreau described below. 10022591Smckusick 10122591SmckusickMost of these changes were made by James A. Woods (ames!jaw). Thank you 10222591SmckusickJames! 10322591Smckusick 10422591SmckusickTo compile compress: 10522591Smckusick 10622591Smckusick cc -O -DUSERMEM=usermem -o compress compress.c 10722591Smckusick 10822591SmckusickWhere "usermem" is the amount of physical user memory available (in bytes). 10922591SmckusickIf any physical memory is to be reserved for other processes, put in 11022591Smckusick"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved. 11122591Smckusick 11222591SmckusickThe difference "usermem-sacredmem" determines the maximum BITS that can be 11322591Smckusickspecified, and the cutoff bits where the large+fast table is used. 11422591Smckusick 11522591Smckusickmemory: at least BITS cutoff 11622591Smckusick------ -- ----- ---- ------ 11722591Smckusick 4,718,592 16 13 11822591Smckusick 2,621,440 16 12 11922591Smckusick 1,572,864 16 11 12024834Slepreau 1,048,576 16 10 12122591Smckusick 631,808 16 -- 12222591Smckusick 329,728 15 -- 12322591Smckusick 178,176 14 -- 12422591Smckusick 99,328 13 -- 12522591Smckusick 0 12 -- 12622591Smckusick 12722591SmckusickThe default memory size is 750,000 which gives a maximum BITS=16 and no 12822591Smckusicklarge+fast table. 12922591Smckusick 13022732SmckusickThe maximum bits can be overruled by specifying "-DBITS=bits" at 13122591Smckusickcompilation time. 13222591Smckusick 13322591SmckusickIf your machine doesn't support unsigned characters, define "NO_UCHAR" 13422591Smckusickwhen compiling. 13522591Smckusick 13624834SlepreauIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling. 13724834Slepreau 13822591SmckusickAfter compilation, move "compress" to a standard executable location, such 13922591Smckusickas /usr/local. Then: 14022591Smckusick cd /usr/local 14122591Smckusick ln compress uncompress 14222591Smckusick ln compress zcat 14322591Smckusick 14422591SmckusickOn machines that have a fixed stack size (such as Perkin-Elmer), set the 14522591Smckusickstack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 14622591Smckusick 14722591SmckusickNext, install the manual (compress.l). 14822591Smckusick cp compress.l /usr/man/manl 14922591Smckusick cd /usr/man/manl 15022591Smckusick ln compress.l uncompress.l 15122591Smckusick ln compress.l zcat.l 15222591Smckusick 15322591Smckusick - or - 15422591Smckusick 15522591Smckusick cp compress.l /usr/man/man1/compress.1 15622591Smckusick cd /usr/man/man1 15722591Smckusick ln compress.1 uncompress.1 15822591Smckusick ln compress.1 zcat.1 15922591Smckusick 16022591Smckusick regards, 16122591Smckusick petsd!joe 16222591Smckusick 16322591SmckusickHere is a note from the net: 16422591Smckusick 16522591Smckusick>From hplabs!pesnta!amd!turtlevax!ken Sat Jan 5 03:35:20 1985 16622591SmckusickPath: ames!hplabs!pesnta!amd!turtlevax!ken 16722591SmckusickFrom: ken@turtlevax.UUCP (Ken Turkowski) 16822591SmckusickNewsgroups: net.sources 16922591SmckusickSubject: Re: Compress release 3.0 : sample Makefile 17022591SmckusickOrganization: CADLINC, Inc. @ Menlo Park, CA 17122591Smckusick 17222591SmckusickIn the compress 3.0 source recently posted to mod.sources, there is a 17322591Smckusick#define variable which can be set for optimum performance on a machine 17422591Smckusickwith a large amount of memory. A program (usermem) to calculate the 17522591Smckusickuseable amount of physical user memory is enclosed, as well as a sample 17622591Smckusick4.2bsd Vax Makefile for compress. 17722591Smckusick 17822591SmckusickHere is the README file from the previous version of compress (2.0): 17922591Smckusick 18022591Smckusick>Enclosed is compress.c version 2.0 with the following bugs fixed: 18122591Smckusick> 18222591Smckusick>1. The packed files produced by compress are different on different 18322591Smckusick> machines and dependent on the vax sysgen option. 18422591Smckusick> The bug was in the different byte/bit ordering on the 18522591Smckusick> various machines. This has been fixed. 18622591Smckusick> 18724834Slepreau> This version is NOT compatible with the original vax posting 18822591Smckusick> unless the '-DCOMPATIBLE' option is specified to the C 18922591Smckusick> compiler. The original posting has a bug which I fixed, 19022591Smckusick> causing incompatible files. I recommend you NOT to use this 19122591Smckusick> option unless you already have a lot of packed files from 19222591Smckusick> the original posting by thomas. 19322591Smckusick>2. The exit status is not well defined (on some machines) causing the 19422591Smckusick> scripts to fail. 19522591Smckusick> The exit status is now 0,1 or 2 and is documented in 19622591Smckusick> compress.l. 19722591Smckusick>3. The function getopt() is not available in all C libraries. 19822591Smckusick> The function getopt() is no longer referenced by the 19922591Smckusick> program. 20022591Smckusick>4. Error status is not being checked on the fwrite() and fflush() calls. 20122591Smckusick> Fixed. 20222591Smckusick> 20322591Smckusick>The following enhancements have been made: 20422591Smckusick> 20522591Smckusick>1. Added facilities of "compact" into the compress program. "Pack", 20622591Smckusick> "Unpack", and "Pcat" are no longer required (no longer supplied). 20722591Smckusick>2. Installed work around for C compiler bug with "-O". 20822591Smckusick>3. Added a magic number header (\037\235). Put the bits specified 20922591Smckusick> in the file. 21022591Smckusick>4. Added "-f" flag to force overwrite of output file. 21122591Smckusick>5. Added "-c" flag and "zcat" program. 'ln compress zcat' after you 21222591Smckusick> compile. 21322591Smckusick>6. The 'uncompress' script has been deleted; simply 21422591Smckusick> 'ln compress uncompress' after you compile and it will work. 21522591Smckusick>7. Removed extra bit masking for machines that support unsigned 21622591Smckusick> characters. If your machine doesn't support unsigned characters, 21722591Smckusick> define "NO_UCHAR" when compiling. 21822591Smckusick> 21922591Smckusick>Compile "compress.c" with "-O -o compress" flags. Move "compress" to a 22022591Smckusick>standard executable location, such as /usr/local. Then: 22122591Smckusick> cd /usr/local 22222591Smckusick> ln compress uncompress 22322591Smckusick> ln compress zcat 22422591Smckusick> 22522591Smckusick>On machines that have a fixed stack size (such as Perkin-Elmer), set the 22622591Smckusick>stack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 22722591Smckusick> 22822591Smckusick>Next, install the manual (compress.l). 22922591Smckusick> cp compress.l /usr/man/manl - or - 23022591Smckusick> cp compress.l /usr/man/man1/compress.1 23122591Smckusick> 23222591Smckusick>Here is the README that I sent with my first posting: 23322591Smckusick> 23422591Smckusick>>Enclosed is a modified version of compress.c, along with scripts to make it 23522591Smckusick>>run identically to pack(1), unpack(1), an pcat(1). Here is what I 23622591Smckusick>>(petsd!joe) and a colleague (petsd!peora!srd) did: 23722591Smckusick>> 23822591Smckusick>>1. Removed VAX dependencies. 23922591Smckusick>>2. Changed the struct to separate arrays; saves mucho memory. 24022591Smckusick>>3. Did comparisons in unsigned, where possible. (Faster on Perkin-Elmer.) 24122591Smckusick>>4. Sorted the character next chain and changed the search to stop 24222591Smckusick>>prematurely. This saves a lot on the execution time when compressing. 24322591Smckusick>> 24422591Smckusick>>This version is totally compatible with the original version. Even though 24522591Smckusick>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit 24622591Smckusick>>machine, due to the size of the arrays. 24722591Smckusick>> 24822591Smckusick>>Here is the README file from the original author: 24922591Smckusick>> 25022591Smckusick>>>Well, with all this discussion about file compression (for news batching 25122591Smckusick>>>in particular) going around, I decided to implement the text compression 25222591Smckusick>>>algorithm described in the June Computer magazine. The author claimed 25322591Smckusick>>>blinding speed and good compression ratios. It's certainly faster than 25422591Smckusick>>>compact (but, then, what wouldn't be), but it's also the same speed as 25522591Smckusick>>>pack, and gets better compression than both of them. On 350K bytes of 25622591Smckusick>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80 25722591Smckusick>>>seconds, and compress (herein) also took 80 seconds. But, compact and 25822591Smckusick>>>pack got about 30% compression, whereas compress got over 50%. So, I 25922591Smckusick>>>decided I had something, and that others might be interested, too. 26022591Smckusick>>> 26122591Smckusick>>>As is probably true of compact and pack (although I haven't checked), 26222591Smckusick>>>the byte order within a word is probably relevant here, but as long as 26322591Smckusick>>>you stay on a single machine type, you should be ok. (Can anybody 26422591Smckusick>>>elucidate on this?) There are a couple of asm's in the code (extv and 26522591Smckusick>>>insv instructions), so anyone porting it to another machine will have to 26622591Smckusick>>>deal with this anyway (and could probably make it compatible with Vax 26722591Smckusick>>>byte order at the same time). Anyway, I've linted the code (both with 26822591Smckusick>>>and without -p), so it should run elsewhere. Note the longs in the 26922591Smckusick>>>code, you can take these out if you reduce BITS to <= 15. 27022591Smckusick>>> 27122591Smckusick>>>Have fun, and as always, if you make good enhancements, or bug fixes, 27222591Smckusick>>>I'd like to see them. 27322591Smckusick>>> 27422591Smckusick>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas) 27522591Smckusick>> 27622591Smckusick>> regards, 27722591Smckusick>> joe 27822591Smckusick>> 27922591Smckusick>>-- 28022591Smckusick>>Full-Name: Joseph M. Orost 28122591Smckusick>>UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe 28222591Smckusick>>US Mail: MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724 28322591Smckusick>>Phone: (201) 870-5844 284