122591Smckusick 2*24834Slepreau @(#)README 5.3 (Berkeley) 09/17/85 322591Smckusick 4*24834SlepreauCompress version 4.0 improvements over 3.0: 5*24834Slepreau o compress() speedup (10-50%) by changing division hash to xor 6*24834Slepreau o decompress() speedup (5-10%) 7*24834Slepreau o Memory requirements reduced (3-30%) 8*24834Slepreau o Stack requirements reduced to less than 4kb 9*24834Slepreau o Removed 'Big+Fast' compress code (FBITS) because of compress speedup 10*24834Slepreau o Portability mods for Z8000 and PC/XT (but not zeus 3.2) 11*24834Slepreau o Default to 'quiet' mode 12*24834Slepreau o Unification of 'force' flags 13*24834Slepreau o Manual page overhaul 14*24834Slepreau o Portability enhancement for M_XENIX 15*24834Slepreau o Removed text on #else and #endif 16*24834Slepreau o Added "-V" switch to print version and options 17*24834Slepreau o Added #defines for SIGNED_COMPARE_SLOW 18*24834Slepreau o Added Makefile and "usermem" program 19*24834Slepreau o Removed all floating point computations 20*24834Slepreau o New programs: [deleted] 2122732Smckusick 22*24834SlepreauThe "usermem" script attempts to determine the maximum process size. Some 23*24834Slepreauediting of the script may be necessary (see the comments). [It should work 24*24834Slepreaufine on 4.3 bsd.] If you can't get it to work at all, just create file 25*24834Slepreau"USERMEM" containing the maximum process size in decimal. 2622732Smckusick 27*24834SlepreauThe following preprocessor symbols control the compilation of "compress.c": 2822732Smckusick 29*24834Slepreau o USERMEM Maximum process memory on the system 30*24834Slepreau o SACREDMEM Amount to reserve for other proceses 31*24834Slepreau o SIGNED_COMPARE_SLOW Unsigned compare instructions are faster 32*24834Slepreau o NO_UCHAR Don't use "unsigned char" types 33*24834Slepreau o BITS Overrules default set by USERMEM-SACREDMEM 34*24834Slepreau o vax Generate inline assembler 35*24834Slepreau o interdata Defines SIGNED_COMPARE_SLOW 36*24834Slepreau o M_XENIX Makes arrays < 65536 bytes each 37*24834Slepreau o pdp11 BITS=12, NO_UCHAR 38*24834Slepreau o z8000 BITS=12 39*24834Slepreau o pcxt BITS=12 40*24834Slepreau o BSD4_2 Allow long filenames ( > 14 characters) & 41*24834Slepreau Call setlinebuf(stderr) 42*24834Slepreau 43*24834SlepreauThe difference "usermem-sacredmem" determines the maximum BITS that can be 44*24834Slepreauspecified with the "-b" flag. 45*24834Slepreau 46*24834Slepreaumemory: at least BITS 47*24834Slepreau------ -- ----- ---- 48*24834Slepreau 433,484 16 49*24834Slepreau 229,600 15 50*24834Slepreau 127,536 14 51*24834Slepreau 73,464 13 52*24834Slepreau 0 12 53*24834Slepreau 54*24834SlepreauThe default is BITS=16. 55*24834Slepreau 56*24834SlepreauThe maximum bits can be overrulled by specifying "-DBITS=bits" at 57*24834Slepreaucompilation time. 58*24834Slepreau 59*24834SlepreauWARNING: files compressed on a large machine with more bits than allowed by 60*24834Slepreaua version of compress on a smaller machine cannot be decompressed! Use the 61*24834Slepreau"-b12" flag to generate a file on a large machine that can be uncompressed 62*24834Slepreauon a 16-bit machine. 63*24834Slepreau 64*24834SlepreauThe output of compress 4.0 is fully compatible with that of compress 3.0. 65*24834SlepreauIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or 66*24834Slepreauthe output of compress 3.0 may be fed into uncompress 4.0. 67*24834Slepreau 68*24834SlepreauThe output of compress 4.0 not compatible with that of 69*24834Slepreaucompress 2.0. However, compress 4.0 still accepts the output of 70*24834Slepreaucompress 2.0. To generate output that is compatible with compress 71*24834Slepreau2.0, use the undocumented "-C" flag. 72*24834Slepreau 73*24834Slepreau -from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85 74*24834Slepreau-------------------------------- 75*24834Slepreau 7622591SmckusickEnclosed is compress version 3.0 with the following changes: 7722591Smckusick 7822591Smckusick1. "Block" compression is performed. After the BITS run out, the 7922591Smckusick compression ratio is checked every so often. If it is decreasing, 8022591Smckusick the table is cleared and a new set of substrings are generated. 8122591Smckusick 82*24834Slepreau This makes the output of compress 3.0 not compatible with that of 8322591Smckusick compress 2.0. However, compress 3.0 still accepts the output of 84*24834Slepreau compress 2.0. To generate output that is compatible with compress 8522591Smckusick 2.0, use the undocumented "-C" flag. 8622591Smckusick 8722591Smckusick2. A quiet "-q" flag has been added for use by the news system. 8822591Smckusick 8922591Smckusick3. The character chaining has been deleted and the program now uses 90*24834Slepreau hashing. This improves the speed of the program, especially 91*24834Slepreau during decompression. Other speed improvements have been made, 92*24834Slepreau such as using putc() instead of fwrite(). 9322591Smckusick 9422591Smckusick4. A large table is used on large machines when a relatively small 9522591Smckusick number of bits is specified. This saves much time when compressing 96*24834Slepreau for a 16-bit machine on a 32-bit virtual machine. Note that the 97*24834Slepreau speed improvement only occurs when the input file is > 30000 98*24834Slepreau characters, and the -b BITS is less than or equal to the cutoff 99*24834Slepreau described below. 10022591Smckusick 10122591SmckusickMost of these changes were made by James A. Woods (ames!jaw). Thank you 10222591SmckusickJames! 10322591Smckusick 10422591SmckusickTo compile compress: 10522591Smckusick 10622591Smckusick cc -O -DUSERMEM=usermem -o compress compress.c 10722591Smckusick 10822591SmckusickWhere "usermem" is the amount of physical user memory available (in bytes). 10922591SmckusickIf any physical memory is to be reserved for other processes, put in 11022591Smckusick"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved. 11122591Smckusick 11222591SmckusickThe difference "usermem-sacredmem" determines the maximum BITS that can be 11322591Smckusickspecified, and the cutoff bits where the large+fast table is used. 11422591Smckusick 11522591Smckusickmemory: at least BITS cutoff 11622591Smckusick------ -- ----- ---- ------ 11722591Smckusick 4,718,592 16 13 11822591Smckusick 2,621,440 16 12 11922591Smckusick 1,572,864 16 11 120*24834Slepreau 1,048,576 16 10 12122591Smckusick 631,808 16 -- 12222591Smckusick 329,728 15 -- 12322591Smckusick 178,176 14 -- 12422591Smckusick 99,328 13 -- 12522591Smckusick 0 12 -- 12622591Smckusick 12722591SmckusickThe default memory size is 750,000 which gives a maximum BITS=16 and no 12822591Smckusicklarge+fast table. 12922591Smckusick 13022732SmckusickThe maximum bits can be overruled by specifying "-DBITS=bits" at 13122591Smckusickcompilation time. 13222591Smckusick 13322591SmckusickIf your machine doesn't support unsigned characters, define "NO_UCHAR" 13422591Smckusickwhen compiling. 13522591Smckusick 136*24834SlepreauIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling. 137*24834Slepreau 13822591SmckusickAfter compilation, move "compress" to a standard executable location, such 13922591Smckusickas /usr/local. Then: 14022591Smckusick cd /usr/local 14122591Smckusick ln compress uncompress 14222591Smckusick ln compress zcat 14322591Smckusick 14422591SmckusickOn machines that have a fixed stack size (such as Perkin-Elmer), set the 14522591Smckusickstack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 14622591Smckusick 14722591SmckusickNext, install the manual (compress.l). 14822591Smckusick cp compress.l /usr/man/manl 14922591Smckusick cd /usr/man/manl 15022591Smckusick ln compress.l uncompress.l 15122591Smckusick ln compress.l zcat.l 15222591Smckusick 15322591Smckusick - or - 15422591Smckusick 15522591Smckusick cp compress.l /usr/man/man1/compress.1 15622591Smckusick cd /usr/man/man1 15722591Smckusick ln compress.1 uncompress.1 15822591Smckusick ln compress.1 zcat.1 15922591Smckusick 16022591Smckusick regards, 16122591Smckusick petsd!joe 16222591Smckusick 16322591SmckusickHere is a note from the net: 16422591Smckusick 16522591Smckusick>From hplabs!pesnta!amd!turtlevax!ken Sat Jan 5 03:35:20 1985 16622591SmckusickPath: ames!hplabs!pesnta!amd!turtlevax!ken 16722591SmckusickFrom: ken@turtlevax.UUCP (Ken Turkowski) 16822591SmckusickNewsgroups: net.sources 16922591SmckusickSubject: Re: Compress release 3.0 : sample Makefile 17022591SmckusickOrganization: CADLINC, Inc. @ Menlo Park, CA 17122591Smckusick 17222591SmckusickIn the compress 3.0 source recently posted to mod.sources, there is a 17322591Smckusick#define variable which can be set for optimum performance on a machine 17422591Smckusickwith a large amount of memory. A program (usermem) to calculate the 17522591Smckusickuseable amount of physical user memory is enclosed, as well as a sample 17622591Smckusick4.2bsd Vax Makefile for compress. 17722591Smckusick 17822591SmckusickHere is the README file from the previous version of compress (2.0): 17922591Smckusick 18022591Smckusick>Enclosed is compress.c version 2.0 with the following bugs fixed: 18122591Smckusick> 18222591Smckusick>1. The packed files produced by compress are different on different 18322591Smckusick> machines and dependent on the vax sysgen option. 18422591Smckusick> The bug was in the different byte/bit ordering on the 18522591Smckusick> various machines. This has been fixed. 18622591Smckusick> 187*24834Slepreau> This version is NOT compatible with the original vax posting 18822591Smckusick> unless the '-DCOMPATIBLE' option is specified to the C 18922591Smckusick> compiler. The original posting has a bug which I fixed, 19022591Smckusick> causing incompatible files. I recommend you NOT to use this 19122591Smckusick> option unless you already have a lot of packed files from 19222591Smckusick> the original posting by thomas. 19322591Smckusick>2. The exit status is not well defined (on some machines) causing the 19422591Smckusick> scripts to fail. 19522591Smckusick> The exit status is now 0,1 or 2 and is documented in 19622591Smckusick> compress.l. 19722591Smckusick>3. The function getopt() is not available in all C libraries. 19822591Smckusick> The function getopt() is no longer referenced by the 19922591Smckusick> program. 20022591Smckusick>4. Error status is not being checked on the fwrite() and fflush() calls. 20122591Smckusick> Fixed. 20222591Smckusick> 20322591Smckusick>The following enhancements have been made: 20422591Smckusick> 20522591Smckusick>1. Added facilities of "compact" into the compress program. "Pack", 20622591Smckusick> "Unpack", and "Pcat" are no longer required (no longer supplied). 20722591Smckusick>2. Installed work around for C compiler bug with "-O". 20822591Smckusick>3. Added a magic number header (\037\235). Put the bits specified 20922591Smckusick> in the file. 21022591Smckusick>4. Added "-f" flag to force overwrite of output file. 21122591Smckusick>5. Added "-c" flag and "zcat" program. 'ln compress zcat' after you 21222591Smckusick> compile. 21322591Smckusick>6. The 'uncompress' script has been deleted; simply 21422591Smckusick> 'ln compress uncompress' after you compile and it will work. 21522591Smckusick>7. Removed extra bit masking for machines that support unsigned 21622591Smckusick> characters. If your machine doesn't support unsigned characters, 21722591Smckusick> define "NO_UCHAR" when compiling. 21822591Smckusick> 21922591Smckusick>Compile "compress.c" with "-O -o compress" flags. Move "compress" to a 22022591Smckusick>standard executable location, such as /usr/local. Then: 22122591Smckusick> cd /usr/local 22222591Smckusick> ln compress uncompress 22322591Smckusick> ln compress zcat 22422591Smckusick> 22522591Smckusick>On machines that have a fixed stack size (such as Perkin-Elmer), set the 22622591Smckusick>stack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 22722591Smckusick> 22822591Smckusick>Next, install the manual (compress.l). 22922591Smckusick> cp compress.l /usr/man/manl - or - 23022591Smckusick> cp compress.l /usr/man/man1/compress.1 23122591Smckusick> 23222591Smckusick>Here is the README that I sent with my first posting: 23322591Smckusick> 23422591Smckusick>>Enclosed is a modified version of compress.c, along with scripts to make it 23522591Smckusick>>run identically to pack(1), unpack(1), an pcat(1). Here is what I 23622591Smckusick>>(petsd!joe) and a colleague (petsd!peora!srd) did: 23722591Smckusick>> 23822591Smckusick>>1. Removed VAX dependencies. 23922591Smckusick>>2. Changed the struct to separate arrays; saves mucho memory. 24022591Smckusick>>3. Did comparisons in unsigned, where possible. (Faster on Perkin-Elmer.) 24122591Smckusick>>4. Sorted the character next chain and changed the search to stop 24222591Smckusick>>prematurely. This saves a lot on the execution time when compressing. 24322591Smckusick>> 24422591Smckusick>>This version is totally compatible with the original version. Even though 24522591Smckusick>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit 24622591Smckusick>>machine, due to the size of the arrays. 24722591Smckusick>> 24822591Smckusick>>Here is the README file from the original author: 24922591Smckusick>> 25022591Smckusick>>>Well, with all this discussion about file compression (for news batching 25122591Smckusick>>>in particular) going around, I decided to implement the text compression 25222591Smckusick>>>algorithm described in the June Computer magazine. The author claimed 25322591Smckusick>>>blinding speed and good compression ratios. It's certainly faster than 25422591Smckusick>>>compact (but, then, what wouldn't be), but it's also the same speed as 25522591Smckusick>>>pack, and gets better compression than both of them. On 350K bytes of 25622591Smckusick>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80 25722591Smckusick>>>seconds, and compress (herein) also took 80 seconds. But, compact and 25822591Smckusick>>>pack got about 30% compression, whereas compress got over 50%. So, I 25922591Smckusick>>>decided I had something, and that others might be interested, too. 26022591Smckusick>>> 26122591Smckusick>>>As is probably true of compact and pack (although I haven't checked), 26222591Smckusick>>>the byte order within a word is probably relevant here, but as long as 26322591Smckusick>>>you stay on a single machine type, you should be ok. (Can anybody 26422591Smckusick>>>elucidate on this?) There are a couple of asm's in the code (extv and 26522591Smckusick>>>insv instructions), so anyone porting it to another machine will have to 26622591Smckusick>>>deal with this anyway (and could probably make it compatible with Vax 26722591Smckusick>>>byte order at the same time). Anyway, I've linted the code (both with 26822591Smckusick>>>and without -p), so it should run elsewhere. Note the longs in the 26922591Smckusick>>>code, you can take these out if you reduce BITS to <= 15. 27022591Smckusick>>> 27122591Smckusick>>>Have fun, and as always, if you make good enhancements, or bug fixes, 27222591Smckusick>>>I'd like to see them. 27322591Smckusick>>> 27422591Smckusick>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas) 27522591Smckusick>> 27622591Smckusick>> regards, 27722591Smckusick>> joe 27822591Smckusick>> 27922591Smckusick>>-- 28022591Smckusick>>Full-Name: Joseph M. Orost 28122591Smckusick>>UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe 28222591Smckusick>>US Mail: MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724 28322591Smckusick>>Phone: (201) 870-5844 284