186d7f5d3SJohn Marino 286d7f5d3SJohn Marino @(#)README 8.1 (Berkeley) 6/9/93 386d7f5d3SJohn Marino 486d7f5d3SJohn MarinoCompress version 4.0 improvements over 3.0: 586d7f5d3SJohn Marino o compress() speedup (10-50%) by changing division hash to xor 686d7f5d3SJohn Marino o decompress() speedup (5-10%) 786d7f5d3SJohn Marino o Memory requirements reduced (3-30%) 886d7f5d3SJohn Marino o Stack requirements reduced to less than 4kb 986d7f5d3SJohn Marino o Removed 'Big+Fast' compress code (FBITS) because of compress speedup 1086d7f5d3SJohn Marino o Portability mods for Z8000 and PC/XT (but not zeus 3.2) 1186d7f5d3SJohn Marino o Default to 'quiet' mode 1286d7f5d3SJohn Marino o Unification of 'force' flags 1386d7f5d3SJohn Marino o Manual page overhaul 1486d7f5d3SJohn Marino o Portability enhancement for M_XENIX 1586d7f5d3SJohn Marino o Removed text on #else and #endif 1686d7f5d3SJohn Marino o Added "-V" switch to print version and options 1786d7f5d3SJohn Marino o Added #defines for SIGNED_COMPARE_SLOW 1886d7f5d3SJohn Marino o Added Makefile and "usermem" program 1986d7f5d3SJohn Marino o Removed all floating point computations 2086d7f5d3SJohn Marino o New programs: [deleted] 2186d7f5d3SJohn Marino 2286d7f5d3SJohn MarinoThe "usermem" script attempts to determine the maximum process size. Some 2386d7f5d3SJohn Marinoediting of the script may be necessary (see the comments). [It should work 2486d7f5d3SJohn Marinofine on 4.3 bsd.] If you can't get it to work at all, just create file 2586d7f5d3SJohn Marino"USERMEM" containing the maximum process size in decimal. 2686d7f5d3SJohn Marino 2786d7f5d3SJohn MarinoThe following preprocessor symbols control the compilation of "compress.c": 2886d7f5d3SJohn Marino 2986d7f5d3SJohn Marino o USERMEM Maximum process memory on the system 3086d7f5d3SJohn Marino o SACREDMEM Amount to reserve for other proceses 3186d7f5d3SJohn Marino o SIGNED_COMPARE_SLOW Unsigned compare instructions are faster 3286d7f5d3SJohn Marino o NO_UCHAR Don't use "unsigned char" types 3386d7f5d3SJohn Marino o BITS Overrules default set by USERMEM-SACREDMEM 3486d7f5d3SJohn Marino o vax Generate inline assembler 3586d7f5d3SJohn Marino o interdata Defines SIGNED_COMPARE_SLOW 3686d7f5d3SJohn Marino o M_XENIX Makes arrays < 65536 bytes each 3786d7f5d3SJohn Marino o pdp11 BITS=12, NO_UCHAR 3886d7f5d3SJohn Marino o z8000 BITS=12 3986d7f5d3SJohn Marino o pcxt BITS=12 4086d7f5d3SJohn Marino o BSD4_2 Allow long filenames ( > 14 characters) & 4186d7f5d3SJohn Marino Call setlinebuf(stderr) 4286d7f5d3SJohn Marino 4386d7f5d3SJohn MarinoThe difference "usermem-sacredmem" determines the maximum BITS that can be 4486d7f5d3SJohn Marinospecified with the "-b" flag. 4586d7f5d3SJohn Marino 4686d7f5d3SJohn Marinomemory: at least BITS 4786d7f5d3SJohn Marino------ -- ----- ---- 4886d7f5d3SJohn Marino 433,484 16 4986d7f5d3SJohn Marino 229,600 15 5086d7f5d3SJohn Marino 127,536 14 5186d7f5d3SJohn Marino 73,464 13 5286d7f5d3SJohn Marino 0 12 5386d7f5d3SJohn Marino 5486d7f5d3SJohn MarinoThe default is BITS=16. 5586d7f5d3SJohn Marino 5686d7f5d3SJohn MarinoThe maximum bits can be overrulled by specifying "-DBITS=bits" at 5786d7f5d3SJohn Marinocompilation time. 5886d7f5d3SJohn Marino 5986d7f5d3SJohn MarinoWARNING: files compressed on a large machine with more bits than allowed by 6086d7f5d3SJohn Marinoa version of compress on a smaller machine cannot be decompressed! Use the 6186d7f5d3SJohn Marino"-b12" flag to generate a file on a large machine that can be uncompressed 6286d7f5d3SJohn Marinoon a 16-bit machine. 6386d7f5d3SJohn Marino 6486d7f5d3SJohn MarinoThe output of compress 4.0 is fully compatible with that of compress 3.0. 6586d7f5d3SJohn MarinoIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or 6686d7f5d3SJohn Marinothe output of compress 3.0 may be fed into uncompress 4.0. 6786d7f5d3SJohn Marino 6886d7f5d3SJohn MarinoThe output of compress 4.0 not compatible with that of 6986d7f5d3SJohn Marinocompress 2.0. However, compress 4.0 still accepts the output of 7086d7f5d3SJohn Marinocompress 2.0. To generate output that is compatible with compress 7186d7f5d3SJohn Marino2.0, use the undocumented "-C" flag. 7286d7f5d3SJohn Marino 7386d7f5d3SJohn Marino -from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85 7486d7f5d3SJohn Marino-------------------------------- 7586d7f5d3SJohn Marino 7686d7f5d3SJohn MarinoEnclosed is compress version 3.0 with the following changes: 7786d7f5d3SJohn Marino 7886d7f5d3SJohn Marino1. "Block" compression is performed. After the BITS run out, the 7986d7f5d3SJohn Marino compression ratio is checked every so often. If it is decreasing, 8086d7f5d3SJohn Marino the table is cleared and a new set of substrings are generated. 8186d7f5d3SJohn Marino 8286d7f5d3SJohn Marino This makes the output of compress 3.0 not compatible with that of 8386d7f5d3SJohn Marino compress 2.0. However, compress 3.0 still accepts the output of 8486d7f5d3SJohn Marino compress 2.0. To generate output that is compatible with compress 8586d7f5d3SJohn Marino 2.0, use the undocumented "-C" flag. 8686d7f5d3SJohn Marino 8786d7f5d3SJohn Marino2. A quiet "-q" flag has been added for use by the news system. 8886d7f5d3SJohn Marino 8986d7f5d3SJohn Marino3. The character chaining has been deleted and the program now uses 9086d7f5d3SJohn Marino hashing. This improves the speed of the program, especially 9186d7f5d3SJohn Marino during decompression. Other speed improvements have been made, 9286d7f5d3SJohn Marino such as using putc() instead of fwrite(). 9386d7f5d3SJohn Marino 9486d7f5d3SJohn Marino4. A large table is used on large machines when a relatively small 9586d7f5d3SJohn Marino number of bits is specified. This saves much time when compressing 9686d7f5d3SJohn Marino for a 16-bit machine on a 32-bit virtual machine. Note that the 9786d7f5d3SJohn Marino speed improvement only occurs when the input file is > 30000 9886d7f5d3SJohn Marino characters, and the -b BITS is less than or equal to the cutoff 9986d7f5d3SJohn Marino described below. 10086d7f5d3SJohn Marino 10186d7f5d3SJohn MarinoMost of these changes were made by James A. Woods (ames!jaw). Thank you 10286d7f5d3SJohn MarinoJames! 10386d7f5d3SJohn Marino 10486d7f5d3SJohn MarinoTo compile compress: 10586d7f5d3SJohn Marino 10686d7f5d3SJohn Marino cc -O -DUSERMEM=usermem -o compress compress.c 10786d7f5d3SJohn Marino 10886d7f5d3SJohn MarinoWhere "usermem" is the amount of physical user memory available (in bytes). 10986d7f5d3SJohn MarinoIf any physical memory is to be reserved for other processes, put in 11086d7f5d3SJohn Marino"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved. 11186d7f5d3SJohn Marino 11286d7f5d3SJohn MarinoThe difference "usermem-sacredmem" determines the maximum BITS that can be 11386d7f5d3SJohn Marinospecified, and the cutoff bits where the large+fast table is used. 11486d7f5d3SJohn Marino 11586d7f5d3SJohn Marinomemory: at least BITS cutoff 11686d7f5d3SJohn Marino------ -- ----- ---- ------ 11786d7f5d3SJohn Marino 4,718,592 16 13 11886d7f5d3SJohn Marino 2,621,440 16 12 11986d7f5d3SJohn Marino 1,572,864 16 11 12086d7f5d3SJohn Marino 1,048,576 16 10 12186d7f5d3SJohn Marino 631,808 16 -- 12286d7f5d3SJohn Marino 329,728 15 -- 12386d7f5d3SJohn Marino 178,176 14 -- 12486d7f5d3SJohn Marino 99,328 13 -- 12586d7f5d3SJohn Marino 0 12 -- 12686d7f5d3SJohn Marino 12786d7f5d3SJohn MarinoThe default memory size is 750,000 which gives a maximum BITS=16 and no 12886d7f5d3SJohn Marinolarge+fast table. 12986d7f5d3SJohn Marino 13086d7f5d3SJohn MarinoThe maximum bits can be overruled by specifying "-DBITS=bits" at 13186d7f5d3SJohn Marinocompilation time. 13286d7f5d3SJohn Marino 13386d7f5d3SJohn MarinoIf your machine doesn't support unsigned characters, define "NO_UCHAR" 13486d7f5d3SJohn Marinowhen compiling. 13586d7f5d3SJohn Marino 13686d7f5d3SJohn MarinoIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling. 13786d7f5d3SJohn Marino 13886d7f5d3SJohn MarinoAfter compilation, move "compress" to a standard executable location, such 13986d7f5d3SJohn Marinoas /usr/local. Then: 14086d7f5d3SJohn Marino cd /usr/local 14186d7f5d3SJohn Marino ln compress uncompress 14286d7f5d3SJohn Marino ln compress zcat 14386d7f5d3SJohn Marino 14486d7f5d3SJohn MarinoOn machines that have a fixed stack size (such as Perkin-Elmer), set the 14586d7f5d3SJohn Marinostack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 14686d7f5d3SJohn Marino 14786d7f5d3SJohn MarinoNext, install the manual (compress.l). 14886d7f5d3SJohn Marino cp compress.l /usr/man/manl 14986d7f5d3SJohn Marino cd /usr/man/manl 15086d7f5d3SJohn Marino ln compress.l uncompress.l 15186d7f5d3SJohn Marino ln compress.l zcat.l 15286d7f5d3SJohn Marino 15386d7f5d3SJohn Marino - or - 15486d7f5d3SJohn Marino 15586d7f5d3SJohn Marino cp compress.l /usr/man/man1/compress.1 15686d7f5d3SJohn Marino cd /usr/man/man1 15786d7f5d3SJohn Marino ln compress.1 uncompress.1 15886d7f5d3SJohn Marino ln compress.1 zcat.1 15986d7f5d3SJohn Marino 16086d7f5d3SJohn Marino regards, 16186d7f5d3SJohn Marino petsd!joe 16286d7f5d3SJohn Marino 16386d7f5d3SJohn MarinoHere is a note from the net: 16486d7f5d3SJohn Marino 16586d7f5d3SJohn Marino>From hplabs!pesnta!amd!turtlevax!ken Sat Jan 5 03:35:20 1985 16686d7f5d3SJohn MarinoPath: ames!hplabs!pesnta!amd!turtlevax!ken 16786d7f5d3SJohn MarinoFrom: ken@turtlevax.UUCP (Ken Turkowski) 16886d7f5d3SJohn MarinoNewsgroups: net.sources 16986d7f5d3SJohn MarinoSubject: Re: Compress release 3.0 : sample Makefile 17086d7f5d3SJohn MarinoOrganization: CADLINC, Inc. @ Menlo Park, CA 17186d7f5d3SJohn Marino 17286d7f5d3SJohn MarinoIn the compress 3.0 source recently posted to mod.sources, there is a 17386d7f5d3SJohn Marino#define variable which can be set for optimum performance on a machine 17486d7f5d3SJohn Marinowith a large amount of memory. A program (usermem) to calculate the 17586d7f5d3SJohn Marinouseable amount of physical user memory is enclosed, as well as a sample 17686d7f5d3SJohn Marino4.2bsd Vax Makefile for compress. 17786d7f5d3SJohn Marino 17886d7f5d3SJohn MarinoHere is the README file from the previous version of compress (2.0): 17986d7f5d3SJohn Marino 18086d7f5d3SJohn Marino>Enclosed is compress.c version 2.0 with the following bugs fixed: 18186d7f5d3SJohn Marino> 18286d7f5d3SJohn Marino>1. The packed files produced by compress are different on different 18386d7f5d3SJohn Marino> machines and dependent on the vax sysgen option. 18486d7f5d3SJohn Marino> The bug was in the different byte/bit ordering on the 18586d7f5d3SJohn Marino> various machines. This has been fixed. 18686d7f5d3SJohn Marino> 18786d7f5d3SJohn Marino> This version is NOT compatible with the original vax posting 18886d7f5d3SJohn Marino> unless the '-DCOMPATIBLE' option is specified to the C 18986d7f5d3SJohn Marino> compiler. The original posting has a bug which I fixed, 19086d7f5d3SJohn Marino> causing incompatible files. I recommend you NOT to use this 19186d7f5d3SJohn Marino> option unless you already have a lot of packed files from 19286d7f5d3SJohn Marino> the original posting by thomas. 19386d7f5d3SJohn Marino>2. The exit status is not well defined (on some machines) causing the 19486d7f5d3SJohn Marino> scripts to fail. 19586d7f5d3SJohn Marino> The exit status is now 0,1 or 2 and is documented in 19686d7f5d3SJohn Marino> compress.l. 19786d7f5d3SJohn Marino>3. The function getopt() is not available in all C libraries. 19886d7f5d3SJohn Marino> The function getopt() is no longer referenced by the 19986d7f5d3SJohn Marino> program. 20086d7f5d3SJohn Marino>4. Error status is not being checked on the fwrite() and fflush() calls. 20186d7f5d3SJohn Marino> Fixed. 20286d7f5d3SJohn Marino> 20386d7f5d3SJohn Marino>The following enhancements have been made: 20486d7f5d3SJohn Marino> 20586d7f5d3SJohn Marino>1. Added facilities of "compact" into the compress program. "Pack", 20686d7f5d3SJohn Marino> "Unpack", and "Pcat" are no longer required (no longer supplied). 20786d7f5d3SJohn Marino>2. Installed work around for C compiler bug with "-O". 20886d7f5d3SJohn Marino>3. Added a magic number header (\037\235). Put the bits specified 20986d7f5d3SJohn Marino> in the file. 21086d7f5d3SJohn Marino>4. Added "-f" flag to force overwrite of output file. 21186d7f5d3SJohn Marino>5. Added "-c" flag and "zcat" program. 'ln compress zcat' after you 21286d7f5d3SJohn Marino> compile. 21386d7f5d3SJohn Marino>6. The 'uncompress' script has been deleted; simply 21486d7f5d3SJohn Marino> 'ln compress uncompress' after you compile and it will work. 21586d7f5d3SJohn Marino>7. Removed extra bit masking for machines that support unsigned 21686d7f5d3SJohn Marino> characters. If your machine doesn't support unsigned characters, 21786d7f5d3SJohn Marino> define "NO_UCHAR" when compiling. 21886d7f5d3SJohn Marino> 21986d7f5d3SJohn Marino>Compile "compress.c" with "-O -o compress" flags. Move "compress" to a 22086d7f5d3SJohn Marino>standard executable location, such as /usr/local. Then: 22186d7f5d3SJohn Marino> cd /usr/local 22286d7f5d3SJohn Marino> ln compress uncompress 22386d7f5d3SJohn Marino> ln compress zcat 22486d7f5d3SJohn Marino> 22586d7f5d3SJohn Marino>On machines that have a fixed stack size (such as Perkin-Elmer), set the 22686d7f5d3SJohn Marino>stack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 22786d7f5d3SJohn Marino> 22886d7f5d3SJohn Marino>Next, install the manual (compress.l). 22986d7f5d3SJohn Marino> cp compress.l /usr/man/manl - or - 23086d7f5d3SJohn Marino> cp compress.l /usr/man/man1/compress.1 23186d7f5d3SJohn Marino> 23286d7f5d3SJohn Marino>Here is the README that I sent with my first posting: 23386d7f5d3SJohn Marino> 23486d7f5d3SJohn Marino>>Enclosed is a modified version of compress.c, along with scripts to make it 23586d7f5d3SJohn Marino>>run identically to pack(1), unpack(1), an pcat(1). Here is what I 23686d7f5d3SJohn Marino>>(petsd!joe) and a colleague (petsd!peora!srd) did: 23786d7f5d3SJohn Marino>> 23886d7f5d3SJohn Marino>>1. Removed VAX dependencies. 23986d7f5d3SJohn Marino>>2. Changed the struct to separate arrays; saves mucho memory. 24086d7f5d3SJohn Marino>>3. Did comparisons in unsigned, where possible. (Faster on Perkin-Elmer.) 24186d7f5d3SJohn Marino>>4. Sorted the character next chain and changed the search to stop 24286d7f5d3SJohn Marino>>prematurely. This saves a lot on the execution time when compressing. 24386d7f5d3SJohn Marino>> 24486d7f5d3SJohn Marino>>This version is totally compatible with the original version. Even though 24586d7f5d3SJohn Marino>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit 24686d7f5d3SJohn Marino>>machine, due to the size of the arrays. 24786d7f5d3SJohn Marino>> 24886d7f5d3SJohn Marino>>Here is the README file from the original author: 24986d7f5d3SJohn Marino>> 25086d7f5d3SJohn Marino>>>Well, with all this discussion about file compression (for news batching 25186d7f5d3SJohn Marino>>>in particular) going around, I decided to implement the text compression 25286d7f5d3SJohn Marino>>>algorithm described in the June Computer magazine. The author claimed 25386d7f5d3SJohn Marino>>>blinding speed and good compression ratios. It's certainly faster than 25486d7f5d3SJohn Marino>>>compact (but, then, what wouldn't be), but it's also the same speed as 25586d7f5d3SJohn Marino>>>pack, and gets better compression than both of them. On 350K bytes of 25686d7f5d3SJohn Marino>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80 25786d7f5d3SJohn Marino>>>seconds, and compress (herein) also took 80 seconds. But, compact and 25886d7f5d3SJohn Marino>>>pack got about 30% compression, whereas compress got over 50%. So, I 25986d7f5d3SJohn Marino>>>decided I had something, and that others might be interested, too. 26086d7f5d3SJohn Marino>>> 26186d7f5d3SJohn Marino>>>As is probably true of compact and pack (although I haven't checked), 26286d7f5d3SJohn Marino>>>the byte order within a word is probably relevant here, but as long as 26386d7f5d3SJohn Marino>>>you stay on a single machine type, you should be ok. (Can anybody 26486d7f5d3SJohn Marino>>>elucidate on this?) There are a couple of asm's in the code (extv and 26586d7f5d3SJohn Marino>>>insv instructions), so anyone porting it to another machine will have to 26686d7f5d3SJohn Marino>>>deal with this anyway (and could probably make it compatible with Vax 26786d7f5d3SJohn Marino>>>byte order at the same time). Anyway, I've linted the code (both with 26886d7f5d3SJohn Marino>>>and without -p), so it should run elsewhere. Note the longs in the 26986d7f5d3SJohn Marino>>>code, you can take these out if you reduce BITS to <= 15. 27086d7f5d3SJohn Marino>>> 27186d7f5d3SJohn Marino>>>Have fun, and as always, if you make good enhancements, or bug fixes, 27286d7f5d3SJohn Marino>>>I'd like to see them. 27386d7f5d3SJohn Marino>>> 27486d7f5d3SJohn Marino>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas) 27586d7f5d3SJohn Marino>> 27686d7f5d3SJohn Marino>> regards, 27786d7f5d3SJohn Marino>> joe 27886d7f5d3SJohn Marino>> 27986d7f5d3SJohn Marino>>-- 28086d7f5d3SJohn Marino>>Full-Name: Joseph M. Orost 28186d7f5d3SJohn Marino>>UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe 28286d7f5d3SJohn Marino>>US Mail: MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724 28386d7f5d3SJohn Marino>>Phone: (201) 870-5844 284