xref: /csrg-svn/usr.bin/compress/doc/README (revision 63059)
122591Smckusick
2*63059Sbostic	@(#)README	8.1 (Berkeley) 06/09/93
322591Smckusick
424834SlepreauCompress version 4.0 improvements over 3.0:
524834Slepreau	o compress() speedup (10-50%) by changing division hash to xor
624834Slepreau	o decompress() speedup (5-10%)
724834Slepreau	o Memory requirements reduced (3-30%)
824834Slepreau	o Stack requirements reduced to less than 4kb
924834Slepreau	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
1024834Slepreau    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
1124834Slepreau	o Default to 'quiet' mode
1224834Slepreau	o Unification of 'force' flags
1324834Slepreau	o Manual page overhaul
1424834Slepreau	o Portability enhancement for M_XENIX
1524834Slepreau	o Removed text on #else and #endif
1624834Slepreau	o Added "-V" switch to print version and options
1724834Slepreau	o Added #defines for SIGNED_COMPARE_SLOW
1824834Slepreau	o Added Makefile and "usermem" program
1924834Slepreau	o Removed all floating point computations
2024834Slepreau	o New programs: [deleted]
2122732Smckusick
2224834SlepreauThe "usermem" script attempts to determine the maximum process size.  Some
2324834Slepreauediting of the script may be necessary (see the comments).  [It should work
2424834Slepreaufine on 4.3 bsd.] If you can't get it to work at all, just create file
2524834Slepreau"USERMEM" containing the maximum process size in decimal.
2622732Smckusick
2724834SlepreauThe following preprocessor symbols control the compilation of "compress.c":
2822732Smckusick
2924834Slepreau	o USERMEM		Maximum process memory on the system
3024834Slepreau	o SACREDMEM		Amount to reserve for other proceses
3124834Slepreau	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
3224834Slepreau	o NO_UCHAR		Don't use "unsigned char" types
3324834Slepreau	o BITS			Overrules default set by USERMEM-SACREDMEM
3424834Slepreau	o vax			Generate inline assembler
3524834Slepreau	o interdata		Defines SIGNED_COMPARE_SLOW
3624834Slepreau	o M_XENIX		Makes arrays < 65536 bytes each
3724834Slepreau	o pdp11			BITS=12, NO_UCHAR
3824834Slepreau	o z8000			BITS=12
3924834Slepreau	o pcxt			BITS=12
4024834Slepreau	o BSD4_2		Allow long filenames ( > 14 characters) &
4124834Slepreau				Call setlinebuf(stderr)
4224834Slepreau
4324834SlepreauThe difference "usermem-sacredmem" determines the maximum BITS that can be
4424834Slepreauspecified with the "-b" flag.
4524834Slepreau
4624834Slepreaumemory: at least		BITS
4724834Slepreau------  -- -----                ----
4824834Slepreau     433,484			 16
4924834Slepreau     229,600			 15
5024834Slepreau     127,536			 14
5124834Slepreau      73,464			 13
5224834Slepreau           0			 12
5324834Slepreau
5424834SlepreauThe default is BITS=16.
5524834Slepreau
5624834SlepreauThe maximum bits can be overrulled by specifying "-DBITS=bits" at
5724834Slepreaucompilation time.
5824834Slepreau
5924834SlepreauWARNING: files compressed on a large machine with more bits than allowed by
6024834Slepreaua version of compress on a smaller machine cannot be decompressed!  Use the
6124834Slepreau"-b12" flag to generate a file on a large machine that can be uncompressed
6224834Slepreauon a 16-bit machine.
6324834Slepreau
6424834SlepreauThe output of compress 4.0 is fully compatible with that of compress 3.0.
6524834SlepreauIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or
6624834Slepreauthe output of compress 3.0 may be fed into uncompress 4.0.
6724834Slepreau
6824834SlepreauThe output of compress 4.0 not compatible with that of
6924834Slepreaucompress 2.0.  However, compress 4.0 still accepts the output of
7024834Slepreaucompress 2.0.  To generate output that is compatible with compress
7124834Slepreau2.0, use the undocumented "-C" flag.
7224834Slepreau
7324834Slepreau	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
7424834Slepreau--------------------------------
7524834Slepreau
7622591SmckusickEnclosed is compress version 3.0 with the following changes:
7722591Smckusick
7822591Smckusick1.	"Block" compression is performed.  After the BITS run out, the
7922591Smckusick	compression ratio is checked every so often.  If it is decreasing,
8022591Smckusick	the table is cleared and a new set of substrings are generated.
8122591Smckusick
8224834Slepreau	This makes the output of compress 3.0 not compatible with that of
8322591Smckusick	compress 2.0.  However, compress 3.0 still accepts the output of
8424834Slepreau	compress 2.0.  To generate output that is compatible with compress
8522591Smckusick	2.0, use the undocumented "-C" flag.
8622591Smckusick
8722591Smckusick2.	A quiet "-q" flag has been added for use by the news system.
8822591Smckusick
8922591Smckusick3.	The character chaining has been deleted and the program now uses
9024834Slepreau	hashing.  This improves the speed of the program, especially
9124834Slepreau	during decompression.  Other speed improvements have been made,
9224834Slepreau	such as using putc() instead of fwrite().
9322591Smckusick
9422591Smckusick4.	A large table is used on large machines when a relatively small
9522591Smckusick	number of bits is specified.  This saves much time when compressing
9624834Slepreau	for a 16-bit machine on a 32-bit virtual machine.  Note that the
9724834Slepreau	speed improvement only occurs when the input file is > 30000
9824834Slepreau	characters, and the -b BITS is less than or equal to the cutoff
9924834Slepreau	described below.
10022591Smckusick
10122591SmckusickMost of these changes were made by James A. Woods (ames!jaw).  Thank you
10222591SmckusickJames!
10322591Smckusick
10422591SmckusickTo compile compress:
10522591Smckusick
10622591Smckusick	cc -O -DUSERMEM=usermem -o compress compress.c
10722591Smckusick
10822591SmckusickWhere "usermem" is the amount of physical user memory available (in bytes).
10922591SmckusickIf any physical memory is to be reserved for other processes, put in
11022591Smckusick"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
11122591Smckusick
11222591SmckusickThe difference "usermem-sacredmem" determines the maximum BITS that can be
11322591Smckusickspecified, and the cutoff bits where the large+fast table is used.
11422591Smckusick
11522591Smckusickmemory: at least		BITS		cutoff
11622591Smckusick------  -- -----                ----            ------
11722591Smckusick   4,718,592 			 16		  13
11822591Smckusick   2,621,440 			 16		  12
11922591Smckusick   1,572,864			 16		  11
12024834Slepreau   1,048,576			 16		  10
12122591Smckusick     631,808			 16               --
12222591Smckusick     329,728			 15               --
12322591Smckusick     178,176			 14		  --
12422591Smckusick      99,328			 13		  --
12522591Smckusick           0			 12		  --
12622591Smckusick
12722591SmckusickThe default memory size is 750,000 which gives a maximum BITS=16 and no
12822591Smckusicklarge+fast table.
12922591Smckusick
13022732SmckusickThe maximum bits can be overruled by specifying "-DBITS=bits" at
13122591Smckusickcompilation time.
13222591Smckusick
13322591SmckusickIf your machine doesn't support unsigned characters, define "NO_UCHAR"
13422591Smckusickwhen compiling.
13522591Smckusick
13624834SlepreauIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
13724834Slepreau
13822591SmckusickAfter compilation, move "compress" to a standard executable location, such
13922591Smckusickas /usr/local.  Then:
14022591Smckusick	cd /usr/local
14122591Smckusick	ln compress uncompress
14222591Smckusick	ln compress zcat
14322591Smckusick
14422591SmckusickOn machines that have a fixed stack size (such as Perkin-Elmer), set the
14522591Smckusickstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
14622591Smckusick
14722591SmckusickNext, install the manual (compress.l).
14822591Smckusick	cp compress.l /usr/man/manl
14922591Smckusick	cd /usr/man/manl
15022591Smckusick	ln compress.l uncompress.l
15122591Smckusick	ln compress.l zcat.l
15222591Smckusick
15322591Smckusick		- or -
15422591Smckusick
15522591Smckusick	cp compress.l /usr/man/man1/compress.1
15622591Smckusick	cd /usr/man/man1
15722591Smckusick	ln compress.1 uncompress.1
15822591Smckusick	ln compress.1 zcat.1
15922591Smckusick
16022591Smckusick					regards,
16122591Smckusick					petsd!joe
16222591Smckusick
16322591SmckusickHere is a note from the net:
16422591Smckusick
16522591Smckusick>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
16622591SmckusickPath: ames!hplabs!pesnta!amd!turtlevax!ken
16722591SmckusickFrom: ken@turtlevax.UUCP (Ken Turkowski)
16822591SmckusickNewsgroups: net.sources
16922591SmckusickSubject: Re: Compress release 3.0 : sample Makefile
17022591SmckusickOrganization: CADLINC, Inc. @ Menlo Park, CA
17122591Smckusick
17222591SmckusickIn the compress 3.0 source recently posted to mod.sources, there is a
17322591Smckusick#define variable which can be set for optimum performance on a machine
17422591Smckusickwith a large amount of memory.  A program (usermem) to calculate the
17522591Smckusickuseable amount of physical user memory is enclosed, as well as a sample
17622591Smckusick4.2bsd Vax Makefile for compress.
17722591Smckusick
17822591SmckusickHere is the README file from the previous version of compress (2.0):
17922591Smckusick
18022591Smckusick>Enclosed is compress.c version 2.0 with the following bugs fixed:
18122591Smckusick>
18222591Smckusick>1.	The packed files produced by compress are different on different
18322591Smckusick>	machines and dependent on the vax sysgen option.
18422591Smckusick>		The bug was in the different byte/bit ordering on the
18522591Smckusick>		various machines.  This has been fixed.
18622591Smckusick>
18724834Slepreau>		This version is NOT compatible with the original vax posting
18822591Smckusick>		unless the '-DCOMPATIBLE' option is specified to the C
18922591Smckusick>		compiler.  The original posting has a bug which I fixed,
19022591Smckusick>		causing incompatible files.  I recommend you NOT to use this
19122591Smckusick>		option unless you already have a lot of packed files from
19222591Smckusick>		the original posting by thomas.
19322591Smckusick>2.	The exit status is not well defined (on some machines) causing the
19422591Smckusick>	scripts to fail.
19522591Smckusick>		The exit status is now 0,1 or 2 and is documented in
19622591Smckusick>		compress.l.
19722591Smckusick>3.	The function getopt() is not available in all C libraries.
19822591Smckusick>		The function getopt() is no longer referenced by the
19922591Smckusick>		program.
20022591Smckusick>4.	Error status is not being checked on the fwrite() and fflush() calls.
20122591Smckusick>		Fixed.
20222591Smckusick>
20322591Smckusick>The following enhancements have been made:
20422591Smckusick>
20522591Smckusick>1.	Added facilities of "compact" into the compress program.  "Pack",
20622591Smckusick>	"Unpack", and "Pcat" are no longer required (no longer supplied).
20722591Smckusick>2.	Installed work around for C compiler bug with "-O".
20822591Smckusick>3.	Added a magic number header (\037\235).  Put the bits specified
20922591Smckusick>	in the file.
21022591Smckusick>4.	Added "-f" flag to force overwrite of output file.
21122591Smckusick>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
21222591Smckusick>	compile.
21322591Smckusick>6.	The 'uncompress' script has been deleted; simply
21422591Smckusick>	'ln compress uncompress' after you compile and it will work.
21522591Smckusick>7.	Removed extra bit masking for machines that support unsigned
21622591Smckusick>	characters.  If your machine doesn't support unsigned characters,
21722591Smckusick>	define "NO_UCHAR" when compiling.
21822591Smckusick>
21922591Smckusick>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
22022591Smckusick>standard executable location, such as /usr/local.  Then:
22122591Smckusick>	cd /usr/local
22222591Smckusick>	ln compress uncompress
22322591Smckusick>	ln compress zcat
22422591Smckusick>
22522591Smckusick>On machines that have a fixed stack size (such as Perkin-Elmer), set the
22622591Smckusick>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
22722591Smckusick>
22822591Smckusick>Next, install the manual (compress.l).
22922591Smckusick>	cp compress.l /usr/man/manl		- or -
23022591Smckusick>	cp compress.l /usr/man/man1/compress.1
23122591Smckusick>
23222591Smckusick>Here is the README that I sent with my first posting:
23322591Smckusick>
23422591Smckusick>>Enclosed is a modified version of compress.c, along with scripts to make it
23522591Smckusick>>run identically to pack(1), unpack(1), an pcat(1).  Here is what I
23622591Smckusick>>(petsd!joe) and a colleague (petsd!peora!srd) did:
23722591Smckusick>>
23822591Smckusick>>1. Removed VAX dependencies.
23922591Smckusick>>2. Changed the struct to separate arrays; saves mucho memory.
24022591Smckusick>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
24122591Smckusick>>4. Sorted the character next chain and changed the search to stop
24222591Smckusick>>prematurely.  This saves a lot on the execution time when compressing.
24322591Smckusick>>
24422591Smckusick>>This version is totally compatible with the original version.  Even though
24522591Smckusick>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
24622591Smckusick>>machine, due to the size of the arrays.
24722591Smckusick>>
24822591Smckusick>>Here is the README file from the original author:
24922591Smckusick>>
25022591Smckusick>>>Well, with all this discussion about file compression (for news batching
25122591Smckusick>>>in particular) going around, I decided to implement the text compression
25222591Smckusick>>>algorithm described in the June Computer magazine.  The author claimed
25322591Smckusick>>>blinding speed and good compression ratios.  It's certainly faster than
25422591Smckusick>>>compact (but, then, what wouldn't be), but it's also the same speed as
25522591Smckusick>>>pack, and gets better compression than both of them.  On 350K bytes of
25622591Smckusick>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
25722591Smckusick>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
25822591Smckusick>>>pack got about 30% compression, whereas compress got over 50%.  So, I
25922591Smckusick>>>decided I had something, and that others might be interested, too.
26022591Smckusick>>>
26122591Smckusick>>>As is probably true of compact and pack (although I haven't checked),
26222591Smckusick>>>the byte order within a word is probably relevant here, but as long as
26322591Smckusick>>>you stay on a single machine type, you should be ok.  (Can anybody
26422591Smckusick>>>elucidate on this?)  There are a couple of asm's in the code (extv and
26522591Smckusick>>>insv instructions), so anyone porting it to another machine will have to
26622591Smckusick>>>deal with this anyway (and could probably make it compatible with Vax
26722591Smckusick>>>byte order at the same time).  Anyway, I've linted the code (both with
26822591Smckusick>>>and without -p), so it should run elsewhere.  Note the longs in the
26922591Smckusick>>>code, you can take these out if you reduce BITS to <= 15.
27022591Smckusick>>>
27122591Smckusick>>>Have fun, and as always, if you make good enhancements, or bug fixes,
27222591Smckusick>>>I'd like to see them.
27322591Smckusick>>>
27422591Smckusick>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
27522591Smckusick>>
27622591Smckusick>>					regards,
27722591Smckusick>>					joe
27822591Smckusick>>
27922591Smckusick>>--
28022591Smckusick>>Full-Name:  Joseph M. Orost
28122591Smckusick>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
28222591Smckusick>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
28322591Smckusick>>Phone:      (201) 870-5844
284