xref: /csrg-svn/usr.bin/compress/doc/README (revision 24834)
122591Smckusick
2*24834Slepreau	@(#)README	5.3 (Berkeley) 09/17/85
322591Smckusick
4*24834SlepreauCompress version 4.0 improvements over 3.0:
5*24834Slepreau	o compress() speedup (10-50%) by changing division hash to xor
6*24834Slepreau	o decompress() speedup (5-10%)
7*24834Slepreau	o Memory requirements reduced (3-30%)
8*24834Slepreau	o Stack requirements reduced to less than 4kb
9*24834Slepreau	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
10*24834Slepreau    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
11*24834Slepreau	o Default to 'quiet' mode
12*24834Slepreau	o Unification of 'force' flags
13*24834Slepreau	o Manual page overhaul
14*24834Slepreau	o Portability enhancement for M_XENIX
15*24834Slepreau	o Removed text on #else and #endif
16*24834Slepreau	o Added "-V" switch to print version and options
17*24834Slepreau	o Added #defines for SIGNED_COMPARE_SLOW
18*24834Slepreau	o Added Makefile and "usermem" program
19*24834Slepreau	o Removed all floating point computations
20*24834Slepreau	o New programs: [deleted]
2122732Smckusick
22*24834SlepreauThe "usermem" script attempts to determine the maximum process size.  Some
23*24834Slepreauediting of the script may be necessary (see the comments).  [It should work
24*24834Slepreaufine on 4.3 bsd.] If you can't get it to work at all, just create file
25*24834Slepreau"USERMEM" containing the maximum process size in decimal.
2622732Smckusick
27*24834SlepreauThe following preprocessor symbols control the compilation of "compress.c":
2822732Smckusick
29*24834Slepreau	o USERMEM		Maximum process memory on the system
30*24834Slepreau	o SACREDMEM		Amount to reserve for other proceses
31*24834Slepreau	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
32*24834Slepreau	o NO_UCHAR		Don't use "unsigned char" types
33*24834Slepreau	o BITS			Overrules default set by USERMEM-SACREDMEM
34*24834Slepreau	o vax			Generate inline assembler
35*24834Slepreau	o interdata		Defines SIGNED_COMPARE_SLOW
36*24834Slepreau	o M_XENIX		Makes arrays < 65536 bytes each
37*24834Slepreau	o pdp11			BITS=12, NO_UCHAR
38*24834Slepreau	o z8000			BITS=12
39*24834Slepreau	o pcxt			BITS=12
40*24834Slepreau	o BSD4_2		Allow long filenames ( > 14 characters) &
41*24834Slepreau				Call setlinebuf(stderr)
42*24834Slepreau
43*24834SlepreauThe difference "usermem-sacredmem" determines the maximum BITS that can be
44*24834Slepreauspecified with the "-b" flag.
45*24834Slepreau
46*24834Slepreaumemory: at least		BITS
47*24834Slepreau------  -- -----                ----
48*24834Slepreau     433,484			 16
49*24834Slepreau     229,600			 15
50*24834Slepreau     127,536			 14
51*24834Slepreau      73,464			 13
52*24834Slepreau           0			 12
53*24834Slepreau
54*24834SlepreauThe default is BITS=16.
55*24834Slepreau
56*24834SlepreauThe maximum bits can be overrulled by specifying "-DBITS=bits" at
57*24834Slepreaucompilation time.
58*24834Slepreau
59*24834SlepreauWARNING: files compressed on a large machine with more bits than allowed by
60*24834Slepreaua version of compress on a smaller machine cannot be decompressed!  Use the
61*24834Slepreau"-b12" flag to generate a file on a large machine that can be uncompressed
62*24834Slepreauon a 16-bit machine.
63*24834Slepreau
64*24834SlepreauThe output of compress 4.0 is fully compatible with that of compress 3.0.
65*24834SlepreauIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or
66*24834Slepreauthe output of compress 3.0 may be fed into uncompress 4.0.
67*24834Slepreau
68*24834SlepreauThe output of compress 4.0 not compatible with that of
69*24834Slepreaucompress 2.0.  However, compress 4.0 still accepts the output of
70*24834Slepreaucompress 2.0.  To generate output that is compatible with compress
71*24834Slepreau2.0, use the undocumented "-C" flag.
72*24834Slepreau
73*24834Slepreau	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
74*24834Slepreau--------------------------------
75*24834Slepreau
7622591SmckusickEnclosed is compress version 3.0 with the following changes:
7722591Smckusick
7822591Smckusick1.	"Block" compression is performed.  After the BITS run out, the
7922591Smckusick	compression ratio is checked every so often.  If it is decreasing,
8022591Smckusick	the table is cleared and a new set of substrings are generated.
8122591Smckusick
82*24834Slepreau	This makes the output of compress 3.0 not compatible with that of
8322591Smckusick	compress 2.0.  However, compress 3.0 still accepts the output of
84*24834Slepreau	compress 2.0.  To generate output that is compatible with compress
8522591Smckusick	2.0, use the undocumented "-C" flag.
8622591Smckusick
8722591Smckusick2.	A quiet "-q" flag has been added for use by the news system.
8822591Smckusick
8922591Smckusick3.	The character chaining has been deleted and the program now uses
90*24834Slepreau	hashing.  This improves the speed of the program, especially
91*24834Slepreau	during decompression.  Other speed improvements have been made,
92*24834Slepreau	such as using putc() instead of fwrite().
9322591Smckusick
9422591Smckusick4.	A large table is used on large machines when a relatively small
9522591Smckusick	number of bits is specified.  This saves much time when compressing
96*24834Slepreau	for a 16-bit machine on a 32-bit virtual machine.  Note that the
97*24834Slepreau	speed improvement only occurs when the input file is > 30000
98*24834Slepreau	characters, and the -b BITS is less than or equal to the cutoff
99*24834Slepreau	described below.
10022591Smckusick
10122591SmckusickMost of these changes were made by James A. Woods (ames!jaw).  Thank you
10222591SmckusickJames!
10322591Smckusick
10422591SmckusickTo compile compress:
10522591Smckusick
10622591Smckusick	cc -O -DUSERMEM=usermem -o compress compress.c
10722591Smckusick
10822591SmckusickWhere "usermem" is the amount of physical user memory available (in bytes).
10922591SmckusickIf any physical memory is to be reserved for other processes, put in
11022591Smckusick"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
11122591Smckusick
11222591SmckusickThe difference "usermem-sacredmem" determines the maximum BITS that can be
11322591Smckusickspecified, and the cutoff bits where the large+fast table is used.
11422591Smckusick
11522591Smckusickmemory: at least		BITS		cutoff
11622591Smckusick------  -- -----                ----            ------
11722591Smckusick   4,718,592 			 16		  13
11822591Smckusick   2,621,440 			 16		  12
11922591Smckusick   1,572,864			 16		  11
120*24834Slepreau   1,048,576			 16		  10
12122591Smckusick     631,808			 16               --
12222591Smckusick     329,728			 15               --
12322591Smckusick     178,176			 14		  --
12422591Smckusick      99,328			 13		  --
12522591Smckusick           0			 12		  --
12622591Smckusick
12722591SmckusickThe default memory size is 750,000 which gives a maximum BITS=16 and no
12822591Smckusicklarge+fast table.
12922591Smckusick
13022732SmckusickThe maximum bits can be overruled by specifying "-DBITS=bits" at
13122591Smckusickcompilation time.
13222591Smckusick
13322591SmckusickIf your machine doesn't support unsigned characters, define "NO_UCHAR"
13422591Smckusickwhen compiling.
13522591Smckusick
136*24834SlepreauIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
137*24834Slepreau
13822591SmckusickAfter compilation, move "compress" to a standard executable location, such
13922591Smckusickas /usr/local.  Then:
14022591Smckusick	cd /usr/local
14122591Smckusick	ln compress uncompress
14222591Smckusick	ln compress zcat
14322591Smckusick
14422591SmckusickOn machines that have a fixed stack size (such as Perkin-Elmer), set the
14522591Smckusickstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
14622591Smckusick
14722591SmckusickNext, install the manual (compress.l).
14822591Smckusick	cp compress.l /usr/man/manl
14922591Smckusick	cd /usr/man/manl
15022591Smckusick	ln compress.l uncompress.l
15122591Smckusick	ln compress.l zcat.l
15222591Smckusick
15322591Smckusick		- or -
15422591Smckusick
15522591Smckusick	cp compress.l /usr/man/man1/compress.1
15622591Smckusick	cd /usr/man/man1
15722591Smckusick	ln compress.1 uncompress.1
15822591Smckusick	ln compress.1 zcat.1
15922591Smckusick
16022591Smckusick					regards,
16122591Smckusick					petsd!joe
16222591Smckusick
16322591SmckusickHere is a note from the net:
16422591Smckusick
16522591Smckusick>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
16622591SmckusickPath: ames!hplabs!pesnta!amd!turtlevax!ken
16722591SmckusickFrom: ken@turtlevax.UUCP (Ken Turkowski)
16822591SmckusickNewsgroups: net.sources
16922591SmckusickSubject: Re: Compress release 3.0 : sample Makefile
17022591SmckusickOrganization: CADLINC, Inc. @ Menlo Park, CA
17122591Smckusick
17222591SmckusickIn the compress 3.0 source recently posted to mod.sources, there is a
17322591Smckusick#define variable which can be set for optimum performance on a machine
17422591Smckusickwith a large amount of memory.  A program (usermem) to calculate the
17522591Smckusickuseable amount of physical user memory is enclosed, as well as a sample
17622591Smckusick4.2bsd Vax Makefile for compress.
17722591Smckusick
17822591SmckusickHere is the README file from the previous version of compress (2.0):
17922591Smckusick
18022591Smckusick>Enclosed is compress.c version 2.0 with the following bugs fixed:
18122591Smckusick>
18222591Smckusick>1.	The packed files produced by compress are different on different
18322591Smckusick>	machines and dependent on the vax sysgen option.
18422591Smckusick>		The bug was in the different byte/bit ordering on the
18522591Smckusick>		various machines.  This has been fixed.
18622591Smckusick>
187*24834Slepreau>		This version is NOT compatible with the original vax posting
18822591Smckusick>		unless the '-DCOMPATIBLE' option is specified to the C
18922591Smckusick>		compiler.  The original posting has a bug which I fixed,
19022591Smckusick>		causing incompatible files.  I recommend you NOT to use this
19122591Smckusick>		option unless you already have a lot of packed files from
19222591Smckusick>		the original posting by thomas.
19322591Smckusick>2.	The exit status is not well defined (on some machines) causing the
19422591Smckusick>	scripts to fail.
19522591Smckusick>		The exit status is now 0,1 or 2 and is documented in
19622591Smckusick>		compress.l.
19722591Smckusick>3.	The function getopt() is not available in all C libraries.
19822591Smckusick>		The function getopt() is no longer referenced by the
19922591Smckusick>		program.
20022591Smckusick>4.	Error status is not being checked on the fwrite() and fflush() calls.
20122591Smckusick>		Fixed.
20222591Smckusick>
20322591Smckusick>The following enhancements have been made:
20422591Smckusick>
20522591Smckusick>1.	Added facilities of "compact" into the compress program.  "Pack",
20622591Smckusick>	"Unpack", and "Pcat" are no longer required (no longer supplied).
20722591Smckusick>2.	Installed work around for C compiler bug with "-O".
20822591Smckusick>3.	Added a magic number header (\037\235).  Put the bits specified
20922591Smckusick>	in the file.
21022591Smckusick>4.	Added "-f" flag to force overwrite of output file.
21122591Smckusick>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
21222591Smckusick>	compile.
21322591Smckusick>6.	The 'uncompress' script has been deleted; simply
21422591Smckusick>	'ln compress uncompress' after you compile and it will work.
21522591Smckusick>7.	Removed extra bit masking for machines that support unsigned
21622591Smckusick>	characters.  If your machine doesn't support unsigned characters,
21722591Smckusick>	define "NO_UCHAR" when compiling.
21822591Smckusick>
21922591Smckusick>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
22022591Smckusick>standard executable location, such as /usr/local.  Then:
22122591Smckusick>	cd /usr/local
22222591Smckusick>	ln compress uncompress
22322591Smckusick>	ln compress zcat
22422591Smckusick>
22522591Smckusick>On machines that have a fixed stack size (such as Perkin-Elmer), set the
22622591Smckusick>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
22722591Smckusick>
22822591Smckusick>Next, install the manual (compress.l).
22922591Smckusick>	cp compress.l /usr/man/manl		- or -
23022591Smckusick>	cp compress.l /usr/man/man1/compress.1
23122591Smckusick>
23222591Smckusick>Here is the README that I sent with my first posting:
23322591Smckusick>
23422591Smckusick>>Enclosed is a modified version of compress.c, along with scripts to make it
23522591Smckusick>>run identically to pack(1), unpack(1), an pcat(1).  Here is what I
23622591Smckusick>>(petsd!joe) and a colleague (petsd!peora!srd) did:
23722591Smckusick>>
23822591Smckusick>>1. Removed VAX dependencies.
23922591Smckusick>>2. Changed the struct to separate arrays; saves mucho memory.
24022591Smckusick>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
24122591Smckusick>>4. Sorted the character next chain and changed the search to stop
24222591Smckusick>>prematurely.  This saves a lot on the execution time when compressing.
24322591Smckusick>>
24422591Smckusick>>This version is totally compatible with the original version.  Even though
24522591Smckusick>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
24622591Smckusick>>machine, due to the size of the arrays.
24722591Smckusick>>
24822591Smckusick>>Here is the README file from the original author:
24922591Smckusick>>
25022591Smckusick>>>Well, with all this discussion about file compression (for news batching
25122591Smckusick>>>in particular) going around, I decided to implement the text compression
25222591Smckusick>>>algorithm described in the June Computer magazine.  The author claimed
25322591Smckusick>>>blinding speed and good compression ratios.  It's certainly faster than
25422591Smckusick>>>compact (but, then, what wouldn't be), but it's also the same speed as
25522591Smckusick>>>pack, and gets better compression than both of them.  On 350K bytes of
25622591Smckusick>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
25722591Smckusick>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
25822591Smckusick>>>pack got about 30% compression, whereas compress got over 50%.  So, I
25922591Smckusick>>>decided I had something, and that others might be interested, too.
26022591Smckusick>>>
26122591Smckusick>>>As is probably true of compact and pack (although I haven't checked),
26222591Smckusick>>>the byte order within a word is probably relevant here, but as long as
26322591Smckusick>>>you stay on a single machine type, you should be ok.  (Can anybody
26422591Smckusick>>>elucidate on this?)  There are a couple of asm's in the code (extv and
26522591Smckusick>>>insv instructions), so anyone porting it to another machine will have to
26622591Smckusick>>>deal with this anyway (and could probably make it compatible with Vax
26722591Smckusick>>>byte order at the same time).  Anyway, I've linted the code (both with
26822591Smckusick>>>and without -p), so it should run elsewhere.  Note the longs in the
26922591Smckusick>>>code, you can take these out if you reduce BITS to <= 15.
27022591Smckusick>>>
27122591Smckusick>>>Have fun, and as always, if you make good enhancements, or bug fixes,
27222591Smckusick>>>I'd like to see them.
27322591Smckusick>>>
27422591Smckusick>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
27522591Smckusick>>
27622591Smckusick>>					regards,
27722591Smckusick>>					joe
27822591Smckusick>>
27922591Smckusick>>--
28022591Smckusick>>Full-Name:  Joseph M. Orost
28122591Smckusick>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
28222591Smckusick>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
28322591Smckusick>>Phone:      (201) 870-5844
284