xref: /csrg-svn/usr.bin/compress/doc/README (revision 22732)
122591Smckusick
2*22732Smckusick	@(#)README	5.2 (Berkeley) 06/07/85
322591Smckusick
4*22732SmckusickCompress version 3.2 enhancements:
5*22732Smckusick
6*22732Smckusick    	(a) portability mods for Z8000 and PC/XT
7*22732Smckusick	(b) default to 'quiet' mode
8*22732Smckusick	(c) unification of 'force' flags
9*22732Smckusick	(d) multi-file bug fix for USERMEM code
10*22732Smckusick	(e) decompress() speedup (5-10%)
11*22732Smckusick	(f) manual page overhaul
12*22732Smckusick
13*22732SmckusickThis is the baseline for both BSD 4.3 and netnews 2.10.3 from Rick Adams.
14*22732Smckusick
15*22732Smckusick	--jaw  (June 7, 1985)
16*22732Smckusick-----
1722591SmckusickEnclosed is compress version 3.0 with the following changes:
1822591Smckusick
1922591Smckusick1.	"Block" compression is performed.  After the BITS run out, the
2022591Smckusick	compression ratio is checked every so often.  If it is decreasing,
2122591Smckusick	the table is cleared and a new set of substrings are generated.
2222591Smckusick
2322591Smckusick	This makes the output of compress 3.0 not compatable with that of
2422591Smckusick	compress 2.0.  However, compress 3.0 still accepts the output of
2522591Smckusick	compress 2.0.  To generate output that is compatable with compress
2622591Smckusick	2.0, use the undocumented "-C" flag.
2722591Smckusick
2822591Smckusick2.	A quiet "-q" flag has been added for use by the news system.
2922591Smckusick
3022591Smckusick3.	The character chaining has been deleted and the program now uses
3122591Smckusick	hashing.  This boosts speed , especially during compression of
3222591Smckusick	large files.  Other speed improvements have been made, such as
3322591Smckusick	using putc() instead of fwrite().
3422591Smckusick
3522591Smckusick4.	A large table is used on large machines when a relatively small
3622591Smckusick	number of bits is specified.  This saves much time when compressing
3722591Smckusick	for a 16-bit machine on a 32-bit virtual machine.
3822591Smckusick
3922591SmckusickMost of these changes were made by James A. Woods (ames!jaw).  Thank you
4022591SmckusickJames!
4122591Smckusick
4222591SmckusickTo compile compress:
4322591Smckusick
4422591Smckusick	cc -O -DUSERMEM=usermem -o compress compress.c
4522591Smckusick
4622591SmckusickWhere "usermem" is the amount of physical user memory available (in bytes).
4722591SmckusickIf any physical memory is to be reserved for other processes, put in
4822591Smckusick"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
4922591Smckusick
5022591SmckusickThe difference "usermem-sacredmem" determines the maximum BITS that can be
5122591Smckusickspecified, and the cutoff bits where the large+fast table is used.
5222591Smckusick
5322591Smckusickmemory: at least		BITS		cutoff
5422591Smckusick------  -- -----                ----            ------
5522591Smckusick   4,718,592 			 16		  13
5622591Smckusick   2,621,440 			 16		  12
5722591Smckusick   1,572,864			 16		  11
5822591Smckusick     631,808			 16               --
5922591Smckusick     329,728			 15               --
6022591Smckusick     178,176			 14		  --
6122591Smckusick      99,328			 13		  --
6222591Smckusick           0			 12		  --
6322591Smckusick
6422591SmckusickThe default memory size is 750,000 which gives a maximum BITS=16 and no
6522591Smckusicklarge+fast table.
6622591Smckusick
67*22732SmckusickThe maximum bits can be overruled by specifying "-DBITS=bits" at
6822591Smckusickcompilation time.
6922591Smckusick
7022591SmckusickIf your machine doesn't support unsigned characters, define "NO_UCHAR"
7122591Smckusickwhen compiling.
7222591Smckusick
7322591SmckusickAfter compilation, move "compress" to a standard executable location, such
7422591Smckusickas /usr/local.  Then:
7522591Smckusick	cd /usr/local
7622591Smckusick	ln compress uncompress
7722591Smckusick	ln compress zcat
7822591Smckusick
7922591SmckusickOn machines that have a fixed stack size (such as Perkin-Elmer), set the
8022591Smckusickstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
8122591Smckusick
8222591SmckusickNext, install the manual (compress.l).
8322591Smckusick	cp compress.l /usr/man/manl
8422591Smckusick	cd /usr/man/manl
8522591Smckusick	ln compress.l uncompress.l
8622591Smckusick	ln compress.l zcat.l
8722591Smckusick
8822591Smckusick		- or -
8922591Smckusick
9022591Smckusick	cp compress.l /usr/man/man1/compress.1
9122591Smckusick	cd /usr/man/man1
9222591Smckusick	ln compress.1 uncompress.1
9322591Smckusick	ln compress.1 zcat.1
9422591Smckusick
9522591Smckusick					regards,
9622591Smckusick					petsd!joe
9722591Smckusick
9822591SmckusickHere is a note from the net:
9922591Smckusick
10022591Smckusick>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
10122591SmckusickPath: ames!hplabs!pesnta!amd!turtlevax!ken
10222591SmckusickFrom: ken@turtlevax.UUCP (Ken Turkowski)
10322591SmckusickNewsgroups: net.sources
10422591SmckusickSubject: Re: Compress release 3.0 : sample Makefile
10522591SmckusickOrganization: CADLINC, Inc. @ Menlo Park, CA
10622591Smckusick
10722591SmckusickIn the compress 3.0 source recently posted to mod.sources, there is a
10822591Smckusick#define variable which can be set for optimum performance on a machine
10922591Smckusickwith a large amount of memory.  A program (usermem) to calculate the
11022591Smckusickuseable amount of physical user memory is enclosed, as well as a sample
11122591Smckusick4.2bsd Vax Makefile for compress.
11222591Smckusick
11322591SmckusickHere is the README file from the previous version of compress (2.0):
11422591Smckusick
11522591Smckusick>Enclosed is compress.c version 2.0 with the following bugs fixed:
11622591Smckusick>
11722591Smckusick>1.	The packed files produced by compress are different on different
11822591Smckusick>	machines and dependent on the vax sysgen option.
11922591Smckusick>		The bug was in the different byte/bit ordering on the
12022591Smckusick>		various machines.  This has been fixed.
12122591Smckusick>
122*22732Smckusick>		This version is NOT compatable with the original vax posting
12322591Smckusick>		unless the '-DCOMPATIBLE' option is specified to the C
12422591Smckusick>		compiler.  The original posting has a bug which I fixed,
12522591Smckusick>		causing incompatible files.  I recommend you NOT to use this
12622591Smckusick>		option unless you already have a lot of packed files from
12722591Smckusick>		the original posting by thomas.
12822591Smckusick>2.	The exit status is not well defined (on some machines) causing the
12922591Smckusick>	scripts to fail.
13022591Smckusick>		The exit status is now 0,1 or 2 and is documented in
13122591Smckusick>		compress.l.
13222591Smckusick>3.	The function getopt() is not available in all C libraries.
13322591Smckusick>		The function getopt() is no longer referenced by the
13422591Smckusick>		program.
13522591Smckusick>4.	Error status is not being checked on the fwrite() and fflush() calls.
13622591Smckusick>		Fixed.
13722591Smckusick>
13822591Smckusick>The following enhancements have been made:
13922591Smckusick>
14022591Smckusick>1.	Added facilities of "compact" into the compress program.  "Pack",
14122591Smckusick>	"Unpack", and "Pcat" are no longer required (no longer supplied).
14222591Smckusick>2.	Installed work around for C compiler bug with "-O".
14322591Smckusick>3.	Added a magic number header (\037\235).  Put the bits specified
14422591Smckusick>	in the file.
14522591Smckusick>4.	Added "-f" flag to force overwrite of output file.
14622591Smckusick>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
14722591Smckusick>	compile.
14822591Smckusick>6.	The 'uncompress' script has been deleted; simply
14922591Smckusick>	'ln compress uncompress' after you compile and it will work.
15022591Smckusick>7.	Removed extra bit masking for machines that support unsigned
15122591Smckusick>	characters.  If your machine doesn't support unsigned characters,
15222591Smckusick>	define "NO_UCHAR" when compiling.
15322591Smckusick>
15422591Smckusick>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
15522591Smckusick>standard executable location, such as /usr/local.  Then:
15622591Smckusick>	cd /usr/local
15722591Smckusick>	ln compress uncompress
15822591Smckusick>	ln compress zcat
15922591Smckusick>
16022591Smckusick>On machines that have a fixed stack size (such as Perkin-Elmer), set the
16122591Smckusick>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
16222591Smckusick>
16322591Smckusick>Next, install the manual (compress.l).
16422591Smckusick>	cp compress.l /usr/man/manl		- or -
16522591Smckusick>	cp compress.l /usr/man/man1/compress.1
16622591Smckusick>
16722591Smckusick>Here is the README that I sent with my first posting:
16822591Smckusick>
16922591Smckusick>>Enclosed is a modified version of compress.c, along with scripts to make it
17022591Smckusick>>run identically to pack(1), unpack(1), an pcat(1).  Here is what I
17122591Smckusick>>(petsd!joe) and a colleague (petsd!peora!srd) did:
17222591Smckusick>>
17322591Smckusick>>1. Removed VAX dependencies.
17422591Smckusick>>2. Changed the struct to separate arrays; saves mucho memory.
17522591Smckusick>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
17622591Smckusick>>4. Sorted the character next chain and changed the search to stop
17722591Smckusick>>prematurely.  This saves a lot on the execution time when compressing.
17822591Smckusick>>
17922591Smckusick>>This version is totally compatible with the original version.  Even though
18022591Smckusick>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
18122591Smckusick>>machine, due to the size of the arrays.
18222591Smckusick>>
18322591Smckusick>>Here is the README file from the original author:
18422591Smckusick>>
18522591Smckusick>>>Well, with all this discussion about file compression (for news batching
18622591Smckusick>>>in particular) going around, I decided to implement the text compression
18722591Smckusick>>>algorithm described in the June Computer magazine.  The author claimed
18822591Smckusick>>>blinding speed and good compression ratios.  It's certainly faster than
18922591Smckusick>>>compact (but, then, what wouldn't be), but it's also the same speed as
19022591Smckusick>>>pack, and gets better compression than both of them.  On 350K bytes of
19122591Smckusick>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
19222591Smckusick>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
19322591Smckusick>>>pack got about 30% compression, whereas compress got over 50%.  So, I
19422591Smckusick>>>decided I had something, and that others might be interested, too.
19522591Smckusick>>>
19622591Smckusick>>>As is probably true of compact and pack (although I haven't checked),
19722591Smckusick>>>the byte order within a word is probably relevant here, but as long as
19822591Smckusick>>>you stay on a single machine type, you should be ok.  (Can anybody
19922591Smckusick>>>elucidate on this?)  There are a couple of asm's in the code (extv and
20022591Smckusick>>>insv instructions), so anyone porting it to another machine will have to
20122591Smckusick>>>deal with this anyway (and could probably make it compatible with Vax
20222591Smckusick>>>byte order at the same time).  Anyway, I've linted the code (both with
20322591Smckusick>>>and without -p), so it should run elsewhere.  Note the longs in the
20422591Smckusick>>>code, you can take these out if you reduce BITS to <= 15.
20522591Smckusick>>>
20622591Smckusick>>>Have fun, and as always, if you make good enhancements, or bug fixes,
20722591Smckusick>>>I'd like to see them.
20822591Smckusick>>>
20922591Smckusick>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
21022591Smckusick>>
21122591Smckusick>>					regards,
21222591Smckusick>>					joe
21322591Smckusick>>
21422591Smckusick>>--
21522591Smckusick>>Full-Name:  Joseph M. Orost
21622591Smckusick>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
21722591Smckusick>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
21822591Smckusick>>Phone:      (201) 870-5844
219