xref: /dflybsd-src/usr.bin/compress/doc/README (revision 86d7f5d305c6adaa56ff4582ece9859d73106103)
186d7f5d3SJohn Marino
286d7f5d3SJohn Marino	@(#)README	8.1 (Berkeley) 6/9/93
386d7f5d3SJohn Marino
486d7f5d3SJohn MarinoCompress version 4.0 improvements over 3.0:
586d7f5d3SJohn Marino	o compress() speedup (10-50%) by changing division hash to xor
686d7f5d3SJohn Marino	o decompress() speedup (5-10%)
786d7f5d3SJohn Marino	o Memory requirements reduced (3-30%)
886d7f5d3SJohn Marino	o Stack requirements reduced to less than 4kb
986d7f5d3SJohn Marino	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
1086d7f5d3SJohn Marino    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
1186d7f5d3SJohn Marino	o Default to 'quiet' mode
1286d7f5d3SJohn Marino	o Unification of 'force' flags
1386d7f5d3SJohn Marino	o Manual page overhaul
1486d7f5d3SJohn Marino	o Portability enhancement for M_XENIX
1586d7f5d3SJohn Marino	o Removed text on #else and #endif
1686d7f5d3SJohn Marino	o Added "-V" switch to print version and options
1786d7f5d3SJohn Marino	o Added #defines for SIGNED_COMPARE_SLOW
1886d7f5d3SJohn Marino	o Added Makefile and "usermem" program
1986d7f5d3SJohn Marino	o Removed all floating point computations
2086d7f5d3SJohn Marino	o New programs: [deleted]
2186d7f5d3SJohn Marino
2286d7f5d3SJohn MarinoThe "usermem" script attempts to determine the maximum process size.  Some
2386d7f5d3SJohn Marinoediting of the script may be necessary (see the comments).  [It should work
2486d7f5d3SJohn Marinofine on 4.3 bsd.] If you can't get it to work at all, just create file
2586d7f5d3SJohn Marino"USERMEM" containing the maximum process size in decimal.
2686d7f5d3SJohn Marino
2786d7f5d3SJohn MarinoThe following preprocessor symbols control the compilation of "compress.c":
2886d7f5d3SJohn Marino
2986d7f5d3SJohn Marino	o USERMEM		Maximum process memory on the system
3086d7f5d3SJohn Marino	o SACREDMEM		Amount to reserve for other proceses
3186d7f5d3SJohn Marino	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
3286d7f5d3SJohn Marino	o NO_UCHAR		Don't use "unsigned char" types
3386d7f5d3SJohn Marino	o BITS			Overrules default set by USERMEM-SACREDMEM
3486d7f5d3SJohn Marino	o vax			Generate inline assembler
3586d7f5d3SJohn Marino	o interdata		Defines SIGNED_COMPARE_SLOW
3686d7f5d3SJohn Marino	o M_XENIX		Makes arrays < 65536 bytes each
3786d7f5d3SJohn Marino	o pdp11			BITS=12, NO_UCHAR
3886d7f5d3SJohn Marino	o z8000			BITS=12
3986d7f5d3SJohn Marino	o pcxt			BITS=12
4086d7f5d3SJohn Marino	o BSD4_2		Allow long filenames ( > 14 characters) &
4186d7f5d3SJohn Marino				Call setlinebuf(stderr)
4286d7f5d3SJohn Marino
4386d7f5d3SJohn MarinoThe difference "usermem-sacredmem" determines the maximum BITS that can be
4486d7f5d3SJohn Marinospecified with the "-b" flag.
4586d7f5d3SJohn Marino
4686d7f5d3SJohn Marinomemory: at least		BITS
4786d7f5d3SJohn Marino------  -- -----                ----
4886d7f5d3SJohn Marino     433,484			 16
4986d7f5d3SJohn Marino     229,600			 15
5086d7f5d3SJohn Marino     127,536			 14
5186d7f5d3SJohn Marino      73,464			 13
5286d7f5d3SJohn Marino           0			 12
5386d7f5d3SJohn Marino
5486d7f5d3SJohn MarinoThe default is BITS=16.
5586d7f5d3SJohn Marino
5686d7f5d3SJohn MarinoThe maximum bits can be overrulled by specifying "-DBITS=bits" at
5786d7f5d3SJohn Marinocompilation time.
5886d7f5d3SJohn Marino
5986d7f5d3SJohn MarinoWARNING: files compressed on a large machine with more bits than allowed by
6086d7f5d3SJohn Marinoa version of compress on a smaller machine cannot be decompressed!  Use the
6186d7f5d3SJohn Marino"-b12" flag to generate a file on a large machine that can be uncompressed
6286d7f5d3SJohn Marinoon a 16-bit machine.
6386d7f5d3SJohn Marino
6486d7f5d3SJohn MarinoThe output of compress 4.0 is fully compatible with that of compress 3.0.
6586d7f5d3SJohn MarinoIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or
6686d7f5d3SJohn Marinothe output of compress 3.0 may be fed into uncompress 4.0.
6786d7f5d3SJohn Marino
6886d7f5d3SJohn MarinoThe output of compress 4.0 not compatible with that of
6986d7f5d3SJohn Marinocompress 2.0.  However, compress 4.0 still accepts the output of
7086d7f5d3SJohn Marinocompress 2.0.  To generate output that is compatible with compress
7186d7f5d3SJohn Marino2.0, use the undocumented "-C" flag.
7286d7f5d3SJohn Marino
7386d7f5d3SJohn Marino	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
7486d7f5d3SJohn Marino--------------------------------
7586d7f5d3SJohn Marino
7686d7f5d3SJohn MarinoEnclosed is compress version 3.0 with the following changes:
7786d7f5d3SJohn Marino
7886d7f5d3SJohn Marino1.	"Block" compression is performed.  After the BITS run out, the
7986d7f5d3SJohn Marino	compression ratio is checked every so often.  If it is decreasing,
8086d7f5d3SJohn Marino	the table is cleared and a new set of substrings are generated.
8186d7f5d3SJohn Marino
8286d7f5d3SJohn Marino	This makes the output of compress 3.0 not compatible with that of
8386d7f5d3SJohn Marino	compress 2.0.  However, compress 3.0 still accepts the output of
8486d7f5d3SJohn Marino	compress 2.0.  To generate output that is compatible with compress
8586d7f5d3SJohn Marino	2.0, use the undocumented "-C" flag.
8686d7f5d3SJohn Marino
8786d7f5d3SJohn Marino2.	A quiet "-q" flag has been added for use by the news system.
8886d7f5d3SJohn Marino
8986d7f5d3SJohn Marino3.	The character chaining has been deleted and the program now uses
9086d7f5d3SJohn Marino	hashing.  This improves the speed of the program, especially
9186d7f5d3SJohn Marino	during decompression.  Other speed improvements have been made,
9286d7f5d3SJohn Marino	such as using putc() instead of fwrite().
9386d7f5d3SJohn Marino
9486d7f5d3SJohn Marino4.	A large table is used on large machines when a relatively small
9586d7f5d3SJohn Marino	number of bits is specified.  This saves much time when compressing
9686d7f5d3SJohn Marino	for a 16-bit machine on a 32-bit virtual machine.  Note that the
9786d7f5d3SJohn Marino	speed improvement only occurs when the input file is > 30000
9886d7f5d3SJohn Marino	characters, and the -b BITS is less than or equal to the cutoff
9986d7f5d3SJohn Marino	described below.
10086d7f5d3SJohn Marino
10186d7f5d3SJohn MarinoMost of these changes were made by James A. Woods (ames!jaw).  Thank you
10286d7f5d3SJohn MarinoJames!
10386d7f5d3SJohn Marino
10486d7f5d3SJohn MarinoTo compile compress:
10586d7f5d3SJohn Marino
10686d7f5d3SJohn Marino	cc -O -DUSERMEM=usermem -o compress compress.c
10786d7f5d3SJohn Marino
10886d7f5d3SJohn MarinoWhere "usermem" is the amount of physical user memory available (in bytes).
10986d7f5d3SJohn MarinoIf any physical memory is to be reserved for other processes, put in
11086d7f5d3SJohn Marino"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
11186d7f5d3SJohn Marino
11286d7f5d3SJohn MarinoThe difference "usermem-sacredmem" determines the maximum BITS that can be
11386d7f5d3SJohn Marinospecified, and the cutoff bits where the large+fast table is used.
11486d7f5d3SJohn Marino
11586d7f5d3SJohn Marinomemory: at least		BITS		cutoff
11686d7f5d3SJohn Marino------  -- -----                ----            ------
11786d7f5d3SJohn Marino   4,718,592 			 16		  13
11886d7f5d3SJohn Marino   2,621,440 			 16		  12
11986d7f5d3SJohn Marino   1,572,864			 16		  11
12086d7f5d3SJohn Marino   1,048,576			 16		  10
12186d7f5d3SJohn Marino     631,808			 16               --
12286d7f5d3SJohn Marino     329,728			 15               --
12386d7f5d3SJohn Marino     178,176			 14		  --
12486d7f5d3SJohn Marino      99,328			 13		  --
12586d7f5d3SJohn Marino           0			 12		  --
12686d7f5d3SJohn Marino
12786d7f5d3SJohn MarinoThe default memory size is 750,000 which gives a maximum BITS=16 and no
12886d7f5d3SJohn Marinolarge+fast table.
12986d7f5d3SJohn Marino
13086d7f5d3SJohn MarinoThe maximum bits can be overruled by specifying "-DBITS=bits" at
13186d7f5d3SJohn Marinocompilation time.
13286d7f5d3SJohn Marino
13386d7f5d3SJohn MarinoIf your machine doesn't support unsigned characters, define "NO_UCHAR"
13486d7f5d3SJohn Marinowhen compiling.
13586d7f5d3SJohn Marino
13686d7f5d3SJohn MarinoIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
13786d7f5d3SJohn Marino
13886d7f5d3SJohn MarinoAfter compilation, move "compress" to a standard executable location, such
13986d7f5d3SJohn Marinoas /usr/local.  Then:
14086d7f5d3SJohn Marino	cd /usr/local
14186d7f5d3SJohn Marino	ln compress uncompress
14286d7f5d3SJohn Marino	ln compress zcat
14386d7f5d3SJohn Marino
14486d7f5d3SJohn MarinoOn machines that have a fixed stack size (such as Perkin-Elmer), set the
14586d7f5d3SJohn Marinostack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
14686d7f5d3SJohn Marino
14786d7f5d3SJohn MarinoNext, install the manual (compress.l).
14886d7f5d3SJohn Marino	cp compress.l /usr/man/manl
14986d7f5d3SJohn Marino	cd /usr/man/manl
15086d7f5d3SJohn Marino	ln compress.l uncompress.l
15186d7f5d3SJohn Marino	ln compress.l zcat.l
15286d7f5d3SJohn Marino
15386d7f5d3SJohn Marino		- or -
15486d7f5d3SJohn Marino
15586d7f5d3SJohn Marino	cp compress.l /usr/man/man1/compress.1
15686d7f5d3SJohn Marino	cd /usr/man/man1
15786d7f5d3SJohn Marino	ln compress.1 uncompress.1
15886d7f5d3SJohn Marino	ln compress.1 zcat.1
15986d7f5d3SJohn Marino
16086d7f5d3SJohn Marino					regards,
16186d7f5d3SJohn Marino					petsd!joe
16286d7f5d3SJohn Marino
16386d7f5d3SJohn MarinoHere is a note from the net:
16486d7f5d3SJohn Marino
16586d7f5d3SJohn Marino>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
16686d7f5d3SJohn MarinoPath: ames!hplabs!pesnta!amd!turtlevax!ken
16786d7f5d3SJohn MarinoFrom: ken@turtlevax.UUCP (Ken Turkowski)
16886d7f5d3SJohn MarinoNewsgroups: net.sources
16986d7f5d3SJohn MarinoSubject: Re: Compress release 3.0 : sample Makefile
17086d7f5d3SJohn MarinoOrganization: CADLINC, Inc. @ Menlo Park, CA
17186d7f5d3SJohn Marino
17286d7f5d3SJohn MarinoIn the compress 3.0 source recently posted to mod.sources, there is a
17386d7f5d3SJohn Marino#define variable which can be set for optimum performance on a machine
17486d7f5d3SJohn Marinowith a large amount of memory.  A program (usermem) to calculate the
17586d7f5d3SJohn Marinouseable amount of physical user memory is enclosed, as well as a sample
17686d7f5d3SJohn Marino4.2bsd Vax Makefile for compress.
17786d7f5d3SJohn Marino
17886d7f5d3SJohn MarinoHere is the README file from the previous version of compress (2.0):
17986d7f5d3SJohn Marino
18086d7f5d3SJohn Marino>Enclosed is compress.c version 2.0 with the following bugs fixed:
18186d7f5d3SJohn Marino>
18286d7f5d3SJohn Marino>1.	The packed files produced by compress are different on different
18386d7f5d3SJohn Marino>	machines and dependent on the vax sysgen option.
18486d7f5d3SJohn Marino>		The bug was in the different byte/bit ordering on the
18586d7f5d3SJohn Marino>		various machines.  This has been fixed.
18686d7f5d3SJohn Marino>
18786d7f5d3SJohn Marino>		This version is NOT compatible with the original vax posting
18886d7f5d3SJohn Marino>		unless the '-DCOMPATIBLE' option is specified to the C
18986d7f5d3SJohn Marino>		compiler.  The original posting has a bug which I fixed,
19086d7f5d3SJohn Marino>		causing incompatible files.  I recommend you NOT to use this
19186d7f5d3SJohn Marino>		option unless you already have a lot of packed files from
19286d7f5d3SJohn Marino>		the original posting by thomas.
19386d7f5d3SJohn Marino>2.	The exit status is not well defined (on some machines) causing the
19486d7f5d3SJohn Marino>	scripts to fail.
19586d7f5d3SJohn Marino>		The exit status is now 0,1 or 2 and is documented in
19686d7f5d3SJohn Marino>		compress.l.
19786d7f5d3SJohn Marino>3.	The function getopt() is not available in all C libraries.
19886d7f5d3SJohn Marino>		The function getopt() is no longer referenced by the
19986d7f5d3SJohn Marino>		program.
20086d7f5d3SJohn Marino>4.	Error status is not being checked on the fwrite() and fflush() calls.
20186d7f5d3SJohn Marino>		Fixed.
20286d7f5d3SJohn Marino>
20386d7f5d3SJohn Marino>The following enhancements have been made:
20486d7f5d3SJohn Marino>
20586d7f5d3SJohn Marino>1.	Added facilities of "compact" into the compress program.  "Pack",
20686d7f5d3SJohn Marino>	"Unpack", and "Pcat" are no longer required (no longer supplied).
20786d7f5d3SJohn Marino>2.	Installed work around for C compiler bug with "-O".
20886d7f5d3SJohn Marino>3.	Added a magic number header (\037\235).  Put the bits specified
20986d7f5d3SJohn Marino>	in the file.
21086d7f5d3SJohn Marino>4.	Added "-f" flag to force overwrite of output file.
21186d7f5d3SJohn Marino>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
21286d7f5d3SJohn Marino>	compile.
21386d7f5d3SJohn Marino>6.	The 'uncompress' script has been deleted; simply
21486d7f5d3SJohn Marino>	'ln compress uncompress' after you compile and it will work.
21586d7f5d3SJohn Marino>7.	Removed extra bit masking for machines that support unsigned
21686d7f5d3SJohn Marino>	characters.  If your machine doesn't support unsigned characters,
21786d7f5d3SJohn Marino>	define "NO_UCHAR" when compiling.
21886d7f5d3SJohn Marino>
21986d7f5d3SJohn Marino>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
22086d7f5d3SJohn Marino>standard executable location, such as /usr/local.  Then:
22186d7f5d3SJohn Marino>	cd /usr/local
22286d7f5d3SJohn Marino>	ln compress uncompress
22386d7f5d3SJohn Marino>	ln compress zcat
22486d7f5d3SJohn Marino>
22586d7f5d3SJohn Marino>On machines that have a fixed stack size (such as Perkin-Elmer), set the
22686d7f5d3SJohn Marino>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
22786d7f5d3SJohn Marino>
22886d7f5d3SJohn Marino>Next, install the manual (compress.l).
22986d7f5d3SJohn Marino>	cp compress.l /usr/man/manl		- or -
23086d7f5d3SJohn Marino>	cp compress.l /usr/man/man1/compress.1
23186d7f5d3SJohn Marino>
23286d7f5d3SJohn Marino>Here is the README that I sent with my first posting:
23386d7f5d3SJohn Marino>
23486d7f5d3SJohn Marino>>Enclosed is a modified version of compress.c, along with scripts to make it
23586d7f5d3SJohn Marino>>run identically to pack(1), unpack(1), an pcat(1).  Here is what I
23686d7f5d3SJohn Marino>>(petsd!joe) and a colleague (petsd!peora!srd) did:
23786d7f5d3SJohn Marino>>
23886d7f5d3SJohn Marino>>1. Removed VAX dependencies.
23986d7f5d3SJohn Marino>>2. Changed the struct to separate arrays; saves mucho memory.
24086d7f5d3SJohn Marino>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
24186d7f5d3SJohn Marino>>4. Sorted the character next chain and changed the search to stop
24286d7f5d3SJohn Marino>>prematurely.  This saves a lot on the execution time when compressing.
24386d7f5d3SJohn Marino>>
24486d7f5d3SJohn Marino>>This version is totally compatible with the original version.  Even though
24586d7f5d3SJohn Marino>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
24686d7f5d3SJohn Marino>>machine, due to the size of the arrays.
24786d7f5d3SJohn Marino>>
24886d7f5d3SJohn Marino>>Here is the README file from the original author:
24986d7f5d3SJohn Marino>>
25086d7f5d3SJohn Marino>>>Well, with all this discussion about file compression (for news batching
25186d7f5d3SJohn Marino>>>in particular) going around, I decided to implement the text compression
25286d7f5d3SJohn Marino>>>algorithm described in the June Computer magazine.  The author claimed
25386d7f5d3SJohn Marino>>>blinding speed and good compression ratios.  It's certainly faster than
25486d7f5d3SJohn Marino>>>compact (but, then, what wouldn't be), but it's also the same speed as
25586d7f5d3SJohn Marino>>>pack, and gets better compression than both of them.  On 350K bytes of
25686d7f5d3SJohn Marino>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
25786d7f5d3SJohn Marino>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
25886d7f5d3SJohn Marino>>>pack got about 30% compression, whereas compress got over 50%.  So, I
25986d7f5d3SJohn Marino>>>decided I had something, and that others might be interested, too.
26086d7f5d3SJohn Marino>>>
26186d7f5d3SJohn Marino>>>As is probably true of compact and pack (although I haven't checked),
26286d7f5d3SJohn Marino>>>the byte order within a word is probably relevant here, but as long as
26386d7f5d3SJohn Marino>>>you stay on a single machine type, you should be ok.  (Can anybody
26486d7f5d3SJohn Marino>>>elucidate on this?)  There are a couple of asm's in the code (extv and
26586d7f5d3SJohn Marino>>>insv instructions), so anyone porting it to another machine will have to
26686d7f5d3SJohn Marino>>>deal with this anyway (and could probably make it compatible with Vax
26786d7f5d3SJohn Marino>>>byte order at the same time).  Anyway, I've linted the code (both with
26886d7f5d3SJohn Marino>>>and without -p), so it should run elsewhere.  Note the longs in the
26986d7f5d3SJohn Marino>>>code, you can take these out if you reduce BITS to <= 15.
27086d7f5d3SJohn Marino>>>
27186d7f5d3SJohn Marino>>>Have fun, and as always, if you make good enhancements, or bug fixes,
27286d7f5d3SJohn Marino>>>I'd like to see them.
27386d7f5d3SJohn Marino>>>
27486d7f5d3SJohn Marino>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
27586d7f5d3SJohn Marino>>
27686d7f5d3SJohn Marino>>					regards,
27786d7f5d3SJohn Marino>>					joe
27886d7f5d3SJohn Marino>>
27986d7f5d3SJohn Marino>>--
28086d7f5d3SJohn Marino>>Full-Name:  Joseph M. Orost
28186d7f5d3SJohn Marino>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
28286d7f5d3SJohn Marino>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
28386d7f5d3SJohn Marino>>Phone:      (201) 870-5844
284