xref: /netbsd-src/usr.bin/compress/doc/README (revision 5b28f239895d55856221c590945769250e289f5f)
161f28255Scgd
23b93c56bScgd	@(#)README	8.1 (Berkeley) 6/9/93
361f28255Scgd
461f28255ScgdCompress version 4.0 improvements over 3.0:
561f28255Scgd	o compress() speedup (10-50%) by changing division hash to xor
661f28255Scgd	o decompress() speedup (5-10%)
761f28255Scgd	o Memory requirements reduced (3-30%)
861f28255Scgd	o Stack requirements reduced to less than 4kb
961f28255Scgd	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
1061f28255Scgd    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
1161f28255Scgd	o Default to 'quiet' mode
1261f28255Scgd	o Unification of 'force' flags
1361f28255Scgd	o Manual page overhaul
1461f28255Scgd	o Portability enhancement for M_XENIX
1561f28255Scgd	o Removed text on #else and #endif
1661f28255Scgd	o Added "-V" switch to print version and options
1761f28255Scgd	o Added #defines for SIGNED_COMPARE_SLOW
1861f28255Scgd	o Added Makefile and "usermem" program
1961f28255Scgd	o Removed all floating point computations
2061f28255Scgd	o New programs: [deleted]
2161f28255Scgd
2261f28255ScgdThe "usermem" script attempts to determine the maximum process size.  Some
2361f28255Scgdediting of the script may be necessary (see the comments).  [It should work
2461f28255Scgdfine on 4.3 bsd.] If you can't get it to work at all, just create file
2561f28255Scgd"USERMEM" containing the maximum process size in decimal.
2661f28255Scgd
2761f28255ScgdThe following preprocessor symbols control the compilation of "compress.c":
2861f28255Scgd
2961f28255Scgd	o USERMEM		Maximum process memory on the system
3061f28255Scgd	o SACREDMEM		Amount to reserve for other proceses
3161f28255Scgd	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
3261f28255Scgd	o NO_UCHAR		Don't use "unsigned char" types
3361f28255Scgd	o BITS			Overrules default set by USERMEM-SACREDMEM
3461f28255Scgd	o vax			Generate inline assembler
3561f28255Scgd	o interdata		Defines SIGNED_COMPARE_SLOW
3661f28255Scgd	o M_XENIX		Makes arrays < 65536 bytes each
3761f28255Scgd	o pdp11			BITS=12, NO_UCHAR
3861f28255Scgd	o z8000			BITS=12
3961f28255Scgd	o pcxt			BITS=12
4061f28255Scgd	o BSD4_2		Allow long filenames ( > 14 characters) &
4161f28255Scgd				Call setlinebuf(stderr)
4261f28255Scgd
4361f28255ScgdThe difference "usermem-sacredmem" determines the maximum BITS that can be
4461f28255Scgdspecified with the "-b" flag.
4561f28255Scgd
4661f28255Scgdmemory: at least		BITS
4761f28255Scgd------  -- -----                ----
4861f28255Scgd     433,484			 16
4961f28255Scgd     229,600			 15
5061f28255Scgd     127,536			 14
5161f28255Scgd      73,464			 13
5261f28255Scgd           0			 12
5361f28255Scgd
5461f28255ScgdThe default is BITS=16.
5561f28255Scgd
5661f28255ScgdThe maximum bits can be overrulled by specifying "-DBITS=bits" at
5761f28255Scgdcompilation time.
5861f28255Scgd
5961f28255ScgdWARNING: files compressed on a large machine with more bits than allowed by
6061f28255Scgda version of compress on a smaller machine cannot be decompressed!  Use the
6161f28255Scgd"-b12" flag to generate a file on a large machine that can be uncompressed
6261f28255Scgdon a 16-bit machine.
6361f28255Scgd
6461f28255ScgdThe output of compress 4.0 is fully compatible with that of compress 3.0.
6561f28255ScgdIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or
6661f28255Scgdthe output of compress 3.0 may be fed into uncompress 4.0.
6761f28255Scgd
6861f28255ScgdThe output of compress 4.0 not compatible with that of
6961f28255Scgdcompress 2.0.  However, compress 4.0 still accepts the output of
7061f28255Scgdcompress 2.0.  To generate output that is compatible with compress
7161f28255Scgd2.0, use the undocumented "-C" flag.
7261f28255Scgd
7361f28255Scgd	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
7461f28255Scgd--------------------------------
7561f28255Scgd
7661f28255ScgdEnclosed is compress version 3.0 with the following changes:
7761f28255Scgd
7861f28255Scgd1.	"Block" compression is performed.  After the BITS run out, the
7961f28255Scgd	compression ratio is checked every so often.  If it is decreasing,
8061f28255Scgd	the table is cleared and a new set of substrings are generated.
8161f28255Scgd
8261f28255Scgd	This makes the output of compress 3.0 not compatible with that of
8361f28255Scgd	compress 2.0.  However, compress 3.0 still accepts the output of
8461f28255Scgd	compress 2.0.  To generate output that is compatible with compress
8561f28255Scgd	2.0, use the undocumented "-C" flag.
8661f28255Scgd
8761f28255Scgd2.	A quiet "-q" flag has been added for use by the news system.
8861f28255Scgd
8961f28255Scgd3.	The character chaining has been deleted and the program now uses
9061f28255Scgd	hashing.  This improves the speed of the program, especially
9161f28255Scgd	during decompression.  Other speed improvements have been made,
9261f28255Scgd	such as using putc() instead of fwrite().
9361f28255Scgd
9461f28255Scgd4.	A large table is used on large machines when a relatively small
9561f28255Scgd	number of bits is specified.  This saves much time when compressing
9661f28255Scgd	for a 16-bit machine on a 32-bit virtual machine.  Note that the
9761f28255Scgd	speed improvement only occurs when the input file is > 30000
9861f28255Scgd	characters, and the -b BITS is less than or equal to the cutoff
9961f28255Scgd	described below.
10061f28255Scgd
10161f28255ScgdMost of these changes were made by James A. Woods (ames!jaw).  Thank you
10261f28255ScgdJames!
10361f28255Scgd
10461f28255ScgdTo compile compress:
10561f28255Scgd
10661f28255Scgd	cc -O -DUSERMEM=usermem -o compress compress.c
10761f28255Scgd
10861f28255ScgdWhere "usermem" is the amount of physical user memory available (in bytes).
10961f28255ScgdIf any physical memory is to be reserved for other processes, put in
11061f28255Scgd"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
11161f28255Scgd
11261f28255ScgdThe difference "usermem-sacredmem" determines the maximum BITS that can be
11361f28255Scgdspecified, and the cutoff bits where the large+fast table is used.
11461f28255Scgd
11561f28255Scgdmemory: at least		BITS		cutoff
11661f28255Scgd------  -- -----                ----            ------
11761f28255Scgd   4,718,592 			 16		  13
11861f28255Scgd   2,621,440 			 16		  12
11961f28255Scgd   1,572,864			 16		  11
12061f28255Scgd   1,048,576			 16		  10
12161f28255Scgd     631,808			 16               --
12261f28255Scgd     329,728			 15               --
12361f28255Scgd     178,176			 14		  --
12461f28255Scgd      99,328			 13		  --
12561f28255Scgd           0			 12		  --
12661f28255Scgd
12761f28255ScgdThe default memory size is 750,000 which gives a maximum BITS=16 and no
12861f28255Scgdlarge+fast table.
12961f28255Scgd
13061f28255ScgdThe maximum bits can be overruled by specifying "-DBITS=bits" at
13161f28255Scgdcompilation time.
13261f28255Scgd
13361f28255ScgdIf your machine doesn't support unsigned characters, define "NO_UCHAR"
13461f28255Scgdwhen compiling.
13561f28255Scgd
13661f28255ScgdIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
13761f28255Scgd
13861f28255ScgdAfter compilation, move "compress" to a standard executable location, such
13961f28255Scgdas /usr/local.  Then:
14061f28255Scgd	cd /usr/local
14161f28255Scgd	ln compress uncompress
14261f28255Scgd	ln compress zcat
14361f28255Scgd
14461f28255ScgdOn machines that have a fixed stack size (such as Perkin-Elmer), set the
14561f28255Scgdstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
14661f28255Scgd
14761f28255ScgdNext, install the manual (compress.l).
14861f28255Scgd	cp compress.l /usr/man/manl
14961f28255Scgd	cd /usr/man/manl
15061f28255Scgd	ln compress.l uncompress.l
15161f28255Scgd	ln compress.l zcat.l
15261f28255Scgd
15361f28255Scgd		- or -
15461f28255Scgd
15561f28255Scgd	cp compress.l /usr/man/man1/compress.1
15661f28255Scgd	cd /usr/man/man1
15761f28255Scgd	ln compress.1 uncompress.1
15861f28255Scgd	ln compress.1 zcat.1
15961f28255Scgd
16061f28255Scgd					regards,
16161f28255Scgd					petsd!joe
16261f28255Scgd
16361f28255ScgdHere is a note from the net:
16461f28255Scgd
16561f28255Scgd>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
16661f28255ScgdPath: ames!hplabs!pesnta!amd!turtlevax!ken
16761f28255ScgdFrom: ken@turtlevax.UUCP (Ken Turkowski)
16861f28255ScgdNewsgroups: net.sources
16961f28255ScgdSubject: Re: Compress release 3.0 : sample Makefile
17061f28255ScgdOrganization: CADLINC, Inc. @ Menlo Park, CA
17161f28255Scgd
17261f28255ScgdIn the compress 3.0 source recently posted to mod.sources, there is a
17361f28255Scgd#define variable which can be set for optimum performance on a machine
17461f28255Scgdwith a large amount of memory.  A program (usermem) to calculate the
17561f28255Scgduseable amount of physical user memory is enclosed, as well as a sample
17661f28255Scgd4.2bsd Vax Makefile for compress.
17761f28255Scgd
17861f28255ScgdHere is the README file from the previous version of compress (2.0):
17961f28255Scgd
18061f28255Scgd>Enclosed is compress.c version 2.0 with the following bugs fixed:
18161f28255Scgd>
18261f28255Scgd>1.	The packed files produced by compress are different on different
18361f28255Scgd>	machines and dependent on the vax sysgen option.
18461f28255Scgd>		The bug was in the different byte/bit ordering on the
18561f28255Scgd>		various machines.  This has been fixed.
18661f28255Scgd>
18761f28255Scgd>		This version is NOT compatible with the original vax posting
18861f28255Scgd>		unless the '-DCOMPATIBLE' option is specified to the C
18961f28255Scgd>		compiler.  The original posting has a bug which I fixed,
19061f28255Scgd>		causing incompatible files.  I recommend you NOT to use this
19161f28255Scgd>		option unless you already have a lot of packed files from
19261f28255Scgd>		the original posting by thomas.
19361f28255Scgd>2.	The exit status is not well defined (on some machines) causing the
19461f28255Scgd>	scripts to fail.
19561f28255Scgd>		The exit status is now 0,1 or 2 and is documented in
19661f28255Scgd>		compress.l.
19761f28255Scgd>3.	The function getopt() is not available in all C libraries.
19861f28255Scgd>		The function getopt() is no longer referenced by the
19961f28255Scgd>		program.
20061f28255Scgd>4.	Error status is not being checked on the fwrite() and fflush() calls.
20161f28255Scgd>		Fixed.
20261f28255Scgd>
20361f28255Scgd>The following enhancements have been made:
20461f28255Scgd>
20561f28255Scgd>1.	Added facilities of "compact" into the compress program.  "Pack",
20661f28255Scgd>	"Unpack", and "Pcat" are no longer required (no longer supplied).
20761f28255Scgd>2.	Installed work around for C compiler bug with "-O".
20861f28255Scgd>3.	Added a magic number header (\037\235).  Put the bits specified
20961f28255Scgd>	in the file.
21061f28255Scgd>4.	Added "-f" flag to force overwrite of output file.
21161f28255Scgd>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
21261f28255Scgd>	compile.
21361f28255Scgd>6.	The 'uncompress' script has been deleted; simply
21461f28255Scgd>	'ln compress uncompress' after you compile and it will work.
21561f28255Scgd>7.	Removed extra bit masking for machines that support unsigned
21661f28255Scgd>	characters.  If your machine doesn't support unsigned characters,
21761f28255Scgd>	define "NO_UCHAR" when compiling.
21861f28255Scgd>
21961f28255Scgd>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
22061f28255Scgd>standard executable location, such as /usr/local.  Then:
22161f28255Scgd>	cd /usr/local
22261f28255Scgd>	ln compress uncompress
22361f28255Scgd>	ln compress zcat
22461f28255Scgd>
22561f28255Scgd>On machines that have a fixed stack size (such as Perkin-Elmer), set the
22661f28255Scgd>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
22761f28255Scgd>
22861f28255Scgd>Next, install the manual (compress.l).
22961f28255Scgd>	cp compress.l /usr/man/manl		- or -
23061f28255Scgd>	cp compress.l /usr/man/man1/compress.1
23161f28255Scgd>
23261f28255Scgd>Here is the README that I sent with my first posting:
23361f28255Scgd>
23461f28255Scgd>>Enclosed is a modified version of compress.c, along with scripts to make it
235*5b28f239Srillig>>run identically to pack(1), unpack(1), and pcat(1).  Here is what I
23661f28255Scgd>>(petsd!joe) and a colleague (petsd!peora!srd) did:
23761f28255Scgd>>
23861f28255Scgd>>1. Removed VAX dependencies.
23961f28255Scgd>>2. Changed the struct to separate arrays; saves mucho memory.
24061f28255Scgd>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
24161f28255Scgd>>4. Sorted the character next chain and changed the search to stop
24261f28255Scgd>>prematurely.  This saves a lot on the execution time when compressing.
24361f28255Scgd>>
24461f28255Scgd>>This version is totally compatible with the original version.  Even though
24561f28255Scgd>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
24661f28255Scgd>>machine, due to the size of the arrays.
24761f28255Scgd>>
24861f28255Scgd>>Here is the README file from the original author:
24961f28255Scgd>>
25061f28255Scgd>>>Well, with all this discussion about file compression (for news batching
25161f28255Scgd>>>in particular) going around, I decided to implement the text compression
25261f28255Scgd>>>algorithm described in the June Computer magazine.  The author claimed
25361f28255Scgd>>>blinding speed and good compression ratios.  It's certainly faster than
25461f28255Scgd>>>compact (but, then, what wouldn't be), but it's also the same speed as
25561f28255Scgd>>>pack, and gets better compression than both of them.  On 350K bytes of
25661f28255Scgd>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
25761f28255Scgd>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
25861f28255Scgd>>>pack got about 30% compression, whereas compress got over 50%.  So, I
25961f28255Scgd>>>decided I had something, and that others might be interested, too.
26061f28255Scgd>>>
26161f28255Scgd>>>As is probably true of compact and pack (although I haven't checked),
26261f28255Scgd>>>the byte order within a word is probably relevant here, but as long as
26361f28255Scgd>>>you stay on a single machine type, you should be ok.  (Can anybody
26461f28255Scgd>>>elucidate on this?)  There are a couple of asm's in the code (extv and
26561f28255Scgd>>>insv instructions), so anyone porting it to another machine will have to
26661f28255Scgd>>>deal with this anyway (and could probably make it compatible with Vax
26761f28255Scgd>>>byte order at the same time).  Anyway, I've linted the code (both with
26861f28255Scgd>>>and without -p), so it should run elsewhere.  Note the longs in the
26961f28255Scgd>>>code, you can take these out if you reduce BITS to <= 15.
27061f28255Scgd>>>
27161f28255Scgd>>>Have fun, and as always, if you make good enhancements, or bug fixes,
27261f28255Scgd>>>I'd like to see them.
27361f28255Scgd>>>
27461f28255Scgd>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
27561f28255Scgd>>
27661f28255Scgd>>					regards,
27761f28255Scgd>>					joe
27861f28255Scgd>>
27961f28255Scgd>>--
28061f28255Scgd>>Full-Name:  Joseph M. Orost
28161f28255Scgd>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
28261f28255Scgd>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
28361f28255Scgd>>Phone:      (201) 870-5844
284