xref: /csrg-svn/usr.bin/compress/doc/README (revision 22591)
1*22591Smckusick
2*22591Smckusick	@(#)README	5.1 (Berkeley) 06/06/85
3*22591Smckusick
4*22591SmckusickEnclosed is compress version 3.0 with the following changes:
5*22591Smckusick
6*22591Smckusick1.	"Block" compression is performed.  After the BITS run out, the
7*22591Smckusick	compression ratio is checked every so often.  If it is decreasing,
8*22591Smckusick	the table is cleared and a new set of substrings are generated.
9*22591Smckusick
10*22591Smckusick	This makes the output of compress 3.0 not compatable with that of
11*22591Smckusick	compress 2.0.  However, compress 3.0 still accepts the output of
12*22591Smckusick	compress 2.0.  To generate output that is compatable with compress
13*22591Smckusick	2.0, use the undocumented "-C" flag.
14*22591Smckusick
15*22591Smckusick2.	A quiet "-q" flag has been added for use by the news system.
16*22591Smckusick
17*22591Smckusick3.	The character chaining has been deleted and the program now uses
18*22591Smckusick	hashing.  This boosts speed , especially during compression of
19*22591Smckusick	large files.  Other speed improvements have been made, such as
20*22591Smckusick	using putc() instead of fwrite().
21*22591Smckusick
22*22591Smckusick4.	A large table is used on large machines when a relatively small
23*22591Smckusick	number of bits is specified.  This saves much time when compressing
24*22591Smckusick	for a 16-bit machine on a 32-bit virtual machine.
25*22591Smckusick
26*22591SmckusickMost of these changes were made by James A. Woods (ames!jaw).  Thank you
27*22591SmckusickJames!
28*22591Smckusick
29*22591SmckusickTo compile compress:
30*22591Smckusick
31*22591Smckusick	cc -O -DUSERMEM=usermem -o compress compress.c
32*22591Smckusick
33*22591SmckusickWhere "usermem" is the amount of physical user memory available (in bytes).
34*22591SmckusickIf any physical memory is to be reserved for other processes, put in
35*22591Smckusick"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
36*22591Smckusick
37*22591SmckusickThe difference "usermem-sacredmem" determines the maximum BITS that can be
38*22591Smckusickspecified, and the cutoff bits where the large+fast table is used.
39*22591Smckusick
40*22591Smckusickmemory: at least		BITS		cutoff
41*22591Smckusick------  -- -----                ----            ------
42*22591Smckusick   4,718,592 			 16		  13
43*22591Smckusick   2,621,440 			 16		  12
44*22591Smckusick   1,572,864			 16		  11
45*22591Smckusick     631,808			 16               --
46*22591Smckusick     329,728			 15               --
47*22591Smckusick     178,176			 14		  --
48*22591Smckusick      99,328			 13		  --
49*22591Smckusick           0			 12		  --
50*22591Smckusick
51*22591SmckusickThe default memory size is 750,000 which gives a maximum BITS=16 and no
52*22591Smckusicklarge+fast table.
53*22591Smckusick
54*22591SmckusickThe maximum bits can be overrulled by specifying "-DBITS=bits" at
55*22591Smckusickcompilation time.
56*22591Smckusick
57*22591SmckusickIf your machine doesn't support unsigned characters, define "NO_UCHAR"
58*22591Smckusickwhen compiling.
59*22591Smckusick
60*22591SmckusickAfter compilation, move "compress" to a standard executable location, such
61*22591Smckusickas /usr/local.  Then:
62*22591Smckusick	cd /usr/local
63*22591Smckusick	ln compress uncompress
64*22591Smckusick	ln compress zcat
65*22591Smckusick
66*22591SmckusickOn machines that have a fixed stack size (such as Perkin-Elmer), set the
67*22591Smckusickstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
68*22591Smckusick
69*22591SmckusickNext, install the manual (compress.l).
70*22591Smckusick	cp compress.l /usr/man/manl
71*22591Smckusick	cd /usr/man/manl
72*22591Smckusick	ln compress.l uncompress.l
73*22591Smckusick	ln compress.l zcat.l
74*22591Smckusick
75*22591Smckusick		- or -
76*22591Smckusick
77*22591Smckusick	cp compress.l /usr/man/man1/compress.1
78*22591Smckusick	cd /usr/man/man1
79*22591Smckusick	ln compress.1 uncompress.1
80*22591Smckusick	ln compress.1 zcat.1
81*22591Smckusick
82*22591Smckusick					regards,
83*22591Smckusick					petsd!joe
84*22591Smckusick
85*22591SmckusickHere is a note from the net:
86*22591Smckusick
87*22591Smckusick>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
88*22591SmckusickPath: ames!hplabs!pesnta!amd!turtlevax!ken
89*22591SmckusickFrom: ken@turtlevax.UUCP (Ken Turkowski)
90*22591SmckusickNewsgroups: net.sources
91*22591SmckusickSubject: Re: Compress release 3.0 : sample Makefile
92*22591SmckusickOrganization: CADLINC, Inc. @ Menlo Park, CA
93*22591Smckusick
94*22591SmckusickIn the compress 3.0 source recently posted to mod.sources, there is a
95*22591Smckusick#define variable which can be set for optimum performance on a machine
96*22591Smckusickwith a large amount of memory.  A program (usermem) to calculate the
97*22591Smckusickuseable amount of physical user memory is enclosed, as well as a sample
98*22591Smckusick4.2bsd Vax Makefile for compress.
99*22591Smckusick
100*22591SmckusickHere is the README file from the previous version of compress (2.0):
101*22591Smckusick
102*22591Smckusick>Enclosed is compress.c version 2.0 with the following bugs fixed:
103*22591Smckusick>
104*22591Smckusick>1.	The packed files produced by compress are different on different
105*22591Smckusick>	machines and dependent on the vax sysgen option.
106*22591Smckusick>		The bug was in the different byte/bit ordering on the
107*22591Smckusick>		various machines.  This has been fixed.
108*22591Smckusick>
109*22591Smckusick>		This version is NOT compatible with the original vax posting
110*22591Smckusick>		unless the '-DCOMPATIBLE' option is specified to the C
111*22591Smckusick>		compiler.  The original posting has a bug which I fixed,
112*22591Smckusick>		causing incompatible files.  I recommend you NOT to use this
113*22591Smckusick>		option unless you already have a lot of packed files from
114*22591Smckusick>		the original posting by thomas.
115*22591Smckusick>2.	The exit status is not well defined (on some machines) causing the
116*22591Smckusick>	scripts to fail.
117*22591Smckusick>		The exit status is now 0,1 or 2 and is documented in
118*22591Smckusick>		compress.l.
119*22591Smckusick>3.	The function getopt() is not available in all C libraries.
120*22591Smckusick>		The function getopt() is no longer referenced by the
121*22591Smckusick>		program.
122*22591Smckusick>4.	Error status is not being checked on the fwrite() and fflush() calls.
123*22591Smckusick>		Fixed.
124*22591Smckusick>
125*22591Smckusick>The following enhancements have been made:
126*22591Smckusick>
127*22591Smckusick>1.	Added facilities of "compact" into the compress program.  "Pack",
128*22591Smckusick>	"Unpack", and "Pcat" are no longer required (no longer supplied).
129*22591Smckusick>2.	Installed work around for C compiler bug with "-O".
130*22591Smckusick>3.	Added a magic number header (\037\235).  Put the bits specified
131*22591Smckusick>	in the file.
132*22591Smckusick>4.	Added "-f" flag to force overwrite of output file.
133*22591Smckusick>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
134*22591Smckusick>	compile.
135*22591Smckusick>6.	The 'uncompress' script has been deleted; simply
136*22591Smckusick>	'ln compress uncompress' after you compile and it will work.
137*22591Smckusick>7.	Removed extra bit masking for machines that support unsigned
138*22591Smckusick>	characters.  If your machine doesn't support unsigned characters,
139*22591Smckusick>	define "NO_UCHAR" when compiling.
140*22591Smckusick>
141*22591Smckusick>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
142*22591Smckusick>standard executable location, such as /usr/local.  Then:
143*22591Smckusick>	cd /usr/local
144*22591Smckusick>	ln compress uncompress
145*22591Smckusick>	ln compress zcat
146*22591Smckusick>
147*22591Smckusick>On machines that have a fixed stack size (such as Perkin-Elmer), set the
148*22591Smckusick>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
149*22591Smckusick>
150*22591Smckusick>Next, install the manual (compress.l).
151*22591Smckusick>	cp compress.l /usr/man/manl		- or -
152*22591Smckusick>	cp compress.l /usr/man/man1/compress.1
153*22591Smckusick>
154*22591Smckusick>Here is the README that I sent with my first posting:
155*22591Smckusick>
156*22591Smckusick>>Enclosed is a modified version of compress.c, along with scripts to make it
157*22591Smckusick>>run identically to pack(1), unpack(1), an pcat(1).  Here is what I
158*22591Smckusick>>(petsd!joe) and a colleague (petsd!peora!srd) did:
159*22591Smckusick>>
160*22591Smckusick>>1. Removed VAX dependencies.
161*22591Smckusick>>2. Changed the struct to separate arrays; saves mucho memory.
162*22591Smckusick>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
163*22591Smckusick>>4. Sorted the character next chain and changed the search to stop
164*22591Smckusick>>prematurely.  This saves a lot on the execution time when compressing.
165*22591Smckusick>>
166*22591Smckusick>>This version is totally compatible with the original version.  Even though
167*22591Smckusick>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
168*22591Smckusick>>machine, due to the size of the arrays.
169*22591Smckusick>>
170*22591Smckusick>>Here is the README file from the original author:
171*22591Smckusick>>
172*22591Smckusick>>>Well, with all this discussion about file compression (for news batching
173*22591Smckusick>>>in particular) going around, I decided to implement the text compression
174*22591Smckusick>>>algorithm described in the June Computer magazine.  The author claimed
175*22591Smckusick>>>blinding speed and good compression ratios.  It's certainly faster than
176*22591Smckusick>>>compact (but, then, what wouldn't be), but it's also the same speed as
177*22591Smckusick>>>pack, and gets better compression than both of them.  On 350K bytes of
178*22591Smckusick>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
179*22591Smckusick>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
180*22591Smckusick>>>pack got about 30% compression, whereas compress got over 50%.  So, I
181*22591Smckusick>>>decided I had something, and that others might be interested, too.
182*22591Smckusick>>>
183*22591Smckusick>>>As is probably true of compact and pack (although I haven't checked),
184*22591Smckusick>>>the byte order within a word is probably relevant here, but as long as
185*22591Smckusick>>>you stay on a single machine type, you should be ok.  (Can anybody
186*22591Smckusick>>>elucidate on this?)  There are a couple of asm's in the code (extv and
187*22591Smckusick>>>insv instructions), so anyone porting it to another machine will have to
188*22591Smckusick>>>deal with this anyway (and could probably make it compatible with Vax
189*22591Smckusick>>>byte order at the same time).  Anyway, I've linted the code (both with
190*22591Smckusick>>>and without -p), so it should run elsewhere.  Note the longs in the
191*22591Smckusick>>>code, you can take these out if you reduce BITS to <= 15.
192*22591Smckusick>>>
193*22591Smckusick>>>Have fun, and as always, if you make good enhancements, or bug fixes,
194*22591Smckusick>>>I'd like to see them.
195*22591Smckusick>>>
196*22591Smckusick>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
197*22591Smckusick>>
198*22591Smckusick>>					regards,
199*22591Smckusick>>					joe
200*22591Smckusick>>
201*22591Smckusick>>--
202*22591Smckusick>>Full-Name:  Joseph M. Orost
203*22591Smckusick>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
204*22591Smckusick>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
205*22591Smckusick>>Phone:      (201) 870-5844
206