xref: /plan9/sys/doc/compiler.ms (revision 219b2ee8daee37f4aad58d63f21287faa8e4ffdc)
1*219b2ee8SDavid du Colombier.TL
2*219b2ee8SDavid du ColombierPlan 9 C Compilers
3*219b2ee8SDavid du Colombier.AU
4*219b2ee8SDavid du ColombierKen Thompson
5*219b2ee8SDavid du Colombierken@plan9.att.com
6*219b2ee8SDavid du Colombier.AB
7*219b2ee8SDavid du Colombier.FS
8*219b2ee8SDavid du ColombierOriginally appeared, in a different form, in
9*219b2ee8SDavid du Colombier.I
10*219b2ee8SDavid du ColombierProceedings of the Summer 1990 UKUUG Conference,
11*219b2ee8SDavid du Colombier.R
12*219b2ee8SDavid du Colombierpp. 41-51,
13*219b2ee8SDavid du ColombierLondon, 1990.
14*219b2ee8SDavid du Colombier.FE
15*219b2ee8SDavid du ColombierThis paper describes the overall structure and function of the Plan 9 C compilers.
16*219b2ee8SDavid du ColombierA more detailed implementation document
17*219b2ee8SDavid du Colombierfor any one of the compilers
18*219b2ee8SDavid du Colombieris yet to be written.
19*219b2ee8SDavid du Colombier.AE
20*219b2ee8SDavid du Colombier.NH
21*219b2ee8SDavid du ColombierIntroduction
22*219b2ee8SDavid du Colombier.LP
23*219b2ee8SDavid du ColombierThere are currently six compilers in the series.
24*219b2ee8SDavid du ColombierFour of the compilers (MIPS 3000, SPARC, Intel 386, and Motorola 68020)
25*219b2ee8SDavid du Colombierare considered active and are used to compile
26*219b2ee8SDavid du Colombiercurrent versions of Plan 9.
27*219b2ee8SDavid du ColombierThe other two (AT&T 3210, Intel 960) have been used
28*219b2ee8SDavid du Colombieronly to program peripherals.
29*219b2ee8SDavid du Colombier.NH
30*219b2ee8SDavid du ColombierStructure
31*219b2ee8SDavid du Colombier.LP
32*219b2ee8SDavid du ColombierThe compiler is a single program that produces an
33*219b2ee8SDavid du Colombierobject file.
34*219b2ee8SDavid du ColombierCombined in the compiler are the traditional
35*219b2ee8SDavid du Colombierroles of preprocessor, lexical analyzer, parser, code generator,
36*219b2ee8SDavid du Colombierlocal optimizer,
37*219b2ee8SDavid du Colombierand first half of the assembler.
38*219b2ee8SDavid du ColombierThe object files are binary forms of assembly
39*219b2ee8SDavid du Colombierlanguage,
40*219b2ee8SDavid du Colombiersimilar to what might be passed between
41*219b2ee8SDavid du Colombierthe first and second passes of an assembler.
42*219b2ee8SDavid du Colombier.LP
43*219b2ee8SDavid du ColombierObject files and libraries
44*219b2ee8SDavid du Colombierare combined by a loader
45*219b2ee8SDavid du Colombierprogram to produce the executable binary.
46*219b2ee8SDavid du ColombierThe loader combines the roles of second half
47*219b2ee8SDavid du Colombierof the assembler, global optimizer, and loader.
48*219b2ee8SDavid du ColombierThe names of the compliers, loaders, and assemblers
49*219b2ee8SDavid du Colombierare as follows:
50*219b2ee8SDavid du Colombier.DS
51*219b2ee8SDavid du Colombier.ta 1.5i
52*219b2ee8SDavid du Colombier.de Ta
53*219b2ee8SDavid du Colombier\\$1	\f(CW\\$2\fP  \f(CW\\$3\fP  \f(CW\\$4\fP
54*219b2ee8SDavid du Colombier..
55*219b2ee8SDavid du Colombier.Ta MIPS vc vl va
56*219b2ee8SDavid du Colombier.Ta SPARC kc kl ka
57*219b2ee8SDavid du Colombier.Ta Intel\ 386 8c 8l 8a
58*219b2ee8SDavid du Colombier.Ta Motorola\ 68020 2c 2l 2a
59*219b2ee8SDavid du Colombier.Ta AT&T\ 3210 xc xl xa
60*219b2ee8SDavid du Colombier.Ta Intel\ 960 9c 9l 9a
61*219b2ee8SDavid du Colombier.DE
62*219b2ee8SDavid du ColombierThere is a further breakdown
63*219b2ee8SDavid du Colombierin the source of the compilers into
64*219b2ee8SDavid du Colombierobject-independent and
65*219b2ee8SDavid du Colombierobject-dependent
66*219b2ee8SDavid du Colombierparts.
67*219b2ee8SDavid du ColombierAll of the object-independent parts
68*219b2ee8SDavid du Colombierare combined into source files in the
69*219b2ee8SDavid du Colombierdirectory
70*219b2ee8SDavid du Colombier.CW /sys/src/cmd/cc .
71*219b2ee8SDavid du ColombierThe object-dependent parts are collected
72*219b2ee8SDavid du Colombierin a separate directory for each compiler,
73*219b2ee8SDavid du Colombierfor example
74*219b2ee8SDavid du Colombier.CW /sys/src/cmd/vc .
75*219b2ee8SDavid du ColombierAll of the code,
76*219b2ee8SDavid du Colombierboth object-independent and
77*219b2ee8SDavid du Colombierobject-dependent,
78*219b2ee8SDavid du Colombieris machine-independent
79*219b2ee8SDavid du Colombierand may be cross-compiled and executed on any
80*219b2ee8SDavid du Colombierof the architectures.
81*219b2ee8SDavid du Colombier.NH
82*219b2ee8SDavid du ColombierThe Language
83*219b2ee8SDavid du Colombier.LP
84*219b2ee8SDavid du ColombierThe compiler implements ANSI C with some
85*219b2ee8SDavid du Colombierrestrictions and extensions
86*219b2ee8SDavid du Colombier[ANSI90].
87*219b2ee8SDavid du ColombierMost of the restrictions are due to
88*219b2ee8SDavid du Colombierpersonal preference, while
89*219b2ee8SDavid du Colombiermost of the extensions were to help in
90*219b2ee8SDavid du Colombierthe implementation of Plan 9.
91*219b2ee8SDavid du ColombierThere are other departures from the standard,
92*219b2ee8SDavid du Colombierparticularly in the libraries,
93*219b2ee8SDavid du Colombierthat are beyond the scope of this
94*219b2ee8SDavid du Colombierpaper.
95*219b2ee8SDavid du Colombier.NH 2
96*219b2ee8SDavid du ColombierRegister, volatile, const
97*219b2ee8SDavid du Colombier.LP
98*219b2ee8SDavid du ColombierThe keywords
99*219b2ee8SDavid du Colombier.CW register ,
100*219b2ee8SDavid du Colombier.CW volatile ,
101*219b2ee8SDavid du Colombierand
102*219b2ee8SDavid du Colombier.CW const
103*219b2ee8SDavid du Colombierare recognized syntactically
104*219b2ee8SDavid du Colombierbut are semantically ignored.
105*219b2ee8SDavid du ColombierThus
106*219b2ee8SDavid du Colombierassigning to a
107*219b2ee8SDavid du Colombier.CW const
108*219b2ee8SDavid du Colombiervariable is not diagnosed, nor is
109*219b2ee8SDavid du Colombiertaking the address of a
110*219b2ee8SDavid du Colombier.CW register .
111*219b2ee8SDavid du Colombier.NH 2
112*219b2ee8SDavid du ColombierThe preprocessor
113*219b2ee8SDavid du Colombier.LP
114*219b2ee8SDavid du ColombierThe C preprocessor is probably the
115*219b2ee8SDavid du Colombierbiggest departure from the ANSI standard.
116*219b2ee8SDavid du Colombier.LP
117*219b2ee8SDavid du ColombierThe preprocessor built into the Plan 9 compilers does not support
118*219b2ee8SDavid du Colombier.CW #if ,
119*219b2ee8SDavid du Colombieralthough it does handle
120*219b2ee8SDavid du Colombier.CW #ifdef
121*219b2ee8SDavid du Colombierand
122*219b2ee8SDavid du Colombier.CW #include .
123*219b2ee8SDavid du ColombierIf it is necessary to be more standard,
124*219b2ee8SDavid du Colombierthe source text can first be run through the separate ANSI C
125*219b2ee8SDavid du Colombierpreprocessor,
126*219b2ee8SDavid du Colombier.CW cpp .
127*219b2ee8SDavid du Colombier.NH 2
128*219b2ee8SDavid du ColombierUnnamed substructures
129*219b2ee8SDavid du Colombier.LP
130*219b2ee8SDavid du ColombierThe most important and most heavily used of the
131*219b2ee8SDavid du Colombierextensions is the declaration of an
132*219b2ee8SDavid du Colombierunnamed substructure or subunion.
133*219b2ee8SDavid du ColombierFor example:
134*219b2ee8SDavid du Colombier.DS
135*219b2ee8SDavid du Colombier.CW
136*219b2ee8SDavid du Colombier.ta .1i .6i 1.1i 1.6i
137*219b2ee8SDavid du Colombier	typedef
138*219b2ee8SDavid du Colombier	struct	lock
139*219b2ee8SDavid du Colombier	{
140*219b2ee8SDavid du Colombier		int    locked;
141*219b2ee8SDavid du Colombier	} Lock;
142*219b2ee8SDavid du Colombier
143*219b2ee8SDavid du Colombier	typedef
144*219b2ee8SDavid du Colombier	struct	node
145*219b2ee8SDavid du Colombier	{
146*219b2ee8SDavid du Colombier		int	type;
147*219b2ee8SDavid du Colombier		union
148*219b2ee8SDavid du Colombier		{
149*219b2ee8SDavid du Colombier			double dval;
150*219b2ee8SDavid du Colombier			float  fval;
151*219b2ee8SDavid du Colombier			long   lval;
152*219b2ee8SDavid du Colombier		};
153*219b2ee8SDavid du Colombier		Lock;
154*219b2ee8SDavid du Colombier	} Node;
155*219b2ee8SDavid du Colombier
156*219b2ee8SDavid du Colombier	Lock*	lock;
157*219b2ee8SDavid du Colombier	Node*	node;
158*219b2ee8SDavid du Colombier.DE
159*219b2ee8SDavid du ColombierThe declaration of
160*219b2ee8SDavid du Colombier.CW Node
161*219b2ee8SDavid du Colombierhas an unnamed substructure of type
162*219b2ee8SDavid du Colombier.CW Lock
163*219b2ee8SDavid du Colombierand an unnamed subunion.
164*219b2ee8SDavid du ColombierOne use of this feature allows references to elements of the
165*219b2ee8SDavid du Colombiersubunit to be accessed as if they were in
166*219b2ee8SDavid du Colombierthe outer structure.
167*219b2ee8SDavid du ColombierThus
168*219b2ee8SDavid du Colombier.CW node->dval
169*219b2ee8SDavid du Colombierand
170*219b2ee8SDavid du Colombier.CW node->locked
171*219b2ee8SDavid du Colombierare legitimate references.
172*219b2ee8SDavid du Colombier.LP
173*219b2ee8SDavid du ColombierWhen an outer structure is used
174*219b2ee8SDavid du Colombierin a context that is only legal for
175*219b2ee8SDavid du Colombieran unnamed substructure,
176*219b2ee8SDavid du Colombierthe compiler promotes the reference to the
177*219b2ee8SDavid du Colombierunnamed substructure.
178*219b2ee8SDavid du ColombierThis is true for references to structures and
179*219b2ee8SDavid du Colombierto references to pointers to structures.
180*219b2ee8SDavid du ColombierThis happens in assignment statements and
181*219b2ee8SDavid du Colombierin argument passing where prototypes have been
182*219b2ee8SDavid du Colombierdeclared.
183*219b2ee8SDavid du ColombierThus, continuing with the example,
184*219b2ee8SDavid du Colombier.DS
185*219b2ee8SDavid du Colombier.CW
186*219b2ee8SDavid du Colombier.ta .1i .6i 1.1i 1.6i
187*219b2ee8SDavid du Colombier	lock = node;
188*219b2ee8SDavid du Colombier.DE
189*219b2ee8SDavid du Colombierwould assign a pointer to the unnamed
190*219b2ee8SDavid du Colombier.CW Lock
191*219b2ee8SDavid du Colombierin
192*219b2ee8SDavid du Colombierthe
193*219b2ee8SDavid du Colombier.CW Node
194*219b2ee8SDavid du Colombierto the variable
195*219b2ee8SDavid du Colombier.CW lock .
196*219b2ee8SDavid du ColombierAnother example,
197*219b2ee8SDavid du Colombier.DS
198*219b2ee8SDavid du Colombier.CW
199*219b2ee8SDavid du Colombier.ta .1i .6i 1.1i 1.6i
200*219b2ee8SDavid du Colombier	extern void lock(Lock*);
201*219b2ee8SDavid du Colombier	func(...)
202*219b2ee8SDavid du Colombier	{
203*219b2ee8SDavid du Colombier		...
204*219b2ee8SDavid du Colombier		lock(node);
205*219b2ee8SDavid du Colombier		...
206*219b2ee8SDavid du Colombier	}
207*219b2ee8SDavid du Colombier.DE
208*219b2ee8SDavid du Colombierwill pass a pointer to the
209*219b2ee8SDavid du Colombier.CW Lock
210*219b2ee8SDavid du Colombiersubstructure.
211*219b2ee8SDavid du Colombier.NH 2
212*219b2ee8SDavid du ColombierStructure displays
213*219b2ee8SDavid du Colombier.LP
214*219b2ee8SDavid du ColombierA structure cast followed by a list of expressions in braces is
215*219b2ee8SDavid du Colombieran expression with the type of the structure and elements assigned from
216*219b2ee8SDavid du Colombierthe corresponding list.
217*219b2ee8SDavid du ColombierStructures are now almost first-class citizens of the language.
218*219b2ee8SDavid du ColombierIt is common to see code like this:
219*219b2ee8SDavid du Colombier.DS
220*219b2ee8SDavid du Colombier.CW
221*219b2ee8SDavid du Colombier.ta .1i
222*219b2ee8SDavid du Colombier	r = (Rectangle){point1, (Point){x,y+2}};
223*219b2ee8SDavid du Colombier.DE
224*219b2ee8SDavid du Colombier.NH 2
225*219b2ee8SDavid du ColombierInitialization indexes
226*219b2ee8SDavid du Colombier.LP
227*219b2ee8SDavid du ColombierIn initializers of arrays,
228*219b2ee8SDavid du Colombierone may place a constant expression
229*219b2ee8SDavid du Colombierin square brackets before an initializer.
230*219b2ee8SDavid du ColombierThis causes the next initializer to assign
231*219b2ee8SDavid du Colombierthe indicated element.
232*219b2ee8SDavid du ColombierFor example:
233*219b2ee8SDavid du Colombier.DS
234*219b2ee8SDavid du Colombier.CW
235*219b2ee8SDavid du Colombier.ta .1i .6i 1.6i
236*219b2ee8SDavid du Colombier	enum	errors
237*219b2ee8SDavid du Colombier	{
238*219b2ee8SDavid du Colombier		Etoobig,
239*219b2ee8SDavid du Colombier		Ealarm,
240*219b2ee8SDavid du Colombier		Egreg
241*219b2ee8SDavid du Colombier	};
242*219b2ee8SDavid du Colombier	char* errstrings[] =
243*219b2ee8SDavid du Colombier	{
244*219b2ee8SDavid du Colombier		[Ealarm]	"Alarm call",
245*219b2ee8SDavid du Colombier		[Egreg]	"Panic: out of mbufs",
246*219b2ee8SDavid du Colombier		[Etoobig]	"Arg list too long",
247*219b2ee8SDavid du Colombier	};
248*219b2ee8SDavid du Colombier.DE
249*219b2ee8SDavid du ColombierIn the same way,
250*219b2ee8SDavid du Colombierindividual structures members may
251*219b2ee8SDavid du Colombierbe initialized in any order by preceding the initialization with
252*219b2ee8SDavid du Colombier.CW .tagname .
253*219b2ee8SDavid du ColombierBoth forms allow an optional
254*219b2ee8SDavid du Colombier.CW = ,
255*219b2ee8SDavid du Colombierto be compatible with a proposed
256*219b2ee8SDavid du Colombierextension to ANSI C.
257*219b2ee8SDavid du Colombier.NH 2
258*219b2ee8SDavid du ColombierExternal register
259*219b2ee8SDavid du Colombier.LP
260*219b2ee8SDavid du ColombierThe declaration
261*219b2ee8SDavid du Colombier.CW extern
262*219b2ee8SDavid du Colombier.CW register
263*219b2ee8SDavid du Colombierwill dedicate a register to
264*219b2ee8SDavid du Colombiera variable on a global basis.
265*219b2ee8SDavid du ColombierIt can be used only under special circumstances.
266*219b2ee8SDavid du ColombierExternal register variables must be identically
267*219b2ee8SDavid du Colombierdeclared in all modules and
268*219b2ee8SDavid du Colombierlibraries.
269*219b2ee8SDavid du ColombierThe feature is not intended for efficiency,
270*219b2ee8SDavid du Colombieralthough it can produce efficient code;
271*219b2ee8SDavid du Colombierrather it represents a unique storage class that
272*219b2ee8SDavid du Colombierwould be hard to get any other way.
273*219b2ee8SDavid du ColombierOn a shared-memory multi-processor,
274*219b2ee8SDavid du Colombieran external register is
275*219b2ee8SDavid du Colombierone-per-processor and neither one-per-procedure (automatic)
276*219b2ee8SDavid du Colombieror one-per-system (external).
277*219b2ee8SDavid du ColombierIt is used for two variables in the Plan 9 kernel,
278*219b2ee8SDavid du Colombier.CW u
279*219b2ee8SDavid du Colombierand
280*219b2ee8SDavid du Colombier.CW m .
281*219b2ee8SDavid du Colombier.CW U
282*219b2ee8SDavid du Colombieris a pointer to the structure representing the currently running process
283*219b2ee8SDavid du Colombierand
284*219b2ee8SDavid du Colombier.CW m
285*219b2ee8SDavid du Colombieris a pointer to the per-machine data structure.
286*219b2ee8SDavid du Colombier.NH 2
287*219b2ee8SDavid du ColombierLong long
288*219b2ee8SDavid du Colombier.LP
289*219b2ee8SDavid du ColombierThe compilers accept
290*219b2ee8SDavid du Colombier.CW long
291*219b2ee8SDavid du Colombier.CW long
292*219b2ee8SDavid du Colombieras a basic type meaning 64-bit integer.
293*219b2ee8SDavid du ColombierOn all of the machines
294*219b2ee8SDavid du Colombierthis type is synthesized from 32-bit instructions.
295*219b2ee8SDavid du Colombier.NH 2
296*219b2ee8SDavid du ColombierPragma
297*219b2ee8SDavid du Colombier.LP
298*219b2ee8SDavid du ColombierThe compilers accept
299*219b2ee8SDavid du Colombier.CW #pragma
300*219b2ee8SDavid du Colombier.CW lib
301*219b2ee8SDavid du Colombier.I libname
302*219b2ee8SDavid du Colombierand pass the
303*219b2ee8SDavid du Colombierlibrary name string uninterpreted
304*219b2ee8SDavid du Colombierto the loader.
305*219b2ee8SDavid du ColombierThe loader uses the library name to
306*219b2ee8SDavid du Colombierfind libraries to load.
307*219b2ee8SDavid du ColombierIf the name contains
308*219b2ee8SDavid du Colombier.CW %O ,
309*219b2ee8SDavid du Colombierit is replaced with
310*219b2ee8SDavid du Colombierthe single character object type of the compiler
311*219b2ee8SDavid du Colombier(e.g.,
312*219b2ee8SDavid du Colombier.CW v
313*219b2ee8SDavid du Colombierfor the MIPS).
314*219b2ee8SDavid du ColombierIf the name contains
315*219b2ee8SDavid du Colombier.CW %M ,
316*219b2ee8SDavid du Colombierit is replaced with
317*219b2ee8SDavid du Colombierthe architecture type for the compiler
318*219b2ee8SDavid du Colombier(e.g.,
319*219b2ee8SDavid du Colombier.CW mips
320*219b2ee8SDavid du Colombierfor the MIPS).
321*219b2ee8SDavid du ColombierIf the name starts with
322*219b2ee8SDavid du Colombier.CW /
323*219b2ee8SDavid du Colombierit is an absolute pathname;
324*219b2ee8SDavid du Colombierif it starts with
325*219b2ee8SDavid du Colombier.CW .
326*219b2ee8SDavid du Colombierthen it is searched for in the loader's current directory.
327*219b2ee8SDavid du ColombierOtherwise, the name is searched from
328*219b2ee8SDavid du Colombier.CW /%M/lib .
329*219b2ee8SDavid du ColombierSuch
330*219b2ee8SDavid du Colombier.CW #pragma
331*219b2ee8SDavid du Colombierstatements in header files guarantee that the correct
332*219b2ee8SDavid du Colombierlibraries are always linked with a program without the
333*219b2ee8SDavid du Colombierneed to specify them explicitly at link time.
334*219b2ee8SDavid du Colombier.NH
335*219b2ee8SDavid du ColombierObject module conventions
336*219b2ee8SDavid du Colombier.LP
337*219b2ee8SDavid du ColombierThe overall conventions of the runtime environment
338*219b2ee8SDavid du Colombierare important
339*219b2ee8SDavid du Colombierto runtime efficiency.
340*219b2ee8SDavid du ColombierIn this section,
341*219b2ee8SDavid du Colombierseveral of these conventions are discussed.
342*219b2ee8SDavid du Colombier.NH 2
343*219b2ee8SDavid du ColombierRegister saving
344*219b2ee8SDavid du Colombier.LP
345*219b2ee8SDavid du ColombierIn the Plan 9 compilers,
346*219b2ee8SDavid du Colombierthe caller of a procedure saves the registers.
347*219b2ee8SDavid du ColombierWith caller-saves,
348*219b2ee8SDavid du Colombierthe leaf procedures can use all the
349*219b2ee8SDavid du Colombierregisters and never save them.
350*219b2ee8SDavid du ColombierIf you spend a lot of time at the leaves,
351*219b2ee8SDavid du Colombierthis seems preferable.
352*219b2ee8SDavid du ColombierWith callee-saves,
353*219b2ee8SDavid du Colombierthe saving of the registers is done
354*219b2ee8SDavid du Colombierin the single point of entry and return.
355*219b2ee8SDavid du ColombierIf you are interested in space,
356*219b2ee8SDavid du Colombierthis seems preferable.
357*219b2ee8SDavid du ColombierIn both,
358*219b2ee8SDavid du Colombierthere is a degree of uncertainty
359*219b2ee8SDavid du Colombierabout what registers need to be saved.
360*219b2ee8SDavid du ColombierCallee-saved registers make it difficult to
361*219b2ee8SDavid du Colombierfind variables in registers in debuggers.
362*219b2ee8SDavid du ColombierCallee-saved registers also complicate
363*219b2ee8SDavid du Colombierthe implementation of
364*219b2ee8SDavid du Colombier.CW longjmp .
365*219b2ee8SDavid du ColombierThe convincing argument is
366*219b2ee8SDavid du Colombierthat with caller-saves,
367*219b2ee8SDavid du Colombierthe decision to registerize a variable
368*219b2ee8SDavid du Colombiercan include the cost of saving the register
369*219b2ee8SDavid du Colombieracross calls.
370*219b2ee8SDavid du ColombierFor a further discussion of caller- vs. callee-saves,
371*219b2ee8SDavid du Colombiersee the paper by Davidson and Whalley [Dav91].
372*219b2ee8SDavid du Colombier.LP
373*219b2ee8SDavid du ColombierIn the Plan 9 operating system,
374*219b2ee8SDavid du Colombiercalls to the kernel look like normal procedure
375*219b2ee8SDavid du Colombiercalls, which means
376*219b2ee8SDavid du Colombierthe caller
377*219b2ee8SDavid du Colombierhas saved the registers and the system
378*219b2ee8SDavid du Colombierentry does not have to.
379*219b2ee8SDavid du ColombierThis makes system calls considerably faster.
380*219b2ee8SDavid du ColombierSince this is a potential security hole,
381*219b2ee8SDavid du Colombierand can lead to non-determinism,
382*219b2ee8SDavid du Colombierthe system may eventually save the registers
383*219b2ee8SDavid du Colombieron entry,
384*219b2ee8SDavid du Colombieror more likely clear the registers on return.
385*219b2ee8SDavid du Colombier.NH 2
386*219b2ee8SDavid du ColombierCalling convention
387*219b2ee8SDavid du Colombier.LP
388*219b2ee8SDavid du ColombierOlder C compilers maintain a frame pointer, which is at a known constant
389*219b2ee8SDavid du Colombieroffset from the stack pointer within each function.
390*219b2ee8SDavid du ColombierFor machines where the stack grows towards zero,
391*219b2ee8SDavid du Colombierthe argument pointer is at a known constant offset
392*219b2ee8SDavid du Colombierfrom the frame pointer.
393*219b2ee8SDavid du ColombierSince the stack grows down in Plan 9,
394*219b2ee8SDavid du Colombierthe Plan 9 compilers
395*219b2ee8SDavid du Colombierkeep neither an
396*219b2ee8SDavid du Colombierexplicit frame pointer nor
397*219b2ee8SDavid du Colombieran explicit argument pointer;
398*219b2ee8SDavid du Colombierinstead they generate addresses relative to the stack pointer.
399*219b2ee8SDavid du Colombier.LP
400*219b2ee8SDavid du ColombierOn some architectures, the first argument to a subroutine is passed in a register.
401*219b2ee8SDavid du Colombier.NH 2
402*219b2ee8SDavid du ColombierFunctions returning structures
403*219b2ee8SDavid du Colombier.LP
404*219b2ee8SDavid du ColombierStructures longer than one word are awkward to implement
405*219b2ee8SDavid du Colombiersince they do not fit in registers and must
406*219b2ee8SDavid du Colombierbe passed around in memory.
407*219b2ee8SDavid du ColombierFunctions that return structures
408*219b2ee8SDavid du Colombierare particularly clumsy.
409*219b2ee8SDavid du ColombierThe Plan 9 compilers pass the return address of
410*219b2ee8SDavid du Colombiera structure as the first argument of a
411*219b2ee8SDavid du Colombierfunction that has a structure return value.
412*219b2ee8SDavid du ColombierThus
413*219b2ee8SDavid du Colombier.DS
414*219b2ee8SDavid du Colombier.CW
415*219b2ee8SDavid du Colombier.ta .1i .6i 1.1i 1.6i
416*219b2ee8SDavid du Colombier	x = f(...)
417*219b2ee8SDavid du Colombier.DE
418*219b2ee8SDavid du Colombieris rewritten as
419*219b2ee8SDavid du Colombier.DS
420*219b2ee8SDavid du Colombier.CW
421*219b2ee8SDavid du Colombier.ta .1i .6i 1.1i 1.6i
422*219b2ee8SDavid du Colombier	f(&x, ...)\f1.
423*219b2ee8SDavid du Colombier.DE
424*219b2ee8SDavid du ColombierThis saves a copy and makes the compilation
425*219b2ee8SDavid du Colombiermuch less clumsy.
426*219b2ee8SDavid du ColombierA disadvantage is that if you call this
427*219b2ee8SDavid du Colombierfunction without an assignment,
428*219b2ee8SDavid du Colombiera dummy location must be invented.
429*219b2ee8SDavid du Colombier.LP
430*219b2ee8SDavid du ColombierThere is also a danger of calling a function
431*219b2ee8SDavid du Colombierthat returns a structure without declaring
432*219b2ee8SDavid du Colombierit as such.
433*219b2ee8SDavid du ColombierWith ANSI C function prototypes,
434*219b2ee8SDavid du Colombierthis error need never occur.
435*219b2ee8SDavid du Colombier.NH
436*219b2ee8SDavid du ColombierImplementation
437*219b2ee8SDavid du Colombier.LP
438*219b2ee8SDavid du ColombierThe compiler is divided internally into
439*219b2ee8SDavid du Colombierfour machine-independent passes,
440*219b2ee8SDavid du Colombierfour machine-dependent passes,
441*219b2ee8SDavid du Colombierand an output pass.
442*219b2ee8SDavid du ColombierThe next nine sections describe each pass in order.
443*219b2ee8SDavid du Colombier.NH 2
444*219b2ee8SDavid du ColombierParsing
445*219b2ee8SDavid du Colombier.LP
446*219b2ee8SDavid du ColombierThe first pass is a YACC-based parser
447*219b2ee8SDavid du Colombier[Joh79].
448*219b2ee8SDavid du ColombierDeclarations are interpreted immediately,
449*219b2ee8SDavid du Colombierbuilding a block structured symbol table.
450*219b2ee8SDavid du ColombierExecutable statements are put into a parse tree
451*219b2ee8SDavid du Colombierand collected,
452*219b2ee8SDavid du Colombierwithout interpretation.
453*219b2ee8SDavid du ColombierAt the end of each procedure,
454*219b2ee8SDavid du Colombierthe parse tree for the function is
455*219b2ee8SDavid du Colombierexamined by the other passes of the compiler.
456*219b2ee8SDavid du Colombier.LP
457*219b2ee8SDavid du ColombierThe input stream of the parser is
458*219b2ee8SDavid du Colombiera pushdown list of input activations.
459*219b2ee8SDavid du ColombierThe preprocessor
460*219b2ee8SDavid du Colombierexpansions of
461*219b2ee8SDavid du Colombiermacros
462*219b2ee8SDavid du Colombierand
463*219b2ee8SDavid du Colombier.CW #include
464*219b2ee8SDavid du Colombierare implemented as pushdowns.
465*219b2ee8SDavid du ColombierThus there is no separate
466*219b2ee8SDavid du Colombierpass for preprocessing.
467*219b2ee8SDavid du Colombier.NH 2
468*219b2ee8SDavid du ColombierTyping
469*219b2ee8SDavid du Colombier.LP
470*219b2ee8SDavid du ColombierThe next pass distributes typing information
471*219b2ee8SDavid du Colombierto every node of the tree.
472*219b2ee8SDavid du ColombierImplicit operations on the tree are added,
473*219b2ee8SDavid du Colombiersuch as type promotions and taking the
474*219b2ee8SDavid du Colombieraddress of arrays and functions.
475*219b2ee8SDavid du Colombier.NH 2
476*219b2ee8SDavid du ColombierMachine-independent optimization
477*219b2ee8SDavid du Colombier.LP
478*219b2ee8SDavid du ColombierThe next pass performs optimizations
479*219b2ee8SDavid du Colombierand transformations of the tree, such as converting
480*219b2ee8SDavid du Colombier.CW &*x
481*219b2ee8SDavid du Colombierand
482*219b2ee8SDavid du Colombier.CW *&x
483*219b2ee8SDavid du Colombierinto
484*219b2ee8SDavid du Colombier.CW x .
485*219b2ee8SDavid du ColombierConstant expressions are converted to constants in this pass.
486*219b2ee8SDavid du Colombier.NH 2
487*219b2ee8SDavid du ColombierArithmetic rewrites
488*219b2ee8SDavid du Colombier.LP
489*219b2ee8SDavid du ColombierThis is another machine-independent optimization.
490*219b2ee8SDavid du ColombierSubtrees of add, subtract, and multiply of integers are
491*219b2ee8SDavid du Colombierrewritten for easier compilation.
492*219b2ee8SDavid du ColombierThe major transformation is factoring:
493*219b2ee8SDavid du Colombier.CW 4+8*a+16*b+5
494*219b2ee8SDavid du Colombieris transformed into
495*219b2ee8SDavid du Colombier.CW 9+8*(a+2*b) .
496*219b2ee8SDavid du ColombierSuch expressions arise from address
497*219b2ee8SDavid du Colombiermanipulation and array indexing.
498*219b2ee8SDavid du Colombier.NH 2
499*219b2ee8SDavid du ColombierAddressability
500*219b2ee8SDavid du Colombier.LP
501*219b2ee8SDavid du ColombierThis is the first of the machine-dependent passes.
502*219b2ee8SDavid du ColombierThe addressability of a processor is defined as the set of
503*219b2ee8SDavid du Colombierexpressions that is legal in the address field
504*219b2ee8SDavid du Colombierof a machine language instruction.
505*219b2ee8SDavid du ColombierThe addressability of different processors varies widely.
506*219b2ee8SDavid du ColombierAt one end of the spectrum are the 68020 and VAX,
507*219b2ee8SDavid du Colombierwhich allow a complex mix of incrementing,
508*219b2ee8SDavid du Colombierdecrementing,
509*219b2ee8SDavid du Colombierindexing, and relative addressing.
510*219b2ee8SDavid du ColombierAt the other end is the MIPS,
511*219b2ee8SDavid du Colombierwhich allows only registers and constant offsets from the
512*219b2ee8SDavid du Colombiercontents of a register.
513*219b2ee8SDavid du ColombierThe addressability can be different for different instructions
514*219b2ee8SDavid du Colombierwithin the same processor.
515*219b2ee8SDavid du Colombier.LP
516*219b2ee8SDavid du ColombierIt is important to the code generator to know when a
517*219b2ee8SDavid du Colombiersubtree represents an address of a particular type.
518*219b2ee8SDavid du ColombierThis is done with a bottom-up walk of the tree.
519*219b2ee8SDavid du ColombierIn this pass, the leaves are labeled with small integers.
520*219b2ee8SDavid du ColombierWhen an internal node is encountered,
521*219b2ee8SDavid du Colombierit is labeled by consulting a table indexed by the
522*219b2ee8SDavid du Colombierlabels on the left and right subtrees.
523*219b2ee8SDavid du ColombierFor example,
524*219b2ee8SDavid du Colombieron the 68020 processor,
525*219b2ee8SDavid du Colombierit is possible to address an
526*219b2ee8SDavid du Colombieroffset from a named location.
527*219b2ee8SDavid du ColombierIn C, this is represented by the expression
528*219b2ee8SDavid du Colombier.CW *(&name+constant) .
529*219b2ee8SDavid du ColombierThis is marked addressable by the following table.
530*219b2ee8SDavid du ColombierIn the table,
531*219b2ee8SDavid du Colombiera node represented by the left column is marked
532*219b2ee8SDavid du Colombierwith a small integer from the right column.
533*219b2ee8SDavid du ColombierMarks of the form
534*219b2ee8SDavid du Colombier.CW A\s-2\di\u\s0
535*219b2ee8SDavid du Colombierare addressable while
536*219b2ee8SDavid du Colombiermarks of the form
537*219b2ee8SDavid du Colombier.CW N\s-2\di\u\s0
538*219b2ee8SDavid du Colombierare not addressable.
539*219b2ee8SDavid du Colombier.DS
540*219b2ee8SDavid du Colombier.B
541*219b2ee8SDavid du Colombier.ta .1i 1.1i
542*219b2ee8SDavid du Colombier	Node	Marked
543*219b2ee8SDavid du Colombier.CW
544*219b2ee8SDavid du Colombier	name	A\s-2\d1\u\s0
545*219b2ee8SDavid du Colombier	const	A\s-2\d2\u\s0
546*219b2ee8SDavid du Colombier	&A\s-2\d1\u\s0	A\s-2\d3\u\s0
547*219b2ee8SDavid du Colombier	A\s-2\d3\u\s0+A\s-2\d1\u\s0	N\s-2\d1\u\s0 \fR(note that this is not addressable)\fP
548*219b2ee8SDavid du Colombier	*N\s-2\d1\u\s0	A\s-2\d4\u\s0
549*219b2ee8SDavid du Colombier.DE
550*219b2ee8SDavid du ColombierHere there is a distinction between
551*219b2ee8SDavid du Colombiera node marked
552*219b2ee8SDavid du Colombier.CW A\s-2\d1\u\s0
553*219b2ee8SDavid du Colombierand a node marked
554*219b2ee8SDavid du Colombier.CW A\s-2\d4\u\s0
555*219b2ee8SDavid du Colombierbecause the address operator of an
556*219b2ee8SDavid du Colombier.CW A\s-2\d4\u\s0
557*219b2ee8SDavid du Colombiernode is not addressable.
558*219b2ee8SDavid du ColombierSo to extend the table:
559*219b2ee8SDavid du Colombier.DS
560*219b2ee8SDavid du Colombier.B
561*219b2ee8SDavid du Colombier.ta .1i 1.1i
562*219b2ee8SDavid du Colombier	Node	Marked
563*219b2ee8SDavid du Colombier.CW
564*219b2ee8SDavid du Colombier	&A\s-2\d4\u\s0	N\s-2\d2\u\s0
565*219b2ee8SDavid du Colombier	N\s-2\d2\u\s0+N\s-2\d1\u\s0	N\s-2\d1\u\s0
566*219b2ee8SDavid du Colombier.DE
567*219b2ee8SDavid du ColombierThe full addressability of the 68020 is expressed
568*219b2ee8SDavid du Colombierin 18 rules like this,
569*219b2ee8SDavid du Colombierwhile the addressability of the MIPS is expressed
570*219b2ee8SDavid du Colombierin 11 rules.
571*219b2ee8SDavid du ColombierWhen one ports the compiler,
572*219b2ee8SDavid du Colombierthis table is usually initialized
573*219b2ee8SDavid du Colombierso that leaves are labeled as addressable and nothing else.
574*219b2ee8SDavid du ColombierThe code produced is poor,
575*219b2ee8SDavid du Colombierbut porting is easy.
576*219b2ee8SDavid du ColombierThe table can be extended later.
577*219b2ee8SDavid du Colombier.LP
578*219b2ee8SDavid du ColombierThis pass also rewrites some complex operators
579*219b2ee8SDavid du Colombierinto procedure calls.
580*219b2ee8SDavid du ColombierExamples include 64-bit multiply and divide.
581*219b2ee8SDavid du Colombier.LP
582*219b2ee8SDavid du ColombierIn the same bottom-up pass of the tree,
583*219b2ee8SDavid du Colombierthe nodes are labeled with a Sethi-Ullman complexity
584*219b2ee8SDavid du Colombier[Set70].
585*219b2ee8SDavid du ColombierThis number is roughly the number of registers required
586*219b2ee8SDavid du Colombierto compile the tree on an ideal machine.
587*219b2ee8SDavid du ColombierAn addressable node is marked 0.
588*219b2ee8SDavid du ColombierA function call is marked infinite.
589*219b2ee8SDavid du ColombierA unary operator is marked as the
590*219b2ee8SDavid du Colombiermaximum of 1 and the mark of its subtree.
591*219b2ee8SDavid du ColombierA binary operator with equal marks on its subtrees is
592*219b2ee8SDavid du Colombiermarked with a subtree mark plus 1.
593*219b2ee8SDavid du ColombierA binary operator with unequal marks on its subtrees is
594*219b2ee8SDavid du Colombiermarked with the maximum mark of its subtrees.
595*219b2ee8SDavid du ColombierThe actual values of the marks are not too important,
596*219b2ee8SDavid du Colombierbut the relative values are.
597*219b2ee8SDavid du ColombierThe goal is to compile the harder
598*219b2ee8SDavid du Colombier(larger mark)
599*219b2ee8SDavid du Colombiersubtree first.
600*219b2ee8SDavid du Colombier.NH 2
601*219b2ee8SDavid du ColombierCode generation
602*219b2ee8SDavid du Colombier.LP
603*219b2ee8SDavid du ColombierCode is generated by recursive
604*219b2ee8SDavid du Colombierdescent.
605*219b2ee8SDavid du ColombierThe Sethi-Ullman complexity completely guides the
606*219b2ee8SDavid du Colombierorder.
607*219b2ee8SDavid du ColombierThe addressability defines the leaves.
608*219b2ee8SDavid du ColombierThe only difficult part is compiling a tree
609*219b2ee8SDavid du Colombierthat has two infinite (function call)
610*219b2ee8SDavid du Colombiersubtrees.
611*219b2ee8SDavid du ColombierIn this case,
612*219b2ee8SDavid du Colombierone subtree is compiled into the return register
613*219b2ee8SDavid du Colombier(usually the most convenient place for a function call)
614*219b2ee8SDavid du Colombierand then stored on the stack.
615*219b2ee8SDavid du ColombierThe other subtree is compiled into the return register
616*219b2ee8SDavid du Colombierand then the operation is compiled with
617*219b2ee8SDavid du Colombieroperands from the stack and the return register.
618*219b2ee8SDavid du Colombier.LP
619*219b2ee8SDavid du ColombierThere is a separate boolean code generator that compiles
620*219b2ee8SDavid du Colombierconditional expressions.
621*219b2ee8SDavid du ColombierThis is fundamentally different from compiling an arithmetic expression.
622*219b2ee8SDavid du ColombierThe result of the boolean code generator is the
623*219b2ee8SDavid du Colombierposition of the program counter and not an expression.
624*219b2ee8SDavid du ColombierThe boolean code generator makes extensive use of De Morgan's rule.
625*219b2ee8SDavid du ColombierThe boolean code generator is an expanded version of that described
626*219b2ee8SDavid du Colombierin chapter 8 of Aho, Sethi, and Ullman
627*219b2ee8SDavid du Colombier[Aho87].
628*219b2ee8SDavid du Colombier.LP
629*219b2ee8SDavid du ColombierThere is a considerable amount of talk in the literature
630*219b2ee8SDavid du Colombierabout automating this part of a compiler with a machine
631*219b2ee8SDavid du Colombierdescription.
632*219b2ee8SDavid du ColombierSince this code generator is so small
633*219b2ee8SDavid du Colombier(less than 500 lines of C)
634*219b2ee8SDavid du Colombierand easy,
635*219b2ee8SDavid du Colombierit hardly seems worth the effort.
636*219b2ee8SDavid du Colombier.NH 2
637*219b2ee8SDavid du ColombierRegisterization
638*219b2ee8SDavid du Colombier.LP
639*219b2ee8SDavid du ColombierUp to now,
640*219b2ee8SDavid du Colombierthe compiler has operated on syntax trees
641*219b2ee8SDavid du Colombierthat are roughly equivalent to the original source language.
642*219b2ee8SDavid du ColombierThe previous pass has produced machine language in an internal
643*219b2ee8SDavid du Colombierformat.
644*219b2ee8SDavid du ColombierThe next two passes operate on the internal machine language
645*219b2ee8SDavid du Colombierstructures.
646*219b2ee8SDavid du ColombierThe purpose of the next pass is to reintroduce
647*219b2ee8SDavid du Colombierregisters for heavily used variables.
648*219b2ee8SDavid du Colombier.LP
649*219b2ee8SDavid du ColombierAll of the variables that can be
650*219b2ee8SDavid du Colombierpotentially registerized within a procedure are
651*219b2ee8SDavid du Colombierplaced in a table.
652*219b2ee8SDavid du Colombier(Suitable variables are any automatic or external
653*219b2ee8SDavid du Colombierscalars that do not have their addresses extracted.
654*219b2ee8SDavid du ColombierSome constants that are hard to reference are also
655*219b2ee8SDavid du Colombierconsidered for registerization.)
656*219b2ee8SDavid du ColombierFour separate data flow equations are evaluated
657*219b2ee8SDavid du Colombierover the procedure on all of these variables.
658*219b2ee8SDavid du ColombierTwo of the equations are the normal set-behind
659*219b2ee8SDavid du Colombierand used-ahead
660*219b2ee8SDavid du Colombierbits that define the life of a variable.
661*219b2ee8SDavid du ColombierThe two new bits tell if a variable life
662*219b2ee8SDavid du Colombiercrosses a function call ahead or behind.
663*219b2ee8SDavid du ColombierBy examining a variable over its lifetime,
664*219b2ee8SDavid du Colombierit is possible to get a cost
665*219b2ee8SDavid du Colombierfor registerizing.
666*219b2ee8SDavid du ColombierLoops are detected and the costs are multiplied
667*219b2ee8SDavid du Colombierby three for every level of loop nesting.
668*219b2ee8SDavid du ColombierCosts are sorted and the variables
669*219b2ee8SDavid du Colombierare replaced by available registers on a greedy basis.
670*219b2ee8SDavid du Colombier.LP
671*219b2ee8SDavid du ColombierThe 68020 has two different
672*219b2ee8SDavid du Colombiertypes of registers.
673*219b2ee8SDavid du ColombierFor the 68020,
674*219b2ee8SDavid du Colombiertwo different costs are calculated for
675*219b2ee8SDavid du Colombiereach variable life and the register type that
676*219b2ee8SDavid du Colombieraffords the better cost is used.
677*219b2ee8SDavid du ColombierTies are broken by counting the number of available
678*219b2ee8SDavid du Colombierregisters of each type.
679*219b2ee8SDavid du Colombier.LP
680*219b2ee8SDavid du ColombierNote that externals are registerized together with automatics.
681*219b2ee8SDavid du ColombierThis is done by evaluating the semantics of a ``call'' instruction
682*219b2ee8SDavid du Colombierdifferently for externals and automatics.
683*219b2ee8SDavid du ColombierSince a call goes outside the local procedure,
684*219b2ee8SDavid du Colombierit is assumed that a call references all externals.
685*219b2ee8SDavid du ColombierSimilarly,
686*219b2ee8SDavid du Colombierexternals are assumed to be set before an ``entry'' instruction
687*219b2ee8SDavid du Colombierand assumed to be referenced after a ``return'' instruction.
688*219b2ee8SDavid du ColombierThis makes sure that externals are in memory across calls.
689*219b2ee8SDavid du Colombier.LP
690*219b2ee8SDavid du ColombierThe overall results are satisfactory.
691*219b2ee8SDavid du ColombierIt would be nice to be able to do this processing in
692*219b2ee8SDavid du Colombiera machine-independent way,
693*219b2ee8SDavid du Colombierbut it is impossible to get all of the costs and
694*219b2ee8SDavid du Colombierside effects of different choices by examining the parse tree.
695*219b2ee8SDavid du Colombier.LP
696*219b2ee8SDavid du ColombierMost of the code in the registerization pass is machine-independent.
697*219b2ee8SDavid du ColombierThe major machine-dependency is in
698*219b2ee8SDavid du Colombierexamining a machine instruction to ask if it sets or references
699*219b2ee8SDavid du Colombiera variable.
700*219b2ee8SDavid du Colombier.NH 2
701*219b2ee8SDavid du ColombierMachine code optimization
702*219b2ee8SDavid du Colombier.LP
703*219b2ee8SDavid du ColombierThe next pass walks the machine code
704*219b2ee8SDavid du Colombierfor opportunistic optimizations.
705*219b2ee8SDavid du ColombierFor the most part,
706*219b2ee8SDavid du Colombierthis is highly specific to a particular
707*219b2ee8SDavid du Colombierprocessor.
708*219b2ee8SDavid du ColombierOne optimization that is performed
709*219b2ee8SDavid du Colombieron all of the processors is the
710*219b2ee8SDavid du Colombierremoval of unnecessary ``move''
711*219b2ee8SDavid du Colombierinstructions.
712*219b2ee8SDavid du ColombierIronically,
713*219b2ee8SDavid du Colombiermost of these instructions were inserted by
714*219b2ee8SDavid du Colombierthe previous pass.
715*219b2ee8SDavid du ColombierThere are two patterns that are repetitively
716*219b2ee8SDavid du Colombiermatched and replaced until no more matches are
717*219b2ee8SDavid du Colombierfound.
718*219b2ee8SDavid du ColombierThe first tries to remove ``move'' instructions
719*219b2ee8SDavid du Colombierby relabeling variables.
720*219b2ee8SDavid du Colombier.LP
721*219b2ee8SDavid du ColombierWhen a ``move'' instruction is encountered,
722*219b2ee8SDavid du Colombierif the destination variable is set before the
723*219b2ee8SDavid du Colombiersource variable is referenced,
724*219b2ee8SDavid du Colombierthen all of the references to the destination
725*219b2ee8SDavid du Colombiervariable can be renamed to the source and the ``move''
726*219b2ee8SDavid du Colombiercan be deleted.
727*219b2ee8SDavid du ColombierThis transformation uses the reverse data flow
728*219b2ee8SDavid du Colombierset up in the previous pass.
729*219b2ee8SDavid du Colombier.LP
730*219b2ee8SDavid du ColombierAn example of this pattern is depicted in the following
731*219b2ee8SDavid du Colombiertable.
732*219b2ee8SDavid du ColombierThe pattern is in the left column and the
733*219b2ee8SDavid du Colombierreplacement action is in the right column.
734*219b2ee8SDavid du Colombier.DS
735*219b2ee8SDavid du Colombier.CW
736*219b2ee8SDavid du Colombier.ta .1i .6i 1.6i 2.1i 2.6i
737*219b2ee8SDavid du Colombier	MOVE	a->b		\fR(remove)\fP
738*219b2ee8SDavid du Colombier.R
739*219b2ee8SDavid du Colombier	(sequence with no mention of \f(CWa\fP)
740*219b2ee8SDavid du Colombier.CW
741*219b2ee8SDavid du Colombier	USE	b		USE	a
742*219b2ee8SDavid du Colombier.R
743*219b2ee8SDavid du Colombier	(sequence with no mention of \f(CWa\fP)
744*219b2ee8SDavid du Colombier.CW
745*219b2ee8SDavid du Colombier	SET	b		SET	b
746*219b2ee8SDavid du Colombier.DE
747*219b2ee8SDavid du Colombier.LP
748*219b2ee8SDavid du ColombierExperiments have shown that it is marginally
749*219b2ee8SDavid du Colombierworthwhile to rename uses of the destination variable
750*219b2ee8SDavid du Colombierwith uses of the source variable up to
751*219b2ee8SDavid du Colombierthe first use of the source variable.
752*219b2ee8SDavid du Colombier.LP
753*219b2ee8SDavid du ColombierThe second transform will do relabeling
754*219b2ee8SDavid du Colombierwithout deleting instructions.
755*219b2ee8SDavid du ColombierWhen a ``move'' instruction is encountered,
756*219b2ee8SDavid du Colombierif the source variable has been set prior
757*219b2ee8SDavid du Colombierto the use of the destination variable
758*219b2ee8SDavid du Colombierthen all of the references to the source
759*219b2ee8SDavid du Colombiervariable are replaced by the destination and
760*219b2ee8SDavid du Colombierthe ``move'' is inverted.
761*219b2ee8SDavid du ColombierTypically,
762*219b2ee8SDavid du Colombierthis transformation will alter two ``move''
763*219b2ee8SDavid du Colombierinstructions and allow the first transformation
764*219b2ee8SDavid du Colombieranother chance to remove code.
765*219b2ee8SDavid du ColombierThis transformation uses the forward data flow
766*219b2ee8SDavid du Colombierset up in the previous pass.
767*219b2ee8SDavid du Colombier.LP
768*219b2ee8SDavid du ColombierAgain,
769*219b2ee8SDavid du Colombierthe following is a depiction of the transformation where
770*219b2ee8SDavid du Colombierthe pattern is in the left column and the
771*219b2ee8SDavid du Colombierrewrite is in the right column.
772*219b2ee8SDavid du Colombier.DS
773*219b2ee8SDavid du Colombier.CW
774*219b2ee8SDavid du Colombier.ta .1i .6i 1.6i 2.1i 2.6i
775*219b2ee8SDavid du Colombier	SET	a		SET	b
776*219b2ee8SDavid du Colombier.R
777*219b2ee8SDavid du Colombier	(sequence with no use of \f(CWb\fP)
778*219b2ee8SDavid du Colombier.CW
779*219b2ee8SDavid du Colombier	USE	a		USE	b
780*219b2ee8SDavid du Colombier.R
781*219b2ee8SDavid du Colombier	(sequence with no use of \f(CWb\fP)
782*219b2ee8SDavid du Colombier.CW
783*219b2ee8SDavid du Colombier	MOVE	a->b		MOVE	b->a
784*219b2ee8SDavid du Colombier.DE
785*219b2ee8SDavid du ColombierIterating these transformations
786*219b2ee8SDavid du Colombierwill usually get rid of all redundant ``move'' instructions.
787*219b2ee8SDavid du Colombier.LP
788*219b2ee8SDavid du ColombierA problem with this organization is that the costs
789*219b2ee8SDavid du Colombierof registerization calculated in the previous pass
790*219b2ee8SDavid du Colombiermust depend on how well this pass can detect and remove
791*219b2ee8SDavid du Colombierredundant instructions.
792*219b2ee8SDavid du ColombierOften,
793*219b2ee8SDavid du Colombiera fine candidate for registerization is rejected
794*219b2ee8SDavid du Colombierbecause of the cost of instructions that are later
795*219b2ee8SDavid du Colombierremoved.
796*219b2ee8SDavid du Colombier.NH 2
797*219b2ee8SDavid du ColombierWriting the object file
798*219b2ee8SDavid du Colombier.LP
799*219b2ee8SDavid du ColombierThe last pass walks the internal assembly language
800*219b2ee8SDavid du Colombierand writes the object file.
801*219b2ee8SDavid du ColombierThe object file is reduced in size by about a factor
802*219b2ee8SDavid du Colombierof three with simple compression
803*219b2ee8SDavid du Colombiertechniques.
804*219b2ee8SDavid du ColombierThe most important aspect of the object file
805*219b2ee8SDavid du Colombierformat is that it is independent of the compiling machine.
806*219b2ee8SDavid du ColombierAll integer and floating numbers in the object
807*219b2ee8SDavid du Colombiercode are converted to known formats and byte
808*219b2ee8SDavid du Colombierorders.
809*219b2ee8SDavid du Colombier.NH
810*219b2ee8SDavid du ColombierThe loader
811*219b2ee8SDavid du Colombier.LP
812*219b2ee8SDavid du ColombierThe loader is a multiple pass program that
813*219b2ee8SDavid du Colombierreads object files and libraries and produces
814*219b2ee8SDavid du Colombieran executable binary.
815*219b2ee8SDavid du ColombierThe loader also does some minimal
816*219b2ee8SDavid du Colombieroptimizations and code rewriting.
817*219b2ee8SDavid du ColombierMany of the operations performed by the
818*219b2ee8SDavid du Colombierloader are machine-dependent.
819*219b2ee8SDavid du Colombier.LP
820*219b2ee8SDavid du ColombierThe first pass of the loader reads the
821*219b2ee8SDavid du Colombierobject modules into an internal data
822*219b2ee8SDavid du Colombierstructure that looks like binary assembly language.
823*219b2ee8SDavid du ColombierAs the instructions are read,
824*219b2ee8SDavid du Colombiercode is reordered to remove
825*219b2ee8SDavid du Colombierunconditional branch instructions.
826*219b2ee8SDavid du ColombierConditional branch instructions are inverted
827*219b2ee8SDavid du Colombierto prevent the insertion of unconditional branches.
828*219b2ee8SDavid du ColombierThe loader will also make a copy of a few instructions
829*219b2ee8SDavid du Colombierto remove an unconditional branch.
830*219b2ee8SDavid du Colombier.LP
831*219b2ee8SDavid du ColombierThe next pass allocates addresses for
832*219b2ee8SDavid du Colombierall external data.
833*219b2ee8SDavid du ColombierTypical of processors is the MIPS,
834*219b2ee8SDavid du Colombierwhich can reference ±32K bytes from a
835*219b2ee8SDavid du Colombierregister.
836*219b2ee8SDavid du ColombierThe loader allocates the register
837*219b2ee8SDavid du Colombier.CW R30
838*219b2ee8SDavid du Colombieras the static pointer.
839*219b2ee8SDavid du ColombierThe value placed in
840*219b2ee8SDavid du Colombier.CW R30
841*219b2ee8SDavid du Colombieris the base of the data segment plus 32K.
842*219b2ee8SDavid du ColombierIt is then cheap to reference all data in the
843*219b2ee8SDavid du Colombierfirst 64K of the data segment.
844*219b2ee8SDavid du ColombierExternal variables are allocated to
845*219b2ee8SDavid du Colombierthe data segment
846*219b2ee8SDavid du Colombierwith the smallest variables allocated first.
847*219b2ee8SDavid du ColombierIf all of the data cannot fit into the first
848*219b2ee8SDavid du Colombier64K of the data segment,
849*219b2ee8SDavid du Colombierthen usually only a few large arrays
850*219b2ee8SDavid du Colombierneed more expensive addressing modes.
851*219b2ee8SDavid du Colombier.LP
852*219b2ee8SDavid du ColombierFor the MIPS processor,
853*219b2ee8SDavid du Colombierthe loader makes a pass over the internal
854*219b2ee8SDavid du Colombierstructures,
855*219b2ee8SDavid du Colombierexchanging instructions to try
856*219b2ee8SDavid du Colombierto fill ``delay slots'' with useful work.
857*219b2ee8SDavid du ColombierIf a useful instruction cannot be found
858*219b2ee8SDavid du Colombierto fill a delay slot,
859*219b2ee8SDavid du Colombierthe loader will insert
860*219b2ee8SDavid du Colombier``noop''
861*219b2ee8SDavid du Colombierinstructions.
862*219b2ee8SDavid du ColombierThis pass is very expensive and does not
863*219b2ee8SDavid du Colombierdo a good job.
864*219b2ee8SDavid du ColombierAbout 40% of all instructions are in
865*219b2ee8SDavid du Colombierdelay slots.
866*219b2ee8SDavid du ColombierAbout 65% of these are useful instructions and
867*219b2ee8SDavid du Colombier35% are ``noops.''
868*219b2ee8SDavid du ColombierThe vendor-supplied assembler does this job
869*219b2ee8SDavid du Colombiermore effectively,
870*219b2ee8SDavid du Colombierfilling about 80%
871*219b2ee8SDavid du Colombierof the delay slots with useful instructions.
872*219b2ee8SDavid du Colombier.LP
873*219b2ee8SDavid du ColombierOn the 68020 processor,
874*219b2ee8SDavid du Colombierbranch instructions come in a variety of
875*219b2ee8SDavid du Colombiersizes depending on the relative distance
876*219b2ee8SDavid du Colombierof the branch.
877*219b2ee8SDavid du ColombierThus the size of branch instructions
878*219b2ee8SDavid du Colombiercan be mutually dependent.
879*219b2ee8SDavid du ColombierThe loader uses a multiple pass algorithm
880*219b2ee8SDavid du Colombierto resolve the branch lengths
881*219b2ee8SDavid du Colombier[Szy78].
882*219b2ee8SDavid du ColombierInitially, all branches are assumed minimal length.
883*219b2ee8SDavid du ColombierOn each subsequent pass,
884*219b2ee8SDavid du Colombierthe branches are reassessed
885*219b2ee8SDavid du Colombierand expanded if necessary.
886*219b2ee8SDavid du ColombierWhen no more expansions occur,
887*219b2ee8SDavid du Colombierthe locations of the instructions in
888*219b2ee8SDavid du Colombierthe text segment are known.
889*219b2ee8SDavid du Colombier.LP
890*219b2ee8SDavid du ColombierOn the MIPS processor,
891*219b2ee8SDavid du Colombierall instructions are one size.
892*219b2ee8SDavid du ColombierA single pass over the instructions will
893*219b2ee8SDavid du Colombierdetermine the locations of all addresses
894*219b2ee8SDavid du Colombierin the text segment.
895*219b2ee8SDavid du Colombier.LP
896*219b2ee8SDavid du ColombierThe last pass of the loader produces the
897*219b2ee8SDavid du Colombierexecutable binary.
898*219b2ee8SDavid du ColombierA symbol table and other tables are
899*219b2ee8SDavid du Colombierproduced to help the debugger to
900*219b2ee8SDavid du Colombierinterpret the binary symbolically.
901*219b2ee8SDavid du Colombier.LP
902*219b2ee8SDavid du ColombierThe loader places absolute source line numbers in the symbol table.
903*219b2ee8SDavid du ColombierThe name and absolute line number of all
904*219b2ee8SDavid du Colombier.CW #include
905*219b2ee8SDavid du Colombierfiles is also placed in the
906*219b2ee8SDavid du Colombiersymbol table so that the debuggers can
907*219b2ee8SDavid du Colombierassociate object code to source files.
908*219b2ee8SDavid du Colombier.NH
909*219b2ee8SDavid du ColombierPerformance
910*219b2ee8SDavid du Colombier.LP
911*219b2ee8SDavid du ColombierThe following is a table of the source size of the MIPS
912*219b2ee8SDavid du Colombiercompiler.
913*219b2ee8SDavid du Colombier.DS
914*219b2ee8SDavid du Colombier.ta .1i .6i
915*219b2ee8SDavid du Colombier	lines	module
916*219b2ee8SDavid du Colombier	\0509	machine-independent headers
917*219b2ee8SDavid du Colombier	1070	machine-independent YACC source
918*219b2ee8SDavid du Colombier	6090	machine-independent C source
919*219b2ee8SDavid du Colombier
920*219b2ee8SDavid du Colombier	\0545	machine-dependent headers
921*219b2ee8SDavid du Colombier	6532	machine-dependent C source
922*219b2ee8SDavid du Colombier
923*219b2ee8SDavid du Colombier	\0298	loader headers
924*219b2ee8SDavid du Colombier	5215	loader C source
925*219b2ee8SDavid du Colombier.DE
926*219b2ee8SDavid du Colombier.LP
927*219b2ee8SDavid du ColombierThe following table shows timing
928*219b2ee8SDavid du Colombierof a test program
929*219b2ee8SDavid du Colombierthat plays checkers, running on a MIPS R4000.
930*219b2ee8SDavid du ColombierThe test program is 26 files totaling 12600 lines of C.
931*219b2ee8SDavid du ColombierThe execution time does not significantly
932*219b2ee8SDavid du Colombierdepend on library implementation.
933*219b2ee8SDavid du ColombierSince no other compiler runs on Plan 9,
934*219b2ee8SDavid du Colombierthe Plan 9 tests were done with the Plan 9 operating system;
935*219b2ee8SDavid du Colombierthe other tests were done on the vendor's operating system.
936*219b2ee8SDavid du ColombierThe hardware was identical in both cases.
937*219b2ee8SDavid du ColombierThe optimizer in the vendor's compiler
938*219b2ee8SDavid du Colombieris reputed to be extremely good.
939*219b2ee8SDavid du Colombier.DS
940*219b2ee8SDavid du Colombier.ta .1i .9i
941*219b2ee8SDavid du Colombier	\0\04.49s	Plan 9 \f(CWvc\fP \f(CW-N\fP compile time (opposite of \f(CW-O\fP)
942*219b2ee8SDavid du Colombier	\0\01.72s	Plan 9 \f(CWvc\fP \f(CW-N\fP load time
943*219b2ee8SDavid du Colombier	148.69s	Plan 9 \f(CWvc\fP \f(CW-N\fP run time
944*219b2ee8SDavid du Colombier
945*219b2ee8SDavid du Colombier	\015.07s	Plan 9 \f(CWvc\fP compile time (\f(CW-O\fP implicit)
946*219b2ee8SDavid du Colombier	\0\01.66s	Plan 9 \f(CWvc\fP load time
947*219b2ee8SDavid du Colombier	\089.96s	Plan 9 \f(CWvc\fP run time
948*219b2ee8SDavid du Colombier
949*219b2ee8SDavid du Colombier	\014.83s	vendor \f(CWcc\fP compile time
950*219b2ee8SDavid du Colombier	\0\00.38s	vendor \f(CWcc\fP load time
951*219b2ee8SDavid du Colombier	104.75s	vendor \f(CWcc\fP run time
952*219b2ee8SDavid du Colombier
953*219b2ee8SDavid du Colombier	\043.59s	vendor \f(CWcc\fP \f(CW-O\fP compile time
954*219b2ee8SDavid du Colombier	\0\00.38s	vendor \f(CWcc\fP \f(CW-O\fP load time
955*219b2ee8SDavid du Colombier	\076.19s	vendor \f(CWcc\fP \f(CW-O\fP run time
956*219b2ee8SDavid du Colombier
957*219b2ee8SDavid du Colombier	\0\08.19s	vendor \f(CWcc\fP \f(CW-O3\fP compile time
958*219b2ee8SDavid du Colombier	\035.97s	vendor \f(CWcc\fP \f(CW-O3\fP load time
959*219b2ee8SDavid du Colombier	\071.16s	vendor \f(CWcc\fP \f(CW-O3\fP run time
960*219b2ee8SDavid du Colombier.DE
961*219b2ee8SDavid du Colombier.LP
962*219b2ee8SDavid du ColombierTo compare the Intel compiler,
963*219b2ee8SDavid du Colombiera program that is about 40% bit manipulation and
964*219b2ee8SDavid du Colombierabout 60% single precision floating point was
965*219b2ee8SDavid du Colombierrun on the same 33 MHz 486, once under Windows
966*219b2ee8SDavid du Colombiercompiled with the Watcom compiler, version 10.0,
967*219b2ee8SDavid du Colombierin 16-bit mode and once under
968*219b2ee8SDavid du ColombierPlan 9 in 32-bit mode.
969*219b2ee8SDavid du ColombierThe Plan 9 execution time was 27 sec while the Windows
970*219b2ee8SDavid du Colombierexecution time was 31 sec.
971*219b2ee8SDavid du Colombier.NH
972*219b2ee8SDavid du ColombierConclusions
973*219b2ee8SDavid du Colombier.LP
974*219b2ee8SDavid du ColombierThe new compilers compile
975*219b2ee8SDavid du Colombierquickly,
976*219b2ee8SDavid du Colombierload slowly,
977*219b2ee8SDavid du Colombierand produce
978*219b2ee8SDavid du Colombiermedium quality
979*219b2ee8SDavid du Colombierobject code.
980*219b2ee8SDavid du ColombierThe compilers are relatively
981*219b2ee8SDavid du Colombierportable,
982*219b2ee8SDavid du Colombierrequiring but a couple of weeks' work to
983*219b2ee8SDavid du Colombierproduce a compiler for a different computer.
984*219b2ee8SDavid du ColombierFor Plan 9,
985*219b2ee8SDavid du Colombierwhere we needed several compilers
986*219b2ee8SDavid du Colombierwith specialized features and
987*219b2ee8SDavid du Colombierour own object formats,
988*219b2ee8SDavid du Colombierthis project was indispensable.
989*219b2ee8SDavid du ColombierIt is also necessary for us to
990*219b2ee8SDavid du Colombierbe able to freely distribute our compilers
991*219b2ee8SDavid du Colombierwith the Plan 9 distribution.
992*219b2ee8SDavid du Colombier.LP
993*219b2ee8SDavid du ColombierTwo problems have come up in retrospect.
994*219b2ee8SDavid du ColombierThe first has to do with the
995*219b2ee8SDavid du Colombierdivision of labor between compiler and loader.
996*219b2ee8SDavid du ColombierPlan 9 runs on multi-processors and as such
997*219b2ee8SDavid du Colombiercompilations are often done in parallel.
998*219b2ee8SDavid du ColombierUnfortunately,
999*219b2ee8SDavid du Colombierall compilations must be complete before loading
1000*219b2ee8SDavid du Colombiercan begin.
1001*219b2ee8SDavid du ColombierThe load is then single-threaded.
1002*219b2ee8SDavid du ColombierWith this model,
1003*219b2ee8SDavid du Colombierany shift of work from compile to load
1004*219b2ee8SDavid du Colombierresults in a significant increase in real time.
1005*219b2ee8SDavid du ColombierThe same is true of libraries that are compiled
1006*219b2ee8SDavid du Colombierinfrequently and loaded often.
1007*219b2ee8SDavid du ColombierIn the future,
1008*219b2ee8SDavid du Colombierwe may try to put some of the loader work
1009*219b2ee8SDavid du Colombierback into the compiler.
1010*219b2ee8SDavid du Colombier.LP
1011*219b2ee8SDavid du ColombierThe second problem comes from
1012*219b2ee8SDavid du Colombierthe various optimizations performed over several
1013*219b2ee8SDavid du Colombierpasses.
1014*219b2ee8SDavid du ColombierOften optimizations in different passes depend
1015*219b2ee8SDavid du Colombieron each other.
1016*219b2ee8SDavid du ColombierIterating the passes could compromise efficiency,
1017*219b2ee8SDavid du Colombieror even loop.
1018*219b2ee8SDavid du ColombierWe see no real solution to this problem.
1019*219b2ee8SDavid du Colombier.NH
1020*219b2ee8SDavid du ColombierReferences
1021*219b2ee8SDavid du Colombier.LP
1022*219b2ee8SDavid du Colombier[Aho87] A. V. Aho, R. Sethi, and J. D. Ullman,
1023*219b2ee8SDavid du Colombier.I
1024*219b2ee8SDavid du ColombierCompilers \- Principles, Techniques, and Tools,
1025*219b2ee8SDavid du Colombier.R
1026*219b2ee8SDavid du ColombierAddison Wesley,
1027*219b2ee8SDavid du ColombierReading, MA,
1028*219b2ee8SDavid du Colombier1987.
1029*219b2ee8SDavid du Colombier.LP
1030*219b2ee8SDavid du Colombier[ANSI90] \f2American National Standard for Information Systems \-
1031*219b2ee8SDavid du ColombierProgramming Language C\f1, American National Standards Institute, Inc.,
1032*219b2ee8SDavid du ColombierNew York, 1990.
1033*219b2ee8SDavid du Colombier.LP
1034*219b2ee8SDavid du Colombier[Dav91] J. W. Davidson and D. B. Whalley,
1035*219b2ee8SDavid du Colombier``Methods for Saving and Restoring Register Values across Function Calls'',
1036*219b2ee8SDavid du Colombier.I
1037*219b2ee8SDavid du ColombierSoftware\-Practice and Experience,
1038*219b2ee8SDavid du Colombier.R
1039*219b2ee8SDavid du ColombierVol 21(2), pp. 149-165, February 1991.
1040*219b2ee8SDavid du Colombier.LP
1041*219b2ee8SDavid du Colombier[Joh79] S. C. Johnson,
1042*219b2ee8SDavid du Colombier``YACC \- Yet Another Compiler Compiler'',
1043*219b2ee8SDavid du Colombier.I
1044*219b2ee8SDavid du ColombierUNIX Programmer's Manual, Seventh Ed., Vol. 2A,
1045*219b2ee8SDavid du Colombier.R
1046*219b2ee8SDavid du ColombierAT&T Bell Laboratories,
1047*219b2ee8SDavid du ColombierMurray Hill, NJ,
1048*219b2ee8SDavid du Colombier1979.
1049*219b2ee8SDavid du Colombier.LP
1050*219b2ee8SDavid du Colombier[Set70] R. Sethi and J. D. Ullman,
1051*219b2ee8SDavid du Colombier``The Generation of Optimal Code for Arithmetic Expressions'',
1052*219b2ee8SDavid du Colombier.I
1053*219b2ee8SDavid du ColombierJournal of the ACM,
1054*219b2ee8SDavid du Colombier.R
1055*219b2ee8SDavid du ColombierVol 17(4), pp. 715-728, 1970.
1056*219b2ee8SDavid du Colombier.LP
1057*219b2ee8SDavid du Colombier[Szy78] T. G. Szymanski,
1058*219b2ee8SDavid du Colombier``Assembling Code for Machines with Span-dependent Instructions'',
1059*219b2ee8SDavid du Colombier.I
1060*219b2ee8SDavid du ColombierCommunications of the ACM,
1061*219b2ee8SDavid du Colombier.R
1062*219b2ee8SDavid du ColombierVol 21(4), pp. 300-308, 1978.
1063