xref: /inferno-os/doc/compiler.ms (revision 46439007cf417cbd9ac8049bb4122c890097a0fa)
1*46439007SCharles.Forsyth.TL
2*46439007SCharles.ForsythPlan 9 C Compilers \(dg
3*46439007SCharles.Forsyth.AU
4*46439007SCharles.Forsyth.I "Ken Thompson"
5*46439007SCharles.Forsyth.AI
6*46439007SCharles.Forsythken@plan9.bell-labs.com
7*46439007SCharles.Forsyth.AB
8*46439007SCharles.Forsyth.FS
9*46439007SCharles.Forsyth\l'1i'
10*46439007SCharles.Forsyth.br
11*46439007SCharles.Forsyth\(dg Originally appeared, in a different form, in
12*46439007SCharles.Forsyth.I
13*46439007SCharles.ForsythProceedings of the Summer 1990 UKUUG Conference,
14*46439007SCharles.Forsyth.R
15*46439007SCharles.Forsythpp. 41-51,
16*46439007SCharles.ForsythLondon, 1990.
17*46439007SCharles.ForsythThis version first appeared in
18*46439007SCharles.Forsyth.I "Plan 9 Programmer's Manual, Volume 2 (Second Edition)" .
19*46439007SCharles.ForsythThe Plan 9 compiler suite forms the basis for the portable Inferno compiler suite,
20*46439007SCharles.Forsythmaking this paper still relevant.
21*46439007SCharles.Forsyth.FE
22*46439007SCharles.ForsythThis paper describes the overall structure and function of the Plan 9 C compilers.
23*46439007SCharles.ForsythA more detailed implementation document
24*46439007SCharles.Forsythfor any one of the compilers
25*46439007SCharles.Forsythis yet to be written.
26*46439007SCharles.Forsyth.AE
27*46439007SCharles.Forsyth.NH
28*46439007SCharles.ForsythIntroduction
29*46439007SCharles.Forsyth.LP
30*46439007SCharles.ForsythThere are many compilers in the series.
31*46439007SCharles.ForsythEight of the compilers (MIPS 3000, SPARC, Intel 386, AMD64, Power PC, ARM, DEC Alpha, and Motorola 68020)
32*46439007SCharles.Forsythare considered active and are used to compile
33*46439007SCharles.Forsythcurrent versions of Plan 9 or Inferno.
34*46439007SCharles.ForsythSeveral others (Motorola 68000, Intel 960, AMD 29000) have had only limited use, such as
35*46439007SCharles.Forsythto program peripherals or experimental devices.
36*46439007SCharles.Forsyth.NH
37*46439007SCharles.ForsythStructure
38*46439007SCharles.Forsyth.LP
39*46439007SCharles.ForsythThe compiler is a single program that produces an
40*46439007SCharles.Forsythobject file.
41*46439007SCharles.ForsythCombined in the compiler are the traditional
42*46439007SCharles.Forsythroles of preprocessor, lexical analyzer, parser, code generator,
43*46439007SCharles.Forsythlocal optimizer,
44*46439007SCharles.Forsythand first half of the assembler.
45*46439007SCharles.ForsythThe object files are binary forms of assembly
46*46439007SCharles.Forsythlanguage,
47*46439007SCharles.Forsythsimilar to what might be passed between
48*46439007SCharles.Forsyththe first and second passes of an assembler.
49*46439007SCharles.Forsyth.LP
50*46439007SCharles.ForsythObject files and libraries
51*46439007SCharles.Forsythare combined by a loader
52*46439007SCharles.Forsythprogram to produce the executable binary.
53*46439007SCharles.ForsythThe loader combines the roles of second half
54*46439007SCharles.Forsythof the assembler, global optimizer, and loader.
55*46439007SCharles.ForsythThe names of the compliers, loaders, and assemblers
56*46439007SCharles.Forsythare as follows:
57*46439007SCharles.Forsyth.DS
58*46439007SCharles.Forsyth.ta 1.5i
59*46439007SCharles.Forsyth.de Ta
60*46439007SCharles.Forsyth\\$1	\f(CW\\$2\fP  \f(CW\\$3\fP  \f(CW\\$4\fP
61*46439007SCharles.Forsyth..
62*46439007SCharles.Forsyth.Ta SPARC kc kl ka
63*46439007SCharles.Forsyth.Ta PowerPC qc ql qa
64*46439007SCharles.Forsyth.Ta MIPS vc vl va
65*46439007SCharles.Forsyth.Ta Motorola\ 68000 1c 1l 1a
66*46439007SCharles.Forsyth.Ta Motorola\ 68020 2c 2l 2a
67*46439007SCharles.Forsyth.Ta ARM 5c 5l 5a
68*46439007SCharles.Forsyth.Ta AMD64 6c 6l 6a
69*46439007SCharles.Forsyth.Ta DEC\ Alpha 7c 7l 7a
70*46439007SCharles.Forsyth.Ta Intel\ 386 8c 8l 8a
71*46439007SCharles.Forsyth.Ta AMD\ 29000 9c 9l 9a
72*46439007SCharles.Forsyth.DE
73*46439007SCharles.ForsythThere is a further breakdown
74*46439007SCharles.Forsythin the source of the compilers into
75*46439007SCharles.Forsythobject-independent and
76*46439007SCharles.Forsythobject-dependent
77*46439007SCharles.Forsythparts.
78*46439007SCharles.ForsythAll of the object-independent parts
79*46439007SCharles.Forsythare combined into source files in the
80*46439007SCharles.Forsythdirectory
81*46439007SCharles.Forsyth.CW /sys/src/cmd/cc .
82*46439007SCharles.ForsythThe object-dependent parts are collected
83*46439007SCharles.Forsythin a separate directory for each compiler,
84*46439007SCharles.Forsythfor example
85*46439007SCharles.Forsyth.CW /sys/src/cmd/vc .
86*46439007SCharles.ForsythAll of the code,
87*46439007SCharles.Forsythboth object-independent and
88*46439007SCharles.Forsythobject-dependent,
89*46439007SCharles.Forsythis machine-independent
90*46439007SCharles.Forsythand may be cross-compiled and executed on any
91*46439007SCharles.Forsythof the architectures.
92*46439007SCharles.Forsyth.NH
93*46439007SCharles.ForsythThe Language
94*46439007SCharles.Forsyth.LP
95*46439007SCharles.ForsythThe compiler implements ANSI C with some
96*46439007SCharles.Forsythrestrictions and extensions
97*46439007SCharles.Forsyth[ANSI90].
98*46439007SCharles.ForsythMost of the restrictions are due to
99*46439007SCharles.Forsythpersonal preference, while
100*46439007SCharles.Forsythmost of the extensions were to help in
101*46439007SCharles.Forsyththe implementation of Plan 9.
102*46439007SCharles.ForsythThere are other departures from the standard,
103*46439007SCharles.Forsythparticularly in the libraries,
104*46439007SCharles.Forsyththat are beyond the scope of this
105*46439007SCharles.Forsythpaper.
106*46439007SCharles.Forsyth.NH 2
107*46439007SCharles.ForsythRegister, volatile, const
108*46439007SCharles.Forsyth.LP
109*46439007SCharles.ForsythThe keyword
110*46439007SCharles.Forsyth.CW register
111*46439007SCharles.Forsythis recognized syntactically
112*46439007SCharles.Forsythbut is semantically ignored.
113*46439007SCharles.ForsythThus taking the address of a
114*46439007SCharles.Forsyth.CW register
115*46439007SCharles.Forsythvariable is not diagnosed.
116*46439007SCharles.ForsythThe keyword
117*46439007SCharles.Forsyth.CW volatile
118*46439007SCharles.Forsythdisables all optimizations, in particular registerization, of the corresponding variable.
119*46439007SCharles.ForsythThe keyword
120*46439007SCharles.Forsyth.CW const
121*46439007SCharles.Forsythgenerates warnings (if warnings are enabled by the compiler's
122*46439007SCharles.Forsyth.CW -w
123*46439007SCharles.Forsythoption) of non-constant use of the variable,
124*46439007SCharles.Forsythbut does not affect the generated code.
125*46439007SCharles.Forsyth.NH 2
126*46439007SCharles.ForsythThe preprocessor
127*46439007SCharles.Forsyth.LP
128*46439007SCharles.ForsythThe C preprocessor is probably the
129*46439007SCharles.Forsythbiggest departure from the ANSI standard.
130*46439007SCharles.Forsyth.LP
131*46439007SCharles.ForsythThe preprocessor built into the Plan 9 compilers does not support
132*46439007SCharles.Forsyth.CW #if ,
133*46439007SCharles.Forsythalthough it does handle
134*46439007SCharles.Forsyth.CW #ifdef
135*46439007SCharles.Forsythand
136*46439007SCharles.Forsyth.CW #include .
137*46439007SCharles.ForsythIf it is necessary to be more standard,
138*46439007SCharles.Forsyththe source text can first be run through the separate ANSI C
139*46439007SCharles.Forsythpreprocessor,
140*46439007SCharles.Forsyth.CW cpp .
141*46439007SCharles.Forsyth.NH 2
142*46439007SCharles.ForsythUnnamed substructures
143*46439007SCharles.Forsyth.LP
144*46439007SCharles.ForsythThe most important and most heavily used of the
145*46439007SCharles.Forsythextensions is the declaration of an
146*46439007SCharles.Forsythunnamed substructure or subunion.
147*46439007SCharles.ForsythFor example:
148*46439007SCharles.Forsyth.DS
149*46439007SCharles.Forsyth.CW
150*46439007SCharles.Forsyth.ta .1i .6i 1.1i 1.6i
151*46439007SCharles.Forsyth	typedef
152*46439007SCharles.Forsyth	struct	lock
153*46439007SCharles.Forsyth	{
154*46439007SCharles.Forsyth		int    locked;
155*46439007SCharles.Forsyth	} Lock;
156*46439007SCharles.Forsyth
157*46439007SCharles.Forsyth	typedef
158*46439007SCharles.Forsyth	struct	node
159*46439007SCharles.Forsyth	{
160*46439007SCharles.Forsyth		int	type;
161*46439007SCharles.Forsyth		union
162*46439007SCharles.Forsyth		{
163*46439007SCharles.Forsyth			double dval;
164*46439007SCharles.Forsyth			float  fval;
165*46439007SCharles.Forsyth			long   lval;
166*46439007SCharles.Forsyth		};
167*46439007SCharles.Forsyth		Lock;
168*46439007SCharles.Forsyth	} Node;
169*46439007SCharles.Forsyth
170*46439007SCharles.Forsyth	Lock*	lock;
171*46439007SCharles.Forsyth	Node*	node;
172*46439007SCharles.Forsyth.R
173*46439007SCharles.Forsyth.DE
174*46439007SCharles.ForsythThe declaration of
175*46439007SCharles.Forsyth.CW Node
176*46439007SCharles.Forsythhas an unnamed substructure of type
177*46439007SCharles.Forsyth.CW Lock
178*46439007SCharles.Forsythand an unnamed subunion.
179*46439007SCharles.ForsythOne use of this feature allows references to elements of the
180*46439007SCharles.Forsythsubunit to be accessed as if they were in
181*46439007SCharles.Forsyththe outer structure.
182*46439007SCharles.ForsythThus
183*46439007SCharles.Forsyth.CW node->dval
184*46439007SCharles.Forsythand
185*46439007SCharles.Forsyth.CW node->locked
186*46439007SCharles.Forsythare legitimate references.
187*46439007SCharles.Forsyth.LP
188*46439007SCharles.ForsythWhen an outer structure is used
189*46439007SCharles.Forsythin a context that is only legal for
190*46439007SCharles.Forsythan unnamed substructure,
191*46439007SCharles.Forsyththe compiler promotes the reference to the
192*46439007SCharles.Forsythunnamed substructure.
193*46439007SCharles.ForsythThis is true for references to structures and
194*46439007SCharles.Forsythto references to pointers to structures.
195*46439007SCharles.ForsythThis happens in assignment statements and
196*46439007SCharles.Forsythin argument passing where prototypes have been
197*46439007SCharles.Forsythdeclared.
198*46439007SCharles.ForsythThus, continuing with the example,
199*46439007SCharles.Forsyth.DS
200*46439007SCharles.Forsyth.CW
201*46439007SCharles.Forsyth.ta .1i .6i 1.1i 1.6i
202*46439007SCharles.Forsyth	lock = node;
203*46439007SCharles.Forsyth.R
204*46439007SCharles.Forsyth.DE
205*46439007SCharles.Forsythwould assign a pointer to the unnamed
206*46439007SCharles.Forsyth.CW Lock
207*46439007SCharles.Forsythin
208*46439007SCharles.Forsyththe
209*46439007SCharles.Forsyth.CW Node
210*46439007SCharles.Forsythto the variable
211*46439007SCharles.Forsyth.CW lock .
212*46439007SCharles.ForsythAnother example,
213*46439007SCharles.Forsyth.DS
214*46439007SCharles.Forsyth.CW
215*46439007SCharles.Forsyth.ta .1i .6i 1.1i 1.6i
216*46439007SCharles.Forsyth	extern void lock(Lock*);
217*46439007SCharles.Forsyth	func(...)
218*46439007SCharles.Forsyth	{
219*46439007SCharles.Forsyth		...
220*46439007SCharles.Forsyth		lock(node);
221*46439007SCharles.Forsyth		...
222*46439007SCharles.Forsyth	}
223*46439007SCharles.Forsyth.R
224*46439007SCharles.Forsyth.DE
225*46439007SCharles.Forsythwill pass a pointer to the
226*46439007SCharles.Forsyth.CW Lock
227*46439007SCharles.Forsythsubstructure.
228*46439007SCharles.Forsyth.LP
229*46439007SCharles.ForsythFinally, in places where context is insufficient to identify the unnamed structure,
230*46439007SCharles.Forsyththe type name (it must be a
231*46439007SCharles.Forsyth.CW typedef )
232*46439007SCharles.Forsythof the unnamed structure can be used as an identifier.
233*46439007SCharles.ForsythIn our example,
234*46439007SCharles.Forsyth.CW &node->Lock
235*46439007SCharles.Forsythgives the address of the anonymous
236*46439007SCharles.Forsyth.CW Lock
237*46439007SCharles.Forsythstructure.
238*46439007SCharles.Forsyth.NH 2
239*46439007SCharles.ForsythStructure displays
240*46439007SCharles.Forsyth.LP
241*46439007SCharles.ForsythA structure cast followed by a list of expressions in braces is
242*46439007SCharles.Forsythan expression with the type of the structure and elements assigned from
243*46439007SCharles.Forsyththe corresponding list.
244*46439007SCharles.ForsythStructures are now almost first-class citizens of the language.
245*46439007SCharles.ForsythIt is common to see code like this:
246*46439007SCharles.Forsyth.DS
247*46439007SCharles.Forsyth.CW
248*46439007SCharles.Forsyth.ta .1i
249*46439007SCharles.Forsyth	r = (Rectangle){point1, (Point){x,y+2}};
250*46439007SCharles.Forsyth.R
251*46439007SCharles.Forsyth.DE
252*46439007SCharles.Forsyth.NH 2
253*46439007SCharles.ForsythInitialization indexes
254*46439007SCharles.Forsyth.LP
255*46439007SCharles.ForsythIn initializers of arrays,
256*46439007SCharles.Forsythone may place a constant expression
257*46439007SCharles.Forsythin square brackets before an initializer.
258*46439007SCharles.ForsythThis causes the next initializer to assign
259*46439007SCharles.Forsyththe indicated element.
260*46439007SCharles.ForsythFor example:
261*46439007SCharles.Forsyth.DS
262*46439007SCharles.Forsyth.CW
263*46439007SCharles.Forsyth.ta .1i .6i 1.6i
264*46439007SCharles.Forsyth	enum	errors
265*46439007SCharles.Forsyth	{
266*46439007SCharles.Forsyth		Etoobig,
267*46439007SCharles.Forsyth		Ealarm,
268*46439007SCharles.Forsyth		Egreg
269*46439007SCharles.Forsyth	};
270*46439007SCharles.Forsyth	char* errstrings[] =
271*46439007SCharles.Forsyth	{
272*46439007SCharles.Forsyth		[Ealarm]	"Alarm call",
273*46439007SCharles.Forsyth		[Egreg]	"Panic: out of mbufs",
274*46439007SCharles.Forsyth		[Etoobig]	"Arg list too long",
275*46439007SCharles.Forsyth	};
276*46439007SCharles.Forsyth.R
277*46439007SCharles.Forsyth.DE
278*46439007SCharles.ForsythIn the same way,
279*46439007SCharles.Forsythindividual structures members may
280*46439007SCharles.Forsythbe initialized in any order by preceding the initialization with
281*46439007SCharles.Forsyth.CW .tagname .
282*46439007SCharles.ForsythBoth forms allow an optional
283*46439007SCharles.Forsyth.CW = ,
284*46439007SCharles.Forsythto be compatible with a proposed
285*46439007SCharles.Forsythextension to ANSI C.
286*46439007SCharles.Forsyth.NH 2
287*46439007SCharles.ForsythExternal register
288*46439007SCharles.Forsyth.LP
289*46439007SCharles.ForsythThe declaration
290*46439007SCharles.Forsyth.CW extern
291*46439007SCharles.Forsyth.CW register
292*46439007SCharles.Forsythwill dedicate a register to
293*46439007SCharles.Forsytha variable on a global basis.
294*46439007SCharles.ForsythIt can be used only under special circumstances.
295*46439007SCharles.ForsythExternal register variables must be identically
296*46439007SCharles.Forsythdeclared in all modules and
297*46439007SCharles.Forsythlibraries.
298*46439007SCharles.ForsythThe feature is not intended for efficiency,
299*46439007SCharles.Forsythalthough it can produce efficient code;
300*46439007SCharles.Forsythrather it represents a unique storage class that
301*46439007SCharles.Forsythwould be hard to get any other way.
302*46439007SCharles.ForsythOn a shared-memory multi-processor,
303*46439007SCharles.Forsythan external register is
304*46439007SCharles.Forsythone-per-processor and neither one-per-procedure (automatic)
305*46439007SCharles.Forsythor one-per-system (external).
306*46439007SCharles.ForsythIt is used for two variables in the Plan 9 kernel,
307*46439007SCharles.Forsyth.CW u
308*46439007SCharles.Forsythand
309*46439007SCharles.Forsyth.CW m .
310*46439007SCharles.Forsyth.CW U
311*46439007SCharles.Forsythis a pointer to the structure representing the currently running process
312*46439007SCharles.Forsythand
313*46439007SCharles.Forsyth.CW m
314*46439007SCharles.Forsythis a pointer to the per-machine data structure.
315*46439007SCharles.Forsyth.NH 2
316*46439007SCharles.ForsythLong long
317*46439007SCharles.Forsyth.LP
318*46439007SCharles.ForsythThe compilers accept
319*46439007SCharles.Forsyth.CW long
320*46439007SCharles.Forsyth.CW long
321*46439007SCharles.Forsythas a basic type meaning 64-bit integer.
322*46439007SCharles.ForsythOn all of the machines
323*46439007SCharles.Forsyththis type is synthesized from 32-bit instructions.
324*46439007SCharles.Forsyth.NH 2
325*46439007SCharles.ForsythPragma
326*46439007SCharles.Forsyth.LP
327*46439007SCharles.ForsythThe compilers accept
328*46439007SCharles.Forsyth.CW #pragma
329*46439007SCharles.Forsyth.CW lib
330*46439007SCharles.Forsyth.I libname
331*46439007SCharles.Forsythand pass the
332*46439007SCharles.Forsythlibrary name string uninterpreted
333*46439007SCharles.Forsythto the loader.
334*46439007SCharles.ForsythThe loader uses the library name to
335*46439007SCharles.Forsythfind libraries to load.
336*46439007SCharles.ForsythIf the name contains
337*46439007SCharles.Forsyth.CW $O ,
338*46439007SCharles.Forsythit is replaced with
339*46439007SCharles.Forsyththe single character object type of the compiler
340*46439007SCharles.Forsyth(e.g.,
341*46439007SCharles.Forsyth.CW v
342*46439007SCharles.Forsythfor the MIPS).
343*46439007SCharles.ForsythIf the name contains
344*46439007SCharles.Forsyth.CW $M ,
345*46439007SCharles.Forsythit is replaced with
346*46439007SCharles.Forsyththe architecture type for the compiler
347*46439007SCharles.Forsyth(e.g.,
348*46439007SCharles.Forsyth.CW mips
349*46439007SCharles.Forsythfor the MIPS).
350*46439007SCharles.ForsythIf the name starts with
351*46439007SCharles.Forsyth.CW /
352*46439007SCharles.Forsythit is an absolute pathname;
353*46439007SCharles.Forsythif it starts with
354*46439007SCharles.Forsyth.CW .
355*46439007SCharles.Forsyththen it is searched for in the loader's current directory.
356*46439007SCharles.ForsythOtherwise, the name is searched from
357*46439007SCharles.Forsyth.CW /$M/lib .
358*46439007SCharles.ForsythSuch
359*46439007SCharles.Forsyth.CW #pragma
360*46439007SCharles.Forsythstatements in header files guarantee that the correct
361*46439007SCharles.Forsythlibraries are always linked with a program without the
362*46439007SCharles.Forsythneed to specify them explicitly at link time.
363*46439007SCharles.Forsyth.LP
364*46439007SCharles.ForsythThey also accept
365*46439007SCharles.Forsyth.CW #pragma
366*46439007SCharles.Forsyth.CW packed
367*46439007SCharles.Forsyth.CW on
368*46439007SCharles.Forsyth(or
369*46439007SCharles.Forsyth.CW yes
370*46439007SCharles.Forsythor
371*46439007SCharles.Forsyth.CW 1 )
372*46439007SCharles.Forsythto cause subsequently declared data, until
373*46439007SCharles.Forsyth.CW #pragma
374*46439007SCharles.Forsyth.CW packed
375*46439007SCharles.Forsyth.CW off
376*46439007SCharles.Forsyth(or
377*46439007SCharles.Forsyth.CW no
378*46439007SCharles.Forsythor
379*46439007SCharles.Forsyth.CW 0 ),
380*46439007SCharles.Forsythto be laid out in memory tightly packed in successive bytes, disregarding
381*46439007SCharles.Forsyththe usual alignment rules.
382*46439007SCharles.ForsythAccessing such data can cause faults.
383*46439007SCharles.Forsyth.LP
384*46439007SCharles.ForsythSimilarly,
385*46439007SCharles.Forsyth.CW #pragma
386*46439007SCharles.Forsyth.CW profile
387*46439007SCharles.Forsyth.CW off
388*46439007SCharles.Forsyth(or
389*46439007SCharles.Forsyth.CW no
390*46439007SCharles.Forsythor
391*46439007SCharles.Forsyth.CW 0 )
392*46439007SCharles.Forsythcauses subsequently declared functions, until
393*46439007SCharles.Forsyth.CW #pragma
394*46439007SCharles.Forsyth.CW profile
395*46439007SCharles.Forsyth.CW on
396*46439007SCharles.Forsyth(or
397*46439007SCharles.Forsyth.CW yes
398*46439007SCharles.Forsythor
399*46439007SCharles.Forsyth.CW 1 ),
400*46439007SCharles.Forsythto be marked as unprofiled.
401*46439007SCharles.ForsythSuch functions will not be profiled when
402*46439007SCharles.Forsythprofiling is enabled for the rest of the program.
403*46439007SCharles.Forsyth.LP
404*46439007SCharles.ForsythTwo
405*46439007SCharles.Forsyth.CW #pragma
406*46439007SCharles.Forsythstatements allow type-checking of
407*46439007SCharles.Forsyth.CW print -like
408*46439007SCharles.Forsythfunctions.
409*46439007SCharles.ForsythThe first, of the form
410*46439007SCharles.Forsyth.P1
411*46439007SCharles.Forsyth#pragma varargck argpos error 2
412*46439007SCharles.Forsyth.P2
413*46439007SCharles.Forsythtells the compiler that the second argument to
414*46439007SCharles.Forsyth.CW error
415*46439007SCharles.Forsythis a
416*46439007SCharles.Forsyth.CW print
417*46439007SCharles.Forsythformat string (see the manual page
418*46439007SCharles.Forsyth.I print (2))
419*46439007SCharles.Forsyththat specifies how to format
420*46439007SCharles.Forsyth.CW error 's
421*46439007SCharles.Forsythsubsequent arguments.
422*46439007SCharles.ForsythThe second, of the form
423*46439007SCharles.Forsyth.P1
424*46439007SCharles.Forsyth#pragma varargck type "s" char*
425*46439007SCharles.Forsyth.P2
426*46439007SCharles.Forsythsays that the
427*46439007SCharles.Forsyth.CW print
428*46439007SCharles.Forsythformat verb
429*46439007SCharles.Forsyth.CW s
430*46439007SCharles.Forsythprocesses an argument of
431*46439007SCharles.Forsythtype
432*46439007SCharles.Forsyth.CW char* .
433*46439007SCharles.ForsythIf the compiler's
434*46439007SCharles.Forsyth.CW -F
435*46439007SCharles.Forsythoption is enabled, the compiler will use this information
436*46439007SCharles.Forsythto report type violations in the arguments to
437*46439007SCharles.Forsyth.CW print ,
438*46439007SCharles.Forsyth.CW error ,
439*46439007SCharles.Forsythand similar routines.
440*46439007SCharles.Forsyth.NH
441*46439007SCharles.ForsythObject module conventions
442*46439007SCharles.Forsyth.LP
443*46439007SCharles.ForsythThe overall conventions of the runtime environment
444*46439007SCharles.Forsythare important
445*46439007SCharles.Forsythto runtime efficiency.
446*46439007SCharles.ForsythIn this section,
447*46439007SCharles.Forsythseveral of these conventions are discussed.
448*46439007SCharles.Forsyth.NH 2
449*46439007SCharles.ForsythRegister saving
450*46439007SCharles.Forsyth.LP
451*46439007SCharles.ForsythIn the Plan 9 compilers,
452*46439007SCharles.Forsyththe caller of a procedure saves the registers.
453*46439007SCharles.ForsythWith caller-saves,
454*46439007SCharles.Forsyththe leaf procedures can use all the
455*46439007SCharles.Forsythregisters and never save them.
456*46439007SCharles.ForsythIf you spend a lot of time at the leaves,
457*46439007SCharles.Forsyththis seems preferable.
458*46439007SCharles.ForsythWith callee-saves,
459*46439007SCharles.Forsyththe saving of the registers is done
460*46439007SCharles.Forsythin the single point of entry and return.
461*46439007SCharles.ForsythIf you are interested in space,
462*46439007SCharles.Forsyththis seems preferable.
463*46439007SCharles.ForsythIn both,
464*46439007SCharles.Forsyththere is a degree of uncertainty
465*46439007SCharles.Forsythabout what registers need to be saved.
466*46439007SCharles.ForsythCallee-saved registers make it difficult to
467*46439007SCharles.Forsythfind variables in registers in debuggers.
468*46439007SCharles.ForsythCallee-saved registers also complicate
469*46439007SCharles.Forsyththe implementation of
470*46439007SCharles.Forsyth.CW longjmp .
471*46439007SCharles.ForsythThe convincing argument is
472*46439007SCharles.Forsyththat with caller-saves,
473*46439007SCharles.Forsyththe decision to registerize a variable
474*46439007SCharles.Forsythcan include the cost of saving the register
475*46439007SCharles.Forsythacross calls.
476*46439007SCharles.ForsythFor a further discussion of caller- vs. callee-saves,
477*46439007SCharles.Forsythsee the paper by Davidson and Whalley [Dav91].
478*46439007SCharles.Forsyth.LP
479*46439007SCharles.ForsythIn the Plan 9 operating system,
480*46439007SCharles.Forsythcalls to the kernel look like normal procedure
481*46439007SCharles.Forsythcalls, which means
482*46439007SCharles.Forsyththe caller
483*46439007SCharles.Forsythhas saved the registers and the system
484*46439007SCharles.Forsythentry does not have to.
485*46439007SCharles.ForsythThis makes system calls considerably faster.
486*46439007SCharles.ForsythSince this is a potential security hole,
487*46439007SCharles.Forsythand can lead to non-determinism,
488*46439007SCharles.Forsyththe system may eventually save the registers
489*46439007SCharles.Forsython entry,
490*46439007SCharles.Forsythor more likely clear the registers on return.
491*46439007SCharles.Forsyth.NH 2
492*46439007SCharles.ForsythCalling convention
493*46439007SCharles.Forsyth.LP
494*46439007SCharles.ForsythOlder C compilers maintain a frame pointer, which is at a known constant
495*46439007SCharles.Forsythoffset from the stack pointer within each function.
496*46439007SCharles.ForsythFor machines where the stack grows towards zero,
497*46439007SCharles.Forsyththe argument pointer is at a known constant offset
498*46439007SCharles.Forsythfrom the frame pointer.
499*46439007SCharles.ForsythSince the stack grows down in Plan 9,
500*46439007SCharles.Forsyththe Plan 9 compilers
501*46439007SCharles.Forsythkeep neither an
502*46439007SCharles.Forsythexplicit frame pointer nor
503*46439007SCharles.Forsythan explicit argument pointer;
504*46439007SCharles.Forsythinstead they generate addresses relative to the stack pointer.
505*46439007SCharles.Forsyth.LP
506*46439007SCharles.ForsythOn some architectures, the first argument to a subroutine is passed in a register.
507*46439007SCharles.Forsyth.NH 2
508*46439007SCharles.ForsythFunctions returning structures
509*46439007SCharles.Forsyth.LP
510*46439007SCharles.ForsythStructures longer than one word are awkward to implement
511*46439007SCharles.Forsythsince they do not fit in registers and must
512*46439007SCharles.Forsythbe passed around in memory.
513*46439007SCharles.ForsythFunctions that return structures
514*46439007SCharles.Forsythare particularly clumsy.
515*46439007SCharles.ForsythThe Plan 9 compilers pass the return address of
516*46439007SCharles.Forsytha structure as the first argument of a
517*46439007SCharles.Forsythfunction that has a structure return value.
518*46439007SCharles.ForsythThus
519*46439007SCharles.Forsyth.DS
520*46439007SCharles.Forsyth.CW
521*46439007SCharles.Forsyth.ta .1i .6i 1.1i 1.6i
522*46439007SCharles.Forsyth	x = f(...)
523*46439007SCharles.Forsyth.R
524*46439007SCharles.Forsyth.DE
525*46439007SCharles.Forsythis rewritten as
526*46439007SCharles.Forsyth.DS
527*46439007SCharles.Forsyth.CW
528*46439007SCharles.Forsyth.ta .1i .6i 1.1i 1.6i
529*46439007SCharles.Forsyth	f(&x, ...)\f1.
530*46439007SCharles.Forsyth.R
531*46439007SCharles.Forsyth.DE
532*46439007SCharles.ForsythThis saves a copy and makes the compilation
533*46439007SCharles.Forsythmuch less clumsy.
534*46439007SCharles.ForsythA disadvantage is that if you call this
535*46439007SCharles.Forsythfunction without an assignment,
536*46439007SCharles.Forsytha dummy location must be invented.
537*46439007SCharles.Forsyth.LP
538*46439007SCharles.ForsythThere is also a danger of calling a function
539*46439007SCharles.Forsyththat returns a structure without declaring
540*46439007SCharles.Forsythit as such.
541*46439007SCharles.ForsythWith ANSI C function prototypes,
542*46439007SCharles.Forsyththis error need never occur.
543*46439007SCharles.Forsyth.NH
544*46439007SCharles.ForsythImplementation
545*46439007SCharles.Forsyth.LP
546*46439007SCharles.ForsythThe compiler is divided internally into
547*46439007SCharles.Forsythfour machine-independent passes,
548*46439007SCharles.Forsythfour machine-dependent passes,
549*46439007SCharles.Forsythand an output pass.
550*46439007SCharles.ForsythThe next nine sections describe each pass in order.
551*46439007SCharles.Forsyth.NH 2
552*46439007SCharles.ForsythParsing
553*46439007SCharles.Forsyth.LP
554*46439007SCharles.ForsythThe first pass is a YACC-based parser
555*46439007SCharles.Forsyth[Joh79].
556*46439007SCharles.ForsythDeclarations are interpreted immediately,
557*46439007SCharles.Forsythbuilding a block structured symbol table.
558*46439007SCharles.ForsythExecutable statements are put into a parse tree
559*46439007SCharles.Forsythand collected,
560*46439007SCharles.Forsythwithout interpretation.
561*46439007SCharles.ForsythAt the end of each procedure,
562*46439007SCharles.Forsyththe parse tree for the function is
563*46439007SCharles.Forsythexamined by the other passes of the compiler.
564*46439007SCharles.Forsyth.LP
565*46439007SCharles.ForsythThe input stream of the parser is
566*46439007SCharles.Forsytha pushdown list of input activations.
567*46439007SCharles.ForsythThe preprocessor
568*46439007SCharles.Forsythexpansions of
569*46439007SCharles.Forsythmacros
570*46439007SCharles.Forsythand
571*46439007SCharles.Forsyth.CW #include
572*46439007SCharles.Forsythare implemented as pushdowns.
573*46439007SCharles.ForsythThus there is no separate
574*46439007SCharles.Forsythpass for preprocessing.
575*46439007SCharles.Forsyth.NH 2
576*46439007SCharles.ForsythTyping
577*46439007SCharles.Forsyth.LP
578*46439007SCharles.ForsythThe next pass distributes typing information
579*46439007SCharles.Forsythto every node of the tree.
580*46439007SCharles.ForsythImplicit operations on the tree are added,
581*46439007SCharles.Forsythsuch as type promotions and taking the
582*46439007SCharles.Forsythaddress of arrays and functions.
583*46439007SCharles.Forsyth.NH 2
584*46439007SCharles.ForsythMachine-independent optimization
585*46439007SCharles.Forsyth.LP
586*46439007SCharles.ForsythThe next pass performs optimizations
587*46439007SCharles.Forsythand transformations of the tree, such as converting
588*46439007SCharles.Forsyth.CW &*x
589*46439007SCharles.Forsythand
590*46439007SCharles.Forsyth.CW *&x
591*46439007SCharles.Forsythinto
592*46439007SCharles.Forsyth.CW x .
593*46439007SCharles.ForsythConstant expressions are converted to constants in this pass.
594*46439007SCharles.Forsyth.NH 2
595*46439007SCharles.ForsythArithmetic rewrites
596*46439007SCharles.Forsyth.LP
597*46439007SCharles.ForsythThis is another machine-independent optimization.
598*46439007SCharles.ForsythSubtrees of add, subtract, and multiply of integers are
599*46439007SCharles.Forsythrewritten for easier compilation.
600*46439007SCharles.ForsythThe major transformation is factoring:
601*46439007SCharles.Forsyth.CW 4+8*a+16*b+5
602*46439007SCharles.Forsythis transformed into
603*46439007SCharles.Forsyth.CW 9+8*(a+2*b) .
604*46439007SCharles.ForsythSuch expressions arise from address
605*46439007SCharles.Forsythmanipulation and array indexing.
606*46439007SCharles.Forsyth.NH 2
607*46439007SCharles.ForsythAddressability
608*46439007SCharles.Forsyth.LP
609*46439007SCharles.ForsythThis is the first of the machine-dependent passes.
610*46439007SCharles.ForsythThe addressability of a processor is defined as the set of
611*46439007SCharles.Forsythexpressions that is legal in the address field
612*46439007SCharles.Forsythof a machine language instruction.
613*46439007SCharles.ForsythThe addressability of different processors varies widely.
614*46439007SCharles.ForsythAt one end of the spectrum are the 68020 and VAX,
615*46439007SCharles.Forsythwhich allow a complex mix of incrementing,
616*46439007SCharles.Forsythdecrementing,
617*46439007SCharles.Forsythindexing, and relative addressing.
618*46439007SCharles.ForsythAt the other end is the MIPS,
619*46439007SCharles.Forsythwhich allows only registers and constant offsets from the
620*46439007SCharles.Forsythcontents of a register.
621*46439007SCharles.ForsythThe addressability can be different for different instructions
622*46439007SCharles.Forsythwithin the same processor.
623*46439007SCharles.Forsyth.LP
624*46439007SCharles.ForsythIt is important to the code generator to know when a
625*46439007SCharles.Forsythsubtree represents an address of a particular type.
626*46439007SCharles.ForsythThis is done with a bottom-up walk of the tree.
627*46439007SCharles.ForsythIn this pass, the leaves are labeled with small integers.
628*46439007SCharles.ForsythWhen an internal node is encountered,
629*46439007SCharles.Forsythit is labeled by consulting a table indexed by the
630*46439007SCharles.Forsythlabels on the left and right subtrees.
631*46439007SCharles.ForsythFor example,
632*46439007SCharles.Forsython the 68020 processor,
633*46439007SCharles.Forsythit is possible to address an
634*46439007SCharles.Forsythoffset from a named location.
635*46439007SCharles.ForsythIn C, this is represented by the expression
636*46439007SCharles.Forsyth.CW *(&name+constant) .
637*46439007SCharles.ForsythThis is marked addressable by the following table.
638*46439007SCharles.ForsythIn the table,
639*46439007SCharles.Forsytha node represented by the left column is marked
640*46439007SCharles.Forsythwith a small integer from the right column.
641*46439007SCharles.ForsythMarks of the form
642*46439007SCharles.Forsyth.CW A\s-2\di\u\s0
643*46439007SCharles.Forsythare addressable while
644*46439007SCharles.Forsythmarks of the form
645*46439007SCharles.Forsyth.CW N\s-2\di\u\s0
646*46439007SCharles.Forsythare not addressable.
647*46439007SCharles.Forsyth.DS
648*46439007SCharles.Forsyth.B
649*46439007SCharles.Forsyth.ta .1i 1.1i
650*46439007SCharles.Forsyth	Node	Marked
651*46439007SCharles.Forsyth.CW
652*46439007SCharles.Forsyth	name	A\s-2\d1\u\s0
653*46439007SCharles.Forsyth	const	A\s-2\d2\u\s0
654*46439007SCharles.Forsyth	&A\s-2\d1\u\s0	A\s-2\d3\u\s0
655*46439007SCharles.Forsyth	A\s-2\d3\u\s0+A\s-2\d1\u\s0	N\s-2\d1\u\s0 \fR(note that this is not addressable)\fP
656*46439007SCharles.Forsyth	*N\s-2\d1\u\s0	A\s-2\d4\u\s0
657*46439007SCharles.Forsyth.R
658*46439007SCharles.Forsyth.DE
659*46439007SCharles.ForsythHere there is a distinction between
660*46439007SCharles.Forsytha node marked
661*46439007SCharles.Forsyth.CW A\s-2\d1\u\s0
662*46439007SCharles.Forsythand a node marked
663*46439007SCharles.Forsyth.CW A\s-2\d4\u\s0
664*46439007SCharles.Forsythbecause the address operator of an
665*46439007SCharles.Forsyth.CW A\s-2\d4\u\s0
666*46439007SCharles.Forsythnode is not addressable.
667*46439007SCharles.ForsythSo to extend the table:
668*46439007SCharles.Forsyth.DS
669*46439007SCharles.Forsyth.B
670*46439007SCharles.Forsyth.ta .1i 1.1i
671*46439007SCharles.Forsyth	Node	Marked
672*46439007SCharles.Forsyth.CW
673*46439007SCharles.Forsyth	&A\s-2\d4\u\s0	N\s-2\d2\u\s0
674*46439007SCharles.Forsyth	N\s-2\d2\u\s0+N\s-2\d1\u\s0	N\s-2\d1\u\s0
675*46439007SCharles.Forsyth.R
676*46439007SCharles.Forsyth.DE
677*46439007SCharles.ForsythThe full addressability of the 68020 is expressed
678*46439007SCharles.Forsythin 18 rules like this,
679*46439007SCharles.Forsythwhile the addressability of the MIPS is expressed
680*46439007SCharles.Forsythin 11 rules.
681*46439007SCharles.ForsythWhen one ports the compiler,
682*46439007SCharles.Forsyththis table is usually initialized
683*46439007SCharles.Forsythso that leaves are labeled as addressable and nothing else.
684*46439007SCharles.ForsythThe code produced is poor,
685*46439007SCharles.Forsythbut porting is easy.
686*46439007SCharles.ForsythThe table can be extended later.
687*46439007SCharles.Forsyth.LP
688*46439007SCharles.ForsythThis pass also rewrites some complex operators
689*46439007SCharles.Forsythinto procedure calls.
690*46439007SCharles.ForsythExamples include 64-bit multiply and divide.
691*46439007SCharles.Forsyth.LP
692*46439007SCharles.ForsythIn the same bottom-up pass of the tree,
693*46439007SCharles.Forsyththe nodes are labeled with a Sethi-Ullman complexity
694*46439007SCharles.Forsyth[Set70].
695*46439007SCharles.ForsythThis number is roughly the number of registers required
696*46439007SCharles.Forsythto compile the tree on an ideal machine.
697*46439007SCharles.ForsythAn addressable node is marked 0.
698*46439007SCharles.ForsythA function call is marked infinite.
699*46439007SCharles.ForsythA unary operator is marked as the
700*46439007SCharles.Forsythmaximum of 1 and the mark of its subtree.
701*46439007SCharles.ForsythA binary operator with equal marks on its subtrees is
702*46439007SCharles.Forsythmarked with a subtree mark plus 1.
703*46439007SCharles.ForsythA binary operator with unequal marks on its subtrees is
704*46439007SCharles.Forsythmarked with the maximum mark of its subtrees.
705*46439007SCharles.ForsythThe actual values of the marks are not too important,
706*46439007SCharles.Forsythbut the relative values are.
707*46439007SCharles.ForsythThe goal is to compile the harder
708*46439007SCharles.Forsyth(larger mark)
709*46439007SCharles.Forsythsubtree first.
710*46439007SCharles.Forsyth.NH 2
711*46439007SCharles.ForsythCode generation
712*46439007SCharles.Forsyth.LP
713*46439007SCharles.ForsythCode is generated by recursive
714*46439007SCharles.Forsythdescent.
715*46439007SCharles.ForsythThe Sethi-Ullman complexity completely guides the
716*46439007SCharles.Forsythorder.
717*46439007SCharles.ForsythThe addressability defines the leaves.
718*46439007SCharles.ForsythThe only difficult part is compiling a tree
719*46439007SCharles.Forsyththat has two infinite (function call)
720*46439007SCharles.Forsythsubtrees.
721*46439007SCharles.ForsythIn this case,
722*46439007SCharles.Forsythone subtree is compiled into the return register
723*46439007SCharles.Forsyth(usually the most convenient place for a function call)
724*46439007SCharles.Forsythand then stored on the stack.
725*46439007SCharles.ForsythThe other subtree is compiled into the return register
726*46439007SCharles.Forsythand then the operation is compiled with
727*46439007SCharles.Forsythoperands from the stack and the return register.
728*46439007SCharles.Forsyth.LP
729*46439007SCharles.ForsythThere is a separate boolean code generator that compiles
730*46439007SCharles.Forsythconditional expressions.
731*46439007SCharles.ForsythThis is fundamentally different from compiling an arithmetic expression.
732*46439007SCharles.ForsythThe result of the boolean code generator is the
733*46439007SCharles.Forsythposition of the program counter and not an expression.
734*46439007SCharles.ForsythThe boolean code generator makes extensive use of De Morgan's rule.
735*46439007SCharles.ForsythThe boolean code generator is an expanded version of that described
736*46439007SCharles.Forsythin chapter 8 of Aho, Sethi, and Ullman
737*46439007SCharles.Forsyth[Aho87].
738*46439007SCharles.Forsyth.LP
739*46439007SCharles.ForsythThere is a considerable amount of talk in the literature
740*46439007SCharles.Forsythabout automating this part of a compiler with a machine
741*46439007SCharles.Forsythdescription.
742*46439007SCharles.ForsythSince this code generator is so small
743*46439007SCharles.Forsyth(less than 500 lines of C)
744*46439007SCharles.Forsythand easy,
745*46439007SCharles.Forsythit hardly seems worth the effort.
746*46439007SCharles.Forsyth.NH 2
747*46439007SCharles.ForsythRegisterization
748*46439007SCharles.Forsyth.LP
749*46439007SCharles.ForsythUp to now,
750*46439007SCharles.Forsyththe compiler has operated on syntax trees
751*46439007SCharles.Forsyththat are roughly equivalent to the original source language.
752*46439007SCharles.ForsythThe previous pass has produced machine language in an internal
753*46439007SCharles.Forsythformat.
754*46439007SCharles.ForsythThe next two passes operate on the internal machine language
755*46439007SCharles.Forsythstructures.
756*46439007SCharles.ForsythThe purpose of the next pass is to reintroduce
757*46439007SCharles.Forsythregisters for heavily used variables.
758*46439007SCharles.Forsyth.LP
759*46439007SCharles.ForsythAll of the variables that can be
760*46439007SCharles.Forsythpotentially registerized within a procedure are
761*46439007SCharles.Forsythplaced in a table.
762*46439007SCharles.Forsyth(Suitable variables are any automatic or external
763*46439007SCharles.Forsythscalars that do not have their addresses extracted.
764*46439007SCharles.ForsythSome constants that are hard to reference are also
765*46439007SCharles.Forsythconsidered for registerization.)
766*46439007SCharles.ForsythFour separate data flow equations are evaluated
767*46439007SCharles.Forsythover the procedure on all of these variables.
768*46439007SCharles.ForsythTwo of the equations are the normal set-behind
769*46439007SCharles.Forsythand used-ahead
770*46439007SCharles.Forsythbits that define the life of a variable.
771*46439007SCharles.ForsythThe two new bits tell if a variable life
772*46439007SCharles.Forsythcrosses a function call ahead or behind.
773*46439007SCharles.ForsythBy examining a variable over its lifetime,
774*46439007SCharles.Forsythit is possible to get a cost
775*46439007SCharles.Forsythfor registerizing.
776*46439007SCharles.ForsythLoops are detected and the costs are multiplied
777*46439007SCharles.Forsythby three for every level of loop nesting.
778*46439007SCharles.ForsythCosts are sorted and the variables
779*46439007SCharles.Forsythare replaced by available registers on a greedy basis.
780*46439007SCharles.Forsyth.LP
781*46439007SCharles.ForsythThe 68020 has two different
782*46439007SCharles.Forsythtypes of registers.
783*46439007SCharles.ForsythFor the 68020,
784*46439007SCharles.Forsythtwo different costs are calculated for
785*46439007SCharles.Forsytheach variable life and the register type that
786*46439007SCharles.Forsythaffords the better cost is used.
787*46439007SCharles.ForsythTies are broken by counting the number of available
788*46439007SCharles.Forsythregisters of each type.
789*46439007SCharles.Forsyth.LP
790*46439007SCharles.ForsythNote that externals are registerized together with automatics.
791*46439007SCharles.ForsythThis is done by evaluating the semantics of a ``call'' instruction
792*46439007SCharles.Forsythdifferently for externals and automatics.
793*46439007SCharles.ForsythSince a call goes outside the local procedure,
794*46439007SCharles.Forsythit is assumed that a call references all externals.
795*46439007SCharles.ForsythSimilarly,
796*46439007SCharles.Forsythexternals are assumed to be set before an ``entry'' instruction
797*46439007SCharles.Forsythand assumed to be referenced after a ``return'' instruction.
798*46439007SCharles.ForsythThis makes sure that externals are in memory across calls.
799*46439007SCharles.Forsyth.LP
800*46439007SCharles.ForsythThe overall results are satisfactory.
801*46439007SCharles.ForsythIt would be nice to be able to do this processing in
802*46439007SCharles.Forsytha machine-independent way,
803*46439007SCharles.Forsythbut it is impossible to get all of the costs and
804*46439007SCharles.Forsythside effects of different choices by examining the parse tree.
805*46439007SCharles.Forsyth.LP
806*46439007SCharles.ForsythMost of the code in the registerization pass is machine-independent.
807*46439007SCharles.ForsythThe major machine-dependency is in
808*46439007SCharles.Forsythexamining a machine instruction to ask if it sets or references
809*46439007SCharles.Forsytha variable.
810*46439007SCharles.Forsyth.NH 2
811*46439007SCharles.ForsythMachine code optimization
812*46439007SCharles.Forsyth.LP
813*46439007SCharles.ForsythThe next pass walks the machine code
814*46439007SCharles.Forsythfor opportunistic optimizations.
815*46439007SCharles.ForsythFor the most part,
816*46439007SCharles.Forsyththis is highly specific to a particular
817*46439007SCharles.Forsythprocessor.
818*46439007SCharles.ForsythOne optimization that is performed
819*46439007SCharles.Forsython all of the processors is the
820*46439007SCharles.Forsythremoval of unnecessary ``move''
821*46439007SCharles.Forsythinstructions.
822*46439007SCharles.ForsythIronically,
823*46439007SCharles.Forsythmost of these instructions were inserted by
824*46439007SCharles.Forsyththe previous pass.
825*46439007SCharles.ForsythThere are two patterns that are repetitively
826*46439007SCharles.Forsythmatched and replaced until no more matches are
827*46439007SCharles.Forsythfound.
828*46439007SCharles.ForsythThe first tries to remove ``move'' instructions
829*46439007SCharles.Forsythby relabeling variables.
830*46439007SCharles.Forsyth.LP
831*46439007SCharles.ForsythWhen a ``move'' instruction is encountered,
832*46439007SCharles.Forsythif the destination variable is set before the
833*46439007SCharles.Forsythsource variable is referenced,
834*46439007SCharles.Forsyththen all of the references to the destination
835*46439007SCharles.Forsythvariable can be renamed to the source and the ``move''
836*46439007SCharles.Forsythcan be deleted.
837*46439007SCharles.ForsythThis transformation uses the reverse data flow
838*46439007SCharles.Forsythset up in the previous pass.
839*46439007SCharles.Forsyth.LP
840*46439007SCharles.ForsythAn example of this pattern is depicted in the following
841*46439007SCharles.Forsythtable.
842*46439007SCharles.ForsythThe pattern is in the left column and the
843*46439007SCharles.Forsythreplacement action is in the right column.
844*46439007SCharles.Forsyth.DS
845*46439007SCharles.Forsyth.CW
846*46439007SCharles.Forsyth.ta .1i .6i 1.6i 2.1i 2.6i
847*46439007SCharles.Forsyth	MOVE	a->b		\fR(remove)\fP
848*46439007SCharles.Forsyth.R
849*46439007SCharles.Forsyth	(sequence with no mention of \f(CWa\fP)
850*46439007SCharles.Forsyth.CW
851*46439007SCharles.Forsyth	USE	b		USE	a
852*46439007SCharles.Forsyth.R
853*46439007SCharles.Forsyth	(sequence with no mention of \f(CWa\fP)
854*46439007SCharles.Forsyth.CW
855*46439007SCharles.Forsyth	SET	b		SET	b
856*46439007SCharles.Forsyth.R
857*46439007SCharles.Forsyth.DE
858*46439007SCharles.Forsyth.LP
859*46439007SCharles.ForsythExperiments have shown that it is marginally
860*46439007SCharles.Forsythworthwhile to rename uses of the destination variable
861*46439007SCharles.Forsythwith uses of the source variable up to
862*46439007SCharles.Forsyththe first use of the source variable.
863*46439007SCharles.Forsyth.LP
864*46439007SCharles.ForsythThe second transform will do relabeling
865*46439007SCharles.Forsythwithout deleting instructions.
866*46439007SCharles.ForsythWhen a ``move'' instruction is encountered,
867*46439007SCharles.Forsythif the source variable has been set prior
868*46439007SCharles.Forsythto the use of the destination variable
869*46439007SCharles.Forsyththen all of the references to the source
870*46439007SCharles.Forsythvariable are replaced by the destination and
871*46439007SCharles.Forsyththe ``move'' is inverted.
872*46439007SCharles.ForsythTypically,
873*46439007SCharles.Forsyththis transformation will alter two ``move''
874*46439007SCharles.Forsythinstructions and allow the first transformation
875*46439007SCharles.Forsythanother chance to remove code.
876*46439007SCharles.ForsythThis transformation uses the forward data flow
877*46439007SCharles.Forsythset up in the previous pass.
878*46439007SCharles.Forsyth.LP
879*46439007SCharles.ForsythAgain,
880*46439007SCharles.Forsyththe following is a depiction of the transformation where
881*46439007SCharles.Forsyththe pattern is in the left column and the
882*46439007SCharles.Forsythrewrite is in the right column.
883*46439007SCharles.Forsyth.DS
884*46439007SCharles.Forsyth.CW
885*46439007SCharles.Forsyth.ta .1i .6i 1.6i 2.1i 2.6i
886*46439007SCharles.Forsyth	SET	a		SET	b
887*46439007SCharles.Forsyth.R
888*46439007SCharles.Forsyth	(sequence with no use of \f(CWb\fP)
889*46439007SCharles.Forsyth.CW
890*46439007SCharles.Forsyth	USE	a		USE	b
891*46439007SCharles.Forsyth.R
892*46439007SCharles.Forsyth	(sequence with no use of \f(CWb\fP)
893*46439007SCharles.Forsyth.CW
894*46439007SCharles.Forsyth	MOVE	a->b		MOVE	b->a
895*46439007SCharles.Forsyth.R
896*46439007SCharles.Forsyth.DE
897*46439007SCharles.ForsythIterating these transformations
898*46439007SCharles.Forsythwill usually get rid of all redundant ``move'' instructions.
899*46439007SCharles.Forsyth.LP
900*46439007SCharles.ForsythA problem with this organization is that the costs
901*46439007SCharles.Forsythof registerization calculated in the previous pass
902*46439007SCharles.Forsythmust depend on how well this pass can detect and remove
903*46439007SCharles.Forsythredundant instructions.
904*46439007SCharles.ForsythOften,
905*46439007SCharles.Forsytha fine candidate for registerization is rejected
906*46439007SCharles.Forsythbecause of the cost of instructions that are later
907*46439007SCharles.Forsythremoved.
908*46439007SCharles.Forsyth.NH 2
909*46439007SCharles.ForsythWriting the object file
910*46439007SCharles.Forsyth.LP
911*46439007SCharles.ForsythThe last pass walks the internal assembly language
912*46439007SCharles.Forsythand writes the object file.
913*46439007SCharles.ForsythThe object file is reduced in size by about a factor
914*46439007SCharles.Forsythof three with simple compression
915*46439007SCharles.Forsythtechniques.
916*46439007SCharles.ForsythThe most important aspect of the object file
917*46439007SCharles.Forsythformat is that it is independent of the compiling machine.
918*46439007SCharles.ForsythAll integer and floating numbers in the object
919*46439007SCharles.Forsythcode are converted to known formats and byte
920*46439007SCharles.Forsythorders.
921*46439007SCharles.Forsyth.NH
922*46439007SCharles.ForsythThe loader
923*46439007SCharles.Forsyth.LP
924*46439007SCharles.ForsythThe loader is a multiple pass program that
925*46439007SCharles.Forsythreads object files and libraries and produces
926*46439007SCharles.Forsythan executable binary.
927*46439007SCharles.ForsythThe loader also does some minimal
928*46439007SCharles.Forsythoptimizations and code rewriting.
929*46439007SCharles.ForsythMany of the operations performed by the
930*46439007SCharles.Forsythloader are machine-dependent.
931*46439007SCharles.Forsyth.LP
932*46439007SCharles.ForsythThe first pass of the loader reads the
933*46439007SCharles.Forsythobject modules into an internal data
934*46439007SCharles.Forsythstructure that looks like binary assembly language.
935*46439007SCharles.ForsythAs the instructions are read,
936*46439007SCharles.Forsythcode is reordered to remove
937*46439007SCharles.Forsythunconditional branch instructions.
938*46439007SCharles.ForsythConditional branch instructions are inverted
939*46439007SCharles.Forsythto prevent the insertion of unconditional branches.
940*46439007SCharles.ForsythThe loader will also make a copy of a few instructions
941*46439007SCharles.Forsythto remove an unconditional branch.
942*46439007SCharles.Forsyth.LP
943*46439007SCharles.ForsythThe next pass allocates addresses for
944*46439007SCharles.Forsythall external data.
945*46439007SCharles.ForsythTypical of processors is the MIPS,
946*46439007SCharles.Forsythwhich can reference ±32K bytes from a
947*46439007SCharles.Forsythregister.
948*46439007SCharles.ForsythThe loader allocates the register
949*46439007SCharles.Forsyth.CW R30
950*46439007SCharles.Forsythas the static pointer.
951*46439007SCharles.ForsythThe value placed in
952*46439007SCharles.Forsyth.CW R30
953*46439007SCharles.Forsythis the base of the data segment plus 32K.
954*46439007SCharles.ForsythIt is then cheap to reference all data in the
955*46439007SCharles.Forsythfirst 64K of the data segment.
956*46439007SCharles.ForsythExternal variables are allocated to
957*46439007SCharles.Forsyththe data segment
958*46439007SCharles.Forsythwith the smallest variables allocated first.
959*46439007SCharles.ForsythIf all of the data cannot fit into the first
960*46439007SCharles.Forsyth64K of the data segment,
961*46439007SCharles.Forsyththen usually only a few large arrays
962*46439007SCharles.Forsythneed more expensive addressing modes.
963*46439007SCharles.Forsyth.LP
964*46439007SCharles.ForsythFor the MIPS processor,
965*46439007SCharles.Forsyththe loader makes a pass over the internal
966*46439007SCharles.Forsythstructures,
967*46439007SCharles.Forsythexchanging instructions to try
968*46439007SCharles.Forsythto fill ``delay slots'' with useful work.
969*46439007SCharles.ForsythIf a useful instruction cannot be found
970*46439007SCharles.Forsythto fill a delay slot,
971*46439007SCharles.Forsyththe loader will insert
972*46439007SCharles.Forsyth``noop''
973*46439007SCharles.Forsythinstructions.
974*46439007SCharles.ForsythThis pass is very expensive and does not
975*46439007SCharles.Forsythdo a good job.
976*46439007SCharles.ForsythAbout 40% of all instructions are in
977*46439007SCharles.Forsythdelay slots.
978*46439007SCharles.ForsythAbout 65% of these are useful instructions and
979*46439007SCharles.Forsyth35% are ``noops.''
980*46439007SCharles.ForsythThe vendor-supplied assembler does this job
981*46439007SCharles.Forsythmore effectively,
982*46439007SCharles.Forsythfilling about 80%
983*46439007SCharles.Forsythof the delay slots with useful instructions.
984*46439007SCharles.Forsyth.LP
985*46439007SCharles.ForsythOn the 68020 processor,
986*46439007SCharles.Forsythbranch instructions come in a variety of
987*46439007SCharles.Forsythsizes depending on the relative distance
988*46439007SCharles.Forsythof the branch.
989*46439007SCharles.ForsythThus the size of branch instructions
990*46439007SCharles.Forsythcan be mutually dependent.
991*46439007SCharles.ForsythThe loader uses a multiple pass algorithm
992*46439007SCharles.Forsythto resolve the branch lengths
993*46439007SCharles.Forsyth[Szy78].
994*46439007SCharles.ForsythInitially, all branches are assumed minimal length.
995*46439007SCharles.ForsythOn each subsequent pass,
996*46439007SCharles.Forsyththe branches are reassessed
997*46439007SCharles.Forsythand expanded if necessary.
998*46439007SCharles.ForsythWhen no more expansions occur,
999*46439007SCharles.Forsyththe locations of the instructions in
1000*46439007SCharles.Forsyththe text segment are known.
1001*46439007SCharles.Forsyth.LP
1002*46439007SCharles.ForsythOn the MIPS processor,
1003*46439007SCharles.Forsythall instructions are one size.
1004*46439007SCharles.ForsythA single pass over the instructions will
1005*46439007SCharles.Forsythdetermine the locations of all addresses
1006*46439007SCharles.Forsythin the text segment.
1007*46439007SCharles.Forsyth.LP
1008*46439007SCharles.ForsythThe last pass of the loader produces the
1009*46439007SCharles.Forsythexecutable binary.
1010*46439007SCharles.ForsythA symbol table and other tables are
1011*46439007SCharles.Forsythproduced to help the debugger to
1012*46439007SCharles.Forsythinterpret the binary symbolically.
1013*46439007SCharles.Forsyth.LP
1014*46439007SCharles.ForsythThe loader places absolute source line numbers in the symbol table.
1015*46439007SCharles.ForsythThe name and absolute line number of all
1016*46439007SCharles.Forsyth.CW #include
1017*46439007SCharles.Forsythfiles is also placed in the
1018*46439007SCharles.Forsythsymbol table so that the debuggers can
1019*46439007SCharles.Forsythassociate object code to source files.
1020*46439007SCharles.Forsyth.NH
1021*46439007SCharles.ForsythPerformance
1022*46439007SCharles.Forsyth.LP
1023*46439007SCharles.ForsythThe following is a table of the source size of the MIPS
1024*46439007SCharles.Forsythcompiler.
1025*46439007SCharles.Forsyth.DS
1026*46439007SCharles.Forsyth.ta .1i .6i
1027*46439007SCharles.Forsyth	lines	module
1028*46439007SCharles.Forsyth	\0509	machine-independent headers
1029*46439007SCharles.Forsyth	1070	machine-independent YACC source
1030*46439007SCharles.Forsyth	6090	machine-independent C source
1031*46439007SCharles.Forsyth
1032*46439007SCharles.Forsyth	\0545	machine-dependent headers
1033*46439007SCharles.Forsyth	6532	machine-dependent C source
1034*46439007SCharles.Forsyth
1035*46439007SCharles.Forsyth	\0298	loader headers
1036*46439007SCharles.Forsyth	5215	loader C source
1037*46439007SCharles.Forsyth.DE
1038*46439007SCharles.Forsyth.LP
1039*46439007SCharles.ForsythThe following table shows timing
1040*46439007SCharles.Forsythof a test program
1041*46439007SCharles.Forsyththat plays checkers, running on a MIPS R4000.
1042*46439007SCharles.ForsythThe test program is 26 files totaling 12600 lines of C.
1043*46439007SCharles.ForsythThe execution time does not significantly
1044*46439007SCharles.Forsythdepend on library implementation.
1045*46439007SCharles.ForsythSince no other compiler runs on Plan 9,
1046*46439007SCharles.Forsyththe Plan 9 tests were done with the Plan 9 operating system;
1047*46439007SCharles.Forsyththe other tests were done on the vendor's operating system.
1048*46439007SCharles.ForsythThe hardware was identical in both cases.
1049*46439007SCharles.ForsythThe optimizer in the vendor's compiler
1050*46439007SCharles.Forsythis reputed to be extremely good.
1051*46439007SCharles.Forsyth.DS
1052*46439007SCharles.Forsyth.ta .1i .9i
1053*46439007SCharles.Forsyth	\0\04.49s	Plan 9 \f(CWvc\fP \f(CW-N\fP compile time (opposite of \f(CW-O\fP)
1054*46439007SCharles.Forsyth	\0\01.72s	Plan 9 \f(CWvc\fP \f(CW-N\fP load time
1055*46439007SCharles.Forsyth	148.69s	Plan 9 \f(CWvc\fP \f(CW-N\fP run time
1056*46439007SCharles.Forsyth
1057*46439007SCharles.Forsyth	\015.07s	Plan 9 \f(CWvc\fP compile time (\f(CW-O\fP implicit)
1058*46439007SCharles.Forsyth	\0\01.66s	Plan 9 \f(CWvc\fP load time
1059*46439007SCharles.Forsyth	\089.96s	Plan 9 \f(CWvc\fP run time
1060*46439007SCharles.Forsyth
1061*46439007SCharles.Forsyth	\014.83s	vendor \f(CWcc\fP compile time
1062*46439007SCharles.Forsyth	\0\00.38s	vendor \f(CWcc\fP load time
1063*46439007SCharles.Forsyth	104.75s	vendor \f(CWcc\fP run time
1064*46439007SCharles.Forsyth
1065*46439007SCharles.Forsyth	\043.59s	vendor \f(CWcc\fP \f(CW-O\fP compile time
1066*46439007SCharles.Forsyth	\0\00.38s	vendor \f(CWcc\fP \f(CW-O\fP load time
1067*46439007SCharles.Forsyth	\076.19s	vendor \f(CWcc\fP \f(CW-O\fP run time
1068*46439007SCharles.Forsyth
1069*46439007SCharles.Forsyth	\0\08.19s	vendor \f(CWcc\fP \f(CW-O3\fP compile time
1070*46439007SCharles.Forsyth	\035.97s	vendor \f(CWcc\fP \f(CW-O3\fP load time
1071*46439007SCharles.Forsyth	\071.16s	vendor \f(CWcc\fP \f(CW-O3\fP run time
1072*46439007SCharles.Forsyth.DE
1073*46439007SCharles.Forsyth.LP
1074*46439007SCharles.ForsythTo compare the Intel compiler,
1075*46439007SCharles.Forsytha program that is about 40% bit manipulation and
1076*46439007SCharles.Forsythabout 60% single precision floating point was
1077*46439007SCharles.Forsythrun on the same 33 MHz 486, once under Windows
1078*46439007SCharles.Forsythcompiled with the Watcom compiler, version 10.0,
1079*46439007SCharles.Forsythin 16-bit mode and once under
1080*46439007SCharles.ForsythPlan 9 in 32-bit mode.
1081*46439007SCharles.ForsythThe Plan 9 execution time was 27 sec while the Windows
1082*46439007SCharles.Forsythexecution time was 31 sec.
1083*46439007SCharles.Forsyth.NH
1084*46439007SCharles.ForsythConclusions
1085*46439007SCharles.Forsyth.LP
1086*46439007SCharles.ForsythThe new compilers compile
1087*46439007SCharles.Forsythquickly,
1088*46439007SCharles.Forsythload slowly,
1089*46439007SCharles.Forsythand produce
1090*46439007SCharles.Forsythmedium quality
1091*46439007SCharles.Forsythobject code.
1092*46439007SCharles.ForsythThe compilers are relatively
1093*46439007SCharles.Forsythportable,
1094*46439007SCharles.Forsythrequiring but a couple of weeks' work to
1095*46439007SCharles.Forsythproduce a compiler for a different computer.
1096*46439007SCharles.ForsythFor Plan 9,
1097*46439007SCharles.Forsythwhere we needed several compilers
1098*46439007SCharles.Forsythwith specialized features and
1099*46439007SCharles.Forsythour own object formats,
1100*46439007SCharles.Forsyththis project was indispensable.
1101*46439007SCharles.ForsythIt is also necessary for us to
1102*46439007SCharles.Forsythbe able to freely distribute our compilers
1103*46439007SCharles.Forsythwith the Plan 9 distribution.
1104*46439007SCharles.Forsyth.LP
1105*46439007SCharles.ForsythTwo problems have come up in retrospect.
1106*46439007SCharles.ForsythThe first has to do with the
1107*46439007SCharles.Forsythdivision of labor between compiler and loader.
1108*46439007SCharles.ForsythPlan 9 runs on multi-processors and as such
1109*46439007SCharles.Forsythcompilations are often done in parallel.
1110*46439007SCharles.ForsythUnfortunately,
1111*46439007SCharles.Forsythall compilations must be complete before loading
1112*46439007SCharles.Forsythcan begin.
1113*46439007SCharles.ForsythThe load is then single-threaded.
1114*46439007SCharles.ForsythWith this model,
1115*46439007SCharles.Forsythany shift of work from compile to load
1116*46439007SCharles.Forsythresults in a significant increase in real time.
1117*46439007SCharles.ForsythThe same is true of libraries that are compiled
1118*46439007SCharles.Forsythinfrequently and loaded often.
1119*46439007SCharles.ForsythIn the future,
1120*46439007SCharles.Forsythwe may try to put some of the loader work
1121*46439007SCharles.Forsythback into the compiler.
1122*46439007SCharles.Forsyth.LP
1123*46439007SCharles.ForsythThe second problem comes from
1124*46439007SCharles.Forsyththe various optimizations performed over several
1125*46439007SCharles.Forsythpasses.
1126*46439007SCharles.ForsythOften optimizations in different passes depend
1127*46439007SCharles.Forsython each other.
1128*46439007SCharles.ForsythIterating the passes could compromise efficiency,
1129*46439007SCharles.Forsythor even loop.
1130*46439007SCharles.ForsythWe see no real solution to this problem.
1131*46439007SCharles.Forsyth.NH
1132*46439007SCharles.ForsythReferences
1133*46439007SCharles.Forsyth.LP
1134*46439007SCharles.Forsyth[Aho87] A. V. Aho, R. Sethi, and J. D. Ullman,
1135*46439007SCharles.Forsyth.I
1136*46439007SCharles.ForsythCompilers \- Principles, Techniques, and Tools,
1137*46439007SCharles.Forsyth.R
1138*46439007SCharles.ForsythAddison Wesley,
1139*46439007SCharles.ForsythReading, MA,
1140*46439007SCharles.Forsyth1987.
1141*46439007SCharles.Forsyth.LP
1142*46439007SCharles.Forsyth[ANSI90] \f2American National Standard for Information Systems \-
1143*46439007SCharles.ForsythProgramming Language C\f1, American National Standards Institute, Inc.,
1144*46439007SCharles.ForsythNew York, 1990.
1145*46439007SCharles.Forsyth.LP
1146*46439007SCharles.Forsyth[Dav91] J. W. Davidson and D. B. Whalley,
1147*46439007SCharles.Forsyth``Methods for Saving and Restoring Register Values across Function Calls'',
1148*46439007SCharles.Forsyth.I
1149*46439007SCharles.ForsythSoftware\-Practice and Experience,
1150*46439007SCharles.Forsyth.R
1151*46439007SCharles.ForsythVol 21(2), pp. 149-165, February 1991.
1152*46439007SCharles.Forsyth.LP
1153*46439007SCharles.Forsyth[Joh79] S. C. Johnson,
1154*46439007SCharles.Forsyth``YACC \- Yet Another Compiler Compiler'',
1155*46439007SCharles.Forsyth.I
1156*46439007SCharles.ForsythUNIX Programmer's Manual, Seventh Ed., Vol. 2A,
1157*46439007SCharles.Forsyth.R
1158*46439007SCharles.ForsythAT&T Bell Laboratories,
1159*46439007SCharles.ForsythMurray Hill, NJ,
1160*46439007SCharles.Forsyth1979.
1161*46439007SCharles.Forsyth.LP
1162*46439007SCharles.Forsyth[Set70] R. Sethi and J. D. Ullman,
1163*46439007SCharles.Forsyth``The Generation of Optimal Code for Arithmetic Expressions'',
1164*46439007SCharles.Forsyth.I
1165*46439007SCharles.ForsythJournal of the ACM,
1166*46439007SCharles.Forsyth.R
1167*46439007SCharles.ForsythVol 17(4), pp. 715-728, 1970.
1168*46439007SCharles.Forsyth.LP
1169*46439007SCharles.Forsyth[Szy78] T. G. Szymanski,
1170*46439007SCharles.Forsyth``Assembling Code for Machines with Span-dependent Instructions'',
1171*46439007SCharles.Forsyth.I
1172*46439007SCharles.ForsythCommunications of the ACM,
1173*46439007SCharles.Forsyth.R
1174*46439007SCharles.ForsythVol 21(4), pp. 300-308, 1978.
1175