1*46439007SCharles.Forsyth.TL 2*46439007SCharles.ForsythPlan 9 C Compilers \(dg 3*46439007SCharles.Forsyth.AU 4*46439007SCharles.Forsyth.I "Ken Thompson" 5*46439007SCharles.Forsyth.AI 6*46439007SCharles.Forsythken@plan9.bell-labs.com 7*46439007SCharles.Forsyth.AB 8*46439007SCharles.Forsyth.FS 9*46439007SCharles.Forsyth\l'1i' 10*46439007SCharles.Forsyth.br 11*46439007SCharles.Forsyth\(dg Originally appeared, in a different form, in 12*46439007SCharles.Forsyth.I 13*46439007SCharles.ForsythProceedings of the Summer 1990 UKUUG Conference, 14*46439007SCharles.Forsyth.R 15*46439007SCharles.Forsythpp. 41-51, 16*46439007SCharles.ForsythLondon, 1990. 17*46439007SCharles.ForsythThis version first appeared in 18*46439007SCharles.Forsyth.I "Plan 9 Programmer's Manual, Volume 2 (Second Edition)" . 19*46439007SCharles.ForsythThe Plan 9 compiler suite forms the basis for the portable Inferno compiler suite, 20*46439007SCharles.Forsythmaking this paper still relevant. 21*46439007SCharles.Forsyth.FE 22*46439007SCharles.ForsythThis paper describes the overall structure and function of the Plan 9 C compilers. 23*46439007SCharles.ForsythA more detailed implementation document 24*46439007SCharles.Forsythfor any one of the compilers 25*46439007SCharles.Forsythis yet to be written. 26*46439007SCharles.Forsyth.AE 27*46439007SCharles.Forsyth.NH 28*46439007SCharles.ForsythIntroduction 29*46439007SCharles.Forsyth.LP 30*46439007SCharles.ForsythThere are many compilers in the series. 31*46439007SCharles.ForsythEight of the compilers (MIPS 3000, SPARC, Intel 386, AMD64, Power PC, ARM, DEC Alpha, and Motorola 68020) 32*46439007SCharles.Forsythare considered active and are used to compile 33*46439007SCharles.Forsythcurrent versions of Plan 9 or Inferno. 34*46439007SCharles.ForsythSeveral others (Motorola 68000, Intel 960, AMD 29000) have had only limited use, such as 35*46439007SCharles.Forsythto program peripherals or experimental devices. 36*46439007SCharles.Forsyth.NH 37*46439007SCharles.ForsythStructure 38*46439007SCharles.Forsyth.LP 39*46439007SCharles.ForsythThe compiler is a single program that produces an 40*46439007SCharles.Forsythobject file. 41*46439007SCharles.ForsythCombined in the compiler are the traditional 42*46439007SCharles.Forsythroles of preprocessor, lexical analyzer, parser, code generator, 43*46439007SCharles.Forsythlocal optimizer, 44*46439007SCharles.Forsythand first half of the assembler. 45*46439007SCharles.ForsythThe object files are binary forms of assembly 46*46439007SCharles.Forsythlanguage, 47*46439007SCharles.Forsythsimilar to what might be passed between 48*46439007SCharles.Forsyththe first and second passes of an assembler. 49*46439007SCharles.Forsyth.LP 50*46439007SCharles.ForsythObject files and libraries 51*46439007SCharles.Forsythare combined by a loader 52*46439007SCharles.Forsythprogram to produce the executable binary. 53*46439007SCharles.ForsythThe loader combines the roles of second half 54*46439007SCharles.Forsythof the assembler, global optimizer, and loader. 55*46439007SCharles.ForsythThe names of the compliers, loaders, and assemblers 56*46439007SCharles.Forsythare as follows: 57*46439007SCharles.Forsyth.DS 58*46439007SCharles.Forsyth.ta 1.5i 59*46439007SCharles.Forsyth.de Ta 60*46439007SCharles.Forsyth\\$1 \f(CW\\$2\fP \f(CW\\$3\fP \f(CW\\$4\fP 61*46439007SCharles.Forsyth.. 62*46439007SCharles.Forsyth.Ta SPARC kc kl ka 63*46439007SCharles.Forsyth.Ta PowerPC qc ql qa 64*46439007SCharles.Forsyth.Ta MIPS vc vl va 65*46439007SCharles.Forsyth.Ta Motorola\ 68000 1c 1l 1a 66*46439007SCharles.Forsyth.Ta Motorola\ 68020 2c 2l 2a 67*46439007SCharles.Forsyth.Ta ARM 5c 5l 5a 68*46439007SCharles.Forsyth.Ta AMD64 6c 6l 6a 69*46439007SCharles.Forsyth.Ta DEC\ Alpha 7c 7l 7a 70*46439007SCharles.Forsyth.Ta Intel\ 386 8c 8l 8a 71*46439007SCharles.Forsyth.Ta AMD\ 29000 9c 9l 9a 72*46439007SCharles.Forsyth.DE 73*46439007SCharles.ForsythThere is a further breakdown 74*46439007SCharles.Forsythin the source of the compilers into 75*46439007SCharles.Forsythobject-independent and 76*46439007SCharles.Forsythobject-dependent 77*46439007SCharles.Forsythparts. 78*46439007SCharles.ForsythAll of the object-independent parts 79*46439007SCharles.Forsythare combined into source files in the 80*46439007SCharles.Forsythdirectory 81*46439007SCharles.Forsyth.CW /sys/src/cmd/cc . 82*46439007SCharles.ForsythThe object-dependent parts are collected 83*46439007SCharles.Forsythin a separate directory for each compiler, 84*46439007SCharles.Forsythfor example 85*46439007SCharles.Forsyth.CW /sys/src/cmd/vc . 86*46439007SCharles.ForsythAll of the code, 87*46439007SCharles.Forsythboth object-independent and 88*46439007SCharles.Forsythobject-dependent, 89*46439007SCharles.Forsythis machine-independent 90*46439007SCharles.Forsythand may be cross-compiled and executed on any 91*46439007SCharles.Forsythof the architectures. 92*46439007SCharles.Forsyth.NH 93*46439007SCharles.ForsythThe Language 94*46439007SCharles.Forsyth.LP 95*46439007SCharles.ForsythThe compiler implements ANSI C with some 96*46439007SCharles.Forsythrestrictions and extensions 97*46439007SCharles.Forsyth[ANSI90]. 98*46439007SCharles.ForsythMost of the restrictions are due to 99*46439007SCharles.Forsythpersonal preference, while 100*46439007SCharles.Forsythmost of the extensions were to help in 101*46439007SCharles.Forsyththe implementation of Plan 9. 102*46439007SCharles.ForsythThere are other departures from the standard, 103*46439007SCharles.Forsythparticularly in the libraries, 104*46439007SCharles.Forsyththat are beyond the scope of this 105*46439007SCharles.Forsythpaper. 106*46439007SCharles.Forsyth.NH 2 107*46439007SCharles.ForsythRegister, volatile, const 108*46439007SCharles.Forsyth.LP 109*46439007SCharles.ForsythThe keyword 110*46439007SCharles.Forsyth.CW register 111*46439007SCharles.Forsythis recognized syntactically 112*46439007SCharles.Forsythbut is semantically ignored. 113*46439007SCharles.ForsythThus taking the address of a 114*46439007SCharles.Forsyth.CW register 115*46439007SCharles.Forsythvariable is not diagnosed. 116*46439007SCharles.ForsythThe keyword 117*46439007SCharles.Forsyth.CW volatile 118*46439007SCharles.Forsythdisables all optimizations, in particular registerization, of the corresponding variable. 119*46439007SCharles.ForsythThe keyword 120*46439007SCharles.Forsyth.CW const 121*46439007SCharles.Forsythgenerates warnings (if warnings are enabled by the compiler's 122*46439007SCharles.Forsyth.CW -w 123*46439007SCharles.Forsythoption) of non-constant use of the variable, 124*46439007SCharles.Forsythbut does not affect the generated code. 125*46439007SCharles.Forsyth.NH 2 126*46439007SCharles.ForsythThe preprocessor 127*46439007SCharles.Forsyth.LP 128*46439007SCharles.ForsythThe C preprocessor is probably the 129*46439007SCharles.Forsythbiggest departure from the ANSI standard. 130*46439007SCharles.Forsyth.LP 131*46439007SCharles.ForsythThe preprocessor built into the Plan 9 compilers does not support 132*46439007SCharles.Forsyth.CW #if , 133*46439007SCharles.Forsythalthough it does handle 134*46439007SCharles.Forsyth.CW #ifdef 135*46439007SCharles.Forsythand 136*46439007SCharles.Forsyth.CW #include . 137*46439007SCharles.ForsythIf it is necessary to be more standard, 138*46439007SCharles.Forsyththe source text can first be run through the separate ANSI C 139*46439007SCharles.Forsythpreprocessor, 140*46439007SCharles.Forsyth.CW cpp . 141*46439007SCharles.Forsyth.NH 2 142*46439007SCharles.ForsythUnnamed substructures 143*46439007SCharles.Forsyth.LP 144*46439007SCharles.ForsythThe most important and most heavily used of the 145*46439007SCharles.Forsythextensions is the declaration of an 146*46439007SCharles.Forsythunnamed substructure or subunion. 147*46439007SCharles.ForsythFor example: 148*46439007SCharles.Forsyth.DS 149*46439007SCharles.Forsyth.CW 150*46439007SCharles.Forsyth.ta .1i .6i 1.1i 1.6i 151*46439007SCharles.Forsyth typedef 152*46439007SCharles.Forsyth struct lock 153*46439007SCharles.Forsyth { 154*46439007SCharles.Forsyth int locked; 155*46439007SCharles.Forsyth } Lock; 156*46439007SCharles.Forsyth 157*46439007SCharles.Forsyth typedef 158*46439007SCharles.Forsyth struct node 159*46439007SCharles.Forsyth { 160*46439007SCharles.Forsyth int type; 161*46439007SCharles.Forsyth union 162*46439007SCharles.Forsyth { 163*46439007SCharles.Forsyth double dval; 164*46439007SCharles.Forsyth float fval; 165*46439007SCharles.Forsyth long lval; 166*46439007SCharles.Forsyth }; 167*46439007SCharles.Forsyth Lock; 168*46439007SCharles.Forsyth } Node; 169*46439007SCharles.Forsyth 170*46439007SCharles.Forsyth Lock* lock; 171*46439007SCharles.Forsyth Node* node; 172*46439007SCharles.Forsyth.R 173*46439007SCharles.Forsyth.DE 174*46439007SCharles.ForsythThe declaration of 175*46439007SCharles.Forsyth.CW Node 176*46439007SCharles.Forsythhas an unnamed substructure of type 177*46439007SCharles.Forsyth.CW Lock 178*46439007SCharles.Forsythand an unnamed subunion. 179*46439007SCharles.ForsythOne use of this feature allows references to elements of the 180*46439007SCharles.Forsythsubunit to be accessed as if they were in 181*46439007SCharles.Forsyththe outer structure. 182*46439007SCharles.ForsythThus 183*46439007SCharles.Forsyth.CW node->dval 184*46439007SCharles.Forsythand 185*46439007SCharles.Forsyth.CW node->locked 186*46439007SCharles.Forsythare legitimate references. 187*46439007SCharles.Forsyth.LP 188*46439007SCharles.ForsythWhen an outer structure is used 189*46439007SCharles.Forsythin a context that is only legal for 190*46439007SCharles.Forsythan unnamed substructure, 191*46439007SCharles.Forsyththe compiler promotes the reference to the 192*46439007SCharles.Forsythunnamed substructure. 193*46439007SCharles.ForsythThis is true for references to structures and 194*46439007SCharles.Forsythto references to pointers to structures. 195*46439007SCharles.ForsythThis happens in assignment statements and 196*46439007SCharles.Forsythin argument passing where prototypes have been 197*46439007SCharles.Forsythdeclared. 198*46439007SCharles.ForsythThus, continuing with the example, 199*46439007SCharles.Forsyth.DS 200*46439007SCharles.Forsyth.CW 201*46439007SCharles.Forsyth.ta .1i .6i 1.1i 1.6i 202*46439007SCharles.Forsyth lock = node; 203*46439007SCharles.Forsyth.R 204*46439007SCharles.Forsyth.DE 205*46439007SCharles.Forsythwould assign a pointer to the unnamed 206*46439007SCharles.Forsyth.CW Lock 207*46439007SCharles.Forsythin 208*46439007SCharles.Forsyththe 209*46439007SCharles.Forsyth.CW Node 210*46439007SCharles.Forsythto the variable 211*46439007SCharles.Forsyth.CW lock . 212*46439007SCharles.ForsythAnother example, 213*46439007SCharles.Forsyth.DS 214*46439007SCharles.Forsyth.CW 215*46439007SCharles.Forsyth.ta .1i .6i 1.1i 1.6i 216*46439007SCharles.Forsyth extern void lock(Lock*); 217*46439007SCharles.Forsyth func(...) 218*46439007SCharles.Forsyth { 219*46439007SCharles.Forsyth ... 220*46439007SCharles.Forsyth lock(node); 221*46439007SCharles.Forsyth ... 222*46439007SCharles.Forsyth } 223*46439007SCharles.Forsyth.R 224*46439007SCharles.Forsyth.DE 225*46439007SCharles.Forsythwill pass a pointer to the 226*46439007SCharles.Forsyth.CW Lock 227*46439007SCharles.Forsythsubstructure. 228*46439007SCharles.Forsyth.LP 229*46439007SCharles.ForsythFinally, in places where context is insufficient to identify the unnamed structure, 230*46439007SCharles.Forsyththe type name (it must be a 231*46439007SCharles.Forsyth.CW typedef ) 232*46439007SCharles.Forsythof the unnamed structure can be used as an identifier. 233*46439007SCharles.ForsythIn our example, 234*46439007SCharles.Forsyth.CW &node->Lock 235*46439007SCharles.Forsythgives the address of the anonymous 236*46439007SCharles.Forsyth.CW Lock 237*46439007SCharles.Forsythstructure. 238*46439007SCharles.Forsyth.NH 2 239*46439007SCharles.ForsythStructure displays 240*46439007SCharles.Forsyth.LP 241*46439007SCharles.ForsythA structure cast followed by a list of expressions in braces is 242*46439007SCharles.Forsythan expression with the type of the structure and elements assigned from 243*46439007SCharles.Forsyththe corresponding list. 244*46439007SCharles.ForsythStructures are now almost first-class citizens of the language. 245*46439007SCharles.ForsythIt is common to see code like this: 246*46439007SCharles.Forsyth.DS 247*46439007SCharles.Forsyth.CW 248*46439007SCharles.Forsyth.ta .1i 249*46439007SCharles.Forsyth r = (Rectangle){point1, (Point){x,y+2}}; 250*46439007SCharles.Forsyth.R 251*46439007SCharles.Forsyth.DE 252*46439007SCharles.Forsyth.NH 2 253*46439007SCharles.ForsythInitialization indexes 254*46439007SCharles.Forsyth.LP 255*46439007SCharles.ForsythIn initializers of arrays, 256*46439007SCharles.Forsythone may place a constant expression 257*46439007SCharles.Forsythin square brackets before an initializer. 258*46439007SCharles.ForsythThis causes the next initializer to assign 259*46439007SCharles.Forsyththe indicated element. 260*46439007SCharles.ForsythFor example: 261*46439007SCharles.Forsyth.DS 262*46439007SCharles.Forsyth.CW 263*46439007SCharles.Forsyth.ta .1i .6i 1.6i 264*46439007SCharles.Forsyth enum errors 265*46439007SCharles.Forsyth { 266*46439007SCharles.Forsyth Etoobig, 267*46439007SCharles.Forsyth Ealarm, 268*46439007SCharles.Forsyth Egreg 269*46439007SCharles.Forsyth }; 270*46439007SCharles.Forsyth char* errstrings[] = 271*46439007SCharles.Forsyth { 272*46439007SCharles.Forsyth [Ealarm] "Alarm call", 273*46439007SCharles.Forsyth [Egreg] "Panic: out of mbufs", 274*46439007SCharles.Forsyth [Etoobig] "Arg list too long", 275*46439007SCharles.Forsyth }; 276*46439007SCharles.Forsyth.R 277*46439007SCharles.Forsyth.DE 278*46439007SCharles.ForsythIn the same way, 279*46439007SCharles.Forsythindividual structures members may 280*46439007SCharles.Forsythbe initialized in any order by preceding the initialization with 281*46439007SCharles.Forsyth.CW .tagname . 282*46439007SCharles.ForsythBoth forms allow an optional 283*46439007SCharles.Forsyth.CW = , 284*46439007SCharles.Forsythto be compatible with a proposed 285*46439007SCharles.Forsythextension to ANSI C. 286*46439007SCharles.Forsyth.NH 2 287*46439007SCharles.ForsythExternal register 288*46439007SCharles.Forsyth.LP 289*46439007SCharles.ForsythThe declaration 290*46439007SCharles.Forsyth.CW extern 291*46439007SCharles.Forsyth.CW register 292*46439007SCharles.Forsythwill dedicate a register to 293*46439007SCharles.Forsytha variable on a global basis. 294*46439007SCharles.ForsythIt can be used only under special circumstances. 295*46439007SCharles.ForsythExternal register variables must be identically 296*46439007SCharles.Forsythdeclared in all modules and 297*46439007SCharles.Forsythlibraries. 298*46439007SCharles.ForsythThe feature is not intended for efficiency, 299*46439007SCharles.Forsythalthough it can produce efficient code; 300*46439007SCharles.Forsythrather it represents a unique storage class that 301*46439007SCharles.Forsythwould be hard to get any other way. 302*46439007SCharles.ForsythOn a shared-memory multi-processor, 303*46439007SCharles.Forsythan external register is 304*46439007SCharles.Forsythone-per-processor and neither one-per-procedure (automatic) 305*46439007SCharles.Forsythor one-per-system (external). 306*46439007SCharles.ForsythIt is used for two variables in the Plan 9 kernel, 307*46439007SCharles.Forsyth.CW u 308*46439007SCharles.Forsythand 309*46439007SCharles.Forsyth.CW m . 310*46439007SCharles.Forsyth.CW U 311*46439007SCharles.Forsythis a pointer to the structure representing the currently running process 312*46439007SCharles.Forsythand 313*46439007SCharles.Forsyth.CW m 314*46439007SCharles.Forsythis a pointer to the per-machine data structure. 315*46439007SCharles.Forsyth.NH 2 316*46439007SCharles.ForsythLong long 317*46439007SCharles.Forsyth.LP 318*46439007SCharles.ForsythThe compilers accept 319*46439007SCharles.Forsyth.CW long 320*46439007SCharles.Forsyth.CW long 321*46439007SCharles.Forsythas a basic type meaning 64-bit integer. 322*46439007SCharles.ForsythOn all of the machines 323*46439007SCharles.Forsyththis type is synthesized from 32-bit instructions. 324*46439007SCharles.Forsyth.NH 2 325*46439007SCharles.ForsythPragma 326*46439007SCharles.Forsyth.LP 327*46439007SCharles.ForsythThe compilers accept 328*46439007SCharles.Forsyth.CW #pragma 329*46439007SCharles.Forsyth.CW lib 330*46439007SCharles.Forsyth.I libname 331*46439007SCharles.Forsythand pass the 332*46439007SCharles.Forsythlibrary name string uninterpreted 333*46439007SCharles.Forsythto the loader. 334*46439007SCharles.ForsythThe loader uses the library name to 335*46439007SCharles.Forsythfind libraries to load. 336*46439007SCharles.ForsythIf the name contains 337*46439007SCharles.Forsyth.CW $O , 338*46439007SCharles.Forsythit is replaced with 339*46439007SCharles.Forsyththe single character object type of the compiler 340*46439007SCharles.Forsyth(e.g., 341*46439007SCharles.Forsyth.CW v 342*46439007SCharles.Forsythfor the MIPS). 343*46439007SCharles.ForsythIf the name contains 344*46439007SCharles.Forsyth.CW $M , 345*46439007SCharles.Forsythit is replaced with 346*46439007SCharles.Forsyththe architecture type for the compiler 347*46439007SCharles.Forsyth(e.g., 348*46439007SCharles.Forsyth.CW mips 349*46439007SCharles.Forsythfor the MIPS). 350*46439007SCharles.ForsythIf the name starts with 351*46439007SCharles.Forsyth.CW / 352*46439007SCharles.Forsythit is an absolute pathname; 353*46439007SCharles.Forsythif it starts with 354*46439007SCharles.Forsyth.CW . 355*46439007SCharles.Forsyththen it is searched for in the loader's current directory. 356*46439007SCharles.ForsythOtherwise, the name is searched from 357*46439007SCharles.Forsyth.CW /$M/lib . 358*46439007SCharles.ForsythSuch 359*46439007SCharles.Forsyth.CW #pragma 360*46439007SCharles.Forsythstatements in header files guarantee that the correct 361*46439007SCharles.Forsythlibraries are always linked with a program without the 362*46439007SCharles.Forsythneed to specify them explicitly at link time. 363*46439007SCharles.Forsyth.LP 364*46439007SCharles.ForsythThey also accept 365*46439007SCharles.Forsyth.CW #pragma 366*46439007SCharles.Forsyth.CW packed 367*46439007SCharles.Forsyth.CW on 368*46439007SCharles.Forsyth(or 369*46439007SCharles.Forsyth.CW yes 370*46439007SCharles.Forsythor 371*46439007SCharles.Forsyth.CW 1 ) 372*46439007SCharles.Forsythto cause subsequently declared data, until 373*46439007SCharles.Forsyth.CW #pragma 374*46439007SCharles.Forsyth.CW packed 375*46439007SCharles.Forsyth.CW off 376*46439007SCharles.Forsyth(or 377*46439007SCharles.Forsyth.CW no 378*46439007SCharles.Forsythor 379*46439007SCharles.Forsyth.CW 0 ), 380*46439007SCharles.Forsythto be laid out in memory tightly packed in successive bytes, disregarding 381*46439007SCharles.Forsyththe usual alignment rules. 382*46439007SCharles.ForsythAccessing such data can cause faults. 383*46439007SCharles.Forsyth.LP 384*46439007SCharles.ForsythSimilarly, 385*46439007SCharles.Forsyth.CW #pragma 386*46439007SCharles.Forsyth.CW profile 387*46439007SCharles.Forsyth.CW off 388*46439007SCharles.Forsyth(or 389*46439007SCharles.Forsyth.CW no 390*46439007SCharles.Forsythor 391*46439007SCharles.Forsyth.CW 0 ) 392*46439007SCharles.Forsythcauses subsequently declared functions, until 393*46439007SCharles.Forsyth.CW #pragma 394*46439007SCharles.Forsyth.CW profile 395*46439007SCharles.Forsyth.CW on 396*46439007SCharles.Forsyth(or 397*46439007SCharles.Forsyth.CW yes 398*46439007SCharles.Forsythor 399*46439007SCharles.Forsyth.CW 1 ), 400*46439007SCharles.Forsythto be marked as unprofiled. 401*46439007SCharles.ForsythSuch functions will not be profiled when 402*46439007SCharles.Forsythprofiling is enabled for the rest of the program. 403*46439007SCharles.Forsyth.LP 404*46439007SCharles.ForsythTwo 405*46439007SCharles.Forsyth.CW #pragma 406*46439007SCharles.Forsythstatements allow type-checking of 407*46439007SCharles.Forsyth.CW print -like 408*46439007SCharles.Forsythfunctions. 409*46439007SCharles.ForsythThe first, of the form 410*46439007SCharles.Forsyth.P1 411*46439007SCharles.Forsyth#pragma varargck argpos error 2 412*46439007SCharles.Forsyth.P2 413*46439007SCharles.Forsythtells the compiler that the second argument to 414*46439007SCharles.Forsyth.CW error 415*46439007SCharles.Forsythis a 416*46439007SCharles.Forsyth.CW print 417*46439007SCharles.Forsythformat string (see the manual page 418*46439007SCharles.Forsyth.I print (2)) 419*46439007SCharles.Forsyththat specifies how to format 420*46439007SCharles.Forsyth.CW error 's 421*46439007SCharles.Forsythsubsequent arguments. 422*46439007SCharles.ForsythThe second, of the form 423*46439007SCharles.Forsyth.P1 424*46439007SCharles.Forsyth#pragma varargck type "s" char* 425*46439007SCharles.Forsyth.P2 426*46439007SCharles.Forsythsays that the 427*46439007SCharles.Forsyth.CW print 428*46439007SCharles.Forsythformat verb 429*46439007SCharles.Forsyth.CW s 430*46439007SCharles.Forsythprocesses an argument of 431*46439007SCharles.Forsythtype 432*46439007SCharles.Forsyth.CW char* . 433*46439007SCharles.ForsythIf the compiler's 434*46439007SCharles.Forsyth.CW -F 435*46439007SCharles.Forsythoption is enabled, the compiler will use this information 436*46439007SCharles.Forsythto report type violations in the arguments to 437*46439007SCharles.Forsyth.CW print , 438*46439007SCharles.Forsyth.CW error , 439*46439007SCharles.Forsythand similar routines. 440*46439007SCharles.Forsyth.NH 441*46439007SCharles.ForsythObject module conventions 442*46439007SCharles.Forsyth.LP 443*46439007SCharles.ForsythThe overall conventions of the runtime environment 444*46439007SCharles.Forsythare important 445*46439007SCharles.Forsythto runtime efficiency. 446*46439007SCharles.ForsythIn this section, 447*46439007SCharles.Forsythseveral of these conventions are discussed. 448*46439007SCharles.Forsyth.NH 2 449*46439007SCharles.ForsythRegister saving 450*46439007SCharles.Forsyth.LP 451*46439007SCharles.ForsythIn the Plan 9 compilers, 452*46439007SCharles.Forsyththe caller of a procedure saves the registers. 453*46439007SCharles.ForsythWith caller-saves, 454*46439007SCharles.Forsyththe leaf procedures can use all the 455*46439007SCharles.Forsythregisters and never save them. 456*46439007SCharles.ForsythIf you spend a lot of time at the leaves, 457*46439007SCharles.Forsyththis seems preferable. 458*46439007SCharles.ForsythWith callee-saves, 459*46439007SCharles.Forsyththe saving of the registers is done 460*46439007SCharles.Forsythin the single point of entry and return. 461*46439007SCharles.ForsythIf you are interested in space, 462*46439007SCharles.Forsyththis seems preferable. 463*46439007SCharles.ForsythIn both, 464*46439007SCharles.Forsyththere is a degree of uncertainty 465*46439007SCharles.Forsythabout what registers need to be saved. 466*46439007SCharles.ForsythCallee-saved registers make it difficult to 467*46439007SCharles.Forsythfind variables in registers in debuggers. 468*46439007SCharles.ForsythCallee-saved registers also complicate 469*46439007SCharles.Forsyththe implementation of 470*46439007SCharles.Forsyth.CW longjmp . 471*46439007SCharles.ForsythThe convincing argument is 472*46439007SCharles.Forsyththat with caller-saves, 473*46439007SCharles.Forsyththe decision to registerize a variable 474*46439007SCharles.Forsythcan include the cost of saving the register 475*46439007SCharles.Forsythacross calls. 476*46439007SCharles.ForsythFor a further discussion of caller- vs. callee-saves, 477*46439007SCharles.Forsythsee the paper by Davidson and Whalley [Dav91]. 478*46439007SCharles.Forsyth.LP 479*46439007SCharles.ForsythIn the Plan 9 operating system, 480*46439007SCharles.Forsythcalls to the kernel look like normal procedure 481*46439007SCharles.Forsythcalls, which means 482*46439007SCharles.Forsyththe caller 483*46439007SCharles.Forsythhas saved the registers and the system 484*46439007SCharles.Forsythentry does not have to. 485*46439007SCharles.ForsythThis makes system calls considerably faster. 486*46439007SCharles.ForsythSince this is a potential security hole, 487*46439007SCharles.Forsythand can lead to non-determinism, 488*46439007SCharles.Forsyththe system may eventually save the registers 489*46439007SCharles.Forsython entry, 490*46439007SCharles.Forsythor more likely clear the registers on return. 491*46439007SCharles.Forsyth.NH 2 492*46439007SCharles.ForsythCalling convention 493*46439007SCharles.Forsyth.LP 494*46439007SCharles.ForsythOlder C compilers maintain a frame pointer, which is at a known constant 495*46439007SCharles.Forsythoffset from the stack pointer within each function. 496*46439007SCharles.ForsythFor machines where the stack grows towards zero, 497*46439007SCharles.Forsyththe argument pointer is at a known constant offset 498*46439007SCharles.Forsythfrom the frame pointer. 499*46439007SCharles.ForsythSince the stack grows down in Plan 9, 500*46439007SCharles.Forsyththe Plan 9 compilers 501*46439007SCharles.Forsythkeep neither an 502*46439007SCharles.Forsythexplicit frame pointer nor 503*46439007SCharles.Forsythan explicit argument pointer; 504*46439007SCharles.Forsythinstead they generate addresses relative to the stack pointer. 505*46439007SCharles.Forsyth.LP 506*46439007SCharles.ForsythOn some architectures, the first argument to a subroutine is passed in a register. 507*46439007SCharles.Forsyth.NH 2 508*46439007SCharles.ForsythFunctions returning structures 509*46439007SCharles.Forsyth.LP 510*46439007SCharles.ForsythStructures longer than one word are awkward to implement 511*46439007SCharles.Forsythsince they do not fit in registers and must 512*46439007SCharles.Forsythbe passed around in memory. 513*46439007SCharles.ForsythFunctions that return structures 514*46439007SCharles.Forsythare particularly clumsy. 515*46439007SCharles.ForsythThe Plan 9 compilers pass the return address of 516*46439007SCharles.Forsytha structure as the first argument of a 517*46439007SCharles.Forsythfunction that has a structure return value. 518*46439007SCharles.ForsythThus 519*46439007SCharles.Forsyth.DS 520*46439007SCharles.Forsyth.CW 521*46439007SCharles.Forsyth.ta .1i .6i 1.1i 1.6i 522*46439007SCharles.Forsyth x = f(...) 523*46439007SCharles.Forsyth.R 524*46439007SCharles.Forsyth.DE 525*46439007SCharles.Forsythis rewritten as 526*46439007SCharles.Forsyth.DS 527*46439007SCharles.Forsyth.CW 528*46439007SCharles.Forsyth.ta .1i .6i 1.1i 1.6i 529*46439007SCharles.Forsyth f(&x, ...)\f1. 530*46439007SCharles.Forsyth.R 531*46439007SCharles.Forsyth.DE 532*46439007SCharles.ForsythThis saves a copy and makes the compilation 533*46439007SCharles.Forsythmuch less clumsy. 534*46439007SCharles.ForsythA disadvantage is that if you call this 535*46439007SCharles.Forsythfunction without an assignment, 536*46439007SCharles.Forsytha dummy location must be invented. 537*46439007SCharles.Forsyth.LP 538*46439007SCharles.ForsythThere is also a danger of calling a function 539*46439007SCharles.Forsyththat returns a structure without declaring 540*46439007SCharles.Forsythit as such. 541*46439007SCharles.ForsythWith ANSI C function prototypes, 542*46439007SCharles.Forsyththis error need never occur. 543*46439007SCharles.Forsyth.NH 544*46439007SCharles.ForsythImplementation 545*46439007SCharles.Forsyth.LP 546*46439007SCharles.ForsythThe compiler is divided internally into 547*46439007SCharles.Forsythfour machine-independent passes, 548*46439007SCharles.Forsythfour machine-dependent passes, 549*46439007SCharles.Forsythand an output pass. 550*46439007SCharles.ForsythThe next nine sections describe each pass in order. 551*46439007SCharles.Forsyth.NH 2 552*46439007SCharles.ForsythParsing 553*46439007SCharles.Forsyth.LP 554*46439007SCharles.ForsythThe first pass is a YACC-based parser 555*46439007SCharles.Forsyth[Joh79]. 556*46439007SCharles.ForsythDeclarations are interpreted immediately, 557*46439007SCharles.Forsythbuilding a block structured symbol table. 558*46439007SCharles.ForsythExecutable statements are put into a parse tree 559*46439007SCharles.Forsythand collected, 560*46439007SCharles.Forsythwithout interpretation. 561*46439007SCharles.ForsythAt the end of each procedure, 562*46439007SCharles.Forsyththe parse tree for the function is 563*46439007SCharles.Forsythexamined by the other passes of the compiler. 564*46439007SCharles.Forsyth.LP 565*46439007SCharles.ForsythThe input stream of the parser is 566*46439007SCharles.Forsytha pushdown list of input activations. 567*46439007SCharles.ForsythThe preprocessor 568*46439007SCharles.Forsythexpansions of 569*46439007SCharles.Forsythmacros 570*46439007SCharles.Forsythand 571*46439007SCharles.Forsyth.CW #include 572*46439007SCharles.Forsythare implemented as pushdowns. 573*46439007SCharles.ForsythThus there is no separate 574*46439007SCharles.Forsythpass for preprocessing. 575*46439007SCharles.Forsyth.NH 2 576*46439007SCharles.ForsythTyping 577*46439007SCharles.Forsyth.LP 578*46439007SCharles.ForsythThe next pass distributes typing information 579*46439007SCharles.Forsythto every node of the tree. 580*46439007SCharles.ForsythImplicit operations on the tree are added, 581*46439007SCharles.Forsythsuch as type promotions and taking the 582*46439007SCharles.Forsythaddress of arrays and functions. 583*46439007SCharles.Forsyth.NH 2 584*46439007SCharles.ForsythMachine-independent optimization 585*46439007SCharles.Forsyth.LP 586*46439007SCharles.ForsythThe next pass performs optimizations 587*46439007SCharles.Forsythand transformations of the tree, such as converting 588*46439007SCharles.Forsyth.CW &*x 589*46439007SCharles.Forsythand 590*46439007SCharles.Forsyth.CW *&x 591*46439007SCharles.Forsythinto 592*46439007SCharles.Forsyth.CW x . 593*46439007SCharles.ForsythConstant expressions are converted to constants in this pass. 594*46439007SCharles.Forsyth.NH 2 595*46439007SCharles.ForsythArithmetic rewrites 596*46439007SCharles.Forsyth.LP 597*46439007SCharles.ForsythThis is another machine-independent optimization. 598*46439007SCharles.ForsythSubtrees of add, subtract, and multiply of integers are 599*46439007SCharles.Forsythrewritten for easier compilation. 600*46439007SCharles.ForsythThe major transformation is factoring: 601*46439007SCharles.Forsyth.CW 4+8*a+16*b+5 602*46439007SCharles.Forsythis transformed into 603*46439007SCharles.Forsyth.CW 9+8*(a+2*b) . 604*46439007SCharles.ForsythSuch expressions arise from address 605*46439007SCharles.Forsythmanipulation and array indexing. 606*46439007SCharles.Forsyth.NH 2 607*46439007SCharles.ForsythAddressability 608*46439007SCharles.Forsyth.LP 609*46439007SCharles.ForsythThis is the first of the machine-dependent passes. 610*46439007SCharles.ForsythThe addressability of a processor is defined as the set of 611*46439007SCharles.Forsythexpressions that is legal in the address field 612*46439007SCharles.Forsythof a machine language instruction. 613*46439007SCharles.ForsythThe addressability of different processors varies widely. 614*46439007SCharles.ForsythAt one end of the spectrum are the 68020 and VAX, 615*46439007SCharles.Forsythwhich allow a complex mix of incrementing, 616*46439007SCharles.Forsythdecrementing, 617*46439007SCharles.Forsythindexing, and relative addressing. 618*46439007SCharles.ForsythAt the other end is the MIPS, 619*46439007SCharles.Forsythwhich allows only registers and constant offsets from the 620*46439007SCharles.Forsythcontents of a register. 621*46439007SCharles.ForsythThe addressability can be different for different instructions 622*46439007SCharles.Forsythwithin the same processor. 623*46439007SCharles.Forsyth.LP 624*46439007SCharles.ForsythIt is important to the code generator to know when a 625*46439007SCharles.Forsythsubtree represents an address of a particular type. 626*46439007SCharles.ForsythThis is done with a bottom-up walk of the tree. 627*46439007SCharles.ForsythIn this pass, the leaves are labeled with small integers. 628*46439007SCharles.ForsythWhen an internal node is encountered, 629*46439007SCharles.Forsythit is labeled by consulting a table indexed by the 630*46439007SCharles.Forsythlabels on the left and right subtrees. 631*46439007SCharles.ForsythFor example, 632*46439007SCharles.Forsython the 68020 processor, 633*46439007SCharles.Forsythit is possible to address an 634*46439007SCharles.Forsythoffset from a named location. 635*46439007SCharles.ForsythIn C, this is represented by the expression 636*46439007SCharles.Forsyth.CW *(&name+constant) . 637*46439007SCharles.ForsythThis is marked addressable by the following table. 638*46439007SCharles.ForsythIn the table, 639*46439007SCharles.Forsytha node represented by the left column is marked 640*46439007SCharles.Forsythwith a small integer from the right column. 641*46439007SCharles.ForsythMarks of the form 642*46439007SCharles.Forsyth.CW A\s-2\di\u\s0 643*46439007SCharles.Forsythare addressable while 644*46439007SCharles.Forsythmarks of the form 645*46439007SCharles.Forsyth.CW N\s-2\di\u\s0 646*46439007SCharles.Forsythare not addressable. 647*46439007SCharles.Forsyth.DS 648*46439007SCharles.Forsyth.B 649*46439007SCharles.Forsyth.ta .1i 1.1i 650*46439007SCharles.Forsyth Node Marked 651*46439007SCharles.Forsyth.CW 652*46439007SCharles.Forsyth name A\s-2\d1\u\s0 653*46439007SCharles.Forsyth const A\s-2\d2\u\s0 654*46439007SCharles.Forsyth &A\s-2\d1\u\s0 A\s-2\d3\u\s0 655*46439007SCharles.Forsyth A\s-2\d3\u\s0+A\s-2\d1\u\s0 N\s-2\d1\u\s0 \fR(note that this is not addressable)\fP 656*46439007SCharles.Forsyth *N\s-2\d1\u\s0 A\s-2\d4\u\s0 657*46439007SCharles.Forsyth.R 658*46439007SCharles.Forsyth.DE 659*46439007SCharles.ForsythHere there is a distinction between 660*46439007SCharles.Forsytha node marked 661*46439007SCharles.Forsyth.CW A\s-2\d1\u\s0 662*46439007SCharles.Forsythand a node marked 663*46439007SCharles.Forsyth.CW A\s-2\d4\u\s0 664*46439007SCharles.Forsythbecause the address operator of an 665*46439007SCharles.Forsyth.CW A\s-2\d4\u\s0 666*46439007SCharles.Forsythnode is not addressable. 667*46439007SCharles.ForsythSo to extend the table: 668*46439007SCharles.Forsyth.DS 669*46439007SCharles.Forsyth.B 670*46439007SCharles.Forsyth.ta .1i 1.1i 671*46439007SCharles.Forsyth Node Marked 672*46439007SCharles.Forsyth.CW 673*46439007SCharles.Forsyth &A\s-2\d4\u\s0 N\s-2\d2\u\s0 674*46439007SCharles.Forsyth N\s-2\d2\u\s0+N\s-2\d1\u\s0 N\s-2\d1\u\s0 675*46439007SCharles.Forsyth.R 676*46439007SCharles.Forsyth.DE 677*46439007SCharles.ForsythThe full addressability of the 68020 is expressed 678*46439007SCharles.Forsythin 18 rules like this, 679*46439007SCharles.Forsythwhile the addressability of the MIPS is expressed 680*46439007SCharles.Forsythin 11 rules. 681*46439007SCharles.ForsythWhen one ports the compiler, 682*46439007SCharles.Forsyththis table is usually initialized 683*46439007SCharles.Forsythso that leaves are labeled as addressable and nothing else. 684*46439007SCharles.ForsythThe code produced is poor, 685*46439007SCharles.Forsythbut porting is easy. 686*46439007SCharles.ForsythThe table can be extended later. 687*46439007SCharles.Forsyth.LP 688*46439007SCharles.ForsythThis pass also rewrites some complex operators 689*46439007SCharles.Forsythinto procedure calls. 690*46439007SCharles.ForsythExamples include 64-bit multiply and divide. 691*46439007SCharles.Forsyth.LP 692*46439007SCharles.ForsythIn the same bottom-up pass of the tree, 693*46439007SCharles.Forsyththe nodes are labeled with a Sethi-Ullman complexity 694*46439007SCharles.Forsyth[Set70]. 695*46439007SCharles.ForsythThis number is roughly the number of registers required 696*46439007SCharles.Forsythto compile the tree on an ideal machine. 697*46439007SCharles.ForsythAn addressable node is marked 0. 698*46439007SCharles.ForsythA function call is marked infinite. 699*46439007SCharles.ForsythA unary operator is marked as the 700*46439007SCharles.Forsythmaximum of 1 and the mark of its subtree. 701*46439007SCharles.ForsythA binary operator with equal marks on its subtrees is 702*46439007SCharles.Forsythmarked with a subtree mark plus 1. 703*46439007SCharles.ForsythA binary operator with unequal marks on its subtrees is 704*46439007SCharles.Forsythmarked with the maximum mark of its subtrees. 705*46439007SCharles.ForsythThe actual values of the marks are not too important, 706*46439007SCharles.Forsythbut the relative values are. 707*46439007SCharles.ForsythThe goal is to compile the harder 708*46439007SCharles.Forsyth(larger mark) 709*46439007SCharles.Forsythsubtree first. 710*46439007SCharles.Forsyth.NH 2 711*46439007SCharles.ForsythCode generation 712*46439007SCharles.Forsyth.LP 713*46439007SCharles.ForsythCode is generated by recursive 714*46439007SCharles.Forsythdescent. 715*46439007SCharles.ForsythThe Sethi-Ullman complexity completely guides the 716*46439007SCharles.Forsythorder. 717*46439007SCharles.ForsythThe addressability defines the leaves. 718*46439007SCharles.ForsythThe only difficult part is compiling a tree 719*46439007SCharles.Forsyththat has two infinite (function call) 720*46439007SCharles.Forsythsubtrees. 721*46439007SCharles.ForsythIn this case, 722*46439007SCharles.Forsythone subtree is compiled into the return register 723*46439007SCharles.Forsyth(usually the most convenient place for a function call) 724*46439007SCharles.Forsythand then stored on the stack. 725*46439007SCharles.ForsythThe other subtree is compiled into the return register 726*46439007SCharles.Forsythand then the operation is compiled with 727*46439007SCharles.Forsythoperands from the stack and the return register. 728*46439007SCharles.Forsyth.LP 729*46439007SCharles.ForsythThere is a separate boolean code generator that compiles 730*46439007SCharles.Forsythconditional expressions. 731*46439007SCharles.ForsythThis is fundamentally different from compiling an arithmetic expression. 732*46439007SCharles.ForsythThe result of the boolean code generator is the 733*46439007SCharles.Forsythposition of the program counter and not an expression. 734*46439007SCharles.ForsythThe boolean code generator makes extensive use of De Morgan's rule. 735*46439007SCharles.ForsythThe boolean code generator is an expanded version of that described 736*46439007SCharles.Forsythin chapter 8 of Aho, Sethi, and Ullman 737*46439007SCharles.Forsyth[Aho87]. 738*46439007SCharles.Forsyth.LP 739*46439007SCharles.ForsythThere is a considerable amount of talk in the literature 740*46439007SCharles.Forsythabout automating this part of a compiler with a machine 741*46439007SCharles.Forsythdescription. 742*46439007SCharles.ForsythSince this code generator is so small 743*46439007SCharles.Forsyth(less than 500 lines of C) 744*46439007SCharles.Forsythand easy, 745*46439007SCharles.Forsythit hardly seems worth the effort. 746*46439007SCharles.Forsyth.NH 2 747*46439007SCharles.ForsythRegisterization 748*46439007SCharles.Forsyth.LP 749*46439007SCharles.ForsythUp to now, 750*46439007SCharles.Forsyththe compiler has operated on syntax trees 751*46439007SCharles.Forsyththat are roughly equivalent to the original source language. 752*46439007SCharles.ForsythThe previous pass has produced machine language in an internal 753*46439007SCharles.Forsythformat. 754*46439007SCharles.ForsythThe next two passes operate on the internal machine language 755*46439007SCharles.Forsythstructures. 756*46439007SCharles.ForsythThe purpose of the next pass is to reintroduce 757*46439007SCharles.Forsythregisters for heavily used variables. 758*46439007SCharles.Forsyth.LP 759*46439007SCharles.ForsythAll of the variables that can be 760*46439007SCharles.Forsythpotentially registerized within a procedure are 761*46439007SCharles.Forsythplaced in a table. 762*46439007SCharles.Forsyth(Suitable variables are any automatic or external 763*46439007SCharles.Forsythscalars that do not have their addresses extracted. 764*46439007SCharles.ForsythSome constants that are hard to reference are also 765*46439007SCharles.Forsythconsidered for registerization.) 766*46439007SCharles.ForsythFour separate data flow equations are evaluated 767*46439007SCharles.Forsythover the procedure on all of these variables. 768*46439007SCharles.ForsythTwo of the equations are the normal set-behind 769*46439007SCharles.Forsythand used-ahead 770*46439007SCharles.Forsythbits that define the life of a variable. 771*46439007SCharles.ForsythThe two new bits tell if a variable life 772*46439007SCharles.Forsythcrosses a function call ahead or behind. 773*46439007SCharles.ForsythBy examining a variable over its lifetime, 774*46439007SCharles.Forsythit is possible to get a cost 775*46439007SCharles.Forsythfor registerizing. 776*46439007SCharles.ForsythLoops are detected and the costs are multiplied 777*46439007SCharles.Forsythby three for every level of loop nesting. 778*46439007SCharles.ForsythCosts are sorted and the variables 779*46439007SCharles.Forsythare replaced by available registers on a greedy basis. 780*46439007SCharles.Forsyth.LP 781*46439007SCharles.ForsythThe 68020 has two different 782*46439007SCharles.Forsythtypes of registers. 783*46439007SCharles.ForsythFor the 68020, 784*46439007SCharles.Forsythtwo different costs are calculated for 785*46439007SCharles.Forsytheach variable life and the register type that 786*46439007SCharles.Forsythaffords the better cost is used. 787*46439007SCharles.ForsythTies are broken by counting the number of available 788*46439007SCharles.Forsythregisters of each type. 789*46439007SCharles.Forsyth.LP 790*46439007SCharles.ForsythNote that externals are registerized together with automatics. 791*46439007SCharles.ForsythThis is done by evaluating the semantics of a ``call'' instruction 792*46439007SCharles.Forsythdifferently for externals and automatics. 793*46439007SCharles.ForsythSince a call goes outside the local procedure, 794*46439007SCharles.Forsythit is assumed that a call references all externals. 795*46439007SCharles.ForsythSimilarly, 796*46439007SCharles.Forsythexternals are assumed to be set before an ``entry'' instruction 797*46439007SCharles.Forsythand assumed to be referenced after a ``return'' instruction. 798*46439007SCharles.ForsythThis makes sure that externals are in memory across calls. 799*46439007SCharles.Forsyth.LP 800*46439007SCharles.ForsythThe overall results are satisfactory. 801*46439007SCharles.ForsythIt would be nice to be able to do this processing in 802*46439007SCharles.Forsytha machine-independent way, 803*46439007SCharles.Forsythbut it is impossible to get all of the costs and 804*46439007SCharles.Forsythside effects of different choices by examining the parse tree. 805*46439007SCharles.Forsyth.LP 806*46439007SCharles.ForsythMost of the code in the registerization pass is machine-independent. 807*46439007SCharles.ForsythThe major machine-dependency is in 808*46439007SCharles.Forsythexamining a machine instruction to ask if it sets or references 809*46439007SCharles.Forsytha variable. 810*46439007SCharles.Forsyth.NH 2 811*46439007SCharles.ForsythMachine code optimization 812*46439007SCharles.Forsyth.LP 813*46439007SCharles.ForsythThe next pass walks the machine code 814*46439007SCharles.Forsythfor opportunistic optimizations. 815*46439007SCharles.ForsythFor the most part, 816*46439007SCharles.Forsyththis is highly specific to a particular 817*46439007SCharles.Forsythprocessor. 818*46439007SCharles.ForsythOne optimization that is performed 819*46439007SCharles.Forsython all of the processors is the 820*46439007SCharles.Forsythremoval of unnecessary ``move'' 821*46439007SCharles.Forsythinstructions. 822*46439007SCharles.ForsythIronically, 823*46439007SCharles.Forsythmost of these instructions were inserted by 824*46439007SCharles.Forsyththe previous pass. 825*46439007SCharles.ForsythThere are two patterns that are repetitively 826*46439007SCharles.Forsythmatched and replaced until no more matches are 827*46439007SCharles.Forsythfound. 828*46439007SCharles.ForsythThe first tries to remove ``move'' instructions 829*46439007SCharles.Forsythby relabeling variables. 830*46439007SCharles.Forsyth.LP 831*46439007SCharles.ForsythWhen a ``move'' instruction is encountered, 832*46439007SCharles.Forsythif the destination variable is set before the 833*46439007SCharles.Forsythsource variable is referenced, 834*46439007SCharles.Forsyththen all of the references to the destination 835*46439007SCharles.Forsythvariable can be renamed to the source and the ``move'' 836*46439007SCharles.Forsythcan be deleted. 837*46439007SCharles.ForsythThis transformation uses the reverse data flow 838*46439007SCharles.Forsythset up in the previous pass. 839*46439007SCharles.Forsyth.LP 840*46439007SCharles.ForsythAn example of this pattern is depicted in the following 841*46439007SCharles.Forsythtable. 842*46439007SCharles.ForsythThe pattern is in the left column and the 843*46439007SCharles.Forsythreplacement action is in the right column. 844*46439007SCharles.Forsyth.DS 845*46439007SCharles.Forsyth.CW 846*46439007SCharles.Forsyth.ta .1i .6i 1.6i 2.1i 2.6i 847*46439007SCharles.Forsyth MOVE a->b \fR(remove)\fP 848*46439007SCharles.Forsyth.R 849*46439007SCharles.Forsyth (sequence with no mention of \f(CWa\fP) 850*46439007SCharles.Forsyth.CW 851*46439007SCharles.Forsyth USE b USE a 852*46439007SCharles.Forsyth.R 853*46439007SCharles.Forsyth (sequence with no mention of \f(CWa\fP) 854*46439007SCharles.Forsyth.CW 855*46439007SCharles.Forsyth SET b SET b 856*46439007SCharles.Forsyth.R 857*46439007SCharles.Forsyth.DE 858*46439007SCharles.Forsyth.LP 859*46439007SCharles.ForsythExperiments have shown that it is marginally 860*46439007SCharles.Forsythworthwhile to rename uses of the destination variable 861*46439007SCharles.Forsythwith uses of the source variable up to 862*46439007SCharles.Forsyththe first use of the source variable. 863*46439007SCharles.Forsyth.LP 864*46439007SCharles.ForsythThe second transform will do relabeling 865*46439007SCharles.Forsythwithout deleting instructions. 866*46439007SCharles.ForsythWhen a ``move'' instruction is encountered, 867*46439007SCharles.Forsythif the source variable has been set prior 868*46439007SCharles.Forsythto the use of the destination variable 869*46439007SCharles.Forsyththen all of the references to the source 870*46439007SCharles.Forsythvariable are replaced by the destination and 871*46439007SCharles.Forsyththe ``move'' is inverted. 872*46439007SCharles.ForsythTypically, 873*46439007SCharles.Forsyththis transformation will alter two ``move'' 874*46439007SCharles.Forsythinstructions and allow the first transformation 875*46439007SCharles.Forsythanother chance to remove code. 876*46439007SCharles.ForsythThis transformation uses the forward data flow 877*46439007SCharles.Forsythset up in the previous pass. 878*46439007SCharles.Forsyth.LP 879*46439007SCharles.ForsythAgain, 880*46439007SCharles.Forsyththe following is a depiction of the transformation where 881*46439007SCharles.Forsyththe pattern is in the left column and the 882*46439007SCharles.Forsythrewrite is in the right column. 883*46439007SCharles.Forsyth.DS 884*46439007SCharles.Forsyth.CW 885*46439007SCharles.Forsyth.ta .1i .6i 1.6i 2.1i 2.6i 886*46439007SCharles.Forsyth SET a SET b 887*46439007SCharles.Forsyth.R 888*46439007SCharles.Forsyth (sequence with no use of \f(CWb\fP) 889*46439007SCharles.Forsyth.CW 890*46439007SCharles.Forsyth USE a USE b 891*46439007SCharles.Forsyth.R 892*46439007SCharles.Forsyth (sequence with no use of \f(CWb\fP) 893*46439007SCharles.Forsyth.CW 894*46439007SCharles.Forsyth MOVE a->b MOVE b->a 895*46439007SCharles.Forsyth.R 896*46439007SCharles.Forsyth.DE 897*46439007SCharles.ForsythIterating these transformations 898*46439007SCharles.Forsythwill usually get rid of all redundant ``move'' instructions. 899*46439007SCharles.Forsyth.LP 900*46439007SCharles.ForsythA problem with this organization is that the costs 901*46439007SCharles.Forsythof registerization calculated in the previous pass 902*46439007SCharles.Forsythmust depend on how well this pass can detect and remove 903*46439007SCharles.Forsythredundant instructions. 904*46439007SCharles.ForsythOften, 905*46439007SCharles.Forsytha fine candidate for registerization is rejected 906*46439007SCharles.Forsythbecause of the cost of instructions that are later 907*46439007SCharles.Forsythremoved. 908*46439007SCharles.Forsyth.NH 2 909*46439007SCharles.ForsythWriting the object file 910*46439007SCharles.Forsyth.LP 911*46439007SCharles.ForsythThe last pass walks the internal assembly language 912*46439007SCharles.Forsythand writes the object file. 913*46439007SCharles.ForsythThe object file is reduced in size by about a factor 914*46439007SCharles.Forsythof three with simple compression 915*46439007SCharles.Forsythtechniques. 916*46439007SCharles.ForsythThe most important aspect of the object file 917*46439007SCharles.Forsythformat is that it is independent of the compiling machine. 918*46439007SCharles.ForsythAll integer and floating numbers in the object 919*46439007SCharles.Forsythcode are converted to known formats and byte 920*46439007SCharles.Forsythorders. 921*46439007SCharles.Forsyth.NH 922*46439007SCharles.ForsythThe loader 923*46439007SCharles.Forsyth.LP 924*46439007SCharles.ForsythThe loader is a multiple pass program that 925*46439007SCharles.Forsythreads object files and libraries and produces 926*46439007SCharles.Forsythan executable binary. 927*46439007SCharles.ForsythThe loader also does some minimal 928*46439007SCharles.Forsythoptimizations and code rewriting. 929*46439007SCharles.ForsythMany of the operations performed by the 930*46439007SCharles.Forsythloader are machine-dependent. 931*46439007SCharles.Forsyth.LP 932*46439007SCharles.ForsythThe first pass of the loader reads the 933*46439007SCharles.Forsythobject modules into an internal data 934*46439007SCharles.Forsythstructure that looks like binary assembly language. 935*46439007SCharles.ForsythAs the instructions are read, 936*46439007SCharles.Forsythcode is reordered to remove 937*46439007SCharles.Forsythunconditional branch instructions. 938*46439007SCharles.ForsythConditional branch instructions are inverted 939*46439007SCharles.Forsythto prevent the insertion of unconditional branches. 940*46439007SCharles.ForsythThe loader will also make a copy of a few instructions 941*46439007SCharles.Forsythto remove an unconditional branch. 942*46439007SCharles.Forsyth.LP 943*46439007SCharles.ForsythThe next pass allocates addresses for 944*46439007SCharles.Forsythall external data. 945*46439007SCharles.ForsythTypical of processors is the MIPS, 946*46439007SCharles.Forsythwhich can reference ±32K bytes from a 947*46439007SCharles.Forsythregister. 948*46439007SCharles.ForsythThe loader allocates the register 949*46439007SCharles.Forsyth.CW R30 950*46439007SCharles.Forsythas the static pointer. 951*46439007SCharles.ForsythThe value placed in 952*46439007SCharles.Forsyth.CW R30 953*46439007SCharles.Forsythis the base of the data segment plus 32K. 954*46439007SCharles.ForsythIt is then cheap to reference all data in the 955*46439007SCharles.Forsythfirst 64K of the data segment. 956*46439007SCharles.ForsythExternal variables are allocated to 957*46439007SCharles.Forsyththe data segment 958*46439007SCharles.Forsythwith the smallest variables allocated first. 959*46439007SCharles.ForsythIf all of the data cannot fit into the first 960*46439007SCharles.Forsyth64K of the data segment, 961*46439007SCharles.Forsyththen usually only a few large arrays 962*46439007SCharles.Forsythneed more expensive addressing modes. 963*46439007SCharles.Forsyth.LP 964*46439007SCharles.ForsythFor the MIPS processor, 965*46439007SCharles.Forsyththe loader makes a pass over the internal 966*46439007SCharles.Forsythstructures, 967*46439007SCharles.Forsythexchanging instructions to try 968*46439007SCharles.Forsythto fill ``delay slots'' with useful work. 969*46439007SCharles.ForsythIf a useful instruction cannot be found 970*46439007SCharles.Forsythto fill a delay slot, 971*46439007SCharles.Forsyththe loader will insert 972*46439007SCharles.Forsyth``noop'' 973*46439007SCharles.Forsythinstructions. 974*46439007SCharles.ForsythThis pass is very expensive and does not 975*46439007SCharles.Forsythdo a good job. 976*46439007SCharles.ForsythAbout 40% of all instructions are in 977*46439007SCharles.Forsythdelay slots. 978*46439007SCharles.ForsythAbout 65% of these are useful instructions and 979*46439007SCharles.Forsyth35% are ``noops.'' 980*46439007SCharles.ForsythThe vendor-supplied assembler does this job 981*46439007SCharles.Forsythmore effectively, 982*46439007SCharles.Forsythfilling about 80% 983*46439007SCharles.Forsythof the delay slots with useful instructions. 984*46439007SCharles.Forsyth.LP 985*46439007SCharles.ForsythOn the 68020 processor, 986*46439007SCharles.Forsythbranch instructions come in a variety of 987*46439007SCharles.Forsythsizes depending on the relative distance 988*46439007SCharles.Forsythof the branch. 989*46439007SCharles.ForsythThus the size of branch instructions 990*46439007SCharles.Forsythcan be mutually dependent. 991*46439007SCharles.ForsythThe loader uses a multiple pass algorithm 992*46439007SCharles.Forsythto resolve the branch lengths 993*46439007SCharles.Forsyth[Szy78]. 994*46439007SCharles.ForsythInitially, all branches are assumed minimal length. 995*46439007SCharles.ForsythOn each subsequent pass, 996*46439007SCharles.Forsyththe branches are reassessed 997*46439007SCharles.Forsythand expanded if necessary. 998*46439007SCharles.ForsythWhen no more expansions occur, 999*46439007SCharles.Forsyththe locations of the instructions in 1000*46439007SCharles.Forsyththe text segment are known. 1001*46439007SCharles.Forsyth.LP 1002*46439007SCharles.ForsythOn the MIPS processor, 1003*46439007SCharles.Forsythall instructions are one size. 1004*46439007SCharles.ForsythA single pass over the instructions will 1005*46439007SCharles.Forsythdetermine the locations of all addresses 1006*46439007SCharles.Forsythin the text segment. 1007*46439007SCharles.Forsyth.LP 1008*46439007SCharles.ForsythThe last pass of the loader produces the 1009*46439007SCharles.Forsythexecutable binary. 1010*46439007SCharles.ForsythA symbol table and other tables are 1011*46439007SCharles.Forsythproduced to help the debugger to 1012*46439007SCharles.Forsythinterpret the binary symbolically. 1013*46439007SCharles.Forsyth.LP 1014*46439007SCharles.ForsythThe loader places absolute source line numbers in the symbol table. 1015*46439007SCharles.ForsythThe name and absolute line number of all 1016*46439007SCharles.Forsyth.CW #include 1017*46439007SCharles.Forsythfiles is also placed in the 1018*46439007SCharles.Forsythsymbol table so that the debuggers can 1019*46439007SCharles.Forsythassociate object code to source files. 1020*46439007SCharles.Forsyth.NH 1021*46439007SCharles.ForsythPerformance 1022*46439007SCharles.Forsyth.LP 1023*46439007SCharles.ForsythThe following is a table of the source size of the MIPS 1024*46439007SCharles.Forsythcompiler. 1025*46439007SCharles.Forsyth.DS 1026*46439007SCharles.Forsyth.ta .1i .6i 1027*46439007SCharles.Forsyth lines module 1028*46439007SCharles.Forsyth \0509 machine-independent headers 1029*46439007SCharles.Forsyth 1070 machine-independent YACC source 1030*46439007SCharles.Forsyth 6090 machine-independent C source 1031*46439007SCharles.Forsyth 1032*46439007SCharles.Forsyth \0545 machine-dependent headers 1033*46439007SCharles.Forsyth 6532 machine-dependent C source 1034*46439007SCharles.Forsyth 1035*46439007SCharles.Forsyth \0298 loader headers 1036*46439007SCharles.Forsyth 5215 loader C source 1037*46439007SCharles.Forsyth.DE 1038*46439007SCharles.Forsyth.LP 1039*46439007SCharles.ForsythThe following table shows timing 1040*46439007SCharles.Forsythof a test program 1041*46439007SCharles.Forsyththat plays checkers, running on a MIPS R4000. 1042*46439007SCharles.ForsythThe test program is 26 files totaling 12600 lines of C. 1043*46439007SCharles.ForsythThe execution time does not significantly 1044*46439007SCharles.Forsythdepend on library implementation. 1045*46439007SCharles.ForsythSince no other compiler runs on Plan 9, 1046*46439007SCharles.Forsyththe Plan 9 tests were done with the Plan 9 operating system; 1047*46439007SCharles.Forsyththe other tests were done on the vendor's operating system. 1048*46439007SCharles.ForsythThe hardware was identical in both cases. 1049*46439007SCharles.ForsythThe optimizer in the vendor's compiler 1050*46439007SCharles.Forsythis reputed to be extremely good. 1051*46439007SCharles.Forsyth.DS 1052*46439007SCharles.Forsyth.ta .1i .9i 1053*46439007SCharles.Forsyth \0\04.49s Plan 9 \f(CWvc\fP \f(CW-N\fP compile time (opposite of \f(CW-O\fP) 1054*46439007SCharles.Forsyth \0\01.72s Plan 9 \f(CWvc\fP \f(CW-N\fP load time 1055*46439007SCharles.Forsyth 148.69s Plan 9 \f(CWvc\fP \f(CW-N\fP run time 1056*46439007SCharles.Forsyth 1057*46439007SCharles.Forsyth \015.07s Plan 9 \f(CWvc\fP compile time (\f(CW-O\fP implicit) 1058*46439007SCharles.Forsyth \0\01.66s Plan 9 \f(CWvc\fP load time 1059*46439007SCharles.Forsyth \089.96s Plan 9 \f(CWvc\fP run time 1060*46439007SCharles.Forsyth 1061*46439007SCharles.Forsyth \014.83s vendor \f(CWcc\fP compile time 1062*46439007SCharles.Forsyth \0\00.38s vendor \f(CWcc\fP load time 1063*46439007SCharles.Forsyth 104.75s vendor \f(CWcc\fP run time 1064*46439007SCharles.Forsyth 1065*46439007SCharles.Forsyth \043.59s vendor \f(CWcc\fP \f(CW-O\fP compile time 1066*46439007SCharles.Forsyth \0\00.38s vendor \f(CWcc\fP \f(CW-O\fP load time 1067*46439007SCharles.Forsyth \076.19s vendor \f(CWcc\fP \f(CW-O\fP run time 1068*46439007SCharles.Forsyth 1069*46439007SCharles.Forsyth \0\08.19s vendor \f(CWcc\fP \f(CW-O3\fP compile time 1070*46439007SCharles.Forsyth \035.97s vendor \f(CWcc\fP \f(CW-O3\fP load time 1071*46439007SCharles.Forsyth \071.16s vendor \f(CWcc\fP \f(CW-O3\fP run time 1072*46439007SCharles.Forsyth.DE 1073*46439007SCharles.Forsyth.LP 1074*46439007SCharles.ForsythTo compare the Intel compiler, 1075*46439007SCharles.Forsytha program that is about 40% bit manipulation and 1076*46439007SCharles.Forsythabout 60% single precision floating point was 1077*46439007SCharles.Forsythrun on the same 33 MHz 486, once under Windows 1078*46439007SCharles.Forsythcompiled with the Watcom compiler, version 10.0, 1079*46439007SCharles.Forsythin 16-bit mode and once under 1080*46439007SCharles.ForsythPlan 9 in 32-bit mode. 1081*46439007SCharles.ForsythThe Plan 9 execution time was 27 sec while the Windows 1082*46439007SCharles.Forsythexecution time was 31 sec. 1083*46439007SCharles.Forsyth.NH 1084*46439007SCharles.ForsythConclusions 1085*46439007SCharles.Forsyth.LP 1086*46439007SCharles.ForsythThe new compilers compile 1087*46439007SCharles.Forsythquickly, 1088*46439007SCharles.Forsythload slowly, 1089*46439007SCharles.Forsythand produce 1090*46439007SCharles.Forsythmedium quality 1091*46439007SCharles.Forsythobject code. 1092*46439007SCharles.ForsythThe compilers are relatively 1093*46439007SCharles.Forsythportable, 1094*46439007SCharles.Forsythrequiring but a couple of weeks' work to 1095*46439007SCharles.Forsythproduce a compiler for a different computer. 1096*46439007SCharles.ForsythFor Plan 9, 1097*46439007SCharles.Forsythwhere we needed several compilers 1098*46439007SCharles.Forsythwith specialized features and 1099*46439007SCharles.Forsythour own object formats, 1100*46439007SCharles.Forsyththis project was indispensable. 1101*46439007SCharles.ForsythIt is also necessary for us to 1102*46439007SCharles.Forsythbe able to freely distribute our compilers 1103*46439007SCharles.Forsythwith the Plan 9 distribution. 1104*46439007SCharles.Forsyth.LP 1105*46439007SCharles.ForsythTwo problems have come up in retrospect. 1106*46439007SCharles.ForsythThe first has to do with the 1107*46439007SCharles.Forsythdivision of labor between compiler and loader. 1108*46439007SCharles.ForsythPlan 9 runs on multi-processors and as such 1109*46439007SCharles.Forsythcompilations are often done in parallel. 1110*46439007SCharles.ForsythUnfortunately, 1111*46439007SCharles.Forsythall compilations must be complete before loading 1112*46439007SCharles.Forsythcan begin. 1113*46439007SCharles.ForsythThe load is then single-threaded. 1114*46439007SCharles.ForsythWith this model, 1115*46439007SCharles.Forsythany shift of work from compile to load 1116*46439007SCharles.Forsythresults in a significant increase in real time. 1117*46439007SCharles.ForsythThe same is true of libraries that are compiled 1118*46439007SCharles.Forsythinfrequently and loaded often. 1119*46439007SCharles.ForsythIn the future, 1120*46439007SCharles.Forsythwe may try to put some of the loader work 1121*46439007SCharles.Forsythback into the compiler. 1122*46439007SCharles.Forsyth.LP 1123*46439007SCharles.ForsythThe second problem comes from 1124*46439007SCharles.Forsyththe various optimizations performed over several 1125*46439007SCharles.Forsythpasses. 1126*46439007SCharles.ForsythOften optimizations in different passes depend 1127*46439007SCharles.Forsython each other. 1128*46439007SCharles.ForsythIterating the passes could compromise efficiency, 1129*46439007SCharles.Forsythor even loop. 1130*46439007SCharles.ForsythWe see no real solution to this problem. 1131*46439007SCharles.Forsyth.NH 1132*46439007SCharles.ForsythReferences 1133*46439007SCharles.Forsyth.LP 1134*46439007SCharles.Forsyth[Aho87] A. V. Aho, R. Sethi, and J. D. Ullman, 1135*46439007SCharles.Forsyth.I 1136*46439007SCharles.ForsythCompilers \- Principles, Techniques, and Tools, 1137*46439007SCharles.Forsyth.R 1138*46439007SCharles.ForsythAddison Wesley, 1139*46439007SCharles.ForsythReading, MA, 1140*46439007SCharles.Forsyth1987. 1141*46439007SCharles.Forsyth.LP 1142*46439007SCharles.Forsyth[ANSI90] \f2American National Standard for Information Systems \- 1143*46439007SCharles.ForsythProgramming Language C\f1, American National Standards Institute, Inc., 1144*46439007SCharles.ForsythNew York, 1990. 1145*46439007SCharles.Forsyth.LP 1146*46439007SCharles.Forsyth[Dav91] J. W. Davidson and D. B. Whalley, 1147*46439007SCharles.Forsyth``Methods for Saving and Restoring Register Values across Function Calls'', 1148*46439007SCharles.Forsyth.I 1149*46439007SCharles.ForsythSoftware\-Practice and Experience, 1150*46439007SCharles.Forsyth.R 1151*46439007SCharles.ForsythVol 21(2), pp. 149-165, February 1991. 1152*46439007SCharles.Forsyth.LP 1153*46439007SCharles.Forsyth[Joh79] S. C. Johnson, 1154*46439007SCharles.Forsyth``YACC \- Yet Another Compiler Compiler'', 1155*46439007SCharles.Forsyth.I 1156*46439007SCharles.ForsythUNIX Programmer's Manual, Seventh Ed., Vol. 2A, 1157*46439007SCharles.Forsyth.R 1158*46439007SCharles.ForsythAT&T Bell Laboratories, 1159*46439007SCharles.ForsythMurray Hill, NJ, 1160*46439007SCharles.Forsyth1979. 1161*46439007SCharles.Forsyth.LP 1162*46439007SCharles.Forsyth[Set70] R. Sethi and J. D. Ullman, 1163*46439007SCharles.Forsyth``The Generation of Optimal Code for Arithmetic Expressions'', 1164*46439007SCharles.Forsyth.I 1165*46439007SCharles.ForsythJournal of the ACM, 1166*46439007SCharles.Forsyth.R 1167*46439007SCharles.ForsythVol 17(4), pp. 715-728, 1970. 1168*46439007SCharles.Forsyth.LP 1169*46439007SCharles.Forsyth[Szy78] T. G. Szymanski, 1170*46439007SCharles.Forsyth``Assembling Code for Machines with Span-dependent Instructions'', 1171*46439007SCharles.Forsyth.I 1172*46439007SCharles.ForsythCommunications of the ACM, 1173*46439007SCharles.Forsyth.R 1174*46439007SCharles.ForsythVol 21(4), pp. 300-308, 1978. 1175