1*219b2ee8SDavid du Colombier.TL 2*219b2ee8SDavid du ColombierPlan 9 C Compilers 3*219b2ee8SDavid du Colombier.AU 4*219b2ee8SDavid du ColombierKen Thompson 5*219b2ee8SDavid du Colombierken@plan9.att.com 6*219b2ee8SDavid du Colombier.AB 7*219b2ee8SDavid du Colombier.FS 8*219b2ee8SDavid du ColombierOriginally appeared, in a different form, in 9*219b2ee8SDavid du Colombier.I 10*219b2ee8SDavid du ColombierProceedings of the Summer 1990 UKUUG Conference, 11*219b2ee8SDavid du Colombier.R 12*219b2ee8SDavid du Colombierpp. 41-51, 13*219b2ee8SDavid du ColombierLondon, 1990. 14*219b2ee8SDavid du Colombier.FE 15*219b2ee8SDavid du ColombierThis paper describes the overall structure and function of the Plan 9 C compilers. 16*219b2ee8SDavid du ColombierA more detailed implementation document 17*219b2ee8SDavid du Colombierfor any one of the compilers 18*219b2ee8SDavid du Colombieris yet to be written. 19*219b2ee8SDavid du Colombier.AE 20*219b2ee8SDavid du Colombier.NH 21*219b2ee8SDavid du ColombierIntroduction 22*219b2ee8SDavid du Colombier.LP 23*219b2ee8SDavid du ColombierThere are currently six compilers in the series. 24*219b2ee8SDavid du ColombierFour of the compilers (MIPS 3000, SPARC, Intel 386, and Motorola 68020) 25*219b2ee8SDavid du Colombierare considered active and are used to compile 26*219b2ee8SDavid du Colombiercurrent versions of Plan 9. 27*219b2ee8SDavid du ColombierThe other two (AT&T 3210, Intel 960) have been used 28*219b2ee8SDavid du Colombieronly to program peripherals. 29*219b2ee8SDavid du Colombier.NH 30*219b2ee8SDavid du ColombierStructure 31*219b2ee8SDavid du Colombier.LP 32*219b2ee8SDavid du ColombierThe compiler is a single program that produces an 33*219b2ee8SDavid du Colombierobject file. 34*219b2ee8SDavid du ColombierCombined in the compiler are the traditional 35*219b2ee8SDavid du Colombierroles of preprocessor, lexical analyzer, parser, code generator, 36*219b2ee8SDavid du Colombierlocal optimizer, 37*219b2ee8SDavid du Colombierand first half of the assembler. 38*219b2ee8SDavid du ColombierThe object files are binary forms of assembly 39*219b2ee8SDavid du Colombierlanguage, 40*219b2ee8SDavid du Colombiersimilar to what might be passed between 41*219b2ee8SDavid du Colombierthe first and second passes of an assembler. 42*219b2ee8SDavid du Colombier.LP 43*219b2ee8SDavid du ColombierObject files and libraries 44*219b2ee8SDavid du Colombierare combined by a loader 45*219b2ee8SDavid du Colombierprogram to produce the executable binary. 46*219b2ee8SDavid du ColombierThe loader combines the roles of second half 47*219b2ee8SDavid du Colombierof the assembler, global optimizer, and loader. 48*219b2ee8SDavid du ColombierThe names of the compliers, loaders, and assemblers 49*219b2ee8SDavid du Colombierare as follows: 50*219b2ee8SDavid du Colombier.DS 51*219b2ee8SDavid du Colombier.ta 1.5i 52*219b2ee8SDavid du Colombier.de Ta 53*219b2ee8SDavid du Colombier\\$1 \f(CW\\$2\fP \f(CW\\$3\fP \f(CW\\$4\fP 54*219b2ee8SDavid du Colombier.. 55*219b2ee8SDavid du Colombier.Ta MIPS vc vl va 56*219b2ee8SDavid du Colombier.Ta SPARC kc kl ka 57*219b2ee8SDavid du Colombier.Ta Intel\ 386 8c 8l 8a 58*219b2ee8SDavid du Colombier.Ta Motorola\ 68020 2c 2l 2a 59*219b2ee8SDavid du Colombier.Ta AT&T\ 3210 xc xl xa 60*219b2ee8SDavid du Colombier.Ta Intel\ 960 9c 9l 9a 61*219b2ee8SDavid du Colombier.DE 62*219b2ee8SDavid du ColombierThere is a further breakdown 63*219b2ee8SDavid du Colombierin the source of the compilers into 64*219b2ee8SDavid du Colombierobject-independent and 65*219b2ee8SDavid du Colombierobject-dependent 66*219b2ee8SDavid du Colombierparts. 67*219b2ee8SDavid du ColombierAll of the object-independent parts 68*219b2ee8SDavid du Colombierare combined into source files in the 69*219b2ee8SDavid du Colombierdirectory 70*219b2ee8SDavid du Colombier.CW /sys/src/cmd/cc . 71*219b2ee8SDavid du ColombierThe object-dependent parts are collected 72*219b2ee8SDavid du Colombierin a separate directory for each compiler, 73*219b2ee8SDavid du Colombierfor example 74*219b2ee8SDavid du Colombier.CW /sys/src/cmd/vc . 75*219b2ee8SDavid du ColombierAll of the code, 76*219b2ee8SDavid du Colombierboth object-independent and 77*219b2ee8SDavid du Colombierobject-dependent, 78*219b2ee8SDavid du Colombieris machine-independent 79*219b2ee8SDavid du Colombierand may be cross-compiled and executed on any 80*219b2ee8SDavid du Colombierof the architectures. 81*219b2ee8SDavid du Colombier.NH 82*219b2ee8SDavid du ColombierThe Language 83*219b2ee8SDavid du Colombier.LP 84*219b2ee8SDavid du ColombierThe compiler implements ANSI C with some 85*219b2ee8SDavid du Colombierrestrictions and extensions 86*219b2ee8SDavid du Colombier[ANSI90]. 87*219b2ee8SDavid du ColombierMost of the restrictions are due to 88*219b2ee8SDavid du Colombierpersonal preference, while 89*219b2ee8SDavid du Colombiermost of the extensions were to help in 90*219b2ee8SDavid du Colombierthe implementation of Plan 9. 91*219b2ee8SDavid du ColombierThere are other departures from the standard, 92*219b2ee8SDavid du Colombierparticularly in the libraries, 93*219b2ee8SDavid du Colombierthat are beyond the scope of this 94*219b2ee8SDavid du Colombierpaper. 95*219b2ee8SDavid du Colombier.NH 2 96*219b2ee8SDavid du ColombierRegister, volatile, const 97*219b2ee8SDavid du Colombier.LP 98*219b2ee8SDavid du ColombierThe keywords 99*219b2ee8SDavid du Colombier.CW register , 100*219b2ee8SDavid du Colombier.CW volatile , 101*219b2ee8SDavid du Colombierand 102*219b2ee8SDavid du Colombier.CW const 103*219b2ee8SDavid du Colombierare recognized syntactically 104*219b2ee8SDavid du Colombierbut are semantically ignored. 105*219b2ee8SDavid du ColombierThus 106*219b2ee8SDavid du Colombierassigning to a 107*219b2ee8SDavid du Colombier.CW const 108*219b2ee8SDavid du Colombiervariable is not diagnosed, nor is 109*219b2ee8SDavid du Colombiertaking the address of a 110*219b2ee8SDavid du Colombier.CW register . 111*219b2ee8SDavid du Colombier.NH 2 112*219b2ee8SDavid du ColombierThe preprocessor 113*219b2ee8SDavid du Colombier.LP 114*219b2ee8SDavid du ColombierThe C preprocessor is probably the 115*219b2ee8SDavid du Colombierbiggest departure from the ANSI standard. 116*219b2ee8SDavid du Colombier.LP 117*219b2ee8SDavid du ColombierThe preprocessor built into the Plan 9 compilers does not support 118*219b2ee8SDavid du Colombier.CW #if , 119*219b2ee8SDavid du Colombieralthough it does handle 120*219b2ee8SDavid du Colombier.CW #ifdef 121*219b2ee8SDavid du Colombierand 122*219b2ee8SDavid du Colombier.CW #include . 123*219b2ee8SDavid du ColombierIf it is necessary to be more standard, 124*219b2ee8SDavid du Colombierthe source text can first be run through the separate ANSI C 125*219b2ee8SDavid du Colombierpreprocessor, 126*219b2ee8SDavid du Colombier.CW cpp . 127*219b2ee8SDavid du Colombier.NH 2 128*219b2ee8SDavid du ColombierUnnamed substructures 129*219b2ee8SDavid du Colombier.LP 130*219b2ee8SDavid du ColombierThe most important and most heavily used of the 131*219b2ee8SDavid du Colombierextensions is the declaration of an 132*219b2ee8SDavid du Colombierunnamed substructure or subunion. 133*219b2ee8SDavid du ColombierFor example: 134*219b2ee8SDavid du Colombier.DS 135*219b2ee8SDavid du Colombier.CW 136*219b2ee8SDavid du Colombier.ta .1i .6i 1.1i 1.6i 137*219b2ee8SDavid du Colombier typedef 138*219b2ee8SDavid du Colombier struct lock 139*219b2ee8SDavid du Colombier { 140*219b2ee8SDavid du Colombier int locked; 141*219b2ee8SDavid du Colombier } Lock; 142*219b2ee8SDavid du Colombier 143*219b2ee8SDavid du Colombier typedef 144*219b2ee8SDavid du Colombier struct node 145*219b2ee8SDavid du Colombier { 146*219b2ee8SDavid du Colombier int type; 147*219b2ee8SDavid du Colombier union 148*219b2ee8SDavid du Colombier { 149*219b2ee8SDavid du Colombier double dval; 150*219b2ee8SDavid du Colombier float fval; 151*219b2ee8SDavid du Colombier long lval; 152*219b2ee8SDavid du Colombier }; 153*219b2ee8SDavid du Colombier Lock; 154*219b2ee8SDavid du Colombier } Node; 155*219b2ee8SDavid du Colombier 156*219b2ee8SDavid du Colombier Lock* lock; 157*219b2ee8SDavid du Colombier Node* node; 158*219b2ee8SDavid du Colombier.DE 159*219b2ee8SDavid du ColombierThe declaration of 160*219b2ee8SDavid du Colombier.CW Node 161*219b2ee8SDavid du Colombierhas an unnamed substructure of type 162*219b2ee8SDavid du Colombier.CW Lock 163*219b2ee8SDavid du Colombierand an unnamed subunion. 164*219b2ee8SDavid du ColombierOne use of this feature allows references to elements of the 165*219b2ee8SDavid du Colombiersubunit to be accessed as if they were in 166*219b2ee8SDavid du Colombierthe outer structure. 167*219b2ee8SDavid du ColombierThus 168*219b2ee8SDavid du Colombier.CW node->dval 169*219b2ee8SDavid du Colombierand 170*219b2ee8SDavid du Colombier.CW node->locked 171*219b2ee8SDavid du Colombierare legitimate references. 172*219b2ee8SDavid du Colombier.LP 173*219b2ee8SDavid du ColombierWhen an outer structure is used 174*219b2ee8SDavid du Colombierin a context that is only legal for 175*219b2ee8SDavid du Colombieran unnamed substructure, 176*219b2ee8SDavid du Colombierthe compiler promotes the reference to the 177*219b2ee8SDavid du Colombierunnamed substructure. 178*219b2ee8SDavid du ColombierThis is true for references to structures and 179*219b2ee8SDavid du Colombierto references to pointers to structures. 180*219b2ee8SDavid du ColombierThis happens in assignment statements and 181*219b2ee8SDavid du Colombierin argument passing where prototypes have been 182*219b2ee8SDavid du Colombierdeclared. 183*219b2ee8SDavid du ColombierThus, continuing with the example, 184*219b2ee8SDavid du Colombier.DS 185*219b2ee8SDavid du Colombier.CW 186*219b2ee8SDavid du Colombier.ta .1i .6i 1.1i 1.6i 187*219b2ee8SDavid du Colombier lock = node; 188*219b2ee8SDavid du Colombier.DE 189*219b2ee8SDavid du Colombierwould assign a pointer to the unnamed 190*219b2ee8SDavid du Colombier.CW Lock 191*219b2ee8SDavid du Colombierin 192*219b2ee8SDavid du Colombierthe 193*219b2ee8SDavid du Colombier.CW Node 194*219b2ee8SDavid du Colombierto the variable 195*219b2ee8SDavid du Colombier.CW lock . 196*219b2ee8SDavid du ColombierAnother example, 197*219b2ee8SDavid du Colombier.DS 198*219b2ee8SDavid du Colombier.CW 199*219b2ee8SDavid du Colombier.ta .1i .6i 1.1i 1.6i 200*219b2ee8SDavid du Colombier extern void lock(Lock*); 201*219b2ee8SDavid du Colombier func(...) 202*219b2ee8SDavid du Colombier { 203*219b2ee8SDavid du Colombier ... 204*219b2ee8SDavid du Colombier lock(node); 205*219b2ee8SDavid du Colombier ... 206*219b2ee8SDavid du Colombier } 207*219b2ee8SDavid du Colombier.DE 208*219b2ee8SDavid du Colombierwill pass a pointer to the 209*219b2ee8SDavid du Colombier.CW Lock 210*219b2ee8SDavid du Colombiersubstructure. 211*219b2ee8SDavid du Colombier.NH 2 212*219b2ee8SDavid du ColombierStructure displays 213*219b2ee8SDavid du Colombier.LP 214*219b2ee8SDavid du ColombierA structure cast followed by a list of expressions in braces is 215*219b2ee8SDavid du Colombieran expression with the type of the structure and elements assigned from 216*219b2ee8SDavid du Colombierthe corresponding list. 217*219b2ee8SDavid du ColombierStructures are now almost first-class citizens of the language. 218*219b2ee8SDavid du ColombierIt is common to see code like this: 219*219b2ee8SDavid du Colombier.DS 220*219b2ee8SDavid du Colombier.CW 221*219b2ee8SDavid du Colombier.ta .1i 222*219b2ee8SDavid du Colombier r = (Rectangle){point1, (Point){x,y+2}}; 223*219b2ee8SDavid du Colombier.DE 224*219b2ee8SDavid du Colombier.NH 2 225*219b2ee8SDavid du ColombierInitialization indexes 226*219b2ee8SDavid du Colombier.LP 227*219b2ee8SDavid du ColombierIn initializers of arrays, 228*219b2ee8SDavid du Colombierone may place a constant expression 229*219b2ee8SDavid du Colombierin square brackets before an initializer. 230*219b2ee8SDavid du ColombierThis causes the next initializer to assign 231*219b2ee8SDavid du Colombierthe indicated element. 232*219b2ee8SDavid du ColombierFor example: 233*219b2ee8SDavid du Colombier.DS 234*219b2ee8SDavid du Colombier.CW 235*219b2ee8SDavid du Colombier.ta .1i .6i 1.6i 236*219b2ee8SDavid du Colombier enum errors 237*219b2ee8SDavid du Colombier { 238*219b2ee8SDavid du Colombier Etoobig, 239*219b2ee8SDavid du Colombier Ealarm, 240*219b2ee8SDavid du Colombier Egreg 241*219b2ee8SDavid du Colombier }; 242*219b2ee8SDavid du Colombier char* errstrings[] = 243*219b2ee8SDavid du Colombier { 244*219b2ee8SDavid du Colombier [Ealarm] "Alarm call", 245*219b2ee8SDavid du Colombier [Egreg] "Panic: out of mbufs", 246*219b2ee8SDavid du Colombier [Etoobig] "Arg list too long", 247*219b2ee8SDavid du Colombier }; 248*219b2ee8SDavid du Colombier.DE 249*219b2ee8SDavid du ColombierIn the same way, 250*219b2ee8SDavid du Colombierindividual structures members may 251*219b2ee8SDavid du Colombierbe initialized in any order by preceding the initialization with 252*219b2ee8SDavid du Colombier.CW .tagname . 253*219b2ee8SDavid du ColombierBoth forms allow an optional 254*219b2ee8SDavid du Colombier.CW = , 255*219b2ee8SDavid du Colombierto be compatible with a proposed 256*219b2ee8SDavid du Colombierextension to ANSI C. 257*219b2ee8SDavid du Colombier.NH 2 258*219b2ee8SDavid du ColombierExternal register 259*219b2ee8SDavid du Colombier.LP 260*219b2ee8SDavid du ColombierThe declaration 261*219b2ee8SDavid du Colombier.CW extern 262*219b2ee8SDavid du Colombier.CW register 263*219b2ee8SDavid du Colombierwill dedicate a register to 264*219b2ee8SDavid du Colombiera variable on a global basis. 265*219b2ee8SDavid du ColombierIt can be used only under special circumstances. 266*219b2ee8SDavid du ColombierExternal register variables must be identically 267*219b2ee8SDavid du Colombierdeclared in all modules and 268*219b2ee8SDavid du Colombierlibraries. 269*219b2ee8SDavid du ColombierThe feature is not intended for efficiency, 270*219b2ee8SDavid du Colombieralthough it can produce efficient code; 271*219b2ee8SDavid du Colombierrather it represents a unique storage class that 272*219b2ee8SDavid du Colombierwould be hard to get any other way. 273*219b2ee8SDavid du ColombierOn a shared-memory multi-processor, 274*219b2ee8SDavid du Colombieran external register is 275*219b2ee8SDavid du Colombierone-per-processor and neither one-per-procedure (automatic) 276*219b2ee8SDavid du Colombieror one-per-system (external). 277*219b2ee8SDavid du ColombierIt is used for two variables in the Plan 9 kernel, 278*219b2ee8SDavid du Colombier.CW u 279*219b2ee8SDavid du Colombierand 280*219b2ee8SDavid du Colombier.CW m . 281*219b2ee8SDavid du Colombier.CW U 282*219b2ee8SDavid du Colombieris a pointer to the structure representing the currently running process 283*219b2ee8SDavid du Colombierand 284*219b2ee8SDavid du Colombier.CW m 285*219b2ee8SDavid du Colombieris a pointer to the per-machine data structure. 286*219b2ee8SDavid du Colombier.NH 2 287*219b2ee8SDavid du ColombierLong long 288*219b2ee8SDavid du Colombier.LP 289*219b2ee8SDavid du ColombierThe compilers accept 290*219b2ee8SDavid du Colombier.CW long 291*219b2ee8SDavid du Colombier.CW long 292*219b2ee8SDavid du Colombieras a basic type meaning 64-bit integer. 293*219b2ee8SDavid du ColombierOn all of the machines 294*219b2ee8SDavid du Colombierthis type is synthesized from 32-bit instructions. 295*219b2ee8SDavid du Colombier.NH 2 296*219b2ee8SDavid du ColombierPragma 297*219b2ee8SDavid du Colombier.LP 298*219b2ee8SDavid du ColombierThe compilers accept 299*219b2ee8SDavid du Colombier.CW #pragma 300*219b2ee8SDavid du Colombier.CW lib 301*219b2ee8SDavid du Colombier.I libname 302*219b2ee8SDavid du Colombierand pass the 303*219b2ee8SDavid du Colombierlibrary name string uninterpreted 304*219b2ee8SDavid du Colombierto the loader. 305*219b2ee8SDavid du ColombierThe loader uses the library name to 306*219b2ee8SDavid du Colombierfind libraries to load. 307*219b2ee8SDavid du ColombierIf the name contains 308*219b2ee8SDavid du Colombier.CW %O , 309*219b2ee8SDavid du Colombierit is replaced with 310*219b2ee8SDavid du Colombierthe single character object type of the compiler 311*219b2ee8SDavid du Colombier(e.g., 312*219b2ee8SDavid du Colombier.CW v 313*219b2ee8SDavid du Colombierfor the MIPS). 314*219b2ee8SDavid du ColombierIf the name contains 315*219b2ee8SDavid du Colombier.CW %M , 316*219b2ee8SDavid du Colombierit is replaced with 317*219b2ee8SDavid du Colombierthe architecture type for the compiler 318*219b2ee8SDavid du Colombier(e.g., 319*219b2ee8SDavid du Colombier.CW mips 320*219b2ee8SDavid du Colombierfor the MIPS). 321*219b2ee8SDavid du ColombierIf the name starts with 322*219b2ee8SDavid du Colombier.CW / 323*219b2ee8SDavid du Colombierit is an absolute pathname; 324*219b2ee8SDavid du Colombierif it starts with 325*219b2ee8SDavid du Colombier.CW . 326*219b2ee8SDavid du Colombierthen it is searched for in the loader's current directory. 327*219b2ee8SDavid du ColombierOtherwise, the name is searched from 328*219b2ee8SDavid du Colombier.CW /%M/lib . 329*219b2ee8SDavid du ColombierSuch 330*219b2ee8SDavid du Colombier.CW #pragma 331*219b2ee8SDavid du Colombierstatements in header files guarantee that the correct 332*219b2ee8SDavid du Colombierlibraries are always linked with a program without the 333*219b2ee8SDavid du Colombierneed to specify them explicitly at link time. 334*219b2ee8SDavid du Colombier.NH 335*219b2ee8SDavid du ColombierObject module conventions 336*219b2ee8SDavid du Colombier.LP 337*219b2ee8SDavid du ColombierThe overall conventions of the runtime environment 338*219b2ee8SDavid du Colombierare important 339*219b2ee8SDavid du Colombierto runtime efficiency. 340*219b2ee8SDavid du ColombierIn this section, 341*219b2ee8SDavid du Colombierseveral of these conventions are discussed. 342*219b2ee8SDavid du Colombier.NH 2 343*219b2ee8SDavid du ColombierRegister saving 344*219b2ee8SDavid du Colombier.LP 345*219b2ee8SDavid du ColombierIn the Plan 9 compilers, 346*219b2ee8SDavid du Colombierthe caller of a procedure saves the registers. 347*219b2ee8SDavid du ColombierWith caller-saves, 348*219b2ee8SDavid du Colombierthe leaf procedures can use all the 349*219b2ee8SDavid du Colombierregisters and never save them. 350*219b2ee8SDavid du ColombierIf you spend a lot of time at the leaves, 351*219b2ee8SDavid du Colombierthis seems preferable. 352*219b2ee8SDavid du ColombierWith callee-saves, 353*219b2ee8SDavid du Colombierthe saving of the registers is done 354*219b2ee8SDavid du Colombierin the single point of entry and return. 355*219b2ee8SDavid du ColombierIf you are interested in space, 356*219b2ee8SDavid du Colombierthis seems preferable. 357*219b2ee8SDavid du ColombierIn both, 358*219b2ee8SDavid du Colombierthere is a degree of uncertainty 359*219b2ee8SDavid du Colombierabout what registers need to be saved. 360*219b2ee8SDavid du ColombierCallee-saved registers make it difficult to 361*219b2ee8SDavid du Colombierfind variables in registers in debuggers. 362*219b2ee8SDavid du ColombierCallee-saved registers also complicate 363*219b2ee8SDavid du Colombierthe implementation of 364*219b2ee8SDavid du Colombier.CW longjmp . 365*219b2ee8SDavid du ColombierThe convincing argument is 366*219b2ee8SDavid du Colombierthat with caller-saves, 367*219b2ee8SDavid du Colombierthe decision to registerize a variable 368*219b2ee8SDavid du Colombiercan include the cost of saving the register 369*219b2ee8SDavid du Colombieracross calls. 370*219b2ee8SDavid du ColombierFor a further discussion of caller- vs. callee-saves, 371*219b2ee8SDavid du Colombiersee the paper by Davidson and Whalley [Dav91]. 372*219b2ee8SDavid du Colombier.LP 373*219b2ee8SDavid du ColombierIn the Plan 9 operating system, 374*219b2ee8SDavid du Colombiercalls to the kernel look like normal procedure 375*219b2ee8SDavid du Colombiercalls, which means 376*219b2ee8SDavid du Colombierthe caller 377*219b2ee8SDavid du Colombierhas saved the registers and the system 378*219b2ee8SDavid du Colombierentry does not have to. 379*219b2ee8SDavid du ColombierThis makes system calls considerably faster. 380*219b2ee8SDavid du ColombierSince this is a potential security hole, 381*219b2ee8SDavid du Colombierand can lead to non-determinism, 382*219b2ee8SDavid du Colombierthe system may eventually save the registers 383*219b2ee8SDavid du Colombieron entry, 384*219b2ee8SDavid du Colombieror more likely clear the registers on return. 385*219b2ee8SDavid du Colombier.NH 2 386*219b2ee8SDavid du ColombierCalling convention 387*219b2ee8SDavid du Colombier.LP 388*219b2ee8SDavid du ColombierOlder C compilers maintain a frame pointer, which is at a known constant 389*219b2ee8SDavid du Colombieroffset from the stack pointer within each function. 390*219b2ee8SDavid du ColombierFor machines where the stack grows towards zero, 391*219b2ee8SDavid du Colombierthe argument pointer is at a known constant offset 392*219b2ee8SDavid du Colombierfrom the frame pointer. 393*219b2ee8SDavid du ColombierSince the stack grows down in Plan 9, 394*219b2ee8SDavid du Colombierthe Plan 9 compilers 395*219b2ee8SDavid du Colombierkeep neither an 396*219b2ee8SDavid du Colombierexplicit frame pointer nor 397*219b2ee8SDavid du Colombieran explicit argument pointer; 398*219b2ee8SDavid du Colombierinstead they generate addresses relative to the stack pointer. 399*219b2ee8SDavid du Colombier.LP 400*219b2ee8SDavid du ColombierOn some architectures, the first argument to a subroutine is passed in a register. 401*219b2ee8SDavid du Colombier.NH 2 402*219b2ee8SDavid du ColombierFunctions returning structures 403*219b2ee8SDavid du Colombier.LP 404*219b2ee8SDavid du ColombierStructures longer than one word are awkward to implement 405*219b2ee8SDavid du Colombiersince they do not fit in registers and must 406*219b2ee8SDavid du Colombierbe passed around in memory. 407*219b2ee8SDavid du ColombierFunctions that return structures 408*219b2ee8SDavid du Colombierare particularly clumsy. 409*219b2ee8SDavid du ColombierThe Plan 9 compilers pass the return address of 410*219b2ee8SDavid du Colombiera structure as the first argument of a 411*219b2ee8SDavid du Colombierfunction that has a structure return value. 412*219b2ee8SDavid du ColombierThus 413*219b2ee8SDavid du Colombier.DS 414*219b2ee8SDavid du Colombier.CW 415*219b2ee8SDavid du Colombier.ta .1i .6i 1.1i 1.6i 416*219b2ee8SDavid du Colombier x = f(...) 417*219b2ee8SDavid du Colombier.DE 418*219b2ee8SDavid du Colombieris rewritten as 419*219b2ee8SDavid du Colombier.DS 420*219b2ee8SDavid du Colombier.CW 421*219b2ee8SDavid du Colombier.ta .1i .6i 1.1i 1.6i 422*219b2ee8SDavid du Colombier f(&x, ...)\f1. 423*219b2ee8SDavid du Colombier.DE 424*219b2ee8SDavid du ColombierThis saves a copy and makes the compilation 425*219b2ee8SDavid du Colombiermuch less clumsy. 426*219b2ee8SDavid du ColombierA disadvantage is that if you call this 427*219b2ee8SDavid du Colombierfunction without an assignment, 428*219b2ee8SDavid du Colombiera dummy location must be invented. 429*219b2ee8SDavid du Colombier.LP 430*219b2ee8SDavid du ColombierThere is also a danger of calling a function 431*219b2ee8SDavid du Colombierthat returns a structure without declaring 432*219b2ee8SDavid du Colombierit as such. 433*219b2ee8SDavid du ColombierWith ANSI C function prototypes, 434*219b2ee8SDavid du Colombierthis error need never occur. 435*219b2ee8SDavid du Colombier.NH 436*219b2ee8SDavid du ColombierImplementation 437*219b2ee8SDavid du Colombier.LP 438*219b2ee8SDavid du ColombierThe compiler is divided internally into 439*219b2ee8SDavid du Colombierfour machine-independent passes, 440*219b2ee8SDavid du Colombierfour machine-dependent passes, 441*219b2ee8SDavid du Colombierand an output pass. 442*219b2ee8SDavid du ColombierThe next nine sections describe each pass in order. 443*219b2ee8SDavid du Colombier.NH 2 444*219b2ee8SDavid du ColombierParsing 445*219b2ee8SDavid du Colombier.LP 446*219b2ee8SDavid du ColombierThe first pass is a YACC-based parser 447*219b2ee8SDavid du Colombier[Joh79]. 448*219b2ee8SDavid du ColombierDeclarations are interpreted immediately, 449*219b2ee8SDavid du Colombierbuilding a block structured symbol table. 450*219b2ee8SDavid du ColombierExecutable statements are put into a parse tree 451*219b2ee8SDavid du Colombierand collected, 452*219b2ee8SDavid du Colombierwithout interpretation. 453*219b2ee8SDavid du ColombierAt the end of each procedure, 454*219b2ee8SDavid du Colombierthe parse tree for the function is 455*219b2ee8SDavid du Colombierexamined by the other passes of the compiler. 456*219b2ee8SDavid du Colombier.LP 457*219b2ee8SDavid du ColombierThe input stream of the parser is 458*219b2ee8SDavid du Colombiera pushdown list of input activations. 459*219b2ee8SDavid du ColombierThe preprocessor 460*219b2ee8SDavid du Colombierexpansions of 461*219b2ee8SDavid du Colombiermacros 462*219b2ee8SDavid du Colombierand 463*219b2ee8SDavid du Colombier.CW #include 464*219b2ee8SDavid du Colombierare implemented as pushdowns. 465*219b2ee8SDavid du ColombierThus there is no separate 466*219b2ee8SDavid du Colombierpass for preprocessing. 467*219b2ee8SDavid du Colombier.NH 2 468*219b2ee8SDavid du ColombierTyping 469*219b2ee8SDavid du Colombier.LP 470*219b2ee8SDavid du ColombierThe next pass distributes typing information 471*219b2ee8SDavid du Colombierto every node of the tree. 472*219b2ee8SDavid du ColombierImplicit operations on the tree are added, 473*219b2ee8SDavid du Colombiersuch as type promotions and taking the 474*219b2ee8SDavid du Colombieraddress of arrays and functions. 475*219b2ee8SDavid du Colombier.NH 2 476*219b2ee8SDavid du ColombierMachine-independent optimization 477*219b2ee8SDavid du Colombier.LP 478*219b2ee8SDavid du ColombierThe next pass performs optimizations 479*219b2ee8SDavid du Colombierand transformations of the tree, such as converting 480*219b2ee8SDavid du Colombier.CW &*x 481*219b2ee8SDavid du Colombierand 482*219b2ee8SDavid du Colombier.CW *&x 483*219b2ee8SDavid du Colombierinto 484*219b2ee8SDavid du Colombier.CW x . 485*219b2ee8SDavid du ColombierConstant expressions are converted to constants in this pass. 486*219b2ee8SDavid du Colombier.NH 2 487*219b2ee8SDavid du ColombierArithmetic rewrites 488*219b2ee8SDavid du Colombier.LP 489*219b2ee8SDavid du ColombierThis is another machine-independent optimization. 490*219b2ee8SDavid du ColombierSubtrees of add, subtract, and multiply of integers are 491*219b2ee8SDavid du Colombierrewritten for easier compilation. 492*219b2ee8SDavid du ColombierThe major transformation is factoring: 493*219b2ee8SDavid du Colombier.CW 4+8*a+16*b+5 494*219b2ee8SDavid du Colombieris transformed into 495*219b2ee8SDavid du Colombier.CW 9+8*(a+2*b) . 496*219b2ee8SDavid du ColombierSuch expressions arise from address 497*219b2ee8SDavid du Colombiermanipulation and array indexing. 498*219b2ee8SDavid du Colombier.NH 2 499*219b2ee8SDavid du ColombierAddressability 500*219b2ee8SDavid du Colombier.LP 501*219b2ee8SDavid du ColombierThis is the first of the machine-dependent passes. 502*219b2ee8SDavid du ColombierThe addressability of a processor is defined as the set of 503*219b2ee8SDavid du Colombierexpressions that is legal in the address field 504*219b2ee8SDavid du Colombierof a machine language instruction. 505*219b2ee8SDavid du ColombierThe addressability of different processors varies widely. 506*219b2ee8SDavid du ColombierAt one end of the spectrum are the 68020 and VAX, 507*219b2ee8SDavid du Colombierwhich allow a complex mix of incrementing, 508*219b2ee8SDavid du Colombierdecrementing, 509*219b2ee8SDavid du Colombierindexing, and relative addressing. 510*219b2ee8SDavid du ColombierAt the other end is the MIPS, 511*219b2ee8SDavid du Colombierwhich allows only registers and constant offsets from the 512*219b2ee8SDavid du Colombiercontents of a register. 513*219b2ee8SDavid du ColombierThe addressability can be different for different instructions 514*219b2ee8SDavid du Colombierwithin the same processor. 515*219b2ee8SDavid du Colombier.LP 516*219b2ee8SDavid du ColombierIt is important to the code generator to know when a 517*219b2ee8SDavid du Colombiersubtree represents an address of a particular type. 518*219b2ee8SDavid du ColombierThis is done with a bottom-up walk of the tree. 519*219b2ee8SDavid du ColombierIn this pass, the leaves are labeled with small integers. 520*219b2ee8SDavid du ColombierWhen an internal node is encountered, 521*219b2ee8SDavid du Colombierit is labeled by consulting a table indexed by the 522*219b2ee8SDavid du Colombierlabels on the left and right subtrees. 523*219b2ee8SDavid du ColombierFor example, 524*219b2ee8SDavid du Colombieron the 68020 processor, 525*219b2ee8SDavid du Colombierit is possible to address an 526*219b2ee8SDavid du Colombieroffset from a named location. 527*219b2ee8SDavid du ColombierIn C, this is represented by the expression 528*219b2ee8SDavid du Colombier.CW *(&name+constant) . 529*219b2ee8SDavid du ColombierThis is marked addressable by the following table. 530*219b2ee8SDavid du ColombierIn the table, 531*219b2ee8SDavid du Colombiera node represented by the left column is marked 532*219b2ee8SDavid du Colombierwith a small integer from the right column. 533*219b2ee8SDavid du ColombierMarks of the form 534*219b2ee8SDavid du Colombier.CW A\s-2\di\u\s0 535*219b2ee8SDavid du Colombierare addressable while 536*219b2ee8SDavid du Colombiermarks of the form 537*219b2ee8SDavid du Colombier.CW N\s-2\di\u\s0 538*219b2ee8SDavid du Colombierare not addressable. 539*219b2ee8SDavid du Colombier.DS 540*219b2ee8SDavid du Colombier.B 541*219b2ee8SDavid du Colombier.ta .1i 1.1i 542*219b2ee8SDavid du Colombier Node Marked 543*219b2ee8SDavid du Colombier.CW 544*219b2ee8SDavid du Colombier name A\s-2\d1\u\s0 545*219b2ee8SDavid du Colombier const A\s-2\d2\u\s0 546*219b2ee8SDavid du Colombier &A\s-2\d1\u\s0 A\s-2\d3\u\s0 547*219b2ee8SDavid du Colombier A\s-2\d3\u\s0+A\s-2\d1\u\s0 N\s-2\d1\u\s0 \fR(note that this is not addressable)\fP 548*219b2ee8SDavid du Colombier *N\s-2\d1\u\s0 A\s-2\d4\u\s0 549*219b2ee8SDavid du Colombier.DE 550*219b2ee8SDavid du ColombierHere there is a distinction between 551*219b2ee8SDavid du Colombiera node marked 552*219b2ee8SDavid du Colombier.CW A\s-2\d1\u\s0 553*219b2ee8SDavid du Colombierand a node marked 554*219b2ee8SDavid du Colombier.CW A\s-2\d4\u\s0 555*219b2ee8SDavid du Colombierbecause the address operator of an 556*219b2ee8SDavid du Colombier.CW A\s-2\d4\u\s0 557*219b2ee8SDavid du Colombiernode is not addressable. 558*219b2ee8SDavid du ColombierSo to extend the table: 559*219b2ee8SDavid du Colombier.DS 560*219b2ee8SDavid du Colombier.B 561*219b2ee8SDavid du Colombier.ta .1i 1.1i 562*219b2ee8SDavid du Colombier Node Marked 563*219b2ee8SDavid du Colombier.CW 564*219b2ee8SDavid du Colombier &A\s-2\d4\u\s0 N\s-2\d2\u\s0 565*219b2ee8SDavid du Colombier N\s-2\d2\u\s0+N\s-2\d1\u\s0 N\s-2\d1\u\s0 566*219b2ee8SDavid du Colombier.DE 567*219b2ee8SDavid du ColombierThe full addressability of the 68020 is expressed 568*219b2ee8SDavid du Colombierin 18 rules like this, 569*219b2ee8SDavid du Colombierwhile the addressability of the MIPS is expressed 570*219b2ee8SDavid du Colombierin 11 rules. 571*219b2ee8SDavid du ColombierWhen one ports the compiler, 572*219b2ee8SDavid du Colombierthis table is usually initialized 573*219b2ee8SDavid du Colombierso that leaves are labeled as addressable and nothing else. 574*219b2ee8SDavid du ColombierThe code produced is poor, 575*219b2ee8SDavid du Colombierbut porting is easy. 576*219b2ee8SDavid du ColombierThe table can be extended later. 577*219b2ee8SDavid du Colombier.LP 578*219b2ee8SDavid du ColombierThis pass also rewrites some complex operators 579*219b2ee8SDavid du Colombierinto procedure calls. 580*219b2ee8SDavid du ColombierExamples include 64-bit multiply and divide. 581*219b2ee8SDavid du Colombier.LP 582*219b2ee8SDavid du ColombierIn the same bottom-up pass of the tree, 583*219b2ee8SDavid du Colombierthe nodes are labeled with a Sethi-Ullman complexity 584*219b2ee8SDavid du Colombier[Set70]. 585*219b2ee8SDavid du ColombierThis number is roughly the number of registers required 586*219b2ee8SDavid du Colombierto compile the tree on an ideal machine. 587*219b2ee8SDavid du ColombierAn addressable node is marked 0. 588*219b2ee8SDavid du ColombierA function call is marked infinite. 589*219b2ee8SDavid du ColombierA unary operator is marked as the 590*219b2ee8SDavid du Colombiermaximum of 1 and the mark of its subtree. 591*219b2ee8SDavid du ColombierA binary operator with equal marks on its subtrees is 592*219b2ee8SDavid du Colombiermarked with a subtree mark plus 1. 593*219b2ee8SDavid du ColombierA binary operator with unequal marks on its subtrees is 594*219b2ee8SDavid du Colombiermarked with the maximum mark of its subtrees. 595*219b2ee8SDavid du ColombierThe actual values of the marks are not too important, 596*219b2ee8SDavid du Colombierbut the relative values are. 597*219b2ee8SDavid du ColombierThe goal is to compile the harder 598*219b2ee8SDavid du Colombier(larger mark) 599*219b2ee8SDavid du Colombiersubtree first. 600*219b2ee8SDavid du Colombier.NH 2 601*219b2ee8SDavid du ColombierCode generation 602*219b2ee8SDavid du Colombier.LP 603*219b2ee8SDavid du ColombierCode is generated by recursive 604*219b2ee8SDavid du Colombierdescent. 605*219b2ee8SDavid du ColombierThe Sethi-Ullman complexity completely guides the 606*219b2ee8SDavid du Colombierorder. 607*219b2ee8SDavid du ColombierThe addressability defines the leaves. 608*219b2ee8SDavid du ColombierThe only difficult part is compiling a tree 609*219b2ee8SDavid du Colombierthat has two infinite (function call) 610*219b2ee8SDavid du Colombiersubtrees. 611*219b2ee8SDavid du ColombierIn this case, 612*219b2ee8SDavid du Colombierone subtree is compiled into the return register 613*219b2ee8SDavid du Colombier(usually the most convenient place for a function call) 614*219b2ee8SDavid du Colombierand then stored on the stack. 615*219b2ee8SDavid du ColombierThe other subtree is compiled into the return register 616*219b2ee8SDavid du Colombierand then the operation is compiled with 617*219b2ee8SDavid du Colombieroperands from the stack and the return register. 618*219b2ee8SDavid du Colombier.LP 619*219b2ee8SDavid du ColombierThere is a separate boolean code generator that compiles 620*219b2ee8SDavid du Colombierconditional expressions. 621*219b2ee8SDavid du ColombierThis is fundamentally different from compiling an arithmetic expression. 622*219b2ee8SDavid du ColombierThe result of the boolean code generator is the 623*219b2ee8SDavid du Colombierposition of the program counter and not an expression. 624*219b2ee8SDavid du ColombierThe boolean code generator makes extensive use of De Morgan's rule. 625*219b2ee8SDavid du ColombierThe boolean code generator is an expanded version of that described 626*219b2ee8SDavid du Colombierin chapter 8 of Aho, Sethi, and Ullman 627*219b2ee8SDavid du Colombier[Aho87]. 628*219b2ee8SDavid du Colombier.LP 629*219b2ee8SDavid du ColombierThere is a considerable amount of talk in the literature 630*219b2ee8SDavid du Colombierabout automating this part of a compiler with a machine 631*219b2ee8SDavid du Colombierdescription. 632*219b2ee8SDavid du ColombierSince this code generator is so small 633*219b2ee8SDavid du Colombier(less than 500 lines of C) 634*219b2ee8SDavid du Colombierand easy, 635*219b2ee8SDavid du Colombierit hardly seems worth the effort. 636*219b2ee8SDavid du Colombier.NH 2 637*219b2ee8SDavid du ColombierRegisterization 638*219b2ee8SDavid du Colombier.LP 639*219b2ee8SDavid du ColombierUp to now, 640*219b2ee8SDavid du Colombierthe compiler has operated on syntax trees 641*219b2ee8SDavid du Colombierthat are roughly equivalent to the original source language. 642*219b2ee8SDavid du ColombierThe previous pass has produced machine language in an internal 643*219b2ee8SDavid du Colombierformat. 644*219b2ee8SDavid du ColombierThe next two passes operate on the internal machine language 645*219b2ee8SDavid du Colombierstructures. 646*219b2ee8SDavid du ColombierThe purpose of the next pass is to reintroduce 647*219b2ee8SDavid du Colombierregisters for heavily used variables. 648*219b2ee8SDavid du Colombier.LP 649*219b2ee8SDavid du ColombierAll of the variables that can be 650*219b2ee8SDavid du Colombierpotentially registerized within a procedure are 651*219b2ee8SDavid du Colombierplaced in a table. 652*219b2ee8SDavid du Colombier(Suitable variables are any automatic or external 653*219b2ee8SDavid du Colombierscalars that do not have their addresses extracted. 654*219b2ee8SDavid du ColombierSome constants that are hard to reference are also 655*219b2ee8SDavid du Colombierconsidered for registerization.) 656*219b2ee8SDavid du ColombierFour separate data flow equations are evaluated 657*219b2ee8SDavid du Colombierover the procedure on all of these variables. 658*219b2ee8SDavid du ColombierTwo of the equations are the normal set-behind 659*219b2ee8SDavid du Colombierand used-ahead 660*219b2ee8SDavid du Colombierbits that define the life of a variable. 661*219b2ee8SDavid du ColombierThe two new bits tell if a variable life 662*219b2ee8SDavid du Colombiercrosses a function call ahead or behind. 663*219b2ee8SDavid du ColombierBy examining a variable over its lifetime, 664*219b2ee8SDavid du Colombierit is possible to get a cost 665*219b2ee8SDavid du Colombierfor registerizing. 666*219b2ee8SDavid du ColombierLoops are detected and the costs are multiplied 667*219b2ee8SDavid du Colombierby three for every level of loop nesting. 668*219b2ee8SDavid du ColombierCosts are sorted and the variables 669*219b2ee8SDavid du Colombierare replaced by available registers on a greedy basis. 670*219b2ee8SDavid du Colombier.LP 671*219b2ee8SDavid du ColombierThe 68020 has two different 672*219b2ee8SDavid du Colombiertypes of registers. 673*219b2ee8SDavid du ColombierFor the 68020, 674*219b2ee8SDavid du Colombiertwo different costs are calculated for 675*219b2ee8SDavid du Colombiereach variable life and the register type that 676*219b2ee8SDavid du Colombieraffords the better cost is used. 677*219b2ee8SDavid du ColombierTies are broken by counting the number of available 678*219b2ee8SDavid du Colombierregisters of each type. 679*219b2ee8SDavid du Colombier.LP 680*219b2ee8SDavid du ColombierNote that externals are registerized together with automatics. 681*219b2ee8SDavid du ColombierThis is done by evaluating the semantics of a ``call'' instruction 682*219b2ee8SDavid du Colombierdifferently for externals and automatics. 683*219b2ee8SDavid du ColombierSince a call goes outside the local procedure, 684*219b2ee8SDavid du Colombierit is assumed that a call references all externals. 685*219b2ee8SDavid du ColombierSimilarly, 686*219b2ee8SDavid du Colombierexternals are assumed to be set before an ``entry'' instruction 687*219b2ee8SDavid du Colombierand assumed to be referenced after a ``return'' instruction. 688*219b2ee8SDavid du ColombierThis makes sure that externals are in memory across calls. 689*219b2ee8SDavid du Colombier.LP 690*219b2ee8SDavid du ColombierThe overall results are satisfactory. 691*219b2ee8SDavid du ColombierIt would be nice to be able to do this processing in 692*219b2ee8SDavid du Colombiera machine-independent way, 693*219b2ee8SDavid du Colombierbut it is impossible to get all of the costs and 694*219b2ee8SDavid du Colombierside effects of different choices by examining the parse tree. 695*219b2ee8SDavid du Colombier.LP 696*219b2ee8SDavid du ColombierMost of the code in the registerization pass is machine-independent. 697*219b2ee8SDavid du ColombierThe major machine-dependency is in 698*219b2ee8SDavid du Colombierexamining a machine instruction to ask if it sets or references 699*219b2ee8SDavid du Colombiera variable. 700*219b2ee8SDavid du Colombier.NH 2 701*219b2ee8SDavid du ColombierMachine code optimization 702*219b2ee8SDavid du Colombier.LP 703*219b2ee8SDavid du ColombierThe next pass walks the machine code 704*219b2ee8SDavid du Colombierfor opportunistic optimizations. 705*219b2ee8SDavid du ColombierFor the most part, 706*219b2ee8SDavid du Colombierthis is highly specific to a particular 707*219b2ee8SDavid du Colombierprocessor. 708*219b2ee8SDavid du ColombierOne optimization that is performed 709*219b2ee8SDavid du Colombieron all of the processors is the 710*219b2ee8SDavid du Colombierremoval of unnecessary ``move'' 711*219b2ee8SDavid du Colombierinstructions. 712*219b2ee8SDavid du ColombierIronically, 713*219b2ee8SDavid du Colombiermost of these instructions were inserted by 714*219b2ee8SDavid du Colombierthe previous pass. 715*219b2ee8SDavid du ColombierThere are two patterns that are repetitively 716*219b2ee8SDavid du Colombiermatched and replaced until no more matches are 717*219b2ee8SDavid du Colombierfound. 718*219b2ee8SDavid du ColombierThe first tries to remove ``move'' instructions 719*219b2ee8SDavid du Colombierby relabeling variables. 720*219b2ee8SDavid du Colombier.LP 721*219b2ee8SDavid du ColombierWhen a ``move'' instruction is encountered, 722*219b2ee8SDavid du Colombierif the destination variable is set before the 723*219b2ee8SDavid du Colombiersource variable is referenced, 724*219b2ee8SDavid du Colombierthen all of the references to the destination 725*219b2ee8SDavid du Colombiervariable can be renamed to the source and the ``move'' 726*219b2ee8SDavid du Colombiercan be deleted. 727*219b2ee8SDavid du ColombierThis transformation uses the reverse data flow 728*219b2ee8SDavid du Colombierset up in the previous pass. 729*219b2ee8SDavid du Colombier.LP 730*219b2ee8SDavid du ColombierAn example of this pattern is depicted in the following 731*219b2ee8SDavid du Colombiertable. 732*219b2ee8SDavid du ColombierThe pattern is in the left column and the 733*219b2ee8SDavid du Colombierreplacement action is in the right column. 734*219b2ee8SDavid du Colombier.DS 735*219b2ee8SDavid du Colombier.CW 736*219b2ee8SDavid du Colombier.ta .1i .6i 1.6i 2.1i 2.6i 737*219b2ee8SDavid du Colombier MOVE a->b \fR(remove)\fP 738*219b2ee8SDavid du Colombier.R 739*219b2ee8SDavid du Colombier (sequence with no mention of \f(CWa\fP) 740*219b2ee8SDavid du Colombier.CW 741*219b2ee8SDavid du Colombier USE b USE a 742*219b2ee8SDavid du Colombier.R 743*219b2ee8SDavid du Colombier (sequence with no mention of \f(CWa\fP) 744*219b2ee8SDavid du Colombier.CW 745*219b2ee8SDavid du Colombier SET b SET b 746*219b2ee8SDavid du Colombier.DE 747*219b2ee8SDavid du Colombier.LP 748*219b2ee8SDavid du ColombierExperiments have shown that it is marginally 749*219b2ee8SDavid du Colombierworthwhile to rename uses of the destination variable 750*219b2ee8SDavid du Colombierwith uses of the source variable up to 751*219b2ee8SDavid du Colombierthe first use of the source variable. 752*219b2ee8SDavid du Colombier.LP 753*219b2ee8SDavid du ColombierThe second transform will do relabeling 754*219b2ee8SDavid du Colombierwithout deleting instructions. 755*219b2ee8SDavid du ColombierWhen a ``move'' instruction is encountered, 756*219b2ee8SDavid du Colombierif the source variable has been set prior 757*219b2ee8SDavid du Colombierto the use of the destination variable 758*219b2ee8SDavid du Colombierthen all of the references to the source 759*219b2ee8SDavid du Colombiervariable are replaced by the destination and 760*219b2ee8SDavid du Colombierthe ``move'' is inverted. 761*219b2ee8SDavid du ColombierTypically, 762*219b2ee8SDavid du Colombierthis transformation will alter two ``move'' 763*219b2ee8SDavid du Colombierinstructions and allow the first transformation 764*219b2ee8SDavid du Colombieranother chance to remove code. 765*219b2ee8SDavid du ColombierThis transformation uses the forward data flow 766*219b2ee8SDavid du Colombierset up in the previous pass. 767*219b2ee8SDavid du Colombier.LP 768*219b2ee8SDavid du ColombierAgain, 769*219b2ee8SDavid du Colombierthe following is a depiction of the transformation where 770*219b2ee8SDavid du Colombierthe pattern is in the left column and the 771*219b2ee8SDavid du Colombierrewrite is in the right column. 772*219b2ee8SDavid du Colombier.DS 773*219b2ee8SDavid du Colombier.CW 774*219b2ee8SDavid du Colombier.ta .1i .6i 1.6i 2.1i 2.6i 775*219b2ee8SDavid du Colombier SET a SET b 776*219b2ee8SDavid du Colombier.R 777*219b2ee8SDavid du Colombier (sequence with no use of \f(CWb\fP) 778*219b2ee8SDavid du Colombier.CW 779*219b2ee8SDavid du Colombier USE a USE b 780*219b2ee8SDavid du Colombier.R 781*219b2ee8SDavid du Colombier (sequence with no use of \f(CWb\fP) 782*219b2ee8SDavid du Colombier.CW 783*219b2ee8SDavid du Colombier MOVE a->b MOVE b->a 784*219b2ee8SDavid du Colombier.DE 785*219b2ee8SDavid du ColombierIterating these transformations 786*219b2ee8SDavid du Colombierwill usually get rid of all redundant ``move'' instructions. 787*219b2ee8SDavid du Colombier.LP 788*219b2ee8SDavid du ColombierA problem with this organization is that the costs 789*219b2ee8SDavid du Colombierof registerization calculated in the previous pass 790*219b2ee8SDavid du Colombiermust depend on how well this pass can detect and remove 791*219b2ee8SDavid du Colombierredundant instructions. 792*219b2ee8SDavid du ColombierOften, 793*219b2ee8SDavid du Colombiera fine candidate for registerization is rejected 794*219b2ee8SDavid du Colombierbecause of the cost of instructions that are later 795*219b2ee8SDavid du Colombierremoved. 796*219b2ee8SDavid du Colombier.NH 2 797*219b2ee8SDavid du ColombierWriting the object file 798*219b2ee8SDavid du Colombier.LP 799*219b2ee8SDavid du ColombierThe last pass walks the internal assembly language 800*219b2ee8SDavid du Colombierand writes the object file. 801*219b2ee8SDavid du ColombierThe object file is reduced in size by about a factor 802*219b2ee8SDavid du Colombierof three with simple compression 803*219b2ee8SDavid du Colombiertechniques. 804*219b2ee8SDavid du ColombierThe most important aspect of the object file 805*219b2ee8SDavid du Colombierformat is that it is independent of the compiling machine. 806*219b2ee8SDavid du ColombierAll integer and floating numbers in the object 807*219b2ee8SDavid du Colombiercode are converted to known formats and byte 808*219b2ee8SDavid du Colombierorders. 809*219b2ee8SDavid du Colombier.NH 810*219b2ee8SDavid du ColombierThe loader 811*219b2ee8SDavid du Colombier.LP 812*219b2ee8SDavid du ColombierThe loader is a multiple pass program that 813*219b2ee8SDavid du Colombierreads object files and libraries and produces 814*219b2ee8SDavid du Colombieran executable binary. 815*219b2ee8SDavid du ColombierThe loader also does some minimal 816*219b2ee8SDavid du Colombieroptimizations and code rewriting. 817*219b2ee8SDavid du ColombierMany of the operations performed by the 818*219b2ee8SDavid du Colombierloader are machine-dependent. 819*219b2ee8SDavid du Colombier.LP 820*219b2ee8SDavid du ColombierThe first pass of the loader reads the 821*219b2ee8SDavid du Colombierobject modules into an internal data 822*219b2ee8SDavid du Colombierstructure that looks like binary assembly language. 823*219b2ee8SDavid du ColombierAs the instructions are read, 824*219b2ee8SDavid du Colombiercode is reordered to remove 825*219b2ee8SDavid du Colombierunconditional branch instructions. 826*219b2ee8SDavid du ColombierConditional branch instructions are inverted 827*219b2ee8SDavid du Colombierto prevent the insertion of unconditional branches. 828*219b2ee8SDavid du ColombierThe loader will also make a copy of a few instructions 829*219b2ee8SDavid du Colombierto remove an unconditional branch. 830*219b2ee8SDavid du Colombier.LP 831*219b2ee8SDavid du ColombierThe next pass allocates addresses for 832*219b2ee8SDavid du Colombierall external data. 833*219b2ee8SDavid du ColombierTypical of processors is the MIPS, 834*219b2ee8SDavid du Colombierwhich can reference ±32K bytes from a 835*219b2ee8SDavid du Colombierregister. 836*219b2ee8SDavid du ColombierThe loader allocates the register 837*219b2ee8SDavid du Colombier.CW R30 838*219b2ee8SDavid du Colombieras the static pointer. 839*219b2ee8SDavid du ColombierThe value placed in 840*219b2ee8SDavid du Colombier.CW R30 841*219b2ee8SDavid du Colombieris the base of the data segment plus 32K. 842*219b2ee8SDavid du ColombierIt is then cheap to reference all data in the 843*219b2ee8SDavid du Colombierfirst 64K of the data segment. 844*219b2ee8SDavid du ColombierExternal variables are allocated to 845*219b2ee8SDavid du Colombierthe data segment 846*219b2ee8SDavid du Colombierwith the smallest variables allocated first. 847*219b2ee8SDavid du ColombierIf all of the data cannot fit into the first 848*219b2ee8SDavid du Colombier64K of the data segment, 849*219b2ee8SDavid du Colombierthen usually only a few large arrays 850*219b2ee8SDavid du Colombierneed more expensive addressing modes. 851*219b2ee8SDavid du Colombier.LP 852*219b2ee8SDavid du ColombierFor the MIPS processor, 853*219b2ee8SDavid du Colombierthe loader makes a pass over the internal 854*219b2ee8SDavid du Colombierstructures, 855*219b2ee8SDavid du Colombierexchanging instructions to try 856*219b2ee8SDavid du Colombierto fill ``delay slots'' with useful work. 857*219b2ee8SDavid du ColombierIf a useful instruction cannot be found 858*219b2ee8SDavid du Colombierto fill a delay slot, 859*219b2ee8SDavid du Colombierthe loader will insert 860*219b2ee8SDavid du Colombier``noop'' 861*219b2ee8SDavid du Colombierinstructions. 862*219b2ee8SDavid du ColombierThis pass is very expensive and does not 863*219b2ee8SDavid du Colombierdo a good job. 864*219b2ee8SDavid du ColombierAbout 40% of all instructions are in 865*219b2ee8SDavid du Colombierdelay slots. 866*219b2ee8SDavid du ColombierAbout 65% of these are useful instructions and 867*219b2ee8SDavid du Colombier35% are ``noops.'' 868*219b2ee8SDavid du ColombierThe vendor-supplied assembler does this job 869*219b2ee8SDavid du Colombiermore effectively, 870*219b2ee8SDavid du Colombierfilling about 80% 871*219b2ee8SDavid du Colombierof the delay slots with useful instructions. 872*219b2ee8SDavid du Colombier.LP 873*219b2ee8SDavid du ColombierOn the 68020 processor, 874*219b2ee8SDavid du Colombierbranch instructions come in a variety of 875*219b2ee8SDavid du Colombiersizes depending on the relative distance 876*219b2ee8SDavid du Colombierof the branch. 877*219b2ee8SDavid du ColombierThus the size of branch instructions 878*219b2ee8SDavid du Colombiercan be mutually dependent. 879*219b2ee8SDavid du ColombierThe loader uses a multiple pass algorithm 880*219b2ee8SDavid du Colombierto resolve the branch lengths 881*219b2ee8SDavid du Colombier[Szy78]. 882*219b2ee8SDavid du ColombierInitially, all branches are assumed minimal length. 883*219b2ee8SDavid du ColombierOn each subsequent pass, 884*219b2ee8SDavid du Colombierthe branches are reassessed 885*219b2ee8SDavid du Colombierand expanded if necessary. 886*219b2ee8SDavid du ColombierWhen no more expansions occur, 887*219b2ee8SDavid du Colombierthe locations of the instructions in 888*219b2ee8SDavid du Colombierthe text segment are known. 889*219b2ee8SDavid du Colombier.LP 890*219b2ee8SDavid du ColombierOn the MIPS processor, 891*219b2ee8SDavid du Colombierall instructions are one size. 892*219b2ee8SDavid du ColombierA single pass over the instructions will 893*219b2ee8SDavid du Colombierdetermine the locations of all addresses 894*219b2ee8SDavid du Colombierin the text segment. 895*219b2ee8SDavid du Colombier.LP 896*219b2ee8SDavid du ColombierThe last pass of the loader produces the 897*219b2ee8SDavid du Colombierexecutable binary. 898*219b2ee8SDavid du ColombierA symbol table and other tables are 899*219b2ee8SDavid du Colombierproduced to help the debugger to 900*219b2ee8SDavid du Colombierinterpret the binary symbolically. 901*219b2ee8SDavid du Colombier.LP 902*219b2ee8SDavid du ColombierThe loader places absolute source line numbers in the symbol table. 903*219b2ee8SDavid du ColombierThe name and absolute line number of all 904*219b2ee8SDavid du Colombier.CW #include 905*219b2ee8SDavid du Colombierfiles is also placed in the 906*219b2ee8SDavid du Colombiersymbol table so that the debuggers can 907*219b2ee8SDavid du Colombierassociate object code to source files. 908*219b2ee8SDavid du Colombier.NH 909*219b2ee8SDavid du ColombierPerformance 910*219b2ee8SDavid du Colombier.LP 911*219b2ee8SDavid du ColombierThe following is a table of the source size of the MIPS 912*219b2ee8SDavid du Colombiercompiler. 913*219b2ee8SDavid du Colombier.DS 914*219b2ee8SDavid du Colombier.ta .1i .6i 915*219b2ee8SDavid du Colombier lines module 916*219b2ee8SDavid du Colombier \0509 machine-independent headers 917*219b2ee8SDavid du Colombier 1070 machine-independent YACC source 918*219b2ee8SDavid du Colombier 6090 machine-independent C source 919*219b2ee8SDavid du Colombier 920*219b2ee8SDavid du Colombier \0545 machine-dependent headers 921*219b2ee8SDavid du Colombier 6532 machine-dependent C source 922*219b2ee8SDavid du Colombier 923*219b2ee8SDavid du Colombier \0298 loader headers 924*219b2ee8SDavid du Colombier 5215 loader C source 925*219b2ee8SDavid du Colombier.DE 926*219b2ee8SDavid du Colombier.LP 927*219b2ee8SDavid du ColombierThe following table shows timing 928*219b2ee8SDavid du Colombierof a test program 929*219b2ee8SDavid du Colombierthat plays checkers, running on a MIPS R4000. 930*219b2ee8SDavid du ColombierThe test program is 26 files totaling 12600 lines of C. 931*219b2ee8SDavid du ColombierThe execution time does not significantly 932*219b2ee8SDavid du Colombierdepend on library implementation. 933*219b2ee8SDavid du ColombierSince no other compiler runs on Plan 9, 934*219b2ee8SDavid du Colombierthe Plan 9 tests were done with the Plan 9 operating system; 935*219b2ee8SDavid du Colombierthe other tests were done on the vendor's operating system. 936*219b2ee8SDavid du ColombierThe hardware was identical in both cases. 937*219b2ee8SDavid du ColombierThe optimizer in the vendor's compiler 938*219b2ee8SDavid du Colombieris reputed to be extremely good. 939*219b2ee8SDavid du Colombier.DS 940*219b2ee8SDavid du Colombier.ta .1i .9i 941*219b2ee8SDavid du Colombier \0\04.49s Plan 9 \f(CWvc\fP \f(CW-N\fP compile time (opposite of \f(CW-O\fP) 942*219b2ee8SDavid du Colombier \0\01.72s Plan 9 \f(CWvc\fP \f(CW-N\fP load time 943*219b2ee8SDavid du Colombier 148.69s Plan 9 \f(CWvc\fP \f(CW-N\fP run time 944*219b2ee8SDavid du Colombier 945*219b2ee8SDavid du Colombier \015.07s Plan 9 \f(CWvc\fP compile time (\f(CW-O\fP implicit) 946*219b2ee8SDavid du Colombier \0\01.66s Plan 9 \f(CWvc\fP load time 947*219b2ee8SDavid du Colombier \089.96s Plan 9 \f(CWvc\fP run time 948*219b2ee8SDavid du Colombier 949*219b2ee8SDavid du Colombier \014.83s vendor \f(CWcc\fP compile time 950*219b2ee8SDavid du Colombier \0\00.38s vendor \f(CWcc\fP load time 951*219b2ee8SDavid du Colombier 104.75s vendor \f(CWcc\fP run time 952*219b2ee8SDavid du Colombier 953*219b2ee8SDavid du Colombier \043.59s vendor \f(CWcc\fP \f(CW-O\fP compile time 954*219b2ee8SDavid du Colombier \0\00.38s vendor \f(CWcc\fP \f(CW-O\fP load time 955*219b2ee8SDavid du Colombier \076.19s vendor \f(CWcc\fP \f(CW-O\fP run time 956*219b2ee8SDavid du Colombier 957*219b2ee8SDavid du Colombier \0\08.19s vendor \f(CWcc\fP \f(CW-O3\fP compile time 958*219b2ee8SDavid du Colombier \035.97s vendor \f(CWcc\fP \f(CW-O3\fP load time 959*219b2ee8SDavid du Colombier \071.16s vendor \f(CWcc\fP \f(CW-O3\fP run time 960*219b2ee8SDavid du Colombier.DE 961*219b2ee8SDavid du Colombier.LP 962*219b2ee8SDavid du ColombierTo compare the Intel compiler, 963*219b2ee8SDavid du Colombiera program that is about 40% bit manipulation and 964*219b2ee8SDavid du Colombierabout 60% single precision floating point was 965*219b2ee8SDavid du Colombierrun on the same 33 MHz 486, once under Windows 966*219b2ee8SDavid du Colombiercompiled with the Watcom compiler, version 10.0, 967*219b2ee8SDavid du Colombierin 16-bit mode and once under 968*219b2ee8SDavid du ColombierPlan 9 in 32-bit mode. 969*219b2ee8SDavid du ColombierThe Plan 9 execution time was 27 sec while the Windows 970*219b2ee8SDavid du Colombierexecution time was 31 sec. 971*219b2ee8SDavid du Colombier.NH 972*219b2ee8SDavid du ColombierConclusions 973*219b2ee8SDavid du Colombier.LP 974*219b2ee8SDavid du ColombierThe new compilers compile 975*219b2ee8SDavid du Colombierquickly, 976*219b2ee8SDavid du Colombierload slowly, 977*219b2ee8SDavid du Colombierand produce 978*219b2ee8SDavid du Colombiermedium quality 979*219b2ee8SDavid du Colombierobject code. 980*219b2ee8SDavid du ColombierThe compilers are relatively 981*219b2ee8SDavid du Colombierportable, 982*219b2ee8SDavid du Colombierrequiring but a couple of weeks' work to 983*219b2ee8SDavid du Colombierproduce a compiler for a different computer. 984*219b2ee8SDavid du ColombierFor Plan 9, 985*219b2ee8SDavid du Colombierwhere we needed several compilers 986*219b2ee8SDavid du Colombierwith specialized features and 987*219b2ee8SDavid du Colombierour own object formats, 988*219b2ee8SDavid du Colombierthis project was indispensable. 989*219b2ee8SDavid du ColombierIt is also necessary for us to 990*219b2ee8SDavid du Colombierbe able to freely distribute our compilers 991*219b2ee8SDavid du Colombierwith the Plan 9 distribution. 992*219b2ee8SDavid du Colombier.LP 993*219b2ee8SDavid du ColombierTwo problems have come up in retrospect. 994*219b2ee8SDavid du ColombierThe first has to do with the 995*219b2ee8SDavid du Colombierdivision of labor between compiler and loader. 996*219b2ee8SDavid du ColombierPlan 9 runs on multi-processors and as such 997*219b2ee8SDavid du Colombiercompilations are often done in parallel. 998*219b2ee8SDavid du ColombierUnfortunately, 999*219b2ee8SDavid du Colombierall compilations must be complete before loading 1000*219b2ee8SDavid du Colombiercan begin. 1001*219b2ee8SDavid du ColombierThe load is then single-threaded. 1002*219b2ee8SDavid du ColombierWith this model, 1003*219b2ee8SDavid du Colombierany shift of work from compile to load 1004*219b2ee8SDavid du Colombierresults in a significant increase in real time. 1005*219b2ee8SDavid du ColombierThe same is true of libraries that are compiled 1006*219b2ee8SDavid du Colombierinfrequently and loaded often. 1007*219b2ee8SDavid du ColombierIn the future, 1008*219b2ee8SDavid du Colombierwe may try to put some of the loader work 1009*219b2ee8SDavid du Colombierback into the compiler. 1010*219b2ee8SDavid du Colombier.LP 1011*219b2ee8SDavid du ColombierThe second problem comes from 1012*219b2ee8SDavid du Colombierthe various optimizations performed over several 1013*219b2ee8SDavid du Colombierpasses. 1014*219b2ee8SDavid du ColombierOften optimizations in different passes depend 1015*219b2ee8SDavid du Colombieron each other. 1016*219b2ee8SDavid du ColombierIterating the passes could compromise efficiency, 1017*219b2ee8SDavid du Colombieror even loop. 1018*219b2ee8SDavid du ColombierWe see no real solution to this problem. 1019*219b2ee8SDavid du Colombier.NH 1020*219b2ee8SDavid du ColombierReferences 1021*219b2ee8SDavid du Colombier.LP 1022*219b2ee8SDavid du Colombier[Aho87] A. V. Aho, R. Sethi, and J. D. Ullman, 1023*219b2ee8SDavid du Colombier.I 1024*219b2ee8SDavid du ColombierCompilers \- Principles, Techniques, and Tools, 1025*219b2ee8SDavid du Colombier.R 1026*219b2ee8SDavid du ColombierAddison Wesley, 1027*219b2ee8SDavid du ColombierReading, MA, 1028*219b2ee8SDavid du Colombier1987. 1029*219b2ee8SDavid du Colombier.LP 1030*219b2ee8SDavid du Colombier[ANSI90] \f2American National Standard for Information Systems \- 1031*219b2ee8SDavid du ColombierProgramming Language C\f1, American National Standards Institute, Inc., 1032*219b2ee8SDavid du ColombierNew York, 1990. 1033*219b2ee8SDavid du Colombier.LP 1034*219b2ee8SDavid du Colombier[Dav91] J. W. Davidson and D. B. Whalley, 1035*219b2ee8SDavid du Colombier``Methods for Saving and Restoring Register Values across Function Calls'', 1036*219b2ee8SDavid du Colombier.I 1037*219b2ee8SDavid du ColombierSoftware\-Practice and Experience, 1038*219b2ee8SDavid du Colombier.R 1039*219b2ee8SDavid du ColombierVol 21(2), pp. 149-165, February 1991. 1040*219b2ee8SDavid du Colombier.LP 1041*219b2ee8SDavid du Colombier[Joh79] S. C. Johnson, 1042*219b2ee8SDavid du Colombier``YACC \- Yet Another Compiler Compiler'', 1043*219b2ee8SDavid du Colombier.I 1044*219b2ee8SDavid du ColombierUNIX Programmer's Manual, Seventh Ed., Vol. 2A, 1045*219b2ee8SDavid du Colombier.R 1046*219b2ee8SDavid du ColombierAT&T Bell Laboratories, 1047*219b2ee8SDavid du ColombierMurray Hill, NJ, 1048*219b2ee8SDavid du Colombier1979. 1049*219b2ee8SDavid du Colombier.LP 1050*219b2ee8SDavid du Colombier[Set70] R. Sethi and J. D. Ullman, 1051*219b2ee8SDavid du Colombier``The Generation of Optimal Code for Arithmetic Expressions'', 1052*219b2ee8SDavid du Colombier.I 1053*219b2ee8SDavid du ColombierJournal of the ACM, 1054*219b2ee8SDavid du Colombier.R 1055*219b2ee8SDavid du ColombierVol 17(4), pp. 715-728, 1970. 1056*219b2ee8SDavid du Colombier.LP 1057*219b2ee8SDavid du Colombier[Szy78] T. G. Szymanski, 1058*219b2ee8SDavid du Colombier``Assembling Code for Machines with Span-dependent Instructions'', 1059*219b2ee8SDavid du Colombier.I 1060*219b2ee8SDavid du ColombierCommunications of the ACM, 1061*219b2ee8SDavid du Colombier.R 1062*219b2ee8SDavid du ColombierVol 21(4), pp. 300-308, 1978. 1063