1.TL 2Plan 9 C Compilers 3.AU 4Ken Thompson 5ken@plan9.att.com 6.AB 7.FS 8Originally appeared, in a different form, in 9.I 10Proceedings of the Summer 1990 UKUUG Conference, 11.R 12pp. 41-51, 13London, 1990. 14.FE 15This paper describes the overall structure and function of the Plan 9 C compilers. 16A more detailed implementation document 17for any one of the compilers 18is yet to be written. 19.AE 20.NH 21Introduction 22.LP 23There are currently six compilers in the series. 24Four of the compilers (MIPS 3000, SPARC, Intel 386, and Motorola 68020) 25are considered active and are used to compile 26current versions of Plan 9. 27The other two (AT&T 3210, Intel 960) have been used 28only to program peripherals. 29.NH 30Structure 31.LP 32The compiler is a single program that produces an 33object file. 34Combined in the compiler are the traditional 35roles of preprocessor, lexical analyzer, parser, code generator, 36local optimizer, 37and first half of the assembler. 38The object files are binary forms of assembly 39language, 40similar to what might be passed between 41the first and second passes of an assembler. 42.LP 43Object files and libraries 44are combined by a loader 45program to produce the executable binary. 46The loader combines the roles of second half 47of the assembler, global optimizer, and loader. 48The names of the compliers, loaders, and assemblers 49are as follows: 50.DS 51.ta 1.5i 52.de Ta 53\\$1 \f(CW\\$2\fP \f(CW\\$3\fP \f(CW\\$4\fP 54.. 55.Ta MIPS vc vl va 56.Ta SPARC kc kl ka 57.Ta Intel\ 386 8c 8l 8a 58.Ta Motorola\ 68020 2c 2l 2a 59.Ta AT&T\ 3210 xc xl xa 60.Ta Intel\ 960 9c 9l 9a 61.DE 62There is a further breakdown 63in the source of the compilers into 64object-independent and 65object-dependent 66parts. 67All of the object-independent parts 68are combined into source files in the 69directory 70.CW /sys/src/cmd/cc . 71The object-dependent parts are collected 72in a separate directory for each compiler, 73for example 74.CW /sys/src/cmd/vc . 75All of the code, 76both object-independent and 77object-dependent, 78is machine-independent 79and may be cross-compiled and executed on any 80of the architectures. 81.NH 82The Language 83.LP 84The compiler implements ANSI C with some 85restrictions and extensions 86[ANSI90]. 87Most of the restrictions are due to 88personal preference, while 89most of the extensions were to help in 90the implementation of Plan 9. 91There are other departures from the standard, 92particularly in the libraries, 93that are beyond the scope of this 94paper. 95.NH 2 96Register, volatile, const 97.LP 98The keywords 99.CW register , 100.CW volatile , 101and 102.CW const 103are recognized syntactically 104but are semantically ignored. 105Thus 106assigning to a 107.CW const 108variable is not diagnosed, nor is 109taking the address of a 110.CW register . 111.NH 2 112The preprocessor 113.LP 114The C preprocessor is probably the 115biggest departure from the ANSI standard. 116.LP 117The preprocessor built into the Plan 9 compilers does not support 118.CW #if , 119although it does handle 120.CW #ifdef 121and 122.CW #include . 123If it is necessary to be more standard, 124the source text can first be run through the separate ANSI C 125preprocessor, 126.CW cpp . 127.NH 2 128Unnamed substructures 129.LP 130The most important and most heavily used of the 131extensions is the declaration of an 132unnamed substructure or subunion. 133For example: 134.DS 135.CW 136.ta .1i .6i 1.1i 1.6i 137 typedef 138 struct lock 139 { 140 int locked; 141 } Lock; 142 143 typedef 144 struct node 145 { 146 int type; 147 union 148 { 149 double dval; 150 float fval; 151 long lval; 152 }; 153 Lock; 154 } Node; 155 156 Lock* lock; 157 Node* node; 158.DE 159The declaration of 160.CW Node 161has an unnamed substructure of type 162.CW Lock 163and an unnamed subunion. 164One use of this feature allows references to elements of the 165subunit to be accessed as if they were in 166the outer structure. 167Thus 168.CW node->dval 169and 170.CW node->locked 171are legitimate references. 172.LP 173When an outer structure is used 174in a context that is only legal for 175an unnamed substructure, 176the compiler promotes the reference to the 177unnamed substructure. 178This is true for references to structures and 179to references to pointers to structures. 180This happens in assignment statements and 181in argument passing where prototypes have been 182declared. 183Thus, continuing with the example, 184.DS 185.CW 186.ta .1i .6i 1.1i 1.6i 187 lock = node; 188.DE 189would assign a pointer to the unnamed 190.CW Lock 191in 192the 193.CW Node 194to the variable 195.CW lock . 196Another example, 197.DS 198.CW 199.ta .1i .6i 1.1i 1.6i 200 extern void lock(Lock*); 201 func(...) 202 { 203 ... 204 lock(node); 205 ... 206 } 207.DE 208will pass a pointer to the 209.CW Lock 210substructure. 211.NH 2 212Structure displays 213.LP 214A structure cast followed by a list of expressions in braces is 215an expression with the type of the structure and elements assigned from 216the corresponding list. 217Structures are now almost first-class citizens of the language. 218It is common to see code like this: 219.DS 220.CW 221.ta .1i 222 r = (Rectangle){point1, (Point){x,y+2}}; 223.DE 224.NH 2 225Initialization indexes 226.LP 227In initializers of arrays, 228one may place a constant expression 229in square brackets before an initializer. 230This causes the next initializer to assign 231the indicated element. 232For example: 233.DS 234.CW 235.ta .1i .6i 1.6i 236 enum errors 237 { 238 Etoobig, 239 Ealarm, 240 Egreg 241 }; 242 char* errstrings[] = 243 { 244 [Ealarm] "Alarm call", 245 [Egreg] "Panic: out of mbufs", 246 [Etoobig] "Arg list too long", 247 }; 248.DE 249In the same way, 250individual structures members may 251be initialized in any order by preceding the initialization with 252.CW .tagname . 253Both forms allow an optional 254.CW = , 255to be compatible with a proposed 256extension to ANSI C. 257.NH 2 258External register 259.LP 260The declaration 261.CW extern 262.CW register 263will dedicate a register to 264a variable on a global basis. 265It can be used only under special circumstances. 266External register variables must be identically 267declared in all modules and 268libraries. 269The feature is not intended for efficiency, 270although it can produce efficient code; 271rather it represents a unique storage class that 272would be hard to get any other way. 273On a shared-memory multi-processor, 274an external register is 275one-per-processor and neither one-per-procedure (automatic) 276or one-per-system (external). 277It is used for two variables in the Plan 9 kernel, 278.CW u 279and 280.CW m . 281.CW U 282is a pointer to the structure representing the currently running process 283and 284.CW m 285is a pointer to the per-machine data structure. 286.NH 2 287Long long 288.LP 289The compilers accept 290.CW long 291.CW long 292as a basic type meaning 64-bit integer. 293On all of the machines 294this type is synthesized from 32-bit instructions. 295.NH 2 296Pragma 297.LP 298The compilers accept 299.CW #pragma 300.CW lib 301.I libname 302and pass the 303library name string uninterpreted 304to the loader. 305The loader uses the library name to 306find libraries to load. 307If the name contains 308.CW %O , 309it is replaced with 310the single character object type of the compiler 311(e.g., 312.CW v 313for the MIPS). 314If the name contains 315.CW %M , 316it is replaced with 317the architecture type for the compiler 318(e.g., 319.CW mips 320for the MIPS). 321If the name starts with 322.CW / 323it is an absolute pathname; 324if it starts with 325.CW . 326then it is searched for in the loader's current directory. 327Otherwise, the name is searched from 328.CW /%M/lib . 329Such 330.CW #pragma 331statements in header files guarantee that the correct 332libraries are always linked with a program without the 333need to specify them explicitly at link time. 334.NH 335Object module conventions 336.LP 337The overall conventions of the runtime environment 338are important 339to runtime efficiency. 340In this section, 341several of these conventions are discussed. 342.NH 2 343Register saving 344.LP 345In the Plan 9 compilers, 346the caller of a procedure saves the registers. 347With caller-saves, 348the leaf procedures can use all the 349registers and never save them. 350If you spend a lot of time at the leaves, 351this seems preferable. 352With callee-saves, 353the saving of the registers is done 354in the single point of entry and return. 355If you are interested in space, 356this seems preferable. 357In both, 358there is a degree of uncertainty 359about what registers need to be saved. 360Callee-saved registers make it difficult to 361find variables in registers in debuggers. 362Callee-saved registers also complicate 363the implementation of 364.CW longjmp . 365The convincing argument is 366that with caller-saves, 367the decision to registerize a variable 368can include the cost of saving the register 369across calls. 370For a further discussion of caller- vs. callee-saves, 371see the paper by Davidson and Whalley [Dav91]. 372.LP 373In the Plan 9 operating system, 374calls to the kernel look like normal procedure 375calls, which means 376the caller 377has saved the registers and the system 378entry does not have to. 379This makes system calls considerably faster. 380Since this is a potential security hole, 381and can lead to non-determinism, 382the system may eventually save the registers 383on entry, 384or more likely clear the registers on return. 385.NH 2 386Calling convention 387.LP 388Older C compilers maintain a frame pointer, which is at a known constant 389offset from the stack pointer within each function. 390For machines where the stack grows towards zero, 391the argument pointer is at a known constant offset 392from the frame pointer. 393Since the stack grows down in Plan 9, 394the Plan 9 compilers 395keep neither an 396explicit frame pointer nor 397an explicit argument pointer; 398instead they generate addresses relative to the stack pointer. 399.LP 400On some architectures, the first argument to a subroutine is passed in a register. 401.NH 2 402Functions returning structures 403.LP 404Structures longer than one word are awkward to implement 405since they do not fit in registers and must 406be passed around in memory. 407Functions that return structures 408are particularly clumsy. 409The Plan 9 compilers pass the return address of 410a structure as the first argument of a 411function that has a structure return value. 412Thus 413.DS 414.CW 415.ta .1i .6i 1.1i 1.6i 416 x = f(...) 417.DE 418is rewritten as 419.DS 420.CW 421.ta .1i .6i 1.1i 1.6i 422 f(&x, ...)\f1. 423.DE 424This saves a copy and makes the compilation 425much less clumsy. 426A disadvantage is that if you call this 427function without an assignment, 428a dummy location must be invented. 429.LP 430There is also a danger of calling a function 431that returns a structure without declaring 432it as such. 433With ANSI C function prototypes, 434this error need never occur. 435.NH 436Implementation 437.LP 438The compiler is divided internally into 439four machine-independent passes, 440four machine-dependent passes, 441and an output pass. 442The next nine sections describe each pass in order. 443.NH 2 444Parsing 445.LP 446The first pass is a YACC-based parser 447[Joh79]. 448Declarations are interpreted immediately, 449building a block structured symbol table. 450Executable statements are put into a parse tree 451and collected, 452without interpretation. 453At the end of each procedure, 454the parse tree for the function is 455examined by the other passes of the compiler. 456.LP 457The input stream of the parser is 458a pushdown list of input activations. 459The preprocessor 460expansions of 461macros 462and 463.CW #include 464are implemented as pushdowns. 465Thus there is no separate 466pass for preprocessing. 467.NH 2 468Typing 469.LP 470The next pass distributes typing information 471to every node of the tree. 472Implicit operations on the tree are added, 473such as type promotions and taking the 474address of arrays and functions. 475.NH 2 476Machine-independent optimization 477.LP 478The next pass performs optimizations 479and transformations of the tree, such as converting 480.CW &*x 481and 482.CW *&x 483into 484.CW x . 485Constant expressions are converted to constants in this pass. 486.NH 2 487Arithmetic rewrites 488.LP 489This is another machine-independent optimization. 490Subtrees of add, subtract, and multiply of integers are 491rewritten for easier compilation. 492The major transformation is factoring: 493.CW 4+8*a+16*b+5 494is transformed into 495.CW 9+8*(a+2*b) . 496Such expressions arise from address 497manipulation and array indexing. 498.NH 2 499Addressability 500.LP 501This is the first of the machine-dependent passes. 502The addressability of a processor is defined as the set of 503expressions that is legal in the address field 504of a machine language instruction. 505The addressability of different processors varies widely. 506At one end of the spectrum are the 68020 and VAX, 507which allow a complex mix of incrementing, 508decrementing, 509indexing, and relative addressing. 510At the other end is the MIPS, 511which allows only registers and constant offsets from the 512contents of a register. 513The addressability can be different for different instructions 514within the same processor. 515.LP 516It is important to the code generator to know when a 517subtree represents an address of a particular type. 518This is done with a bottom-up walk of the tree. 519In this pass, the leaves are labeled with small integers. 520When an internal node is encountered, 521it is labeled by consulting a table indexed by the 522labels on the left and right subtrees. 523For example, 524on the 68020 processor, 525it is possible to address an 526offset from a named location. 527In C, this is represented by the expression 528.CW *(&name+constant) . 529This is marked addressable by the following table. 530In the table, 531a node represented by the left column is marked 532with a small integer from the right column. 533Marks of the form 534.CW A\s-2\di\u\s0 535are addressable while 536marks of the form 537.CW N\s-2\di\u\s0 538are not addressable. 539.DS 540.B 541.ta .1i 1.1i 542 Node Marked 543.CW 544 name A\s-2\d1\u\s0 545 const A\s-2\d2\u\s0 546 &A\s-2\d1\u\s0 A\s-2\d3\u\s0 547 A\s-2\d3\u\s0+A\s-2\d1\u\s0 N\s-2\d1\u\s0 \fR(note that this is not addressable)\fP 548 *N\s-2\d1\u\s0 A\s-2\d4\u\s0 549.DE 550Here there is a distinction between 551a node marked 552.CW A\s-2\d1\u\s0 553and a node marked 554.CW A\s-2\d4\u\s0 555because the address operator of an 556.CW A\s-2\d4\u\s0 557node is not addressable. 558So to extend the table: 559.DS 560.B 561.ta .1i 1.1i 562 Node Marked 563.CW 564 &A\s-2\d4\u\s0 N\s-2\d2\u\s0 565 N\s-2\d2\u\s0+N\s-2\d1\u\s0 N\s-2\d1\u\s0 566.DE 567The full addressability of the 68020 is expressed 568in 18 rules like this, 569while the addressability of the MIPS is expressed 570in 11 rules. 571When one ports the compiler, 572this table is usually initialized 573so that leaves are labeled as addressable and nothing else. 574The code produced is poor, 575but porting is easy. 576The table can be extended later. 577.LP 578This pass also rewrites some complex operators 579into procedure calls. 580Examples include 64-bit multiply and divide. 581.LP 582In the same bottom-up pass of the tree, 583the nodes are labeled with a Sethi-Ullman complexity 584[Set70]. 585This number is roughly the number of registers required 586to compile the tree on an ideal machine. 587An addressable node is marked 0. 588A function call is marked infinite. 589A unary operator is marked as the 590maximum of 1 and the mark of its subtree. 591A binary operator with equal marks on its subtrees is 592marked with a subtree mark plus 1. 593A binary operator with unequal marks on its subtrees is 594marked with the maximum mark of its subtrees. 595The actual values of the marks are not too important, 596but the relative values are. 597The goal is to compile the harder 598(larger mark) 599subtree first. 600.NH 2 601Code generation 602.LP 603Code is generated by recursive 604descent. 605The Sethi-Ullman complexity completely guides the 606order. 607The addressability defines the leaves. 608The only difficult part is compiling a tree 609that has two infinite (function call) 610subtrees. 611In this case, 612one subtree is compiled into the return register 613(usually the most convenient place for a function call) 614and then stored on the stack. 615The other subtree is compiled into the return register 616and then the operation is compiled with 617operands from the stack and the return register. 618.LP 619There is a separate boolean code generator that compiles 620conditional expressions. 621This is fundamentally different from compiling an arithmetic expression. 622The result of the boolean code generator is the 623position of the program counter and not an expression. 624The boolean code generator makes extensive use of De Morgan's rule. 625The boolean code generator is an expanded version of that described 626in chapter 8 of Aho, Sethi, and Ullman 627[Aho87]. 628.LP 629There is a considerable amount of talk in the literature 630about automating this part of a compiler with a machine 631description. 632Since this code generator is so small 633(less than 500 lines of C) 634and easy, 635it hardly seems worth the effort. 636.NH 2 637Registerization 638.LP 639Up to now, 640the compiler has operated on syntax trees 641that are roughly equivalent to the original source language. 642The previous pass has produced machine language in an internal 643format. 644The next two passes operate on the internal machine language 645structures. 646The purpose of the next pass is to reintroduce 647registers for heavily used variables. 648.LP 649All of the variables that can be 650potentially registerized within a procedure are 651placed in a table. 652(Suitable variables are any automatic or external 653scalars that do not have their addresses extracted. 654Some constants that are hard to reference are also 655considered for registerization.) 656Four separate data flow equations are evaluated 657over the procedure on all of these variables. 658Two of the equations are the normal set-behind 659and used-ahead 660bits that define the life of a variable. 661The two new bits tell if a variable life 662crosses a function call ahead or behind. 663By examining a variable over its lifetime, 664it is possible to get a cost 665for registerizing. 666Loops are detected and the costs are multiplied 667by three for every level of loop nesting. 668Costs are sorted and the variables 669are replaced by available registers on a greedy basis. 670.LP 671The 68020 has two different 672types of registers. 673For the 68020, 674two different costs are calculated for 675each variable life and the register type that 676affords the better cost is used. 677Ties are broken by counting the number of available 678registers of each type. 679.LP 680Note that externals are registerized together with automatics. 681This is done by evaluating the semantics of a ``call'' instruction 682differently for externals and automatics. 683Since a call goes outside the local procedure, 684it is assumed that a call references all externals. 685Similarly, 686externals are assumed to be set before an ``entry'' instruction 687and assumed to be referenced after a ``return'' instruction. 688This makes sure that externals are in memory across calls. 689.LP 690The overall results are satisfactory. 691It would be nice to be able to do this processing in 692a machine-independent way, 693but it is impossible to get all of the costs and 694side effects of different choices by examining the parse tree. 695.LP 696Most of the code in the registerization pass is machine-independent. 697The major machine-dependency is in 698examining a machine instruction to ask if it sets or references 699a variable. 700.NH 2 701Machine code optimization 702.LP 703The next pass walks the machine code 704for opportunistic optimizations. 705For the most part, 706this is highly specific to a particular 707processor. 708One optimization that is performed 709on all of the processors is the 710removal of unnecessary ``move'' 711instructions. 712Ironically, 713most of these instructions were inserted by 714the previous pass. 715There are two patterns that are repetitively 716matched and replaced until no more matches are 717found. 718The first tries to remove ``move'' instructions 719by relabeling variables. 720.LP 721When a ``move'' instruction is encountered, 722if the destination variable is set before the 723source variable is referenced, 724then all of the references to the destination 725variable can be renamed to the source and the ``move'' 726can be deleted. 727This transformation uses the reverse data flow 728set up in the previous pass. 729.LP 730An example of this pattern is depicted in the following 731table. 732The pattern is in the left column and the 733replacement action is in the right column. 734.DS 735.CW 736.ta .1i .6i 1.6i 2.1i 2.6i 737 MOVE a->b \fR(remove)\fP 738.R 739 (sequence with no mention of \f(CWa\fP) 740.CW 741 USE b USE a 742.R 743 (sequence with no mention of \f(CWa\fP) 744.CW 745 SET b SET b 746.DE 747.LP 748Experiments have shown that it is marginally 749worthwhile to rename uses of the destination variable 750with uses of the source variable up to 751the first use of the source variable. 752.LP 753The second transform will do relabeling 754without deleting instructions. 755When a ``move'' instruction is encountered, 756if the source variable has been set prior 757to the use of the destination variable 758then all of the references to the source 759variable are replaced by the destination and 760the ``move'' is inverted. 761Typically, 762this transformation will alter two ``move'' 763instructions and allow the first transformation 764another chance to remove code. 765This transformation uses the forward data flow 766set up in the previous pass. 767.LP 768Again, 769the following is a depiction of the transformation where 770the pattern is in the left column and the 771rewrite is in the right column. 772.DS 773.CW 774.ta .1i .6i 1.6i 2.1i 2.6i 775 SET a SET b 776.R 777 (sequence with no use of \f(CWb\fP) 778.CW 779 USE a USE b 780.R 781 (sequence with no use of \f(CWb\fP) 782.CW 783 MOVE a->b MOVE b->a 784.DE 785Iterating these transformations 786will usually get rid of all redundant ``move'' instructions. 787.LP 788A problem with this organization is that the costs 789of registerization calculated in the previous pass 790must depend on how well this pass can detect and remove 791redundant instructions. 792Often, 793a fine candidate for registerization is rejected 794because of the cost of instructions that are later 795removed. 796.NH 2 797Writing the object file 798.LP 799The last pass walks the internal assembly language 800and writes the object file. 801The object file is reduced in size by about a factor 802of three with simple compression 803techniques. 804The most important aspect of the object file 805format is that it is independent of the compiling machine. 806All integer and floating numbers in the object 807code are converted to known formats and byte 808orders. 809.NH 810The loader 811.LP 812The loader is a multiple pass program that 813reads object files and libraries and produces 814an executable binary. 815The loader also does some minimal 816optimizations and code rewriting. 817Many of the operations performed by the 818loader are machine-dependent. 819.LP 820The first pass of the loader reads the 821object modules into an internal data 822structure that looks like binary assembly language. 823As the instructions are read, 824code is reordered to remove 825unconditional branch instructions. 826Conditional branch instructions are inverted 827to prevent the insertion of unconditional branches. 828The loader will also make a copy of a few instructions 829to remove an unconditional branch. 830.LP 831The next pass allocates addresses for 832all external data. 833Typical of processors is the MIPS, 834which can reference ±32K bytes from a 835register. 836The loader allocates the register 837.CW R30 838as the static pointer. 839The value placed in 840.CW R30 841is the base of the data segment plus 32K. 842It is then cheap to reference all data in the 843first 64K of the data segment. 844External variables are allocated to 845the data segment 846with the smallest variables allocated first. 847If all of the data cannot fit into the first 84864K of the data segment, 849then usually only a few large arrays 850need more expensive addressing modes. 851.LP 852For the MIPS processor, 853the loader makes a pass over the internal 854structures, 855exchanging instructions to try 856to fill ``delay slots'' with useful work. 857If a useful instruction cannot be found 858to fill a delay slot, 859the loader will insert 860``noop'' 861instructions. 862This pass is very expensive and does not 863do a good job. 864About 40% of all instructions are in 865delay slots. 866About 65% of these are useful instructions and 86735% are ``noops.'' 868The vendor-supplied assembler does this job 869more effectively, 870filling about 80% 871of the delay slots with useful instructions. 872.LP 873On the 68020 processor, 874branch instructions come in a variety of 875sizes depending on the relative distance 876of the branch. 877Thus the size of branch instructions 878can be mutually dependent. 879The loader uses a multiple pass algorithm 880to resolve the branch lengths 881[Szy78]. 882Initially, all branches are assumed minimal length. 883On each subsequent pass, 884the branches are reassessed 885and expanded if necessary. 886When no more expansions occur, 887the locations of the instructions in 888the text segment are known. 889.LP 890On the MIPS processor, 891all instructions are one size. 892A single pass over the instructions will 893determine the locations of all addresses 894in the text segment. 895.LP 896The last pass of the loader produces the 897executable binary. 898A symbol table and other tables are 899produced to help the debugger to 900interpret the binary symbolically. 901.LP 902The loader places absolute source line numbers in the symbol table. 903The name and absolute line number of all 904.CW #include 905files is also placed in the 906symbol table so that the debuggers can 907associate object code to source files. 908.NH 909Performance 910.LP 911The following is a table of the source size of the MIPS 912compiler. 913.DS 914.ta .1i .6i 915 lines module 916 \0509 machine-independent headers 917 1070 machine-independent YACC source 918 6090 machine-independent C source 919 920 \0545 machine-dependent headers 921 6532 machine-dependent C source 922 923 \0298 loader headers 924 5215 loader C source 925.DE 926.LP 927The following table shows timing 928of a test program 929that plays checkers, running on a MIPS R4000. 930The test program is 26 files totaling 12600 lines of C. 931The execution time does not significantly 932depend on library implementation. 933Since no other compiler runs on Plan 9, 934the Plan 9 tests were done with the Plan 9 operating system; 935the other tests were done on the vendor's operating system. 936The hardware was identical in both cases. 937The optimizer in the vendor's compiler 938is reputed to be extremely good. 939.DS 940.ta .1i .9i 941 \0\04.49s Plan 9 \f(CWvc\fP \f(CW-N\fP compile time (opposite of \f(CW-O\fP) 942 \0\01.72s Plan 9 \f(CWvc\fP \f(CW-N\fP load time 943 148.69s Plan 9 \f(CWvc\fP \f(CW-N\fP run time 944 945 \015.07s Plan 9 \f(CWvc\fP compile time (\f(CW-O\fP implicit) 946 \0\01.66s Plan 9 \f(CWvc\fP load time 947 \089.96s Plan 9 \f(CWvc\fP run time 948 949 \014.83s vendor \f(CWcc\fP compile time 950 \0\00.38s vendor \f(CWcc\fP load time 951 104.75s vendor \f(CWcc\fP run time 952 953 \043.59s vendor \f(CWcc\fP \f(CW-O\fP compile time 954 \0\00.38s vendor \f(CWcc\fP \f(CW-O\fP load time 955 \076.19s vendor \f(CWcc\fP \f(CW-O\fP run time 956 957 \0\08.19s vendor \f(CWcc\fP \f(CW-O3\fP compile time 958 \035.97s vendor \f(CWcc\fP \f(CW-O3\fP load time 959 \071.16s vendor \f(CWcc\fP \f(CW-O3\fP run time 960.DE 961.LP 962To compare the Intel compiler, 963a program that is about 40% bit manipulation and 964about 60% single precision floating point was 965run on the same 33 MHz 486, once under Windows 966compiled with the Watcom compiler, version 10.0, 967in 16-bit mode and once under 968Plan 9 in 32-bit mode. 969The Plan 9 execution time was 27 sec while the Windows 970execution time was 31 sec. 971.NH 972Conclusions 973.LP 974The new compilers compile 975quickly, 976load slowly, 977and produce 978medium quality 979object code. 980The compilers are relatively 981portable, 982requiring but a couple of weeks' work to 983produce a compiler for a different computer. 984For Plan 9, 985where we needed several compilers 986with specialized features and 987our own object formats, 988this project was indispensable. 989It is also necessary for us to 990be able to freely distribute our compilers 991with the Plan 9 distribution. 992.LP 993Two problems have come up in retrospect. 994The first has to do with the 995division of labor between compiler and loader. 996Plan 9 runs on multi-processors and as such 997compilations are often done in parallel. 998Unfortunately, 999all compilations must be complete before loading 1000can begin. 1001The load is then single-threaded. 1002With this model, 1003any shift of work from compile to load 1004results in a significant increase in real time. 1005The same is true of libraries that are compiled 1006infrequently and loaded often. 1007In the future, 1008we may try to put some of the loader work 1009back into the compiler. 1010.LP 1011The second problem comes from 1012the various optimizations performed over several 1013passes. 1014Often optimizations in different passes depend 1015on each other. 1016Iterating the passes could compromise efficiency, 1017or even loop. 1018We see no real solution to this problem. 1019.NH 1020References 1021.LP 1022[Aho87] A. V. Aho, R. Sethi, and J. D. Ullman, 1023.I 1024Compilers \- Principles, Techniques, and Tools, 1025.R 1026Addison Wesley, 1027Reading, MA, 10281987. 1029.LP 1030[ANSI90] \f2American National Standard for Information Systems \- 1031Programming Language C\f1, American National Standards Institute, Inc., 1032New York, 1990. 1033.LP 1034[Dav91] J. W. Davidson and D. B. Whalley, 1035``Methods for Saving and Restoring Register Values across Function Calls'', 1036.I 1037Software\-Practice and Experience, 1038.R 1039Vol 21(2), pp. 149-165, February 1991. 1040.LP 1041[Joh79] S. C. Johnson, 1042``YACC \- Yet Another Compiler Compiler'', 1043.I 1044UNIX Programmer's Manual, Seventh Ed., Vol. 2A, 1045.R 1046AT&T Bell Laboratories, 1047Murray Hill, NJ, 10481979. 1049.LP 1050[Set70] R. Sethi and J. D. Ullman, 1051``The Generation of Optimal Code for Arithmetic Expressions'', 1052.I 1053Journal of the ACM, 1054.R 1055Vol 17(4), pp. 715-728, 1970. 1056.LP 1057[Szy78] T. G. Szymanski, 1058``Assembling Code for Machines with Span-dependent Instructions'', 1059.I 1060Communications of the ACM, 1061.R 1062Vol 21(4), pp. 300-308, 1978. 1063