1.HTML "How to Use the Plan 9 C Compiler 2.TL 3How to Use the Plan 9 C Compiler 4.AU 5Rob Pike 6rob@plan9.bell-labs.com 7.SH 8Introduction 9.PP 10The C compiler on Plan 9 is a wholly new program; in fact 11it was the first piece of software written for what would 12eventually become Plan 9 from Bell Labs. 13Programmers familiar with existing C compilers will find 14a number of differences in both the language the Plan 9 compiler 15accepts and in how the compiler is used. 16.PP 17The compiler is really a set of compilers, one for each 18architecture \(em MIPS, SPARC, Motorola 68020, Intel 386, etc. \(em 19that accept a dialect of ANSI C and efficiently produce 20fairly good code for the target machine. 21There is a packaging of the compiler that accepts strict ANSI C for 22a POSIX environment, but this document focuses on the 23native Plan 9 environment, that in which all the system source and 24almost all the utilities are written. 25.SH 26Source 27.PP 28The language accepted by the compilers is the core ANSI C language 29with some modest extensions, 30a greatly simplified preprocessor, 31a smaller library that includes system calls and related facilities, 32and a completely different structure for include files. 33.PP 34Official ANSI C accepts the old (K&R) style of declarations for 35functions; the Plan 9 compilers 36are more demanding. 37Without an explicit run-time flag 38.CW -B ) ( 39whose use is discouraged, the compilers insist 40on new-style function declarations, that is, prototypes for 41function arguments. 42The function declarations in the libraries' include files are 43all in the new style so the interfaces are checked at compile time. 44For C programmers who have not yet switched to function prototypes 45the clumsy syntax may seem repellent but the payoff in stronger typing 46is substantial. 47Those who wish to import existing software to Plan 9 are urged 48to use the opportunity to update their code. 49.PP 50The compilers include an integrated preprocessor that accepts the familiar 51.CW #include , 52.CW #define 53for macros both with and without arguments, 54.CW #undef , 55.CW #line , 56.CW #ifdef , 57.CW #ifndef , 58and 59.CW #endif . 60It 61supports neither 62.CW #if 63nor 64.CW ## , 65although it does 66honor a few 67.CW #pragmas . 68The 69.CW #if 70directive was omitted because it greatly complicates the 71preprocessor, is never necessary, and is usually abused. 72Conditional compilation in general makes code hard to understand; 73the Plan 9 source uses it sparingly. 74Also, because the compilers remove dead code, regular 75.CW if 76statements with constant conditions are more readable equivalents to many 77.CW #ifs . 78To compile imported code ineluctably fouled by 79.CW #if 80there is a separate command, 81.CW /bin/cpp , 82that implements the complete ANSI C preprocessor specification. 83.PP 84Include files fall into two groups: machine-dependent and machine-independent. 85The machine-independent files occupy the directory 86.CW /sys/include ; 87the others are placed in a directory appropriate to the machine, such as 88.CW /mips/include . 89The compiler searches for include files 90first in the machine-dependent directory and then 91in the machine-independent directory. 92At the time of writing there are thirty-one machine-independent include 93files and two (per machine) machine-dependent ones: 94.CW <ureg.h> 95and 96.CW <u.h> . 97The first describes the layout of registers on the system stack, 98for use by the debugger. 99The second defines some 100architecture-dependent types such as 101.CW jmp_buf 102for 103.CW setjmp 104and the 105.CW va_arg 106and 107.CW va_list 108macros for handling arguments to variadic functions, 109as well as a set of 110.CW typedef 111abbreviations for 112.CW unsigned 113.CW short 114and so on. 115.PP 116Here is an excerpt from 117.CW /68020/include/u.h : 118.P1 119#define nil ((void*)0) 120typedef unsigned short ushort; 121typedef unsigned char uchar; 122typedef unsigned long ulong; 123typedef unsigned int uint; 124typedef signed char schar; 125typedef long long vlong; 126 127typedef long jmp_buf[2]; 128#define JMPBUFSP 0 129#define JMPBUFPC 1 130#define JMPBUFDPC 0 131.P2 132Plan 9 programs use 133.CW nil 134for the name of the zero-valued pointer. 135The type 136.CW vlong 137is the largest integer type available; on most architectures it 138is a 64-bit value. 139A couple of other types in 140.CW <u.h> 141are 142.CW u32int , 143which is guaranteed to have exactly 32 bits (a possibility on all the supported architectures) and 144.CW mpdigit , 145which is used by the multiprecision math package 146.CW <mp.h> . 147The 148.CW #define 149constants permit an architecture-independent (but compiler-dependent) 150implementation of stack-switching using 151.CW setjmp 152and 153.CW longjmp . 154.PP 155Every Plan 9 C program begins 156.P1 157#include <u.h> 158.P2 159because all the other installed header files use the 160.CW typedefs 161declared in 162.CW <u.h> . 163.PP 164In strict ANSI C, include files are grouped to collect related functions 165in a single file: one for string functions, one for memory functions, 166one for I/O, and none for system calls. 167Each include file is protected by an 168.CW #ifdef 169to guarantee its contents are seen by the compiler only once. 170Plan 9 takes a different approach. Other than a few include 171files that define external formats such as archives, the files in 172.CW /sys/include 173correspond to 174.I libraries. 175If a program is using a library, it includes the corresponding header. 176The default C library comprises string functions, memory functions, and 177so on, largely as in ANSI C, some formatted I/O routines, 178plus all the system calls and related functions. 179To use these functions, one must 180.CW #include 181the file 182.CW <libc.h> , 183which in turn must follow 184.CW <u.h> , 185to define their prototypes for the compiler. 186Here is the complete source to the traditional first C program: 187.P1 188#include <u.h> 189#include <libc.h> 190 191void 192main(void) 193{ 194 print("hello world\en"); 195 exits(0); 196} 197.P2 198The 199.CW print 200routine and its relatives 201.CW fprint 202and 203.CW sprint 204resemble the similarly-named functions in Standard I/O but are not 205attached to a specific I/O library. 206In Plan 9 207.CW main 208is not integer-valued; it should call 209.CW exits , 210which takes a string argument (or null; here ANSI C promotes the 0 to a 211.CW char* ). 212All these functions are, of course, documented in the Programmer's Manual. 213.PP 214To use 215.CW printf , 216.CW <stdio.h> 217must be included to define the function prototype for 218.CW printf : 219.P1 220#include <u.h> 221#include <libc.h> 222#include <stdio.h> 223 224void 225main(int argc, char *argv[]) 226{ 227 printf("%s: hello world; argc = %d\en", argv[0], argc); 228 exits(0); 229} 230.P2 231In practice, Standard I/O is not used much in Plan 9. I/O libraries are 232discussed in a later section of this document. 233.PP 234There are libraries for handling regular expressions, raster graphics, 235windows, and so on, and each has an associated include file. 236The manual for each library states which include files are needed. 237The files are not protected against multiple inclusion and themselves 238contain no nested 239.CW #includes . 240Instead the 241programmer is expected to sort out the requirements 242and to 243.CW #include 244the necessary files once at the top of each source file. In practice this is 245trivial: this way of handling include files is so straightforward 246that it is rare for a source file to contain more than half a dozen 247.CW #includes . 248.PP 249The compilers do their own register allocation so the 250.CW register 251keyword is ignored. 252For different reasons, 253.CW volatile 254and 255.CW const 256are also ignored. 257.PP 258To make it easier to share code with other systems, Plan 9 has a version 259of the compiler, 260.CW pcc , 261that provides the standard ANSI C preprocessor, headers, and libraries 262with POSIX extensions. 263.CW Pcc 264is recommended only 265when broad external portability is mandated. It compiles slower, 266produces slower code (it takes extra work to simulate POSIX on Plan 9), 267eliminates those parts of the Plan 9 interface 268not related to POSIX, and illustrates the clumsiness of an environment 269designed by committee. 270.CW Pcc 271is described in more detail in 272.I 273APE\(emThe ANSI/POSIX Environment, 274.R 275by Howard Trickey. 276.SH 277Process 278.PP 279Each CPU architecture supported by Plan 9 is identified by a single, 280arbitrary, alphanumeric character: 281.CW k 282for SPARC, 283.CW q 284for Motorola Power PC 630 and 640, 285.CW v 286for MIPS, 287.CW 1 288for Motorola 68000, 289.CW 2 290for Motorola 68020 and 68040, 291.CW 5 292for Acorn ARM 7500, 293.CW 6 294for Intel 960, 295.CW 7 296for DEC Alpha, 297.CW 8 298for Intel 386, and 299.CW 9 300for AMD 29000. 301The character labels the support tools and files for that architecture. 302For instance, for the 68020 the compiler is 303.CW 2c , 304the assembler is 305.CW 2a , 306the link editor/loader is 307.CW 2l , 308the object files are suffixed 309.CW \&.2 , 310and the default name for an executable file is 311.CW 2.out . 312Before we can use the compiler we therefore need to know which 313machine we are compiling for. 314The next section explains how this decision is made; for the moment 315assume we are building 68020 binaries and make the mental substitution for 316.CW 2 317appropriate to the machine you are actually using. 318.PP 319To convert source to an executable binary is a two-step process. 320First run the compiler, 321.CW 2c , 322on the source, say 323.CW file.c , 324to generate an object file 325.CW file.2 . 326Then run the loader, 327.CW 2l , 328to generate an executable 329.CW 2.out 330that may be run (on a 680X0 machine): 331.P1 3322c file.c 3332l file.2 3342.out 335.P2 336The loader automatically links with whatever libraries the program 337needs, usually including the standard C library as defined by 338.CW <libc.h> . 339Of course the compiler and loader have lots of options, both familiar and new; 340see the manual for details. 341The compiler does not generate an executable automatically; 342the output of the compiler must be given to the loader. 343Since most compilation is done under the control of 344.CW mk 345(see below), this is rarely an inconvenience. 346.PP 347The distribution of work between the compiler and loader is unusual. 348The compiler integrates preprocessing, parsing, register allocation, 349code generation and some assembly. 350Combining these tasks in a single program is part of the reason for 351the compiler's efficiency. 352The loader does instruction selection, branch folding, 353instruction scheduling, 354and writes the final executable. 355There is no separate C preprocessor and no assembler in the usual pipeline. 356Instead the intermediate object file 357(here a 358.CW \&.2 359file) is a type of binary assembly language. 360The instructions in the intermediate format are not exactly those in 361the machine. For example, on the 68020 the object file may specify 362a MOVE instruction but the loader will decide just which variant of 363the MOVE instruction \(em MOVE immediate, MOVE quick, MOVE address, 364etc. \(em is most efficient. 365.PP 366The assembler, 367.CW 2a , 368is just a translator between the textual and binary 369representations of the object file format. 370It is not an assembler in the traditional sense. It has limited 371macro capabilities (the same as the integral C preprocessor in the compiler), 372clumsy syntax, and minimal error checking. For instance, the assembler 373will accept an instruction (such as memory-to-memory MOVE on the MIPS) that the 374machine does not actually support; only when the output of the assembler 375is passed to the loader will the error be discovered. 376The assembler is intended only for writing things that need access to instructions 377invisible from C, 378such as the machine-dependent 379part of an operating system; 380very little code in Plan 9 is in assembly language. 381.PP 382The compilers take an option 383.CW -S 384that causes them to print on their standard output the generated code 385in a format acceptable as input to the assemblers. 386This is of course merely a formatting of the 387data in the object file; therefore the assembler is just 388an 389ASCII-to-binary converter for this format. 390Other than the specific instructions, the input to the assemblers 391is largely architecture-independent; see 392``A Manual for the Plan 9 Assembler'', 393by Rob Pike, 394for more information. 395.PP 396The loader is an integral part of the compilation process. 397Each library header file contains a 398.CW #pragma 399that tells the loader the name of the associated archive; it is 400not necessary to tell the loader which libraries a program uses. 401The C run-time startup is found, by default, in the C library. 402The loader starts with an undefined 403symbol, 404.CW _main , 405that is resolved by pulling in the run-time startup code from the library. 406(The loader undefines 407.CW _mainp 408when profiling is enabled, to force loading of the profiling start-up 409instead.) 410.PP 411Unlike its counterpart on other systems, the Plan 9 loader rearranges 412data to optimize access. This means the order of variables in the 413loaded program is unrelated to its order in the source. 414Most programs don't care, but some assume that, for example, the 415variables declared by 416.P1 417int a; 418int b; 419.P2 420will appear at adjacent addresses in memory. On Plan 9, they won't. 421.SH 422Heterogeneity 423.PP 424When the system starts or a user logs in the environment is configured 425so the appropriate binaries are available in 426.CW /bin . 427The configuration process is controlled by an environment variable, 428.CW $cputype , 429with value such as 430.CW mips , 431.CW 68020 , 432.CW 386 , 433or 434.CW sparc . 435For each architecture there is a directory in the root, 436with the appropriate name, 437that holds the binary and library files for that architecture. 438Thus 439.CW /mips/lib 440contains the object code libraries for MIPS programs, 441.CW /mips/include 442holds MIPS-specific include files, and 443.CW /mips/bin 444has the MIPS binaries. 445These binaries are attached to 446.CW /bin 447at boot time by binding 448.CW /$cputype/bin 449to 450.CW /bin , 451so 452.CW /bin 453always contains the correct files. 454.PP 455The MIPS compiler, 456.CW vc , 457by definition 458produces object files for the MIPS architecture, 459regardless of the architecture of the machine on which the compiler is running. 460There is a version of 461.CW vc 462compiled for each architecture: 463.CW /mips/bin/vc , 464.CW /68020/bin/vc , 465.CW /sparc/bin/vc , 466and so on, 467each capable of producing MIPS object files regardless of the native 468instruction set. 469If one is running on a SPARC, 470.CW /sparc/bin/vc 471will compile programs for the MIPS; 472if one is running on machine 473.CW $cputype , 474.CW /$cputype/bin/vc 475will compile programs for the MIPS. 476.PP 477Because of the bindings that assemble 478.CW /bin , 479the shell always looks for a command, say 480.CW date , 481in 482.CW /bin 483and automatically finds the file 484.CW /$cputype/bin/date . 485Therefore the MIPS compiler is known as just 486.CW vc ; 487the shell will invoke 488.CW /bin/vc 489and that is guaranteed to be the version of the MIPS compiler 490appropriate for the machine running the command. 491Regardless of the architecture of the compiling machine, 492.CW /bin/vc 493is 494.I always 495the MIPS compiler. 496.PP 497Also, the output of 498.CW vc 499and 500.CW vl 501is completely independent of the machine type on which they are executed: 502.CW \&.v 503files compiled (with 504.CW vc ) 505on a SPARC may be linked (with 506.CW vl ) 507on a 386. 508(The resulting 509.CW v.out 510will run, of course, only on a MIPS.) 511Similarly, the MIPS libraries in 512.CW /mips/lib 513are suitable for loading with 514.CW vl 515on any machine; there is only one set of MIPS libraries, not one 516set for each architecture that supports the MIPS compiler. 517.SH 518Heterogeneity and \f(CWmk\fP 519.PP 520Most software on Plan 9 is compiled under the control of 521.CW mk , 522a descendant of 523.CW make 524that is documented in the Programmer's Manual. 525A convention used throughout the 526.CW mkfiles 527makes it easy to compile the source into binary suitable for any architecture. 528.PP 529The variable 530.CW $cputype 531is advisory: it reports the architecture of the current environment, and should 532not be modified. A second variable, 533.CW $objtype , 534is used to set which architecture is being 535.I compiled 536for. 537The value of 538.CW $objtype 539can be used by a 540.CW mkfile 541to configure the compilation environment. 542.PP 543In each machine's root directory there is a short 544.CW mkfile 545that defines a set of macros for the compiler, loader, etc. 546Here is 547.CW /mips/mkfile : 548.P1 549</sys/src/mkfile.proto 550 551CC=vc 552LD=vl 553O=v 554AS=va 555.P2 556The line 557.P1 558</sys/src/mkfile.proto 559.P2 560causes 561.CW mk 562to include the file 563.CW /sys/src/mkfile.proto , 564which contains general definitions: 565.P1 566# 567# common mkfile parameters shared by all architectures 568# 569 570OS=v486xq7 571CPUS=mips 386 power alpha 572CFLAGS=-FVw 573LEX=lex 574YACC=yacc 575MK=/bin/mk 576.P2 577.CW CC 578is obviously the compiler, 579.CW AS 580the assembler, and 581.CW LD 582the loader. 583.CW O 584is the suffix for the object files and 585.CW CPUS 586and 587.CW OS 588are used in special rules described below. 589.PP 590Here is a 591.CW mkfile 592to build the installed source for 593.CW sam : 594.P1 595</$objtype/mkfile 596OBJ=sam.$O address.$O buffer.$O cmd.$O disc.$O error.$O \e 597 file.$O io.$O list.$O mesg.$O moveto.$O multi.$O \e 598 plan9.$O rasp.$O regexp.$O string.$O sys.$O xec.$O 599 600$O.out: $OBJ 601 $LD $OBJ 602 603install: $O.out 604 cp $O.out /$objtype/bin/sam 605 606installall: 607 for(objtype in $CPUS) mk install 608 609%.$O: %.c 610 $CC $CFLAGS $stem.c 611 612$OBJ: sam.h errors.h mesg.h 613address.$O cmd.$O parse.$O xec.$O unix.$O: parse.h 614 615clean:V: 616 rm -f [$OS].out *.[$OS] y.tab.? 617.P2 618(The actual 619.CW mkfile 620imports most of its rules from other secondary files, but 621this example works and is not misleading.) 622The first line causes 623.CW mk 624to include the contents of 625.CW /$objtype/mkfile 626in the current 627.CW mkfile . 628If 629.CW $objtype 630is 631.CW mips , 632this inserts the MIPS macro definitions into the 633.CW mkfile . 634In this case the rule for 635.CW $O.out 636uses the MIPS tools to build 637.CW v.out . 638The 639.CW %.$O 640rule in the file uses 641.CW mk 's 642pattern matching facilities to convert the source files to the object 643files through the compiler. 644(The text of the rules is passed directly to the shell, 645.CW rc , 646without further translation. 647See the 648.CW mk 649manual if any of this is unfamiliar.) 650Because the default rule builds 651.CW $O.out 652rather than 653.CW sam , 654it is possible to maintain binaries for multiple machines in the 655same source directory without conflict. 656This is also, of course, why the output files from the various 657compilers and loaders 658have distinct names. 659.PP 660The rest of the 661.CW mkfile 662should be easy to follow; notice how the rules for 663.CW clean 664and 665.CW installall 666(that is, install versions for all architectures) use other macros 667defined in 668.CW /$objtype/mkfile . 669In Plan 9, 670.CW mkfiles 671for commands conventionally contain rules to 672.CW install 673(compile and install the version for 674.CW $objtype ), 675.CW installall 676(compile and install for all 677.CW $objtypes ), 678and 679.CW clean 680(remove all object files, binaries, etc.). 681.PP 682The 683.CW mkfile 684is easy to use. To build a MIPS binary, 685.CW v.out : 686.P1 687% objtype=mips 688% mk 689.P2 690To build and install a MIPS binary: 691.P1 692% objtype=mips 693% mk install 694.P2 695To build and install all versions: 696.P1 697% mk installall 698.P2 699These conventions make cross-compilation as easy to manage 700as traditional native compilation. 701Plan 9 programs compile and run without change on machines from 702large multiprocessors to laptops. For more information about this process, see 703``Plan 9 Mkfiles'', 704by Bob Flandrena. 705.SH 706Portability 707.PP 708Within Plan 9, it is painless to write portable programs, programs whose 709source is independent of the machine on which they execute. 710The operating system is fixed and the compiler, headers and libraries 711are constant so most of the stumbling blocks to portability are removed. 712Attention to a few details can avoid those that remain. 713.PP 714Plan 9 is a heterogeneous environment, so programs must 715.I expect 716that external files will be written by programs on machines of different 717architectures. 718The compilers, for instance, must handle without confusion 719object files written by other machines. 720The traditional approach to this problem is to pepper the source with 721.CW #ifdefs 722to turn byte-swapping on and off. 723Plan 9 takes a different approach: of the handful of machine-dependent 724.CW #ifdefs 725in all the source, almost all are deep in the libraries. 726Instead programs read and write files in a defined format, 727either (for low volume applications) as formatted text, or 728(for high volume applications) as binary in a known byte order. 729If the external data were written with the most significant 730byte first, the following code reads a 4-byte integer correctly 731regardless of the architecture of the executing machine (assuming 732an unsigned long holds 4 bytes): 733.P1 734ulong 735getlong(void) 736{ 737 ulong l; 738 739 l = (getchar()&0xFF)<<24; 740 l |= (getchar()&0xFF)<<16; 741 l |= (getchar()&0xFF)<<8; 742 l |= (getchar()&0xFF)<<0; 743 return l; 744} 745.P2 746Note that this code does not `swap' the bytes; instead it just reads 747them in the correct order. 748Variations of this code will handle any binary format 749and also avoid problems 750involving how structures are padded, how words are aligned, 751and other impediments to portability. 752Be aware, though, that extra care is needed to handle floating point data. 753.PP 754Efficiency hounds will argue that this method is unnecessarily slow and clumsy 755when the executing machine has the same byte order (and padding and alignment) 756as the data. 757The CPU cost of I/O processing 758is rarely the bottleneck for an application, however, 759and the gain in simplicity of porting and maintaining the code greatly outweighs 760the minor speed loss from handling data in this general way. 761This method is how the Plan 9 compilers, the window system, and even the file 762servers transmit data between programs. 763.PP 764To port programs beyond Plan 9, where the system interface is more variable, 765it is probably necessary to use 766.CW pcc 767and hope that the target machine supports ANSI C and POSIX. 768.SH 769I/O 770.PP 771The default C library, defined by the include file 772.CW <libc.h> , 773contains no buffered I/O package. 774It does have several entry points for printing formatted text: 775.CW print 776outputs text to the standard output, 777.CW fprint 778outputs text to a specified integer file descriptor, and 779.CW sprint 780places text in a character array. 781To access library routines for buffered I/O, a program must 782explicitly include the header file associated with an appropriate library. 783.PP 784The recommended I/O library, used by most Plan 9 utilities, is 785.CW bio 786(buffered I/O), defined by 787.CW <bio.h> . 788There also exists an implementation of ANSI Standard I/O, 789.CW stdio . 790.PP 791.CW Bio 792is small and efficient, particularly for buffer-at-a-time or 793line-at-a-time I/O. 794Even for character-at-a-time I/O, however, it is significantly faster than 795the Standard I/O library, 796.CW stdio . 797Its interface is compact and regular, although it lacks a few conveniences. 798The most noticeable is that one must explicitly define buffers for standard 799input and output; 800.CW bio 801does not predefine them. Here is a program to copy input to output a byte 802at a time using 803.CW bio : 804.P1 805#include <u.h> 806#include <libc.h> 807#include <bio.h> 808 809Biobuf bin; 810Biobuf bout; 811 812main(void) 813{ 814 int c; 815 816 Binit(&bin, 0, OREAD); 817 Binit(&bout, 1, OWRITE); 818 819 while((c=Bgetc(&bin)) != Beof) 820 Bputc(&bout, c); 821 exits(0); 822} 823.P2 824For peak performance, we could replace 825.CW Bgetc 826and 827.CW Bputc 828by their equivalent in-line macros 829.CW BGETC 830and 831.CW BPUTC 832but 833the performance gain would be modest. 834For more information on 835.CW bio , 836see the Programmer's Manual. 837.PP 838Perhaps the most dramatic difference in the I/O interface of Plan 9 from other 839systems' is that text is not ASCII. 840The format for 841text in Plan 9 is a byte-stream encoding of 16-bit characters. 842The character set is based on the Unicode Standard and is backward compatible with 843ASCII: 844characters with value 0 through 127 are the same in both sets. 845The 16-bit characters, called 846.I runes 847in Plan 9, are encoded using a representation called 848UTF, 849an encoding that is becoming accepted as a standard. 850(ISO calls it UTF-8; 851throughout Plan 9 it's just called 852UTF.) 853UTF 854defines multibyte sequences to 855represent character values from 0 to 65535. 856In 857UTF, 858character values up to 127 decimal, 7F hexadecimal, represent themselves, 859so straight 860ASCII 861files are also valid 862UTF. 863Also, 864UTF 865guarantees that bytes with values 0 to 127 (NUL to DEL, inclusive) 866will appear only when they represent themselves, so programs that read bytes 867looking for plain ASCII characters will continue to work. 868Any program that expects a one-to-one correspondence between bytes and 869characters will, however, need to be modified. 870An example is parsing file names. 871File names, like all text, are in 872UTF, 873so it is incorrect to search for a character in a string by 874.CW strchr(filename, 875.CW c) 876because the character might have a multi-byte encoding. 877The correct method is to call 878.CW utfrune(filename, 879.CW c) , 880defined in 881.I rune (2), 882which interprets the file name as a sequence of encoded characters 883rather than bytes. 884In fact, even when you know the character is a single byte 885that can represent only itself, 886it is safer to use 887.CW utfrune 888because that assumes nothing about the character set 889and its representation. 890.PP 891The library defines several symbols relevant to the representation of characters. 892Any byte with unsigned value less than 893.CW Runesync 894will not appear in any multi-byte encoding of a character. 895.CW Utfrune 896compares the character being searched against 897.CW Runesync 898to see if it is sufficient to call 899.CW strchr 900or if the byte stream must be interpreted. 901Any byte with unsigned value less than 902.CW Runeself 903is represented by a single byte with the same value. 904Finally, when errors are encountered converting 905to runes from a byte stream, the library returns the rune value 906.CW Runeerror 907and advances a single byte. This permits programs to find runes 908embedded in binary data. 909.PP 910.CW Bio 911includes routines 912.CW Bgetrune 913and 914.CW Bputrune 915to transform the external byte stream 916UTF 917format to and from 918internal 16-bit runes. 919Also, the 920.CW %s 921format to 922.CW print 923accepts 924UTF; 925.CW %c 926prints a character after narrowing it to 8 bits. 927The 928.CW %S 929format prints a null-terminated sequence of runes; 930.CW %C 931prints a character after narrowing it to 16 bits. 932For more information, see the Programmer's Manual, in particular 933.I utf (6) 934and 935.I rune (2), 936and the paper, 937``Hello world, or 938Καλημέρα κόσμε, or\ 939\f(Jpこんにちは 世界\f1'', 940by Rob Pike and 941Ken Thompson; 942there is not room for the full story here. 943.PP 944These issues affect the compiler in several ways. 945First, the C source is in 946UTF. 947ANSI says C variables are formed from 948ASCII 949alphanumerics, but comments and literal strings may contain any characters 950encoded in the native encoding, here 951UTF. 952The declaration 953.P1 954char *cp = "abcÿ"; 955.P2 956initializes the variable 957.CW cp 958to point to an array of bytes holding the 959UTF 960representation of the characters 961.CW abcÿ. 962The type 963.CW Rune 964is defined in 965.CW <u.h> 966to be 967.CW ushort , 968which is also the `wide character' type in the compiler. 969Therefore the declaration 970.P1 971Rune *rp = L"abcÿ"; 972.P2 973initializes the variable 974.CW rp 975to point to an array of unsigned short integers holding the 16-bit 976values of the characters 977.CW abcÿ . 978Note that in both these declarations the characters in the source 979that represent 980.CW "abcÿ" 981are the same; what changes is how those characters are represented 982in memory in the program. 983The following two lines: 984.P1 985print("%s\en", "abcÿ"); 986print("%S\en", L"abcÿ"); 987.P2 988produce the same 989UTF 990string on their output, the first by copying the bytes, the second 991by converting from runes to bytes. 992.PP 993In C, character constants are integers but narrowed through the 994.CW char 995type. 996The Unicode character 997.CW ÿ 998has value 255, so if the 999.CW char 1000type is signed, 1001the constant 1002.CW 'ÿ' 1003has value \-1 (which is equal to EOF). 1004On the other hand, 1005.CW L'ÿ' 1006narrows through the wide character type, 1007.CW ushort , 1008and therefore has value 255. 1009.PP 1010Finally, although it's not ANSI C, the Plan 9 C compilers 1011assume any character with value above 1012.CW Runeself 1013is an alphanumeric, 1014so α is a legal, if non-portable, variable name. 1015.SH 1016Arguments 1017.PP 1018Some macros are defined 1019in 1020.CW <libc.h> 1021for parsing the arguments to 1022.CW main() . 1023They are described in 1024.I ARG (2) 1025but are fairly self-explanatory. 1026There are four macros: 1027.CW ARGBEGIN 1028and 1029.CW ARGEND 1030are used to bracket a hidden 1031.CW switch 1032statement within which 1033.CW ARGC 1034returns the current option character (rune) being processed and 1035.CW ARGF 1036returns the argument to the option, as in the loader option 1037.CW -o 1038.CW file . 1039Here, for example, is the code at the beginning of 1040.CW main() 1041in 1042.CW ramfs.c 1043(see 1044.I ramfs (1)) 1045that cracks its arguments: 1046.P1 1047void 1048main(int argc, char *argv[]) 1049{ 1050 char *defmnt; 1051 int p[2]; 1052 int mfd[2]; 1053 int stdio = 0; 1054 1055 defmnt = "/tmp"; 1056 ARGBEGIN{ 1057 case 'i': 1058 defmnt = 0; 1059 stdio = 1; 1060 mfd[0] = 0; 1061 mfd[1] = 1; 1062 break; 1063 case 's': 1064 defmnt = 0; 1065 break; 1066 case 'm': 1067 defmnt = ARGF(); 1068 break; 1069 default: 1070 usage(); 1071 }ARGEND 1072.P2 1073.SH 1074Extensions 1075.PP 1076The compiler has several extensions to ANSI C, all of which are used 1077extensively in the system source. 1078First, 1079.I structure 1080.I displays 1081permit 1082.CW struct 1083expressions to be formed dynamically. 1084Given these declarations: 1085.P1 1086typedef struct Point Point; 1087typedef struct Rectangle Rectangle; 1088 1089struct Point 1090{ 1091 int x, y; 1092}; 1093 1094struct Rectangle 1095{ 1096 Point min, max; 1097}; 1098 1099Point p, q, add(Point, Point); 1100Rectangle r; 1101int x, y; 1102.P2 1103this assignment may appear anywhere an assignment is legal: 1104.P1 1105r = (Rectangle){add(p, q), (Point){x, y+3}}; 1106.P2 1107The syntax is the same as for initializing a structure but with 1108a leading cast. 1109.PP 1110If an 1111.I anonymous 1112.I structure 1113or 1114.I union 1115is declared within another structure or union, the members of the internal 1116structure or union are addressable without prefix in the outer structure. 1117This feature eliminates the clumsy naming of nested structures and, 1118particularly, unions. 1119For example, after these declarations, 1120.P1 1121struct Lock 1122{ 1123 int locked; 1124}; 1125 1126struct Node 1127{ 1128 int type; 1129 union{ 1130 double dval; 1131 double fval; 1132 long lval; 1133 }; /* anonymous union */ 1134 struct Lock; /* anonymous structure */ 1135} *node; 1136 1137void lock(struct Lock*); 1138.P2 1139one may refer to 1140.CW node->type , 1141.CW node->dval , 1142.CW node->fval , 1143.CW node->lval , 1144and 1145.CW node->locked . 1146Moreover, the address of a 1147.CW struct 1148.CW Node 1149may be used without a cast anywhere that the address of a 1150.CW struct 1151.CW Lock 1152is used, such as in argument lists. 1153The compiler automatically promotes the type and adjusts the address. 1154Thus one may invoke 1155.CW lock(node) . 1156.PP 1157Anonymous structures and unions may be accessed by type name 1158if (and only if) they are declared using a 1159.CW typedef 1160name. 1161For example, using the above declaration for 1162.CW Point , 1163one may declare 1164.P1 1165struct 1166{ 1167 int type; 1168 Point; 1169} p; 1170.P2 1171and refer to 1172.CW p.Point . 1173.PP 1174In the initialization of arrays, a number in square brackets before an 1175element sets the index for the initialization. For example, to initialize 1176some elements in 1177a table of function pointers indexed by 1178ASCII 1179character, 1180.P1 1181void percent(void), slash(void); 1182 1183void (*func[128])(void) = 1184{ 1185 ['%'] percent, 1186 ['/'] slash, 1187}; 1188.P2 1189.LP 1190A similar syntax allows one to initialize structure elements: 1191.P1 1192Point p = 1193{ 1194 .y 100, 1195 .x 200 1196}; 1197.P2 1198These initialization syntaxes were later added to ANSI C, with the addition of an 1199equals sign between the index or tag and the value. 1200The Plan 9 compiler accepts either form. 1201.PP 1202Finally, the declaration 1203.P1 1204extern register reg; 1205.P2 1206.I this "" ( 1207appearance of the register keyword is not ignored) 1208allocates a global register to hold the variable 1209.CW reg . 1210External registers must be used carefully: they need to be declared in 1211.I all 1212source files and libraries in the program to guarantee the register 1213is not allocated temporarily for other purposes. 1214Especially on machines with few registers, such as the i386, 1215it is easy to link accidentally with code that has already usurped 1216the global registers and there is no diagnostic when this happens. 1217Used wisely, though, external registers are powerful. 1218The Plan 9 operating system uses them to access per-process and 1219per-machine data structures on a multiprocessor. The storage class they provide 1220is hard to create in other ways. 1221.SH 1222The compile-time environment 1223.PP 1224The code generated by the compilers is `optimized' by default: 1225variables are placed in registers and peephole optimizations are 1226performed. 1227The compiler flag 1228.CW -N 1229disables these optimizations. 1230Registerization is done locally rather than throughout a function: 1231whether a variable occupies a register or 1232the memory location identified in the symbol 1233table depends on the activity of the variable and may change 1234throughout the life of the variable. 1235The 1236.CW -N 1237flag is rarely needed; 1238its main use is to simplify debugging. 1239There is no information in the symbol table to identify the 1240registerization of a variable, so 1241.CW -N 1242guarantees the variable is always where the symbol table says it is. 1243.PP 1244Another flag, 1245.CW -w , 1246turns 1247.I on 1248warnings about portability and problems detected in flow analysis. 1249Most code in Plan 9 is compiled with warnings enabled; 1250these warnings plus the type checking offered by function prototypes 1251provide most of the support of the Unix tool 1252.CW lint 1253more accurately and with less chatter. 1254Two of the warnings, 1255`used and not set' and `set and not used', are almost always accurate but 1256may be triggered spuriously by code with invisible control flow, 1257such as in routines that call 1258.CW longjmp . 1259The compiler statements 1260.P1 1261SET(v1); 1262USED(v2); 1263.P2 1264decorate the flow graph to silence the compiler. 1265Either statement accepts a comma-separated list of variables. 1266Use them carefully: they may silence real errors. 1267For the common case of unused parameters to a function, 1268leaving the name off the declaration silences the warnings. 1269That is, listing the type of a parameter but giving it no 1270associated variable name does the trick. 1271.SH 1272Debugging 1273.PP 1274There are two debuggers available on Plan 9. 1275The first, and older, is 1276.CW db , 1277a revision of Unix 1278.CW adb . 1279The other, 1280.CW acid , 1281is a source-level debugger whose commands are statements in 1282a true programming language. 1283.CW Acid 1284is the preferred debugger, but since it 1285borrows some elements of 1286.CW db , 1287notably the formats for displaying values, it is worth knowing a little bit about 1288.CW db . 1289.PP 1290Both debuggers support multiple architectures in a single program; that is, 1291the programs are 1292.CW db 1293and 1294.CW acid , 1295not for example 1296.CW vdb 1297and 1298.CW vacid . 1299They also support cross-architecture debugging comfortably: 1300one may debug a 68020 binary on a MIPS. 1301.PP 1302Imagine a program has crashed mysteriously: 1303.P1 1304% X11/X 1305Fatal server bug! 1306failed to create default stipple 1307X 106: suicide: sys: trap: fault read addr=0x0 pc=0x00105fb8 1308% 1309.P2 1310When a process dies on Plan 9 it hangs in the `broken' state 1311for debugging. 1312Attach a debugger to the process by naming its process id: 1313.P1 1314% acid 106 1315/proc/106/text:mips plan 9 executable 1316 1317/sys/lib/acid/port 1318/sys/lib/acid/mips 1319acid: 1320.P2 1321The 1322.CW acid 1323function 1324.CW stk() 1325reports the stack traceback: 1326.P1 1327acid: stk() 1328At pc:0x105fb8:abort+0x24 /sys/src/ape/lib/ap/stdio/abort.c:6 1329abort() /sys/src/ape/lib/ap/stdio/abort.c:4 1330 called from FatalError+#4e 1331 /sys/src/X/mit/server/dix/misc.c:421 1332FatalError(s9=#e02, s8=#4901d200, s7=#2, s6=#72701, s5=#1, 1333 s4=#7270d, s3=#6, s2=#12, s1=#ff37f1c, s0=#6, f=#7270f) 1334 /sys/src/X/mit/server/dix/misc.c:416 1335 called from gnotscreeninit+#4ce 1336 /sys/src/X/mit/server/ddx/gnot/gnot.c:792 1337gnotscreeninit(snum=#0, sc=#80db0) 1338 /sys/src/X/mit/server/ddx/gnot/gnot.c:766 1339 called from AddScreen+#16e 1340 /n/bootes/sys/src/X/mit/server/dix/main.c:610 1341AddScreen(pfnInit=0x0000129c,argc=0x00000001,argv=0x7fffffe4) 1342 /sys/src/X/mit/server/dix/main.c:530 1343 called from InitOutput+0x80 1344 /sys/src/X/mit/server/ddx/brazil/brddx.c:522 1345InitOutput(argc=0x00000001,argv=0x7fffffe4) 1346 /sys/src/X/mit/server/ddx/brazil/brddx.c:511 1347 called from main+0x294 1348 /sys/src/X/mit/server/dix/main.c:225 1349main(argc=0x00000001,argv=0x7fffffe4) 1350 /sys/src/X/mit/server/dix/main.c:136 1351 called from _main+0x24 1352 /sys/src/ape/lib/ap/mips/main9.s:8 1353.P2 1354The function 1355.CW lstk() 1356is similar but 1357also reports the values of local variables. 1358Note that the traceback includes full file names; this is a boon to debugging, 1359although it makes the output much noisier. 1360.PP 1361To use 1362.CW acid 1363well you will need to learn its input language; see the 1364``Acid Manual'', 1365by Phil Winterbottom, 1366for details. For simple debugging, however, the information in the manual page is 1367sufficient. In particular, it describes the most useful functions 1368for examining a process. 1369.PP 1370The compiler does not place 1371information describing the types of variables in the executable, 1372but a compile-time flag provides crude support for symbolic debugging. 1373The 1374.CW -a 1375flag to the compiler suppresses code generation 1376and instead emits source text in the 1377.CW acid 1378language to format and display data structure types defined in the program. 1379The easiest way to use this feature is to put a rule in the 1380.CW mkfile : 1381.P1 1382syms: main.$O 1383 $CC -a main.c > syms 1384.P2 1385Then from within 1386.CW acid , 1387.P1 1388acid: include("sourcedirectory/syms") 1389.P2 1390to read in the relevant definitions. 1391(For multi-file source, you need to be a little fancier; 1392see 1393.I 2c (1)). 1394This text includes, for each defined compound 1395type, a function with that name that may be called with the address of a structure 1396of that type to display its contents. 1397For example, if 1398.CW rect 1399is a global variable of type 1400.CW Rectangle , 1401one may execute 1402.P1 1403Rectangle(*rect) 1404.P2 1405to display it. 1406The 1407.CW * 1408(indirection) operator is necessary because 1409of the way 1410.CW acid 1411works: each global symbol in the program is defined as a variable by 1412.CW acid , 1413with value equal to the 1414.I address 1415of the symbol. 1416.PP 1417Another common technique is to write by hand special 1418.CW acid 1419code to define functions to aid debugging, initialize the debugger, and so on. 1420Conventionally, this is placed in a file called 1421.CW acid 1422in the source directory; it has a line 1423.P1 1424include("sourcedirectory/syms"); 1425.P2 1426to load the compiler-produced symbols. One may edit the compiler output directly but 1427it is wiser to keep the hand-generated 1428.CW acid 1429separate from the machine-generated. 1430.PP 1431To make things simple, the default rules in the system 1432.CW mkfiles 1433include entries to make 1434.CW foo.acid 1435from 1436.CW foo.c , 1437so one may use 1438.CW mk 1439to automate the production of 1440.CW acid 1441definitions for a given C source file. 1442.PP 1443There is much more to say here. See 1444.CW acid 1445manual page, the reference manual, or the paper 1446``Acid: A Debugger Built From A Language'', 1447also by Phil Winterbottom. 1448