1.HTML "How to Use the Plan 9 C Compiler 2.TL 3How to Use the Plan 9 C Compiler 4.AU 5Rob Pike 6rob@plan9.bell-labs.com 7.SH 8Introduction 9.PP 10The C compiler on Plan 9 is a wholly new program; in fact 11it was the first piece of software written for what would 12eventually become Plan 9 from Bell Labs. 13Programmers familiar with existing C compilers will find 14a number of differences in both the language the Plan 9 compiler 15accepts and in how the compiler is used. 16.PP 17The compiler is really a set of compilers, one for each 18architecture \(em MIPS, SPARC, Motorola 68020, Intel 386, etc. \(em 19that accept a dialect of ANSI C and efficiently produce 20fairly good code for the target machine. 21There is a packaging of the compiler that accepts strict ANSI C for 22a POSIX environment, but this document focuses on the 23native Plan 9 environment, that in which all the system source and 24almost all the utilities are written. 25.SH 26Source 27.PP 28The language accepted by the compilers is the core ANSI C language 29with some modest extensions, 30a greatly simplified preprocessor, 31a smaller library that includes system calls and related facilities, 32and a completely different structure for include files. 33.PP 34Official ANSI C accepts the old (K&R) style of declarations for 35functions; the Plan 9 compilers 36are more demanding. 37Without an explicit run-time flag 38.CW -B ) ( 39whose use is discouraged, the compilers insist 40on new-style function declarations, that is, prototypes for 41function arguments. 42The function declarations in the libraries' include files are 43all in the new style so the interfaces are checked at compile time. 44For C programmers who have not yet switched to function prototypes 45the clumsy syntax may seem repellent but the payoff in stronger typing 46is substantial. 47Those who wish to import existing software to Plan 9 are urged 48to use the opportunity to update their code. 49.PP 50The compilers include an integrated preprocessor that accepts the familiar 51.CW #include , 52.CW #define 53for macros both with and without arguments, 54.CW #undef , 55.CW #line , 56.CW #ifdef , 57.CW #ifndef , 58and 59.CW #endif . 60It 61supports neither 62.CW #if 63nor 64.CW ## , 65although it does 66honor a few 67.CW #pragmas . 68The 69.CW #if 70directive was omitted because it greatly complicates the 71preprocessor, is never necessary, and is usually abused. 72Conditional compilation in general makes code hard to understand; 73the Plan 9 source uses it sparingly. 74Also, because the compilers remove dead code, regular 75.CW if 76statements with constant conditions are more readable equivalents to many 77.CW #ifs . 78To compile imported code ineluctably fouled by 79.CW #if 80there is a separate command, 81.CW /bin/cpp , 82that implements the complete ANSI C preprocessor specification. 83.PP 84Include files fall into two groups: machine-dependent and machine-independent. 85The machine-independent files occupy the directory 86.CW /sys/include ; 87the others are placed in a directory appropriate to the machine, such as 88.CW /mips/include . 89The compiler searches for include files 90first in the machine-dependent directory and then 91in the machine-independent directory. 92At the time of writing there are thirty-one machine-independent include 93files and two (per machine) machine-dependent ones: 94.CW <ureg.h> 95and 96.CW <u.h> . 97The first describes the layout of registers on the system stack, 98for use by the debugger. 99The second defines some 100architecture-dependent types such as 101.CW jmp_buf 102for 103.CW setjmp 104and the 105.CW va_arg 106and 107.CW va_list 108macros for handling arguments to variadic functions, 109as well as a set of 110.CW typedef 111abbreviations for 112.CW unsigned 113.CW short 114and so on. 115.PP 116Here is an excerpt from 117.CW /68020/include/u.h : 118.P1 119#define nil ((void*)0) 120typedef unsigned short ushort; 121typedef unsigned char uchar; 122typedef unsigned long ulong; 123typedef unsigned int uint; 124typedef signed char schar; 125typedef long long vlong; 126 127typedef long jmp_buf[2]; 128#define JMPBUFSP 0 129#define JMPBUFPC 1 130#define JMPBUFDPC 0 131.P2 132Plan 9 programs use 133.CW nil 134for the name of the zero-valued pointer. 135The type 136.CW vlong 137is the largest integer type available; on most architectures it 138is a 64-bit value. 139A couple of other types in 140.CW <u.h> 141are 142.CW u32int , 143which is guaranteed to have exactly 32 bits (a possibility on all the supported architectures) and 144.CW mpdigit , 145which is used by the multiprecision math package 146.CW <mp.h> . 147The 148.CW #define 149constants permit an architecture-independent (but compiler-dependent) 150implementation of stack-switching using 151.CW setjmp 152and 153.CW longjmp . 154.PP 155Every Plan 9 C program begins 156.P1 157#include <u.h> 158.P2 159because all the other installed header files use the 160.CW typedefs 161declared in 162.CW <u.h> . 163.PP 164In strict ANSI C, include files are grouped to collect related functions 165in a single file: one for string functions, one for memory functions, 166one for I/O, and none for system calls. 167Each include file is protected by an 168.CW #ifdef 169to guarantee its contents are seen by the compiler only once. 170Plan 9 takes a different approach. Other than a few include 171files that define external formats such as archives, the files in 172.CW /sys/include 173correspond to 174.I libraries. 175If a program is using a library, it includes the corresponding header. 176The default C library comprises string functions, memory functions, and 177so on, largely as in ANSI C, some formatted I/O routines, 178plus all the system calls and related functions. 179To use these functions, one must 180.CW #include 181the file 182.CW <libc.h> , 183which in turn must follow 184.CW <u.h> , 185to define their prototypes for the compiler. 186Here is the complete source to the traditional first C program: 187.P1 188#include <u.h> 189#include <libc.h> 190 191void 192main(void) 193{ 194 print("hello world\en"); 195 exits(0); 196} 197.P2 198The 199.CW print 200routine and its relatives 201.CW fprint 202and 203.CW sprint 204resemble the similarly-named functions in Standard I/O but are not 205attached to a specific I/O library. 206In Plan 9 207.CW main 208is not integer-valued; it should call 209.CW exits , 210which takes a string argument (or null; here ANSI C promotes the 0 to a 211.CW char* ). 212All these functions are, of course, documented in the Programmer's Manual. 213.PP 214To use 215.CW printf , 216.CW <stdio.h> 217must be included to define the function prototype for 218.CW printf : 219.P1 220#include <u.h> 221#include <libc.h> 222#include <stdio.h> 223 224void 225main(int argc, char *argv[]) 226{ 227 printf("%s: hello world; argc = %d\en", argv[0], argc); 228 exits(0); 229} 230.P2 231In practice, Standard I/O is not used much in Plan 9. I/O libraries are 232discussed in a later section of this document. 233.PP 234There are libraries for handling regular expressions, raster graphics, 235windows, and so on, and each has an associated include file. 236The manual for each library states which include files are needed. 237The files are not protected against multiple inclusion and themselves 238contain no nested 239.CW #includes . 240Instead the 241programmer is expected to sort out the requirements 242and to 243.CW #include 244the necessary files once at the top of each source file. In practice this is 245trivial: this way of handling include files is so straightforward 246that it is rare for a source file to contain more than half a dozen 247.CW #includes . 248.PP 249The compilers do their own register allocation so the 250.CW register 251keyword is ignored. 252For different reasons, 253.CW volatile 254and 255.CW const 256are also ignored. 257.PP 258To make it easier to share code with other systems, Plan 9 has a version 259of the compiler, 260.CW pcc , 261that provides the standard ANSI C preprocessor, headers, and libraries 262with POSIX extensions. 263.CW Pcc 264is recommended only 265when broad external portability is mandated. It compiles slower, 266produces slower code (it takes extra work to simulate POSIX on Plan 9), 267eliminates those parts of the Plan 9 interface 268not related to POSIX, and illustrates the clumsiness of an environment 269designed by committee. 270.CW Pcc 271is described in more detail in 272.I 273APE\(emThe ANSI/POSIX Environment, 274.R 275by Howard Trickey. 276.SH 277Process 278.PP 279Each CPU architecture supported by Plan 9 is identified by a single, 280arbitrary, alphanumeric character: 281.CW k 282for SPARC, 283.CW q 284for Motorola Power PC 630 and 640, 285.CW v 286for MIPS, 287.CW 0 288for little-endian MIPS, 289.CW 1 290for Motorola 68000, 291.CW 2 292for Motorola 68020 and 68040, 293.CW 5 294for Acorn ARM 7500, 295.CW 6 296for AMD 64, 297.CW 7 298for DEC Alpha, 299.CW 8 300for Intel 386, and 301.CW 9 302for AMD 29000. 303The character labels the support tools and files for that architecture. 304For instance, for the 68020 the compiler is 305.CW 2c , 306the assembler is 307.CW 2a , 308the link editor/loader is 309.CW 2l , 310the object files are suffixed 311.CW \&.2 , 312and the default name for an executable file is 313.CW 2.out . 314Before we can use the compiler we therefore need to know which 315machine we are compiling for. 316The next section explains how this decision is made; for the moment 317assume we are building 68020 binaries and make the mental substitution for 318.CW 2 319appropriate to the machine you are actually using. 320.PP 321To convert source to an executable binary is a two-step process. 322First run the compiler, 323.CW 2c , 324on the source, say 325.CW file.c , 326to generate an object file 327.CW file.2 . 328Then run the loader, 329.CW 2l , 330to generate an executable 331.CW 2.out 332that may be run (on a 680X0 machine): 333.P1 3342c file.c 3352l file.2 3362.out 337.P2 338The loader automatically links with whatever libraries the program 339needs, usually including the standard C library as defined by 340.CW <libc.h> . 341Of course the compiler and loader have lots of options, both familiar and new; 342see the manual for details. 343The compiler does not generate an executable automatically; 344the output of the compiler must be given to the loader. 345Since most compilation is done under the control of 346.CW mk 347(see below), this is rarely an inconvenience. 348.PP 349The distribution of work between the compiler and loader is unusual. 350The compiler integrates preprocessing, parsing, register allocation, 351code generation and some assembly. 352Combining these tasks in a single program is part of the reason for 353the compiler's efficiency. 354The loader does instruction selection, branch folding, 355instruction scheduling, 356and writes the final executable. 357There is no separate C preprocessor and no assembler in the usual pipeline. 358Instead the intermediate object file 359(here a 360.CW \&.2 361file) is a type of binary assembly language. 362The instructions in the intermediate format are not exactly those in 363the machine. For example, on the 68020 the object file may specify 364a MOVE instruction but the loader will decide just which variant of 365the MOVE instruction \(em MOVE immediate, MOVE quick, MOVE address, 366etc. \(em is most efficient. 367.PP 368The assembler, 369.CW 2a , 370is just a translator between the textual and binary 371representations of the object file format. 372It is not an assembler in the traditional sense. It has limited 373macro capabilities (the same as the integral C preprocessor in the compiler), 374clumsy syntax, and minimal error checking. For instance, the assembler 375will accept an instruction (such as memory-to-memory MOVE on the MIPS) that the 376machine does not actually support; only when the output of the assembler 377is passed to the loader will the error be discovered. 378The assembler is intended only for writing things that need access to instructions 379invisible from C, 380such as the machine-dependent 381part of an operating system; 382very little code in Plan 9 is in assembly language. 383.PP 384The compilers take an option 385.CW -S 386that causes them to print on their standard output the generated code 387in a format acceptable as input to the assemblers. 388This is of course merely a formatting of the 389data in the object file; therefore the assembler is just 390an 391ASCII-to-binary converter for this format. 392Other than the specific instructions, the input to the assemblers 393is largely architecture-independent; see 394``A Manual for the Plan 9 Assembler'', 395by Rob Pike, 396for more information. 397.PP 398The loader is an integral part of the compilation process. 399Each library header file contains a 400.CW #pragma 401that tells the loader the name of the associated archive; it is 402not necessary to tell the loader which libraries a program uses. 403The C run-time startup is found, by default, in the C library. 404The loader starts with an undefined 405symbol, 406.CW _main , 407that is resolved by pulling in the run-time startup code from the library. 408(The loader undefines 409.CW _mainp 410when profiling is enabled, to force loading of the profiling start-up 411instead.) 412.PP 413Unlike its counterpart on other systems, the Plan 9 loader rearranges 414data to optimize access. This means the order of variables in the 415loaded program is unrelated to its order in the source. 416Most programs don't care, but some assume that, for example, the 417variables declared by 418.P1 419int a; 420int b; 421.P2 422will appear at adjacent addresses in memory. On Plan 9, they won't. 423.SH 424Heterogeneity 425.PP 426When the system starts or a user logs in the environment is configured 427so the appropriate binaries are available in 428.CW /bin . 429The configuration process is controlled by an environment variable, 430.CW $cputype , 431with value such as 432.CW mips , 433.CW 68020 , 434.CW 386 , 435or 436.CW sparc . 437For each architecture there is a directory in the root, 438with the appropriate name, 439that holds the binary and library files for that architecture. 440Thus 441.CW /mips/lib 442contains the object code libraries for MIPS programs, 443.CW /mips/include 444holds MIPS-specific include files, and 445.CW /mips/bin 446has the MIPS binaries. 447These binaries are attached to 448.CW /bin 449at boot time by binding 450.CW /$cputype/bin 451to 452.CW /bin , 453so 454.CW /bin 455always contains the correct files. 456.PP 457The MIPS compiler, 458.CW vc , 459by definition 460produces object files for the MIPS architecture, 461regardless of the architecture of the machine on which the compiler is running. 462There is a version of 463.CW vc 464compiled for each architecture: 465.CW /mips/bin/vc , 466.CW /68020/bin/vc , 467.CW /sparc/bin/vc , 468and so on, 469each capable of producing MIPS object files regardless of the native 470instruction set. 471If one is running on a SPARC, 472.CW /sparc/bin/vc 473will compile programs for the MIPS; 474if one is running on machine 475.CW $cputype , 476.CW /$cputype/bin/vc 477will compile programs for the MIPS. 478.PP 479Because of the bindings that assemble 480.CW /bin , 481the shell always looks for a command, say 482.CW date , 483in 484.CW /bin 485and automatically finds the file 486.CW /$cputype/bin/date . 487Therefore the MIPS compiler is known as just 488.CW vc ; 489the shell will invoke 490.CW /bin/vc 491and that is guaranteed to be the version of the MIPS compiler 492appropriate for the machine running the command. 493Regardless of the architecture of the compiling machine, 494.CW /bin/vc 495is 496.I always 497the MIPS compiler. 498.PP 499Also, the output of 500.CW vc 501and 502.CW vl 503is completely independent of the machine type on which they are executed: 504.CW \&.v 505files compiled (with 506.CW vc ) 507on a SPARC may be linked (with 508.CW vl ) 509on a 386. 510(The resulting 511.CW v.out 512will run, of course, only on a MIPS.) 513Similarly, the MIPS libraries in 514.CW /mips/lib 515are suitable for loading with 516.CW vl 517on any machine; there is only one set of MIPS libraries, not one 518set for each architecture that supports the MIPS compiler. 519.SH 520Heterogeneity and \f(CWmk\fP 521.PP 522Most software on Plan 9 is compiled under the control of 523.CW mk , 524a descendant of 525.CW make 526that is documented in the Programmer's Manual. 527A convention used throughout the 528.CW mkfiles 529makes it easy to compile the source into binary suitable for any architecture. 530.PP 531The variable 532.CW $cputype 533is advisory: it reports the architecture of the current environment, and should 534not be modified. A second variable, 535.CW $objtype , 536is used to set which architecture is being 537.I compiled 538for. 539The value of 540.CW $objtype 541can be used by a 542.CW mkfile 543to configure the compilation environment. 544.PP 545In each machine's root directory there is a short 546.CW mkfile 547that defines a set of macros for the compiler, loader, etc. 548Here is 549.CW /mips/mkfile : 550.P1 551</sys/src/mkfile.proto 552 553CC=vc 554LD=vl 555O=v 556AS=va 557.P2 558The line 559.P1 560</sys/src/mkfile.proto 561.P2 562causes 563.CW mk 564to include the file 565.CW /sys/src/mkfile.proto , 566which contains general definitions: 567.P1 568# 569# common mkfile parameters shared by all architectures 570# 571 572OS=v486xq7 573CPUS=mips 386 power alpha 574CFLAGS=-FVw 575LEX=lex 576YACC=yacc 577MK=/bin/mk 578.P2 579.CW CC 580is obviously the compiler, 581.CW AS 582the assembler, and 583.CW LD 584the loader. 585.CW O 586is the suffix for the object files and 587.CW CPUS 588and 589.CW OS 590are used in special rules described below. 591.PP 592Here is a 593.CW mkfile 594to build the installed source for 595.CW sam : 596.P1 597</$objtype/mkfile 598OBJ=sam.$O address.$O buffer.$O cmd.$O disc.$O error.$O \e 599 file.$O io.$O list.$O mesg.$O moveto.$O multi.$O \e 600 plan9.$O rasp.$O regexp.$O string.$O sys.$O xec.$O 601 602$O.out: $OBJ 603 $LD $OBJ 604 605install: $O.out 606 cp $O.out /$objtype/bin/sam 607 608installall: 609 for(objtype in $CPUS) mk install 610 611%.$O: %.c 612 $CC $CFLAGS $stem.c 613 614$OBJ: sam.h errors.h mesg.h 615address.$O cmd.$O parse.$O xec.$O unix.$O: parse.h 616 617clean:V: 618 rm -f [$OS].out *.[$OS] y.tab.? 619.P2 620(The actual 621.CW mkfile 622imports most of its rules from other secondary files, but 623this example works and is not misleading.) 624The first line causes 625.CW mk 626to include the contents of 627.CW /$objtype/mkfile 628in the current 629.CW mkfile . 630If 631.CW $objtype 632is 633.CW mips , 634this inserts the MIPS macro definitions into the 635.CW mkfile . 636In this case the rule for 637.CW $O.out 638uses the MIPS tools to build 639.CW v.out . 640The 641.CW %.$O 642rule in the file uses 643.CW mk 's 644pattern matching facilities to convert the source files to the object 645files through the compiler. 646(The text of the rules is passed directly to the shell, 647.CW rc , 648without further translation. 649See the 650.CW mk 651manual if any of this is unfamiliar.) 652Because the default rule builds 653.CW $O.out 654rather than 655.CW sam , 656it is possible to maintain binaries for multiple machines in the 657same source directory without conflict. 658This is also, of course, why the output files from the various 659compilers and loaders 660have distinct names. 661.PP 662The rest of the 663.CW mkfile 664should be easy to follow; notice how the rules for 665.CW clean 666and 667.CW installall 668(that is, install versions for all architectures) use other macros 669defined in 670.CW /$objtype/mkfile . 671In Plan 9, 672.CW mkfiles 673for commands conventionally contain rules to 674.CW install 675(compile and install the version for 676.CW $objtype ), 677.CW installall 678(compile and install for all 679.CW $objtypes ), 680and 681.CW clean 682(remove all object files, binaries, etc.). 683.PP 684The 685.CW mkfile 686is easy to use. To build a MIPS binary, 687.CW v.out : 688.P1 689% objtype=mips 690% mk 691.P2 692To build and install a MIPS binary: 693.P1 694% objtype=mips 695% mk install 696.P2 697To build and install all versions: 698.P1 699% mk installall 700.P2 701These conventions make cross-compilation as easy to manage 702as traditional native compilation. 703Plan 9 programs compile and run without change on machines from 704large multiprocessors to laptops. For more information about this process, see 705``Plan 9 Mkfiles'', 706by Bob Flandrena. 707.SH 708Portability 709.PP 710Within Plan 9, it is painless to write portable programs, programs whose 711source is independent of the machine on which they execute. 712The operating system is fixed and the compiler, headers and libraries 713are constant so most of the stumbling blocks to portability are removed. 714Attention to a few details can avoid those that remain. 715.PP 716Plan 9 is a heterogeneous environment, so programs must 717.I expect 718that external files will be written by programs on machines of different 719architectures. 720The compilers, for instance, must handle without confusion 721object files written by other machines. 722The traditional approach to this problem is to pepper the source with 723.CW #ifdefs 724to turn byte-swapping on and off. 725Plan 9 takes a different approach: of the handful of machine-dependent 726.CW #ifdefs 727in all the source, almost all are deep in the libraries. 728Instead programs read and write files in a defined format, 729either (for low volume applications) as formatted text, or 730(for high volume applications) as binary in a known byte order. 731If the external data were written with the most significant 732byte first, the following code reads a 4-byte integer correctly 733regardless of the architecture of the executing machine (assuming 734an unsigned long holds 4 bytes): 735.P1 736ulong 737getlong(void) 738{ 739 ulong l; 740 741 l = (getchar()&0xFF)<<24; 742 l |= (getchar()&0xFF)<<16; 743 l |= (getchar()&0xFF)<<8; 744 l |= (getchar()&0xFF)<<0; 745 return l; 746} 747.P2 748Note that this code does not `swap' the bytes; instead it just reads 749them in the correct order. 750Variations of this code will handle any binary format 751and also avoid problems 752involving how structures are padded, how words are aligned, 753and other impediments to portability. 754Be aware, though, that extra care is needed to handle floating point data. 755.PP 756Efficiency hounds will argue that this method is unnecessarily slow and clumsy 757when the executing machine has the same byte order (and padding and alignment) 758as the data. 759The CPU cost of I/O processing 760is rarely the bottleneck for an application, however, 761and the gain in simplicity of porting and maintaining the code greatly outweighs 762the minor speed loss from handling data in this general way. 763This method is how the Plan 9 compilers, the window system, and even the file 764servers transmit data between programs. 765.PP 766To port programs beyond Plan 9, where the system interface is more variable, 767it is probably necessary to use 768.CW pcc 769and hope that the target machine supports ANSI C and POSIX. 770.SH 771I/O 772.PP 773The default C library, defined by the include file 774.CW <libc.h> , 775contains no buffered I/O package. 776It does have several entry points for printing formatted text: 777.CW print 778outputs text to the standard output, 779.CW fprint 780outputs text to a specified integer file descriptor, and 781.CW sprint 782places text in a character array. 783To access library routines for buffered I/O, a program must 784explicitly include the header file associated with an appropriate library. 785.PP 786The recommended I/O library, used by most Plan 9 utilities, is 787.CW bio 788(buffered I/O), defined by 789.CW <bio.h> . 790There also exists an implementation of ANSI Standard I/O, 791.CW stdio . 792.PP 793.CW Bio 794is small and efficient, particularly for buffer-at-a-time or 795line-at-a-time I/O. 796Even for character-at-a-time I/O, however, it is significantly faster than 797the Standard I/O library, 798.CW stdio . 799Its interface is compact and regular, although it lacks a few conveniences. 800The most noticeable is that one must explicitly define buffers for standard 801input and output; 802.CW bio 803does not predefine them. Here is a program to copy input to output a byte 804at a time using 805.CW bio : 806.P1 807#include <u.h> 808#include <libc.h> 809#include <bio.h> 810 811Biobuf bin; 812Biobuf bout; 813 814main(void) 815{ 816 int c; 817 818 Binit(&bin, 0, OREAD); 819 Binit(&bout, 1, OWRITE); 820 821 while((c=Bgetc(&bin)) != Beof) 822 Bputc(&bout, c); 823 exits(0); 824} 825.P2 826For peak performance, we could replace 827.CW Bgetc 828and 829.CW Bputc 830by their equivalent in-line macros 831.CW BGETC 832and 833.CW BPUTC 834but 835the performance gain would be modest. 836For more information on 837.CW bio , 838see the Programmer's Manual. 839.PP 840Perhaps the most dramatic difference in the I/O interface of Plan 9 from other 841systems' is that text is not ASCII. 842The format for 843text in Plan 9 is a byte-stream encoding of 16-bit characters. 844The character set is based on the Unicode Standard and is backward compatible with 845ASCII: 846characters with value 0 through 127 are the same in both sets. 847The 16-bit characters, called 848.I runes 849in Plan 9, are encoded using a representation called 850UTF, 851an encoding that is becoming accepted as a standard. 852(ISO calls it UTF-8; 853throughout Plan 9 it's just called 854UTF.) 855UTF 856defines multibyte sequences to 857represent character values from 0 to 65535. 858In 859UTF, 860character values up to 127 decimal, 7F hexadecimal, represent themselves, 861so straight 862ASCII 863files are also valid 864UTF. 865Also, 866UTF 867guarantees that bytes with values 0 to 127 (NUL to DEL, inclusive) 868will appear only when they represent themselves, so programs that read bytes 869looking for plain ASCII characters will continue to work. 870Any program that expects a one-to-one correspondence between bytes and 871characters will, however, need to be modified. 872An example is parsing file names. 873File names, like all text, are in 874UTF, 875so it is incorrect to search for a character in a string by 876.CW strchr(filename, 877.CW c) 878because the character might have a multi-byte encoding. 879The correct method is to call 880.CW utfrune(filename, 881.CW c) , 882defined in 883.I rune (2), 884which interprets the file name as a sequence of encoded characters 885rather than bytes. 886In fact, even when you know the character is a single byte 887that can represent only itself, 888it is safer to use 889.CW utfrune 890because that assumes nothing about the character set 891and its representation. 892.PP 893The library defines several symbols relevant to the representation of characters. 894Any byte with unsigned value less than 895.CW Runesync 896will not appear in any multi-byte encoding of a character. 897.CW Utfrune 898compares the character being searched against 899.CW Runesync 900to see if it is sufficient to call 901.CW strchr 902or if the byte stream must be interpreted. 903Any byte with unsigned value less than 904.CW Runeself 905is represented by a single byte with the same value. 906Finally, when errors are encountered converting 907to runes from a byte stream, the library returns the rune value 908.CW Runeerror 909and advances a single byte. This permits programs to find runes 910embedded in binary data. 911.PP 912.CW Bio 913includes routines 914.CW Bgetrune 915and 916.CW Bputrune 917to transform the external byte stream 918UTF 919format to and from 920internal 16-bit runes. 921Also, the 922.CW %s 923format to 924.CW print 925accepts 926UTF; 927.CW %c 928prints a character after narrowing it to 8 bits. 929The 930.CW %S 931format prints a null-terminated sequence of runes; 932.CW %C 933prints a character after narrowing it to 16 bits. 934For more information, see the Programmer's Manual, in particular 935.I utf (6) 936and 937.I rune (2), 938and the paper, 939``Hello world, or 940Καλημέρα κόσμε, or\ 941\f(Jpこんにちは 世界\f1'', 942by Rob Pike and 943Ken Thompson; 944there is not room for the full story here. 945.PP 946These issues affect the compiler in several ways. 947First, the C source is in 948UTF. 949ANSI says C variables are formed from 950ASCII 951alphanumerics, but comments and literal strings may contain any characters 952encoded in the native encoding, here 953UTF. 954The declaration 955.P1 956char *cp = "abcÿ"; 957.P2 958initializes the variable 959.CW cp 960to point to an array of bytes holding the 961UTF 962representation of the characters 963.CW abcÿ. 964The type 965.CW Rune 966is defined in 967.CW <u.h> 968to be 969.CW ushort , 970which is also the `wide character' type in the compiler. 971Therefore the declaration 972.P1 973Rune *rp = L"abcÿ"; 974.P2 975initializes the variable 976.CW rp 977to point to an array of unsigned short integers holding the 16-bit 978values of the characters 979.CW abcÿ . 980Note that in both these declarations the characters in the source 981that represent 982.CW "abcÿ" 983are the same; what changes is how those characters are represented 984in memory in the program. 985The following two lines: 986.P1 987print("%s\en", "abcÿ"); 988print("%S\en", L"abcÿ"); 989.P2 990produce the same 991UTF 992string on their output, the first by copying the bytes, the second 993by converting from runes to bytes. 994.PP 995In C, character constants are integers but narrowed through the 996.CW char 997type. 998The Unicode character 999.CW ÿ 1000has value 255, so if the 1001.CW char 1002type is signed, 1003the constant 1004.CW 'ÿ' 1005has value \-1 (which is equal to EOF). 1006On the other hand, 1007.CW L'ÿ' 1008narrows through the wide character type, 1009.CW ushort , 1010and therefore has value 255. 1011.PP 1012Finally, although it's not ANSI C, the Plan 9 C compilers 1013assume any character with value above 1014.CW Runeself 1015is an alphanumeric, 1016so α is a legal, if non-portable, variable name. 1017.SH 1018Arguments 1019.PP 1020Some macros are defined 1021in 1022.CW <libc.h> 1023for parsing the arguments to 1024.CW main() . 1025They are described in 1026.I ARG (2) 1027but are fairly self-explanatory. 1028There are four macros: 1029.CW ARGBEGIN 1030and 1031.CW ARGEND 1032are used to bracket a hidden 1033.CW switch 1034statement within which 1035.CW ARGC 1036returns the current option character (rune) being processed and 1037.CW ARGF 1038returns the argument to the option, as in the loader option 1039.CW -o 1040.CW file . 1041Here, for example, is the code at the beginning of 1042.CW main() 1043in 1044.CW ramfs.c 1045(see 1046.I ramfs (1)) 1047that cracks its arguments: 1048.P1 1049void 1050main(int argc, char *argv[]) 1051{ 1052 char *defmnt; 1053 int p[2]; 1054 int mfd[2]; 1055 int stdio = 0; 1056 1057 defmnt = "/tmp"; 1058 ARGBEGIN{ 1059 case 'i': 1060 defmnt = 0; 1061 stdio = 1; 1062 mfd[0] = 0; 1063 mfd[1] = 1; 1064 break; 1065 case 's': 1066 defmnt = 0; 1067 break; 1068 case 'm': 1069 defmnt = ARGF(); 1070 break; 1071 default: 1072 usage(); 1073 }ARGEND 1074.P2 1075.SH 1076Extensions 1077.PP 1078The compiler has several extensions to ANSI C, all of which are used 1079extensively in the system source. 1080First, 1081.I structure 1082.I displays 1083permit 1084.CW struct 1085expressions to be formed dynamically. 1086Given these declarations: 1087.P1 1088typedef struct Point Point; 1089typedef struct Rectangle Rectangle; 1090 1091struct Point 1092{ 1093 int x, y; 1094}; 1095 1096struct Rectangle 1097{ 1098 Point min, max; 1099}; 1100 1101Point p, q, add(Point, Point); 1102Rectangle r; 1103int x, y; 1104.P2 1105this assignment may appear anywhere an assignment is legal: 1106.P1 1107r = (Rectangle){add(p, q), (Point){x, y+3}}; 1108.P2 1109The syntax is the same as for initializing a structure but with 1110a leading cast. 1111.PP 1112If an 1113.I anonymous 1114.I structure 1115or 1116.I union 1117is declared within another structure or union, the members of the internal 1118structure or union are addressable without prefix in the outer structure. 1119This feature eliminates the clumsy naming of nested structures and, 1120particularly, unions. 1121For example, after these declarations, 1122.P1 1123struct Lock 1124{ 1125 int locked; 1126}; 1127 1128struct Node 1129{ 1130 int type; 1131 union{ 1132 double dval; 1133 double fval; 1134 long lval; 1135 }; /* anonymous union */ 1136 struct Lock; /* anonymous structure */ 1137} *node; 1138 1139void lock(struct Lock*); 1140.P2 1141one may refer to 1142.CW node->type , 1143.CW node->dval , 1144.CW node->fval , 1145.CW node->lval , 1146and 1147.CW node->locked . 1148Moreover, the address of a 1149.CW struct 1150.CW Node 1151may be used without a cast anywhere that the address of a 1152.CW struct 1153.CW Lock 1154is used, such as in argument lists. 1155The compiler automatically promotes the type and adjusts the address. 1156Thus one may invoke 1157.CW lock(node) . 1158.PP 1159Anonymous structures and unions may be accessed by type name 1160if (and only if) they are declared using a 1161.CW typedef 1162name. 1163For example, using the above declaration for 1164.CW Point , 1165one may declare 1166.P1 1167struct 1168{ 1169 int type; 1170 Point; 1171} p; 1172.P2 1173and refer to 1174.CW p.Point . 1175.PP 1176In the initialization of arrays, a number in square brackets before an 1177element sets the index for the initialization. For example, to initialize 1178some elements in 1179a table of function pointers indexed by 1180ASCII 1181character, 1182.P1 1183void percent(void), slash(void); 1184 1185void (*func[128])(void) = 1186{ 1187 ['%'] percent, 1188 ['/'] slash, 1189}; 1190.P2 1191.LP 1192A similar syntax allows one to initialize structure elements: 1193.P1 1194Point p = 1195{ 1196 .y 100, 1197 .x 200 1198}; 1199.P2 1200These initialization syntaxes were later added to ANSI C, with the addition of an 1201equals sign between the index or tag and the value. 1202The Plan 9 compiler accepts either form. 1203.PP 1204Finally, the declaration 1205.P1 1206extern register reg; 1207.P2 1208.I this "" ( 1209appearance of the register keyword is not ignored) 1210allocates a global register to hold the variable 1211.CW reg . 1212External registers must be used carefully: they need to be declared in 1213.I all 1214source files and libraries in the program to guarantee the register 1215is not allocated temporarily for other purposes. 1216Especially on machines with few registers, such as the i386, 1217it is easy to link accidentally with code that has already usurped 1218the global registers and there is no diagnostic when this happens. 1219Used wisely, though, external registers are powerful. 1220The Plan 9 operating system uses them to access per-process and 1221per-machine data structures on a multiprocessor. The storage class they provide 1222is hard to create in other ways. 1223.SH 1224The compile-time environment 1225.PP 1226The code generated by the compilers is `optimized' by default: 1227variables are placed in registers and peephole optimizations are 1228performed. 1229The compiler flag 1230.CW -N 1231disables these optimizations. 1232Registerization is done locally rather than throughout a function: 1233whether a variable occupies a register or 1234the memory location identified in the symbol 1235table depends on the activity of the variable and may change 1236throughout the life of the variable. 1237The 1238.CW -N 1239flag is rarely needed; 1240its main use is to simplify debugging. 1241There is no information in the symbol table to identify the 1242registerization of a variable, so 1243.CW -N 1244guarantees the variable is always where the symbol table says it is. 1245.PP 1246Another flag, 1247.CW -w , 1248turns 1249.I on 1250warnings about portability and problems detected in flow analysis. 1251Most code in Plan 9 is compiled with warnings enabled; 1252these warnings plus the type checking offered by function prototypes 1253provide most of the support of the Unix tool 1254.CW lint 1255more accurately and with less chatter. 1256Two of the warnings, 1257`used and not set' and `set and not used', are almost always accurate but 1258may be triggered spuriously by code with invisible control flow, 1259such as in routines that call 1260.CW longjmp . 1261The compiler statements 1262.P1 1263SET(v1); 1264USED(v2); 1265.P2 1266decorate the flow graph to silence the compiler. 1267Either statement accepts a comma-separated list of variables. 1268Use them carefully: they may silence real errors. 1269For the common case of unused parameters to a function, 1270leaving the name off the declaration silences the warnings. 1271That is, listing the type of a parameter but giving it no 1272associated variable name does the trick. 1273.SH 1274Debugging 1275.PP 1276There are two debuggers available on Plan 9. 1277The first, and older, is 1278.CW db , 1279a revision of Unix 1280.CW adb . 1281The other, 1282.CW acid , 1283is a source-level debugger whose commands are statements in 1284a true programming language. 1285.CW Acid 1286is the preferred debugger, but since it 1287borrows some elements of 1288.CW db , 1289notably the formats for displaying values, it is worth knowing a little bit about 1290.CW db . 1291.PP 1292Both debuggers support multiple architectures in a single program; that is, 1293the programs are 1294.CW db 1295and 1296.CW acid , 1297not for example 1298.CW vdb 1299and 1300.CW vacid . 1301They also support cross-architecture debugging comfortably: 1302one may debug a 68020 binary on a MIPS. 1303.PP 1304Imagine a program has crashed mysteriously: 1305.P1 1306% X11/X 1307Fatal server bug! 1308failed to create default stipple 1309X 106: suicide: sys: trap: fault read addr=0x0 pc=0x00105fb8 1310% 1311.P2 1312When a process dies on Plan 9 it hangs in the `broken' state 1313for debugging. 1314Attach a debugger to the process by naming its process id: 1315.P1 1316% acid 106 1317/proc/106/text:mips plan 9 executable 1318 1319/sys/lib/acid/port 1320/sys/lib/acid/mips 1321acid: 1322.P2 1323The 1324.CW acid 1325function 1326.CW stk() 1327reports the stack traceback: 1328.P1 1329acid: stk() 1330At pc:0x105fb8:abort+0x24 /sys/src/ape/lib/ap/stdio/abort.c:6 1331abort() /sys/src/ape/lib/ap/stdio/abort.c:4 1332 called from FatalError+#4e 1333 /sys/src/X/mit/server/dix/misc.c:421 1334FatalError(s9=#e02, s8=#4901d200, s7=#2, s6=#72701, s5=#1, 1335 s4=#7270d, s3=#6, s2=#12, s1=#ff37f1c, s0=#6, f=#7270f) 1336 /sys/src/X/mit/server/dix/misc.c:416 1337 called from gnotscreeninit+#4ce 1338 /sys/src/X/mit/server/ddx/gnot/gnot.c:792 1339gnotscreeninit(snum=#0, sc=#80db0) 1340 /sys/src/X/mit/server/ddx/gnot/gnot.c:766 1341 called from AddScreen+#16e 1342 /n/bootes/sys/src/X/mit/server/dix/main.c:610 1343AddScreen(pfnInit=0x0000129c,argc=0x00000001,argv=0x7fffffe4) 1344 /sys/src/X/mit/server/dix/main.c:530 1345 called from InitOutput+0x80 1346 /sys/src/X/mit/server/ddx/brazil/brddx.c:522 1347InitOutput(argc=0x00000001,argv=0x7fffffe4) 1348 /sys/src/X/mit/server/ddx/brazil/brddx.c:511 1349 called from main+0x294 1350 /sys/src/X/mit/server/dix/main.c:225 1351main(argc=0x00000001,argv=0x7fffffe4) 1352 /sys/src/X/mit/server/dix/main.c:136 1353 called from _main+0x24 1354 /sys/src/ape/lib/ap/mips/main9.s:8 1355.P2 1356The function 1357.CW lstk() 1358is similar but 1359also reports the values of local variables. 1360Note that the traceback includes full file names; this is a boon to debugging, 1361although it makes the output much noisier. 1362.PP 1363To use 1364.CW acid 1365well you will need to learn its input language; see the 1366``Acid Manual'', 1367by Phil Winterbottom, 1368for details. For simple debugging, however, the information in the manual page is 1369sufficient. In particular, it describes the most useful functions 1370for examining a process. 1371.PP 1372The compiler does not place 1373information describing the types of variables in the executable, 1374but a compile-time flag provides crude support for symbolic debugging. 1375The 1376.CW -a 1377flag to the compiler suppresses code generation 1378and instead emits source text in the 1379.CW acid 1380language to format and display data structure types defined in the program. 1381The easiest way to use this feature is to put a rule in the 1382.CW mkfile : 1383.P1 1384syms: main.$O 1385 $CC -a main.c > syms 1386.P2 1387Then from within 1388.CW acid , 1389.P1 1390acid: include("sourcedirectory/syms") 1391.P2 1392to read in the relevant definitions. 1393(For multi-file source, you need to be a little fancier; 1394see 1395.I 2c (1)). 1396This text includes, for each defined compound 1397type, a function with that name that may be called with the address of a structure 1398of that type to display its contents. 1399For example, if 1400.CW rect 1401is a global variable of type 1402.CW Rectangle , 1403one may execute 1404.P1 1405Rectangle(*rect) 1406.P2 1407to display it. 1408The 1409.CW * 1410(indirection) operator is necessary because 1411of the way 1412.CW acid 1413works: each global symbol in the program is defined as a variable by 1414.CW acid , 1415with value equal to the 1416.I address 1417of the symbol. 1418.PP 1419Another common technique is to write by hand special 1420.CW acid 1421code to define functions to aid debugging, initialize the debugger, and so on. 1422Conventionally, this is placed in a file called 1423.CW acid 1424in the source directory; it has a line 1425.P1 1426include("sourcedirectory/syms"); 1427.P2 1428to load the compiler-produced symbols. One may edit the compiler output directly but 1429it is wiser to keep the hand-generated 1430.CW acid 1431separate from the machine-generated. 1432.PP 1433To make things simple, the default rules in the system 1434.CW mkfiles 1435include entries to make 1436.CW foo.acid 1437from 1438.CW foo.c , 1439so one may use 1440.CW mk 1441to automate the production of 1442.CW acid 1443definitions for a given C source file. 1444.PP 1445There is much more to say here. See 1446.CW acid 1447manual page, the reference manual, or the paper 1448``Acid: A Debugger Built From A Language'', 1449also by Phil Winterbottom. 1450