1.HTML "How to Use the Plan 9 C Compiler 2.TL 3How to Use the Plan 9 C Compiler 4.AU 5Rob Pike 6rob@plan9.bell-labs.com 7.SH 8Introduction 9.PP 10The C compiler on Plan 9 is a wholly new program; in fact 11it was the first piece of software written for what would 12eventually become Plan 9 from Bell Labs. 13Programmers familiar with existing C compilers will find 14a number of differences in both the language the Plan 9 compiler 15accepts and in how the compiler is used. 16.PP 17The compiler is really a set of compilers, one for each 18architecture \(em MIPS, SPARC, Motorola 68020, Intel 386, etc. \(em 19that accept a dialect of ANSI C and efficiently produce 20fairly good code for the target machine. 21There is a packaging of the compiler that accepts strict ANSI C for 22a POSIX environment, but this document focuses on the 23native Plan 9 environment, that in which all the system source and 24almost all the utilities are written. 25.SH 26Source 27.PP 28The language accepted by the compilers is the core 1989 ANSI C language 29with some modest extensions, 30a greatly simplified preprocessor, 31a smaller library that includes system calls and related facilities, 32and a completely different structure for include files. 33.PP 34Official ANSI C accepts the old (K&R) style of declarations for 35functions; the Plan 9 compilers 36are more demanding. 37Without an explicit run-time flag 38.CW -B ) ( 39whose use is discouraged, the compilers insist 40on new-style function declarations, that is, prototypes for 41function arguments. 42The function declarations in the libraries' include files are 43all in the new style so the interfaces are checked at compile time. 44For C programmers who have not yet switched to function prototypes 45the clumsy syntax may seem repellent but the payoff in stronger typing 46is substantial. 47Those who wish to import existing software to Plan 9 are urged 48to use the opportunity to update their code. 49.PP 50The compilers include an integrated preprocessor that accepts the familiar 51.CW #include , 52.CW #define 53for macros both with and without arguments, 54.CW #undef , 55.CW #line , 56.CW #ifdef , 57.CW #ifndef , 58and 59.CW #endif . 60It 61supports neither 62.CW #if 63nor 64.CW ## , 65although it does 66honor a few 67.CW #pragmas . 68The 69.CW #if 70directive was omitted because it greatly complicates the 71preprocessor, is never necessary, and is usually abused. 72Conditional compilation in general makes code hard to understand; 73the Plan 9 source uses it sparingly. 74Also, because the compilers remove dead code, regular 75.CW if 76statements with constant conditions are more readable equivalents to many 77.CW #ifs . 78To compile imported code ineluctably fouled by 79.CW #if 80there is a separate command, 81.CW /bin/cpp , 82that implements the complete ANSI C preprocessor specification. 83.PP 84Include files fall into two groups: machine-dependent and machine-independent. 85The machine-independent files occupy the directory 86.CW /sys/include ; 87the others are placed in a directory appropriate to the machine, such as 88.CW /mips/include . 89The compiler searches for include files 90first in the machine-dependent directory and then 91in the machine-independent directory. 92At the time of writing there are thirty-one machine-independent include 93files and two (per machine) machine-dependent ones: 94.CW <ureg.h> 95and 96.CW <u.h> . 97The first describes the layout of registers on the system stack, 98for use by the debugger. 99The second defines some 100architecture-dependent types such as 101.CW jmp_buf 102for 103.CW setjmp 104and the 105.CW va_arg 106and 107.CW va_list 108macros for handling arguments to variadic functions, 109as well as a set of 110.CW typedef 111abbreviations for 112.CW unsigned 113.CW short 114and so on. 115.PP 116Here is an excerpt from 117.CW /68020/include/u.h : 118.P1 119#define nil ((void*)0) 120typedef unsigned short ushort; 121typedef unsigned char uchar; 122typedef unsigned long ulong; 123typedef unsigned int uint; 124typedef signed char schar; 125typedef long long vlong; 126 127typedef long jmp_buf[2]; 128#define JMPBUFSP 0 129#define JMPBUFPC 1 130#define JMPBUFDPC 0 131.P2 132Plan 9 programs use 133.CW nil 134for the name of the zero-valued pointer. 135The type 136.CW vlong 137is the largest integer type available; on most architectures it 138is a 64-bit value. 139A couple of other types in 140.CW <u.h> 141are 142.CW u32int , 143which is guaranteed to have exactly 32 bits (a possibility on all the supported architectures) and 144.CW mpdigit , 145which is used by the multiprecision math package 146.CW <mp.h> . 147The 148.CW #define 149constants permit an architecture-independent (but compiler-dependent) 150implementation of stack-switching using 151.CW setjmp 152and 153.CW longjmp . 154.PP 155Every Plan 9 C program begins 156.P1 157#include <u.h> 158.P2 159because all the other installed header files use the 160.CW typedefs 161declared in 162.CW <u.h> . 163.PP 164In strict ANSI C, include files are grouped to collect related functions 165in a single file: one for string functions, one for memory functions, 166one for I/O, and none for system calls. 167Each include file is protected by an 168.CW #ifdef 169to guarantee its contents are seen by the compiler only once. 170Plan 9 takes a different approach. Other than a few include 171files that define external formats such as archives, the files in 172.CW /sys/include 173correspond to 174.I libraries. 175If a program is using a library, it includes the corresponding header. 176The default C library comprises string functions, memory functions, and 177so on, largely as in ANSI C, some formatted I/O routines, 178plus all the system calls and related functions. 179To use these functions, one must 180.CW #include 181the file 182.CW <libc.h> , 183which in turn must follow 184.CW <u.h> , 185to define their prototypes for the compiler. 186Here is the complete source to the traditional first C program: 187.P1 188#include <u.h> 189#include <libc.h> 190 191void 192main(void) 193{ 194 print("hello world\en"); 195 exits(0); 196} 197.P2 198The 199.CW print 200routine and its relatives 201.CW fprint 202and 203.CW sprint 204resemble the similarly-named functions in Standard I/O but are not 205attached to a specific I/O library. 206In Plan 9 207.CW main 208is not integer-valued; it should call 209.CW exits , 210which takes a string argument (or null; here ANSI C promotes the 0 to a 211.CW char* ). 212All these functions are, of course, documented in the Programmer's Manual. 213.PP 214To use 215.CW printf , 216.CW <stdio.h> 217must be included to define the function prototype for 218.CW printf : 219.P1 220#include <u.h> 221#include <libc.h> 222#include <stdio.h> 223 224void 225main(int argc, char *argv[]) 226{ 227 printf("%s: hello world; argc = %d\en", argv[0], argc); 228 exits(0); 229} 230.P2 231In practice, Standard I/O is not used much in Plan 9. I/O libraries are 232discussed in a later section of this document. 233.PP 234There are libraries for handling regular expressions, raster graphics, 235windows, and so on, and each has an associated include file. 236The manual for each library states which include files are needed. 237The files are not protected against multiple inclusion and themselves 238contain no nested 239.CW #includes . 240Instead the 241programmer is expected to sort out the requirements 242and to 243.CW #include 244the necessary files once at the top of each source file. In practice this is 245trivial: this way of handling include files is so straightforward 246that it is rare for a source file to contain more than half a dozen 247.CW #includes . 248.PP 249The compilers do their own register allocation so the 250.CW register 251keyword is ignored. 252For different reasons, 253.CW volatile 254and 255.CW const 256are also ignored. 257.PP 258To make it easier to share code with other systems, Plan 9 has a version 259of the compiler, 260.CW pcc , 261that provides the standard ANSI C preprocessor, headers, and libraries 262with POSIX extensions. 263.CW Pcc 264is recommended only 265when broad external portability is mandated. It compiles slower, 266produces slower code (it takes extra work to simulate POSIX on Plan 9), 267eliminates those parts of the Plan 9 interface 268not related to POSIX, and illustrates the clumsiness of an environment 269designed by committee. 270.CW Pcc 271is described in more detail in 272.I 273APE\(emThe ANSI/POSIX Environment, 274.R 275by Howard Trickey. 276.SH 277Process 278.PP 279Each CPU architecture supported by Plan 9 is identified by a single, 280arbitrary, alphanumeric character: 281.CW k 282for SPARC, 283.CW q 284for Motorola Power PC 630 and 640, 285.CW v 286for MIPS, 287.CW 0 288for little-endian MIPS, 289.CW 1 290for Motorola 68000, 291.CW 2 292for Motorola 68020 and 68040, 293.CW 5 294for Acorn ARM 7500, 295.CW 6 296for AMD 64, 297.CW 7 298for DEC Alpha, 299.CW 8 300for Intel 386, and 301.CW 9 302for AMD 29000. 303The character labels the support tools and files for that architecture. 304For instance, for the 68020 the compiler is 305.CW 2c , 306the assembler is 307.CW 2a , 308the link editor/loader is 309.CW 2l , 310the object files are suffixed 311.CW \&.2 , 312and the default name for an executable file is 313.CW 2.out . 314Before we can use the compiler we therefore need to know which 315machine we are compiling for. 316The next section explains how this decision is made; for the moment 317assume we are building 68020 binaries and make the mental substitution for 318.CW 2 319appropriate to the machine you are actually using. 320.PP 321To convert source to an executable binary is a two-step process. 322First run the compiler, 323.CW 2c , 324on the source, say 325.CW file.c , 326to generate an object file 327.CW file.2 . 328Then run the loader, 329.CW 2l , 330to generate an executable 331.CW 2.out 332that may be run (on a 680X0 machine): 333.P1 3342c file.c 3352l file.2 3362.out 337.P2 338The loader automatically links with whatever libraries the program 339needs, usually including the standard C library as defined by 340.CW <libc.h> . 341Of course the compiler and loader have lots of options, both familiar and new; 342see the manual for details. 343The compiler does not generate an executable automatically; 344the output of the compiler must be given to the loader. 345Since most compilation is done under the control of 346.CW mk 347(see below), this is rarely an inconvenience. 348.PP 349The distribution of work between the compiler and loader is unusual. 350The compiler integrates preprocessing, parsing, register allocation, 351code generation and some assembly. 352Combining these tasks in a single program is part of the reason for 353the compiler's efficiency. 354The loader does instruction selection, branch folding, 355instruction scheduling, 356and writes the final executable. 357There is no separate C preprocessor and no assembler in the usual pipeline. 358Instead the intermediate object file 359(here a 360.CW \&.2 361file) is a type of binary assembly language. 362The instructions in the intermediate format are not exactly those in 363the machine. For example, on the 68020 the object file may specify 364a MOVE instruction but the loader will decide just which variant of 365the MOVE instruction \(em MOVE immediate, MOVE quick, MOVE address, 366etc. \(em is most efficient. 367.PP 368The assembler, 369.CW 2a , 370is just a translator between the textual and binary 371representations of the object file format. 372It is not an assembler in the traditional sense. It has limited 373macro capabilities (the same as the integral C preprocessor in the compiler), 374clumsy syntax, and minimal error checking. For instance, the assembler 375will accept an instruction (such as memory-to-memory MOVE on the MIPS) that the 376machine does not actually support; only when the output of the assembler 377is passed to the loader will the error be discovered. 378The assembler is intended only for writing things that need access to instructions 379invisible from C, 380such as the machine-dependent 381part of an operating system; 382very little code in Plan 9 is in assembly language. 383.PP 384The compilers take an option 385.CW -S 386that causes them to print on their standard output the generated code 387in a format acceptable as input to the assemblers. 388This is of course merely a formatting of the 389data in the object file; therefore the assembler is just 390an 391ASCII-to-binary converter for this format. 392Other than the specific instructions, the input to the assemblers 393is largely architecture-independent; see 394``A Manual for the Plan 9 Assembler'', 395by Rob Pike, 396for more information. 397.PP 398The loader is an integral part of the compilation process. 399Each library header file contains a 400.CW #pragma 401that tells the loader the name of the associated archive; it is 402not necessary to tell the loader which libraries a program uses. 403The C run-time startup is found, by default, in the C library. 404The loader starts with an undefined 405symbol, 406.CW _main , 407that is resolved by pulling in the run-time startup code from the library. 408(The loader undefines 409.CW _mainp 410when profiling is enabled, to force loading of the profiling start-up 411instead.) 412.PP 413Unlike its counterpart on other systems, the Plan 9 loader rearranges 414data to optimize access. This means the order of variables in the 415loaded program is unrelated to its order in the source. 416Most programs don't care, but some assume that, for example, the 417variables declared by 418.P1 419int a; 420int b; 421.P2 422will appear at adjacent addresses in memory. On Plan 9, they won't. 423.SH 424Heterogeneity 425.PP 426When the system starts or a user logs in the environment is configured 427so the appropriate binaries are available in 428.CW /bin . 429The configuration process is controlled by an environment variable, 430.CW $cputype , 431with value such as 432.CW mips , 433.CW 68020 , 434.CW 386 , 435or 436.CW sparc . 437For each architecture there is a directory in the root, 438with the appropriate name, 439that holds the binary and library files for that architecture. 440Thus 441.CW /mips/lib 442contains the object code libraries for MIPS programs, 443.CW /mips/include 444holds MIPS-specific include files, and 445.CW /mips/bin 446has the MIPS binaries. 447These binaries are attached to 448.CW /bin 449at boot time by binding 450.CW /$cputype/bin 451to 452.CW /bin , 453so 454.CW /bin 455always contains the correct files. 456.PP 457The MIPS compiler, 458.CW vc , 459by definition 460produces object files for the MIPS architecture, 461regardless of the architecture of the machine on which the compiler is running. 462There is a version of 463.CW vc 464compiled for each architecture: 465.CW /mips/bin/vc , 466.CW /68020/bin/vc , 467.CW /sparc/bin/vc , 468and so on, 469each capable of producing MIPS object files regardless of the native 470instruction set. 471If one is running on a SPARC, 472.CW /sparc/bin/vc 473will compile programs for the MIPS; 474if one is running on machine 475.CW $cputype , 476.CW /$cputype/bin/vc 477will compile programs for the MIPS. 478.PP 479Because of the bindings that assemble 480.CW /bin , 481the shell always looks for a command, say 482.CW date , 483in 484.CW /bin 485and automatically finds the file 486.CW /$cputype/bin/date . 487Therefore the MIPS compiler is known as just 488.CW vc ; 489the shell will invoke 490.CW /bin/vc 491and that is guaranteed to be the version of the MIPS compiler 492appropriate for the machine running the command. 493Regardless of the architecture of the compiling machine, 494.CW /bin/vc 495is 496.I always 497the MIPS compiler. 498.PP 499Also, the output of 500.CW vc 501and 502.CW vl 503is completely independent of the machine type on which they are executed: 504.CW \&.v 505files compiled (with 506.CW vc ) 507on a SPARC may be linked (with 508.CW vl ) 509on a 386. 510(The resulting 511.CW v.out 512will run, of course, only on a MIPS.) 513Similarly, the MIPS libraries in 514.CW /mips/lib 515are suitable for loading with 516.CW vl 517on any machine; there is only one set of MIPS libraries, not one 518set for each architecture that supports the MIPS compiler. 519.SH 520Heterogeneity and \f(CWmk\fP 521.PP 522Most software on Plan 9 is compiled under the control of 523.CW mk , 524a descendant of 525.CW make 526that is documented in the Programmer's Manual. 527A convention used throughout the 528.CW mkfiles 529makes it easy to compile the source into binary suitable for any architecture. 530.PP 531The variable 532.CW $cputype 533is advisory: it reports the architecture of the current environment, and should 534not be modified. A second variable, 535.CW $objtype , 536is used to set which architecture is being 537.I compiled 538for. 539The value of 540.CW $objtype 541can be used by a 542.CW mkfile 543to configure the compilation environment. 544.PP 545In each machine's root directory there is a short 546.CW mkfile 547that defines a set of macros for the compiler, loader, etc. 548Here is 549.CW /mips/mkfile : 550.P1 551</sys/src/mkfile.proto 552 553CC=vc 554LD=vl 555O=v 556AS=va 557.P2 558The line 559.P1 560</sys/src/mkfile.proto 561.P2 562causes 563.CW mk 564to include the file 565.CW /sys/src/mkfile.proto , 566which contains general definitions: 567.P1 568# 569# common mkfile parameters shared by all architectures 570# 571 572OS=v486xq7 573CPUS=mips 386 power alpha 574CFLAGS=-FVw 575LEX=lex 576YACC=yacc 577MK=/bin/mk 578.P2 579.CW CC 580is obviously the compiler, 581.CW AS 582the assembler, and 583.CW LD 584the loader. 585.CW O 586is the suffix for the object files and 587.CW CPUS 588and 589.CW OS 590are used in special rules described below. 591.PP 592Here is a 593.CW mkfile 594to build the installed source for 595.CW sam : 596.P1 597</$objtype/mkfile 598OBJ=sam.$O address.$O buffer.$O cmd.$O disc.$O error.$O \e 599 file.$O io.$O list.$O mesg.$O moveto.$O multi.$O \e 600 plan9.$O rasp.$O regexp.$O string.$O sys.$O xec.$O 601 602$O.out: $OBJ 603 $LD $OBJ 604 605install: $O.out 606 cp $O.out /$objtype/bin/sam 607 608installall: 609 for(objtype in $CPUS) mk install 610 611%.$O: %.c 612 $CC $CFLAGS $stem.c 613 614$OBJ: sam.h errors.h mesg.h 615address.$O cmd.$O parse.$O xec.$O unix.$O: parse.h 616 617clean:V: 618 rm -f [$OS].out *.[$OS] y.tab.? 619.P2 620(The actual 621.CW mkfile 622imports most of its rules from other secondary files, but 623this example works and is not misleading.) 624The first line causes 625.CW mk 626to include the contents of 627.CW /$objtype/mkfile 628in the current 629.CW mkfile . 630If 631.CW $objtype 632is 633.CW mips , 634this inserts the MIPS macro definitions into the 635.CW mkfile . 636In this case the rule for 637.CW $O.out 638uses the MIPS tools to build 639.CW v.out . 640The 641.CW %.$O 642rule in the file uses 643.CW mk 's 644pattern matching facilities to convert the source files to the object 645files through the compiler. 646(The text of the rules is passed directly to the shell, 647.CW rc , 648without further translation. 649See the 650.CW mk 651manual if any of this is unfamiliar.) 652Because the default rule builds 653.CW $O.out 654rather than 655.CW sam , 656it is possible to maintain binaries for multiple machines in the 657same source directory without conflict. 658This is also, of course, why the output files from the various 659compilers and loaders 660have distinct names. 661.PP 662The rest of the 663.CW mkfile 664should be easy to follow; notice how the rules for 665.CW clean 666and 667.CW installall 668(that is, install versions for all architectures) use other macros 669defined in 670.CW /$objtype/mkfile . 671In Plan 9, 672.CW mkfiles 673for commands conventionally contain rules to 674.CW install 675(compile and install the version for 676.CW $objtype ), 677.CW installall 678(compile and install for all 679.CW $objtypes ), 680and 681.CW clean 682(remove all object files, binaries, etc.). 683.PP 684The 685.CW mkfile 686is easy to use. To build a MIPS binary, 687.CW v.out : 688.P1 689% objtype=mips 690% mk 691.P2 692To build and install a MIPS binary: 693.P1 694% objtype=mips 695% mk install 696.P2 697To build and install all versions: 698.P1 699% mk installall 700.P2 701These conventions make cross-compilation as easy to manage 702as traditional native compilation. 703Plan 9 programs compile and run without change on machines from 704large multiprocessors to laptops. For more information about this process, see 705``Plan 9 Mkfiles'', 706by Bob Flandrena. 707.SH 708Portability 709.PP 710Within Plan 9, it is painless to write portable programs, programs whose 711source is independent of the machine on which they execute. 712The operating system is fixed and the compiler, headers and libraries 713are constant so most of the stumbling blocks to portability are removed. 714Attention to a few details can avoid those that remain. 715.PP 716Plan 9 is a heterogeneous environment, so programs must 717.I expect 718that external files will be written by programs on machines of different 719architectures. 720The compilers, for instance, must handle without confusion 721object files written by other machines. 722The traditional approach to this problem is to pepper the source with 723.CW #ifdefs 724to turn byte-swapping on and off. 725Plan 9 takes a different approach: of the handful of machine-dependent 726.CW #ifdefs 727in all the source, almost all are deep in the libraries. 728Instead programs read and write files in a defined format, 729either (for low volume applications) as formatted text, or 730(for high volume applications) as binary in a known byte order. 731If the external data were written with the most significant 732byte first, the following code reads a 4-byte integer correctly 733regardless of the architecture of the executing machine (assuming 734an unsigned long holds 4 bytes): 735.P1 736ulong 737getlong(void) 738{ 739 ulong l; 740 741 l = (getchar()&0xFF)<<24; 742 l |= (getchar()&0xFF)<<16; 743 l |= (getchar()&0xFF)<<8; 744 l |= (getchar()&0xFF)<<0; 745 return l; 746} 747.P2 748Note that this code does not `swap' the bytes; instead it just reads 749them in the correct order. 750Variations of this code will handle any binary format 751and also avoid problems 752involving how structures are padded, how words are aligned, 753and other impediments to portability. 754Be aware, though, that extra care is needed to handle floating point data. 755.PP 756Efficiency hounds will argue that this method is unnecessarily slow and clumsy 757when the executing machine has the same byte order (and padding and alignment) 758as the data. 759The CPU cost of I/O processing 760is rarely the bottleneck for an application, however, 761and the gain in simplicity of porting and maintaining the code greatly outweighs 762the minor speed loss from handling data in this general way. 763This method is how the Plan 9 compilers, the window system, and even the file 764servers transmit data between programs. 765.PP 766To port programs beyond Plan 9, where the system interface is more variable, 767it is probably necessary to use 768.CW pcc 769and hope that the target machine supports ANSI C and POSIX. 770.SH 771I/O 772.PP 773The default C library, defined by the include file 774.CW <libc.h> , 775contains no buffered I/O package. 776It does have several entry points for printing formatted text: 777.CW print 778outputs text to the standard output, 779.CW fprint 780outputs text to a specified integer file descriptor, and 781.CW sprint 782places text in a character array. 783To access library routines for buffered I/O, a program must 784explicitly include the header file associated with an appropriate library. 785.PP 786The recommended I/O library, used by most Plan 9 utilities, is 787.CW bio 788(buffered I/O), defined by 789.CW <bio.h> . 790There also exists an implementation of ANSI Standard I/O, 791.CW stdio . 792.PP 793.CW Bio 794is small and efficient, particularly for buffer-at-a-time or 795line-at-a-time I/O. 796Even for character-at-a-time I/O, however, it is significantly faster than 797the Standard I/O library, 798.CW stdio . 799Its interface is compact and regular, although it lacks a few conveniences. 800The most noticeable is that one must explicitly define buffers for standard 801input and output; 802.CW bio 803does not predefine them. Here is a program to copy input to output a byte 804at a time using 805.CW bio : 806.P1 807#include <u.h> 808#include <libc.h> 809#include <bio.h> 810 811Biobuf bin; 812Biobuf bout; 813 814main(void) 815{ 816 int c; 817 818 Binit(&bin, 0, OREAD); 819 Binit(&bout, 1, OWRITE); 820 821 while((c=Bgetc(&bin)) != Beof) 822 Bputc(&bout, c); 823 exits(0); 824} 825.P2 826For peak performance, we could replace 827.CW Bgetc 828and 829.CW Bputc 830by their equivalent in-line macros 831.CW BGETC 832and 833.CW BPUTC 834but 835the performance gain would be modest. 836For more information on 837.CW bio , 838see the Programmer's Manual. 839.PP 840Perhaps the most dramatic difference in the I/O interface of Plan 9 from other 841systems' is that text is not ASCII. 842The format for 843text in Plan 9 is a byte-stream encoding of 16-bit characters. 844The character set is based on the Unicode Standard and is backward compatible with 845ASCII: 846characters with value 0 through 127 are the same in both sets. 847The 16-bit characters, called 848.I runes 849in Plan 9, are encoded using a representation called 850UTF, 851an encoding that is becoming accepted as a standard. 852(ISO calls it UTF-8; 853throughout Plan 9 it's just called 854UTF.) 855UTF 856defines multibyte sequences to 857represent character values from 0 to 65535. 858In 859UTF, 860character values up to 127 decimal, 7F hexadecimal, represent themselves, 861so straight 862ASCII 863files are also valid 864UTF. 865Also, 866UTF 867guarantees that bytes with values 0 to 127 (NUL to DEL, inclusive) 868will appear only when they represent themselves, so programs that read bytes 869looking for plain ASCII characters will continue to work. 870Any program that expects a one-to-one correspondence between bytes and 871characters will, however, need to be modified. 872An example is parsing file names. 873File names, like all text, are in 874UTF, 875so it is incorrect to search for a character in a string by 876.CW strchr(filename, 877.CW c) 878because the character might have a multi-byte encoding. 879The correct method is to call 880.CW utfrune(filename, 881.CW c) , 882defined in 883.I rune (2), 884which interprets the file name as a sequence of encoded characters 885rather than bytes. 886In fact, even when you know the character is a single byte 887that can represent only itself, 888it is safer to use 889.CW utfrune 890because that assumes nothing about the character set 891and its representation. 892.PP 893The library defines several symbols relevant to the representation of characters. 894Any byte with unsigned value less than 895.CW Runesync 896will not appear in any multi-byte encoding of a character. 897.CW Utfrune 898compares the character being searched against 899.CW Runesync 900to see if it is sufficient to call 901.CW strchr 902or if the byte stream must be interpreted. 903Any byte with unsigned value less than 904.CW Runeself 905is represented by a single byte with the same value. 906Finally, when errors are encountered converting 907to runes from a byte stream, the library returns the rune value 908.CW Runeerror 909and advances a single byte. This permits programs to find runes 910embedded in binary data. 911.PP 912.CW Bio 913includes routines 914.CW Bgetrune 915and 916.CW Bputrune 917to transform the external byte stream 918UTF 919format to and from 920internal 16-bit runes. 921Also, the 922.CW %s 923format to 924.CW print 925accepts 926UTF; 927.CW %c 928prints a character after narrowing it to 8 bits. 929The 930.CW %S 931format prints a null-terminated sequence of runes; 932.CW %C 933prints a character after narrowing it to 16 bits. 934For more information, see the Programmer's Manual, in particular 935.I utf (6) 936and 937.I rune (2), 938and the paper, 939``Hello world, or 940Καλημέρα κόσμε, or\ 941\f(Jpこんにちは 世界\f1'', 942by Rob Pike and 943Ken Thompson; 944there is not room for the full story here. 945.PP 946These issues affect the compiler in several ways. 947First, the C source is in 948UTF. 949ANSI says C variables are formed from 950ASCII 951alphanumerics, but comments and literal strings may contain any characters 952encoded in the native encoding, here 953UTF. 954The declaration 955.P1 956char *cp = "abcÿ"; 957.P2 958initializes the variable 959.CW cp 960to point to an array of bytes holding the 961UTF 962representation of the characters 963.CW abcÿ. 964The type 965.CW Rune 966is defined in 967.CW <u.h> 968to be 969.CW ushort , 970which is also the `wide character' type in the compiler. 971Therefore the declaration 972.P1 973Rune *rp = L"abcÿ"; 974.P2 975initializes the variable 976.CW rp 977to point to an array of unsigned short integers holding the 16-bit 978values of the characters 979.CW abcÿ . 980Note that in both these declarations the characters in the source 981that represent 982.CW "abcÿ" 983are the same; what changes is how those characters are represented 984in memory in the program. 985The following two lines: 986.P1 987print("%s\en", "abcÿ"); 988print("%S\en", L"abcÿ"); 989.P2 990produce the same 991UTF 992string on their output, the first by copying the bytes, the second 993by converting from runes to bytes. 994.PP 995In C, character constants are integers but narrowed through the 996.CW char 997type. 998The Unicode character 999.CW ÿ 1000has value 255, so if the 1001.CW char 1002type is signed, 1003the constant 1004.CW 'ÿ' 1005has value \-1 (which is equal to EOF). 1006On the other hand, 1007.CW L'ÿ' 1008narrows through the wide character type, 1009.CW ushort , 1010and therefore has value 255. 1011.PP 1012Finally, although it's not ANSI C, the Plan 9 C compilers 1013assume any character with value above 1014.CW Runeself 1015is an alphanumeric, 1016so α is a legal, if non-portable, variable name. 1017.SH 1018Arguments 1019.PP 1020Some macros are defined 1021in 1022.CW <libc.h> 1023for parsing the arguments to 1024.CW main() . 1025They are described in 1026.I ARG (2) 1027but are fairly self-explanatory. 1028There are four macros: 1029.CW ARGBEGIN 1030and 1031.CW ARGEND 1032are used to bracket a hidden 1033.CW switch 1034statement within which 1035.CW ARGC 1036returns the current option character (rune) being processed and 1037.CW ARGF 1038returns the argument to the option, as in the loader option 1039.CW -o 1040.CW file . 1041Here, for example, is the code at the beginning of 1042.CW main() 1043in 1044.CW ramfs.c 1045(see 1046.I ramfs (1)) 1047that cracks its arguments: 1048.P1 1049void 1050main(int argc, char *argv[]) 1051{ 1052 char *defmnt; 1053 int p[2]; 1054 int mfd[2]; 1055 int stdio = 0; 1056 1057 defmnt = "/tmp"; 1058 ARGBEGIN{ 1059 case 'i': 1060 defmnt = 0; 1061 stdio = 1; 1062 mfd[0] = 0; 1063 mfd[1] = 1; 1064 break; 1065 case 's': 1066 defmnt = 0; 1067 break; 1068 case 'm': 1069 defmnt = ARGF(); 1070 break; 1071 default: 1072 usage(); 1073 }ARGEND 1074.P2 1075.SH 1076Extensions 1077.PP 1078The compiler has several extensions to 1989 ANSI C, all of which are used 1079extensively in the system source. 1080Some of these have been adopted in later ANSI C standards. 1081First, 1082.I structure 1083.I displays 1084permit 1085.CW struct 1086expressions to be formed dynamically. 1087Given these declarations: 1088.P1 1089typedef struct Point Point; 1090typedef struct Rectangle Rectangle; 1091 1092struct Point 1093{ 1094 int x, y; 1095}; 1096 1097struct Rectangle 1098{ 1099 Point min, max; 1100}; 1101 1102Point p, q, add(Point, Point); 1103Rectangle r; 1104int x, y; 1105.P2 1106this assignment may appear anywhere an assignment is legal: 1107.P1 1108r = (Rectangle){add(p, q), (Point){x, y+3}}; 1109.P2 1110The syntax is the same as for initializing a structure but with 1111a leading cast. 1112.PP 1113If an 1114.I anonymous 1115.I structure 1116or 1117.I union 1118is declared within another structure or union, the members of the internal 1119structure or union are addressable without prefix in the outer structure. 1120This feature eliminates the clumsy naming of nested structures and, 1121particularly, unions. 1122For example, after these declarations, 1123.P1 1124struct Lock 1125{ 1126 int locked; 1127}; 1128 1129struct Node 1130{ 1131 int type; 1132 union{ 1133 double dval; 1134 double fval; 1135 long lval; 1136 }; /* anonymous union */ 1137 struct Lock; /* anonymous structure */ 1138} *node; 1139 1140void lock(struct Lock*); 1141.P2 1142one may refer to 1143.CW node->type , 1144.CW node->dval , 1145.CW node->fval , 1146.CW node->lval , 1147and 1148.CW node->locked . 1149Moreover, the address of a 1150.CW struct 1151.CW Node 1152may be used without a cast anywhere that the address of a 1153.CW struct 1154.CW Lock 1155is used, such as in argument lists. 1156The compiler automatically promotes the type and adjusts the address. 1157Thus one may invoke 1158.CW lock(node) . 1159.PP 1160Anonymous structures and unions may be accessed by type name 1161if (and only if) they are declared using a 1162.CW typedef 1163name. 1164For example, using the above declaration for 1165.CW Point , 1166one may declare 1167.P1 1168struct 1169{ 1170 int type; 1171 Point; 1172} p; 1173.P2 1174and refer to 1175.CW p.Point . 1176.PP 1177In the initialization of arrays, a number in square brackets before an 1178element sets the index for the initialization. For example, to initialize 1179some elements in 1180a table of function pointers indexed by 1181ASCII 1182character, 1183.P1 1184void percent(void), slash(void); 1185 1186void (*func[128])(void) = 1187{ 1188 ['%'] percent, 1189 ['/'] slash, 1190}; 1191.P2 1192.LP 1193A similar syntax allows one to initialize structure elements: 1194.P1 1195Point p = 1196{ 1197 .y 100, 1198 .x 200 1199}; 1200.P2 1201These initialization syntaxes were later added to ANSI C, with the addition of an 1202equals sign between the index or tag and the value. 1203The Plan 9 compiler accepts either form. 1204.PP 1205Finally, the declaration 1206.P1 1207extern register reg; 1208.P2 1209.I this "" ( 1210appearance of the register keyword is not ignored) 1211allocates a global register to hold the variable 1212.CW reg . 1213External registers must be used carefully: they need to be declared in 1214.I all 1215source files and libraries in the program to guarantee the register 1216is not allocated temporarily for other purposes. 1217Especially on machines with few registers, such as the i386, 1218it is easy to link accidentally with code that has already usurped 1219the global registers and there is no diagnostic when this happens. 1220Used wisely, though, external registers are powerful. 1221The Plan 9 operating system uses them to access per-process and 1222per-machine data structures on a multiprocessor. The storage class they provide 1223is hard to create in other ways. 1224.SH 1225The compile-time environment 1226.PP 1227The code generated by the compilers is `optimized' by default: 1228variables are placed in registers and peephole optimizations are 1229performed. 1230The compiler flag 1231.CW -N 1232disables these optimizations. 1233Registerization is done locally rather than throughout a function: 1234whether a variable occupies a register or 1235the memory location identified in the symbol 1236table depends on the activity of the variable and may change 1237throughout the life of the variable. 1238The 1239.CW -N 1240flag is rarely needed; 1241its main use is to simplify debugging. 1242There is no information in the symbol table to identify the 1243registerization of a variable, so 1244.CW -N 1245guarantees the variable is always where the symbol table says it is. 1246.PP 1247Another flag, 1248.CW -w , 1249turns 1250.I on 1251warnings about portability and problems detected in flow analysis. 1252Most code in Plan 9 is compiled with warnings enabled; 1253these warnings plus the type checking offered by function prototypes 1254provide most of the support of the Unix tool 1255.CW lint 1256more accurately and with less chatter. 1257Two of the warnings, 1258`used and not set' and `set and not used', are almost always accurate but 1259may be triggered spuriously by code with invisible control flow, 1260such as in routines that call 1261.CW longjmp . 1262The compiler statements 1263.P1 1264SET(v1); 1265USED(v2); 1266.P2 1267decorate the flow graph to silence the compiler. 1268Either statement accepts a comma-separated list of variables. 1269Use them carefully: they may silence real errors. 1270For the common case of unused parameters to a function, 1271leaving the name off the declaration silences the warnings. 1272That is, listing the type of a parameter but giving it no 1273associated variable name does the trick. 1274.SH 1275Debugging 1276.PP 1277There are two debuggers available on Plan 9. 1278The first, and older, is 1279.CW db , 1280a revision of Unix 1281.CW adb . 1282The other, 1283.CW acid , 1284is a source-level debugger whose commands are statements in 1285a true programming language. 1286.CW Acid 1287is the preferred debugger, but since it 1288borrows some elements of 1289.CW db , 1290notably the formats for displaying values, it is worth knowing a little bit about 1291.CW db . 1292.PP 1293Both debuggers support multiple architectures in a single program; that is, 1294the programs are 1295.CW db 1296and 1297.CW acid , 1298not for example 1299.CW vdb 1300and 1301.CW vacid . 1302They also support cross-architecture debugging comfortably: 1303one may debug a 68020 binary on a MIPS. 1304.PP 1305Imagine a program has crashed mysteriously: 1306.P1 1307% X11/X 1308Fatal server bug! 1309failed to create default stipple 1310X 106: suicide: sys: trap: fault read addr=0x0 pc=0x00105fb8 1311% 1312.P2 1313When a process dies on Plan 9 it hangs in the `broken' state 1314for debugging. 1315Attach a debugger to the process by naming its process id: 1316.P1 1317% acid 106 1318/proc/106/text:mips plan 9 executable 1319 1320/sys/lib/acid/port 1321/sys/lib/acid/mips 1322acid: 1323.P2 1324The 1325.CW acid 1326function 1327.CW stk() 1328reports the stack traceback: 1329.P1 1330acid: stk() 1331At pc:0x105fb8:abort+0x24 /sys/src/ape/lib/ap/stdio/abort.c:6 1332abort() /sys/src/ape/lib/ap/stdio/abort.c:4 1333 called from FatalError+#4e 1334 /sys/src/X/mit/server/dix/misc.c:421 1335FatalError(s9=#e02, s8=#4901d200, s7=#2, s6=#72701, s5=#1, 1336 s4=#7270d, s3=#6, s2=#12, s1=#ff37f1c, s0=#6, f=#7270f) 1337 /sys/src/X/mit/server/dix/misc.c:416 1338 called from gnotscreeninit+#4ce 1339 /sys/src/X/mit/server/ddx/gnot/gnot.c:792 1340gnotscreeninit(snum=#0, sc=#80db0) 1341 /sys/src/X/mit/server/ddx/gnot/gnot.c:766 1342 called from AddScreen+#16e 1343 /n/bootes/sys/src/X/mit/server/dix/main.c:610 1344AddScreen(pfnInit=0x0000129c,argc=0x00000001,argv=0x7fffffe4) 1345 /sys/src/X/mit/server/dix/main.c:530 1346 called from InitOutput+0x80 1347 /sys/src/X/mit/server/ddx/brazil/brddx.c:522 1348InitOutput(argc=0x00000001,argv=0x7fffffe4) 1349 /sys/src/X/mit/server/ddx/brazil/brddx.c:511 1350 called from main+0x294 1351 /sys/src/X/mit/server/dix/main.c:225 1352main(argc=0x00000001,argv=0x7fffffe4) 1353 /sys/src/X/mit/server/dix/main.c:136 1354 called from _main+0x24 1355 /sys/src/ape/lib/ap/mips/main9.s:8 1356.P2 1357The function 1358.CW lstk() 1359is similar but 1360also reports the values of local variables. 1361Note that the traceback includes full file names; this is a boon to debugging, 1362although it makes the output much noisier. 1363.PP 1364To use 1365.CW acid 1366well you will need to learn its input language; see the 1367``Acid Manual'', 1368by Phil Winterbottom, 1369for details. For simple debugging, however, the information in the manual page is 1370sufficient. In particular, it describes the most useful functions 1371for examining a process. 1372.PP 1373The compiler does not place 1374information describing the types of variables in the executable, 1375but a compile-time flag provides crude support for symbolic debugging. 1376The 1377.CW -a 1378flag to the compiler suppresses code generation 1379and instead emits source text in the 1380.CW acid 1381language to format and display data structure types defined in the program. 1382The easiest way to use this feature is to put a rule in the 1383.CW mkfile : 1384.P1 1385syms: main.$O 1386 $CC -a main.c > syms 1387.P2 1388Then from within 1389.CW acid , 1390.P1 1391acid: include("sourcedirectory/syms") 1392.P2 1393to read in the relevant definitions. 1394(For multi-file source, you need to be a little fancier; 1395see 1396.I 2c (1)). 1397This text includes, for each defined compound 1398type, a function with that name that may be called with the address of a structure 1399of that type to display its contents. 1400For example, if 1401.CW rect 1402is a global variable of type 1403.CW Rectangle , 1404one may execute 1405.P1 1406Rectangle(*rect) 1407.P2 1408to display it. 1409The 1410.CW * 1411(indirection) operator is necessary because 1412of the way 1413.CW acid 1414works: each global symbol in the program is defined as a variable by 1415.CW acid , 1416with value equal to the 1417.I address 1418of the symbol. 1419.PP 1420Another common technique is to write by hand special 1421.CW acid 1422code to define functions to aid debugging, initialize the debugger, and so on. 1423Conventionally, this is placed in a file called 1424.CW acid 1425in the source directory; it has a line 1426.P1 1427include("sourcedirectory/syms"); 1428.P2 1429to load the compiler-produced symbols. One may edit the compiler output directly but 1430it is wiser to keep the hand-generated 1431.CW acid 1432separate from the machine-generated. 1433.PP 1434To make things simple, the default rules in the system 1435.CW mkfiles 1436include entries to make 1437.CW foo.acid 1438from 1439.CW foo.c , 1440so one may use 1441.CW mk 1442to automate the production of 1443.CW acid 1444definitions for a given C source file. 1445.PP 1446There is much more to say here. See 1447.CW acid 1448manual page, the reference manual, or the paper 1449``Acid: A Debugger Built From A Language'', 1450also by Phil Winterbottom. 1451