1.HTML "How to Use the Plan 9 C Compiler 2.TL 3How to Use the Plan 9 C Compiler 4.AU 5Rob Pike 6rob@plan9.bell-labs.com 7.SH 8Introduction 9.PP 10The C compiler on Plan 9 is a wholly new program; in fact 11it was the first piece of software written for what would 12eventually become Plan 9 from Bell Labs. 13Programmers familiar with existing C compilers will find 14a number of differences in both the language the Plan 9 compiler 15accepts and in how the compiler is used. 16.PP 17The compiler is really a set of compilers, one for each 18architecture \(em MIPS, SPARC, Intel 386, Power PC, ARM, etc. \(em 19that accept a dialect of ANSI C and efficiently produce 20fairly good code for the target machine. 21There is a packaging of the compiler that accepts strict ANSI C for 22a POSIX environment, but this document focuses on the 23native Plan 9 environment, that in which all the system source and 24almost all the utilities are written. 25.SH 26Source 27.PP 28The language accepted by the compilers is the core 1989 ANSI C language 29with some modest extensions, 30a greatly simplified preprocessor, 31a smaller library that includes system calls and related facilities, 32and a completely different structure for include files. 33.PP 34Official ANSI C accepts the old (K&R) style of declarations for 35functions; the Plan 9 compilers 36are more demanding. 37Without an explicit run-time flag 38.CW -B ) ( 39whose use is discouraged, the compilers insist 40on new-style function declarations, that is, prototypes for 41function arguments. 42The function declarations in the libraries' include files are 43all in the new style so the interfaces are checked at compile time. 44For C programmers who have not yet switched to function prototypes 45the clumsy syntax may seem repellent but the payoff in stronger typing 46is substantial. 47Those who wish to import existing software to Plan 9 are urged 48to use the opportunity to update their code. 49.PP 50The compilers include an integrated preprocessor that accepts the familiar 51.CW #include , 52.CW #define 53for macros both with and without arguments, 54.CW #undef , 55.CW #line , 56.CW #ifdef , 57.CW #ifndef , 58and 59.CW #endif . 60It 61supports neither 62.CW #if 63nor 64.CW ## , 65although it does 66honor a few 67.CW #pragmas . 68The 69.CW #if 70directive was omitted because it greatly complicates the 71preprocessor, is never necessary, and is usually abused. 72Conditional compilation in general makes code hard to understand; 73the Plan 9 source uses it sparingly. 74Also, because the compilers remove dead code, regular 75.CW if 76statements with constant conditions are more readable equivalents to many 77.CW #ifs . 78To compile imported code ineluctably fouled by 79.CW #if 80there is a separate command, 81.CW /bin/cpp , 82that implements the complete ANSI C preprocessor specification. 83.PP 84Include files fall into two groups: machine-dependent and machine-independent. 85The machine-independent files occupy the directory 86.CW /sys/include ; 87the others are placed in a directory appropriate to the machine, such as 88.CW /mips/include . 89The compiler searches for include files 90first in the machine-dependent directory and then 91in the machine-independent directory. 92At the time of writing there are thirty-one machine-independent include 93files and two (per machine) machine-dependent ones: 94.CW <ureg.h> 95and 96.CW <u.h> . 97The first describes the layout of registers on the system stack, 98for use by the debugger. 99The second defines some 100architecture-dependent types such as 101.CW jmp_buf 102for 103.CW setjmp 104and the 105.CW va_arg 106and 107.CW va_list 108macros for handling arguments to variadic functions, 109as well as a set of 110.CW typedef 111abbreviations for 112.CW unsigned 113.CW short 114and so on. 115.PP 116Here is an excerpt from 117.CW /386/include/u.h : 118.P1 119#define nil ((void*)0) 120typedef unsigned short ushort; 121typedef unsigned char uchar; 122typedef unsigned long ulong; 123typedef unsigned int uint; 124typedef signed char schar; 125typedef long long vlong; 126 127typedef long jmp_buf[2]; 128#define JMPBUFSP 0 129#define JMPBUFPC 1 130#define JMPBUFDPC 0 131.P2 132Plan 9 programs use 133.CW nil 134for the name of the zero-valued pointer. 135The type 136.CW vlong 137is the largest integer type available; on most architectures it 138is a 64-bit value. 139A couple of other types in 140.CW <u.h> 141are 142.CW u32int , 143which is guaranteed to have exactly 32 bits (a possibility on all the supported architectures) and 144.CW mpdigit , 145which is used by the multiprecision math package 146.CW <mp.h> . 147The 148.CW #define 149constants permit an architecture-independent (but compiler-dependent) 150implementation of stack-switching using 151.CW setjmp 152and 153.CW longjmp . 154.PP 155Every Plan 9 C program begins 156.P1 157#include <u.h> 158.P2 159because all the other installed header files use the 160.CW typedefs 161declared in 162.CW <u.h> . 163.PP 164In strict ANSI C, include files are grouped to collect related functions 165in a single file: one for string functions, one for memory functions, 166one for I/O, and none for system calls. 167Each include file is protected by an 168.CW #ifdef 169to guarantee its contents are seen by the compiler only once. 170Plan 9 takes a different approach. Other than a few include 171files that define external formats such as archives, the files in 172.CW /sys/include 173correspond to 174.I libraries. 175If a program is using a library, it includes the corresponding header. 176The default C library comprises string functions, memory functions, and 177so on, largely as in ANSI C, some formatted I/O routines, 178plus all the system calls and related functions. 179To use these functions, one must 180.CW #include 181the file 182.CW <libc.h> , 183which in turn must follow 184.CW <u.h> , 185to define their prototypes for the compiler. 186Here is the complete source to the traditional first C program: 187.P1 188#include <u.h> 189#include <libc.h> 190 191void 192main(void) 193{ 194 print("hello world\en"); 195 exits(0); 196} 197.P2 198The 199.CW print 200routine and its relatives 201.CW fprint 202and 203.CW sprint 204resemble the similarly-named functions in Standard I/O but are not 205attached to a specific I/O library. 206In Plan 9 207.CW main 208is not integer-valued; it should call 209.CW exits , 210which takes a string argument (or null; here ANSI C promotes the 0 to a 211.CW char* ). 212All these functions are, of course, documented in the Programmer's Manual. 213.PP 214To use 215.CW printf , 216.CW <stdio.h> 217must be included to define the function prototype for 218.CW printf : 219.P1 220#include <u.h> 221#include <libc.h> 222#include <stdio.h> 223 224void 225main(int argc, char *argv[]) 226{ 227 printf("%s: hello world; argc = %d\en", argv[0], argc); 228 exits(0); 229} 230.P2 231In practice, Standard I/O is not used much in Plan 9. I/O libraries are 232discussed in a later section of this document. 233.PP 234There are libraries for handling regular expressions, raster graphics, 235windows, and so on, and each has an associated include file. 236The manual for each library states which include files are needed. 237The files are not protected against multiple inclusion and themselves 238contain no nested 239.CW #includes . 240Instead the 241programmer is expected to sort out the requirements 242and to 243.CW #include 244the necessary files once at the top of each source file. In practice this is 245trivial: this way of handling include files is so straightforward 246that it is rare for a source file to contain more than half a dozen 247.CW #includes . 248.PP 249The compilers do their own register allocation so the 250.CW register 251keyword is ignored. 252For different reasons, 253.CW volatile 254and 255.CW const 256are also ignored. 257.PP 258To make it easier to share code with other systems, Plan 9 has a version 259of the compiler, 260.CW pcc , 261that provides the standard ANSI C preprocessor, headers, and libraries 262with POSIX extensions. 263.CW Pcc 264is recommended only 265when broad external portability is mandated. It compiles slower, 266produces slower code (it takes extra work to simulate POSIX on Plan 9), 267eliminates those parts of the Plan 9 interface 268not related to POSIX, and illustrates the clumsiness of an environment 269designed by committee. 270.CW Pcc 271is described in more detail in 272.I 273APE\(emThe ANSI/POSIX Environment, 274.R 275by Howard Trickey. 276.SH 277Process 278.PP 279Each CPU architecture supported by Plan 9 is identified by a single, 280arbitrary, alphanumeric character: 281.CW k 282for SPARC, 283.CW q 284for 32-bit Power PC, 285.CW v 286for MIPS, 287.CW 0 288for little-endian MIPS, 289.CW 5 290for ARM v5 and later 32-bit architectures, 291.CW 6 292for AMD64, 293.CW 8 294for Intel 386, and 295.CW 9 296for 64-bit Power PC. 297The character labels the support tools and files for that architecture. 298For instance, for the 386 the compiler is 299.CW 8c , 300the assembler is 301.CW 8a , 302the link editor/loader is 303.CW 8l , 304the object files are suffixed 305.CW \&.8 , 306and the default name for an executable file is 307.CW 8.out . 308Before we can use the compiler we therefore need to know which 309machine we are compiling for. 310The next section explains how this decision is made; for the moment 311assume we are building 386 binaries and make the mental substitution for 312.CW 8 313appropriate to the machine you are actually using. 314.PP 315To convert source to an executable binary is a two-step process. 316First run the compiler, 317.CW 8c , 318on the source, say 319.CW file.c , 320to generate an object file 321.CW file.8 . 322Then run the loader, 323.CW 8l , 324to generate an executable 325.CW 8.out 326that may be run (on a 386 machine): 327.P1 3288c file.c 3298l file.8 3308.out 331.P2 332The loader automatically links with whatever libraries the program 333needs, usually including the standard C library as defined by 334.CW <libc.h> . 335Of course the compiler and loader have lots of options, both familiar and new; 336see the manual for details. 337The compiler does not generate an executable automatically; 338the output of the compiler must be given to the loader. 339Since most compilation is done under the control of 340.CW mk 341(see below), this is rarely an inconvenience. 342.PP 343The distribution of work between the compiler and loader is unusual. 344The compiler integrates preprocessing, parsing, register allocation, 345code generation and some assembly. 346Combining these tasks in a single program is part of the reason for 347the compiler's efficiency. 348The loader does instruction selection, branch folding, 349instruction scheduling, 350and writes the final executable. 351There is no separate C preprocessor and no assembler in the usual pipeline. 352Instead the intermediate object file 353(here a 354.CW \&.8 355file) is a type of binary assembly language. 356The instructions in the intermediate format are not exactly those in 357the machine. For example, on the 68020 the object file may specify 358a MOVE instruction but the loader will decide just which variant of 359the MOVE instruction \(em MOVE immediate, MOVE quick, MOVE address, 360etc. \(em is most efficient. 361.PP 362The assembler, 363.CW 8a , 364is just a translator between the textual and binary 365representations of the object file format. 366It is not an assembler in the traditional sense. It has limited 367macro capabilities (the same as the integral C preprocessor in the compiler), 368clumsy syntax, and minimal error checking. For instance, the assembler 369will accept an instruction (such as memory-to-memory MOVE on the MIPS) that the 370machine does not actually support; only when the output of the assembler 371is passed to the loader will the error be discovered. 372The assembler is intended only for writing things that need access to instructions 373invisible from C, 374such as the machine-dependent 375part of an operating system; 376very little code in Plan 9 is in assembly language. 377.PP 378The compilers take an option 379.CW -S 380that causes them to print on their standard output the generated code 381in a format acceptable as input to the assemblers. 382This is of course merely a formatting of the 383data in the object file; therefore the assembler is just 384an 385ASCII-to-binary converter for this format. 386Other than the specific instructions, the input to the assemblers 387is largely architecture-independent; see 388``A Manual for the Plan 9 Assembler'', 389by Rob Pike, 390for more information. 391.PP 392The loader is an integral part of the compilation process. 393Each library header file contains a 394.CW #pragma 395that tells the loader the name of the associated archive; it is 396not necessary to tell the loader which libraries a program uses. 397The C run-time startup is found, by default, in the C library. 398The loader starts with an undefined 399symbol, 400.CW _main , 401that is resolved by pulling in the run-time startup code from the library. 402(The loader undefines 403.CW _mainp 404when profiling is enabled, to force loading of the profiling start-up 405instead.) 406.PP 407Unlike its counterpart on other systems, the Plan 9 loader rearranges 408data to optimize access. This means the order of variables in the 409loaded program is unrelated to its order in the source. 410Most programs don't care, but some assume that, for example, the 411variables declared by 412.P1 413int a; 414int b; 415.P2 416will appear at adjacent addresses in memory. On Plan 9, they won't. 417.SH 418Heterogeneity 419.PP 420When the system starts or a user logs in the environment is configured 421so the appropriate binaries are available in 422.CW /bin . 423The configuration process is controlled by an environment variable, 424.CW $cputype , 425with value such as 426.CW mips , 427.CW 386 , 428.CW arm , 429or 430.CW sparc . 431For each architecture there is a directory in the root, 432with the appropriate name, 433that holds the binary and library files for that architecture. 434Thus 435.CW /mips/lib 436contains the object code libraries for MIPS programs, 437.CW /mips/include 438holds MIPS-specific include files, and 439.CW /mips/bin 440has the MIPS binaries. 441These binaries are attached to 442.CW /bin 443at boot time by binding 444.CW /$cputype/bin 445to 446.CW /bin , 447so 448.CW /bin 449always contains the correct files. 450.PP 451The MIPS compiler, 452.CW vc , 453by definition 454produces object files for the MIPS architecture, 455regardless of the architecture of the machine on which the compiler is running. 456There is a version of 457.CW vc 458compiled for each architecture: 459.CW /mips/bin/vc , 460.CW /arm/bin/vc , 461.CW /sparc/bin/vc , 462and so on, 463each capable of producing MIPS object files regardless of the native 464instruction set. 465If one is running on a SPARC, 466.CW /sparc/bin/vc 467will compile programs for the MIPS; 468if one is running on machine 469.CW $cputype , 470.CW /$cputype/bin/vc 471will compile programs for the MIPS. 472.PP 473Because of the bindings that assemble 474.CW /bin , 475the shell always looks for a command, say 476.CW date , 477in 478.CW /bin 479and automatically finds the file 480.CW /$cputype/bin/date . 481Therefore the MIPS compiler is known as just 482.CW vc ; 483the shell will invoke 484.CW /bin/vc 485and that is guaranteed to be the version of the MIPS compiler 486appropriate for the machine running the command. 487Regardless of the architecture of the compiling machine, 488.CW /bin/vc 489is 490.I always 491the MIPS compiler. 492.PP 493Also, the output of 494.CW vc 495and 496.CW vl 497is completely independent of the machine type on which they are executed: 498.CW \&.v 499files compiled (with 500.CW vc ) 501on a SPARC may be linked (with 502.CW vl ) 503on a 386. 504(The resulting 505.CW v.out 506will run, of course, only on a MIPS.) 507Similarly, the MIPS libraries in 508.CW /mips/lib 509are suitable for loading with 510.CW vl 511on any machine; there is only one set of MIPS libraries, not one 512set for each architecture that supports the MIPS compiler. 513.SH 514Heterogeneity and \f(CWmk\fP 515.PP 516Most software on Plan 9 is compiled under the control of 517.CW mk , 518a descendant of 519.CW make 520that is documented in the Programmer's Manual. 521A convention used throughout the 522.CW mkfiles 523makes it easy to compile the source into binary suitable for any architecture. 524.PP 525The variable 526.CW $cputype 527is advisory: it reports the architecture of the current environment, and should 528not be modified. A second variable, 529.CW $objtype , 530is used to set which architecture is being 531.I compiled 532for. 533The value of 534.CW $objtype 535can be used by a 536.CW mkfile 537to configure the compilation environment. 538.PP 539In each machine's root directory there is a short 540.CW mkfile 541that defines a set of macros for the compiler, loader, etc. 542Here is 543.CW /mips/mkfile : 544.P1 545</sys/src/mkfile.proto 546 547CC=vc 548LD=vl 549O=v 550AS=va 551.P2 552The line 553.P1 554</sys/src/mkfile.proto 555.P2 556causes 557.CW mk 558to include the file 559.CW /sys/src/mkfile.proto , 560which contains general definitions: 561.P1 562# 563# common mkfile parameters shared by all architectures 564# 565 566OS=5689qv 567CPUS=arm amd64 386 power mips 568CFLAGS=-FTVw 569LEX=lex 570YACC=yacc 571MK=/bin/mk 572.P2 573.CW CC 574is obviously the compiler, 575.CW AS 576the assembler, and 577.CW LD 578the loader. 579.CW O 580is the suffix for the object files and 581.CW CPUS 582and 583.CW OS 584are used in special rules described below. 585.PP 586Here is a 587.CW mkfile 588to build the installed source for 589.CW sam : 590.P1 591</$objtype/mkfile 592OBJ=sam.$O address.$O buffer.$O cmd.$O disc.$O error.$O \e 593 file.$O io.$O list.$O mesg.$O moveto.$O multi.$O \e 594 plan9.$O rasp.$O regexp.$O string.$O sys.$O xec.$O 595 596$O.out: $OBJ 597 $LD $OBJ 598 599install: $O.out 600 cp $O.out /$objtype/bin/sam 601 602installall: 603 for(objtype in $CPUS) mk install 604 605%.$O: %.c 606 $CC $CFLAGS $stem.c 607 608$OBJ: sam.h errors.h mesg.h 609address.$O cmd.$O parse.$O xec.$O unix.$O: parse.h 610 611clean:V: 612 rm -f [$OS].out *.[$OS] y.tab.? 613.P2 614(The actual 615.CW mkfile 616imports most of its rules from other secondary files, but 617this example works and is not misleading.) 618The first line causes 619.CW mk 620to include the contents of 621.CW /$objtype/mkfile 622in the current 623.CW mkfile . 624If 625.CW $objtype 626is 627.CW mips , 628this inserts the MIPS macro definitions into the 629.CW mkfile . 630In this case the rule for 631.CW $O.out 632uses the MIPS tools to build 633.CW v.out . 634The 635.CW %.$O 636rule in the file uses 637.CW mk 's 638pattern matching facilities to convert the source files to the object 639files through the compiler. 640(The text of the rules is passed directly to the shell, 641.CW rc , 642without further translation. 643See the 644.CW mk 645manual if any of this is unfamiliar.) 646Because the default rule builds 647.CW $O.out 648rather than 649.CW sam , 650it is possible to maintain binaries for multiple machines in the 651same source directory without conflict. 652This is also, of course, why the output files from the various 653compilers and loaders 654have distinct names. 655.PP 656The rest of the 657.CW mkfile 658should be easy to follow; notice how the rules for 659.CW clean 660and 661.CW installall 662(that is, install versions for all architectures) use other macros 663defined in 664.CW /$objtype/mkfile . 665In Plan 9, 666.CW mkfiles 667for commands conventionally contain rules to 668.CW install 669(compile and install the version for 670.CW $objtype ), 671.CW installall 672(compile and install for all 673.CW $objtypes ), 674and 675.CW clean 676(remove all object files, binaries, etc.). 677.PP 678The 679.CW mkfile 680is easy to use. To build a MIPS binary, 681.CW v.out : 682.P1 683% objtype=mips 684% mk 685.P2 686To build and install a MIPS binary: 687.P1 688% objtype=mips 689% mk install 690.P2 691To build and install all versions: 692.P1 693% mk installall 694.P2 695These conventions make cross-compilation as easy to manage 696as traditional native compilation. 697Plan 9 programs compile and run without change on machines from 698large multiprocessors to laptops. For more information about this process, see 699``Plan 9 Mkfiles'', 700by Bob Flandrena. 701.SH 702Portability 703.PP 704Within Plan 9, it is painless to write portable programs, programs whose 705source is independent of the machine on which they execute. 706The operating system is fixed and the compiler, headers and libraries 707are constant so most of the stumbling blocks to portability are removed. 708Attention to a few details can avoid those that remain. 709.PP 710Plan 9 is a heterogeneous environment, so programs must 711.I expect 712that external files will be written by programs on machines of different 713architectures. 714The compilers, for instance, must handle without confusion 715object files written by other machines. 716The traditional approach to this problem is to pepper the source with 717.CW #ifdefs 718to turn byte-swapping on and off. 719Plan 9 takes a different approach: of the handful of machine-dependent 720.CW #ifdefs 721in all the source, almost all are deep in the libraries. 722Instead programs read and write files in a defined format, 723either (for low volume applications) as formatted text, or 724(for high volume applications) as binary in a known byte order. 725If the external data were written with the most significant 726byte first, the following code reads a 4-byte integer correctly 727regardless of the architecture of the executing machine (assuming 728an unsigned long holds 4 bytes): 729.P1 730ulong 731getlong(void) 732{ 733 ulong l; 734 735 l = (getchar()&0xFF)<<24; 736 l |= (getchar()&0xFF)<<16; 737 l |= (getchar()&0xFF)<<8; 738 l |= (getchar()&0xFF)<<0; 739 return l; 740} 741.P2 742Note that this code does not `swap' the bytes; instead it just reads 743them in the correct order. 744Variations of this code will handle any binary format 745and also avoid problems 746involving how structures are padded, how words are aligned, 747and other impediments to portability. 748Be aware, though, that extra care is needed to handle floating point data. 749.PP 750Efficiency hounds will argue that this method is unnecessarily slow and clumsy 751when the executing machine has the same byte order (and padding and alignment) 752as the data. 753The CPU cost of I/O processing 754is rarely the bottleneck for an application, however, 755and the gain in simplicity of porting and maintaining the code greatly outweighs 756the minor speed loss from handling data in this general way. 757This method is how the Plan 9 compilers, the window system, and even the file 758servers transmit data between programs. 759.PP 760To port programs beyond Plan 9, where the system interface is more variable, 761it is probably necessary to use 762.CW pcc 763and hope that the target machine supports ANSI C and POSIX. 764.SH 765I/O 766.PP 767The default C library, defined by the include file 768.CW <libc.h> , 769contains no buffered I/O package. 770It does have several entry points for printing formatted text: 771.CW print 772outputs text to the standard output, 773.CW fprint 774outputs text to a specified integer file descriptor, and 775.CW sprint 776places text in a character array. 777To access library routines for buffered I/O, a program must 778explicitly include the header file associated with an appropriate library. 779.PP 780The recommended I/O library, used by most Plan 9 utilities, is 781.CW bio 782(buffered I/O), defined by 783.CW <bio.h> . 784There also exists an implementation of ANSI Standard I/O, 785.CW stdio . 786.PP 787.CW Bio 788is small and efficient, particularly for buffer-at-a-time or 789line-at-a-time I/O. 790Even for character-at-a-time I/O, however, it is significantly faster than 791the Standard I/O library, 792.CW stdio . 793Its interface is compact and regular, although it lacks a few conveniences. 794The most noticeable is that one must explicitly define buffers for standard 795input and output; 796.CW bio 797does not predefine them. Here is a program to copy input to output a byte 798at a time using 799.CW bio : 800.P1 801#include <u.h> 802#include <libc.h> 803#include <bio.h> 804 805Biobuf bin; 806Biobuf bout; 807 808main(void) 809{ 810 int c; 811 812 Binit(&bin, 0, OREAD); 813 Binit(&bout, 1, OWRITE); 814 815 while((c=Bgetc(&bin)) != Beof) 816 Bputc(&bout, c); 817 exits(0); 818} 819.P2 820For peak performance, we could replace 821.CW Bgetc 822and 823.CW Bputc 824by their equivalent in-line macros 825.CW BGETC 826and 827.CW BPUTC 828but 829the performance gain would be modest. 830For more information on 831.CW bio , 832see the Programmer's Manual. 833.PP 834Perhaps the most dramatic difference in the I/O interface of Plan 9 from other 835systems' is that text is not ASCII. 836The format for 837text in Plan 9 is a byte-stream encoding of 16-bit characters. 838The character set is based on the Unicode Standard and is backward compatible with 839ASCII: 840characters with value 0 through 127 are the same in both sets. 841The 16-bit characters, called 842.I runes 843in Plan 9, are encoded using a representation called 844UTF, 845an encoding that is becoming accepted as a standard. 846(ISO calls it UTF-8; 847throughout Plan 9 it's just called 848UTF.) 849UTF 850defines multibyte sequences to 851represent character values from 0 to 65535. 852In 853UTF, 854character values up to 127 decimal, 7F hexadecimal, represent themselves, 855so straight 856ASCII 857files are also valid 858UTF. 859Also, 860UTF 861guarantees that bytes with values 0 to 127 (NUL to DEL, inclusive) 862will appear only when they represent themselves, so programs that read bytes 863looking for plain ASCII characters will continue to work. 864Any program that expects a one-to-one correspondence between bytes and 865characters will, however, need to be modified. 866An example is parsing file names. 867File names, like all text, are in 868UTF, 869so it is incorrect to search for a character in a string by 870.CW strchr(filename, 871.CW c) 872because the character might have a multi-byte encoding. 873The correct method is to call 874.CW utfrune(filename, 875.CW c) , 876defined in 877.I rune (2), 878which interprets the file name as a sequence of encoded characters 879rather than bytes. 880In fact, even when you know the character is a single byte 881that can represent only itself, 882it is safer to use 883.CW utfrune 884because that assumes nothing about the character set 885and its representation. 886.PP 887The library defines several symbols relevant to the representation of characters. 888Any byte with unsigned value less than 889.CW Runesync 890will not appear in any multi-byte encoding of a character. 891.CW Utfrune 892compares the character being searched against 893.CW Runesync 894to see if it is sufficient to call 895.CW strchr 896or if the byte stream must be interpreted. 897Any byte with unsigned value less than 898.CW Runeself 899is represented by a single byte with the same value. 900Finally, when errors are encountered converting 901to runes from a byte stream, the library returns the rune value 902.CW Runeerror 903and advances a single byte. This permits programs to find runes 904embedded in binary data. 905.PP 906.CW Bio 907includes routines 908.CW Bgetrune 909and 910.CW Bputrune 911to transform the external byte stream 912UTF 913format to and from 914internal 16-bit runes. 915Also, the 916.CW %s 917format to 918.CW print 919accepts 920UTF; 921.CW %c 922prints a character after narrowing it to 8 bits. 923The 924.CW %S 925format prints a null-terminated sequence of runes; 926.CW %C 927prints a character after narrowing it to 16 bits. 928For more information, see the Programmer's Manual, in particular 929.I utf (6) 930and 931.I rune (2), 932and the paper, 933``Hello world, or 934Καλημέρα κόσμε, or\ 935\f(Jpこんにちは 世界\f1'', 936by Rob Pike and 937Ken Thompson; 938there is not room for the full story here. 939.PP 940These issues affect the compiler in several ways. 941First, the C source is in 942UTF. 943ANSI says C variables are formed from 944ASCII 945alphanumerics, but comments and literal strings may contain any characters 946encoded in the native encoding, here 947UTF. 948The declaration 949.P1 950char *cp = "abcÿ"; 951.P2 952initializes the variable 953.CW cp 954to point to an array of bytes holding the 955UTF 956representation of the characters 957.CW abcÿ. 958The type 959.CW Rune 960is defined in 961.CW <u.h> 962to be 963.CW ushort , 964which is also the `wide character' type in the compiler. 965Therefore the declaration 966.P1 967Rune *rp = L"abcÿ"; 968.P2 969initializes the variable 970.CW rp 971to point to an array of unsigned short integers holding the 16-bit 972values of the characters 973.CW abcÿ . 974Note that in both these declarations the characters in the source 975that represent 976.CW "abcÿ" 977are the same; what changes is how those characters are represented 978in memory in the program. 979The following two lines: 980.P1 981print("%s\en", "abcÿ"); 982print("%S\en", L"abcÿ"); 983.P2 984produce the same 985UTF 986string on their output, the first by copying the bytes, the second 987by converting from runes to bytes. 988.PP 989In C, character constants are integers but narrowed through the 990.CW char 991type. 992The Unicode character 993.CW ÿ 994has value 255, so if the 995.CW char 996type is signed, 997the constant 998.CW 'ÿ' 999has value \-1 (which is equal to EOF). 1000On the other hand, 1001.CW L'ÿ' 1002narrows through the wide character type, 1003.CW ushort , 1004and therefore has value 255. 1005.PP 1006Finally, although it's not ANSI C, the Plan 9 C compilers 1007assume any character with value above 1008.CW Runeself 1009is an alphanumeric, 1010so α is a legal, if non-portable, variable name. 1011.SH 1012Arguments 1013.PP 1014Some macros are defined 1015in 1016.CW <libc.h> 1017for parsing the arguments to 1018.CW main() . 1019They are described in 1020.I ARG (2) 1021but are fairly self-explanatory. 1022There are four macros: 1023.CW ARGBEGIN 1024and 1025.CW ARGEND 1026are used to bracket a hidden 1027.CW switch 1028statement within which 1029.CW ARGC 1030returns the current option character (rune) being processed and 1031.CW ARGF 1032returns the argument to the option, as in the loader option 1033.CW -o 1034.CW file . 1035Here, for example, is the code at the beginning of 1036.CW main() 1037in 1038.CW ramfs.c 1039(see 1040.I ramfs (1)) 1041that cracks its arguments: 1042.P1 1043void 1044main(int argc, char *argv[]) 1045{ 1046 char *defmnt; 1047 int p[2]; 1048 int mfd[2]; 1049 int stdio = 0; 1050 1051 defmnt = "/tmp"; 1052 ARGBEGIN{ 1053 case 'i': 1054 defmnt = 0; 1055 stdio = 1; 1056 mfd[0] = 0; 1057 mfd[1] = 1; 1058 break; 1059 case 's': 1060 defmnt = 0; 1061 break; 1062 case 'm': 1063 defmnt = ARGF(); 1064 break; 1065 default: 1066 usage(); 1067 }ARGEND 1068.P2 1069.SH 1070Extensions 1071.PP 1072The compiler has several extensions to 1989 ANSI C, all of which are used 1073extensively in the system source. 1074Some of these have been adopted in later ANSI C standards. 1075First, 1076.I structure 1077.I displays 1078permit 1079.CW struct 1080expressions to be formed dynamically. 1081Given these declarations: 1082.P1 1083typedef struct Point Point; 1084typedef struct Rectangle Rectangle; 1085 1086struct Point 1087{ 1088 int x, y; 1089}; 1090 1091struct Rectangle 1092{ 1093 Point min, max; 1094}; 1095 1096Point p, q, add(Point, Point); 1097Rectangle r; 1098int x, y; 1099.P2 1100this assignment may appear anywhere an assignment is legal: 1101.P1 1102r = (Rectangle){add(p, q), (Point){x, y+3}}; 1103.P2 1104The syntax is the same as for initializing a structure but with 1105a leading cast. 1106.PP 1107If an 1108.I anonymous 1109.I structure 1110or 1111.I union 1112is declared within another structure or union, the members of the internal 1113structure or union are addressable without prefix in the outer structure. 1114This feature eliminates the clumsy naming of nested structures and, 1115particularly, unions. 1116For example, after these declarations, 1117.P1 1118struct Lock 1119{ 1120 int locked; 1121}; 1122 1123struct Node 1124{ 1125 int type; 1126 union{ 1127 double dval; 1128 double fval; 1129 long lval; 1130 }; /* anonymous union */ 1131 struct Lock; /* anonymous structure */ 1132} *node; 1133 1134void lock(struct Lock*); 1135.P2 1136one may refer to 1137.CW node->type , 1138.CW node->dval , 1139.CW node->fval , 1140.CW node->lval , 1141and 1142.CW node->locked . 1143Moreover, the address of a 1144.CW struct 1145.CW Node 1146may be used without a cast anywhere that the address of a 1147.CW struct 1148.CW Lock 1149is used, such as in argument lists. 1150The compiler automatically promotes the type and adjusts the address. 1151Thus one may invoke 1152.CW lock(node) . 1153.PP 1154Anonymous structures and unions may be accessed by type name 1155if (and only if) they are declared using a 1156.CW typedef 1157name. 1158For example, using the above declaration for 1159.CW Point , 1160one may declare 1161.P1 1162struct 1163{ 1164 int type; 1165 Point; 1166} p; 1167.P2 1168and refer to 1169.CW p.Point . 1170.PP 1171In the initialization of arrays, a number in square brackets before an 1172element sets the index for the initialization. For example, to initialize 1173some elements in 1174a table of function pointers indexed by 1175ASCII 1176character, 1177.P1 1178void percent(void), slash(void); 1179 1180void (*func[128])(void) = 1181{ 1182 ['%'] percent, 1183 ['/'] slash, 1184}; 1185.P2 1186.LP 1187A similar syntax allows one to initialize structure elements: 1188.P1 1189Point p = 1190{ 1191 .y 100, 1192 .x 200 1193}; 1194.P2 1195These initialization syntaxes were later added to ANSI C, with the addition of an 1196equals sign between the index or tag and the value. 1197The Plan 9 compiler accepts either form. 1198.PP 1199Finally, the declaration 1200.P1 1201extern register reg; 1202.P2 1203.I this "" ( 1204appearance of the register keyword is not ignored) 1205allocates a global register to hold the variable 1206.CW reg . 1207External registers must be used carefully: they need to be declared in 1208.I all 1209source files and libraries in the program to guarantee the register 1210is not allocated temporarily for other purposes. 1211Especially on machines with few registers, such as the i386, 1212it is easy to link accidentally with code that has already usurped 1213the global registers and there is no diagnostic when this happens. 1214Used wisely, though, external registers are powerful. 1215The Plan 9 operating system uses them to access per-process and 1216per-machine data structures on a multiprocessor. The storage class they provide 1217is hard to create in other ways. 1218.SH 1219The compile-time environment 1220.PP 1221The code generated by the compilers is `optimized' by default: 1222variables are placed in registers and peephole optimizations are 1223performed. 1224The compiler flag 1225.CW -N 1226disables these optimizations. 1227Registerization is done locally rather than throughout a function: 1228whether a variable occupies a register or 1229the memory location identified in the symbol 1230table depends on the activity of the variable and may change 1231throughout the life of the variable. 1232The 1233.CW -N 1234flag is rarely needed; 1235its main use is to simplify debugging. 1236There is no information in the symbol table to identify the 1237registerization of a variable, so 1238.CW -N 1239guarantees the variable is always where the symbol table says it is. 1240.PP 1241Another flag, 1242.CW -w , 1243turns 1244.I on 1245warnings about portability and problems detected in flow analysis. 1246Most code in Plan 9 is compiled with warnings enabled; 1247these warnings plus the type checking offered by function prototypes 1248provide most of the support of the Unix tool 1249.CW lint 1250more accurately and with less chatter. 1251Two of the warnings, 1252`used and not set' and `set and not used', are almost always accurate but 1253may be triggered spuriously by code with invisible control flow, 1254such as in routines that call 1255.CW longjmp . 1256The compiler statements 1257.P1 1258SET(v1); 1259USED(v2); 1260.P2 1261decorate the flow graph to silence the compiler. 1262Either statement accepts a comma-separated list of variables. 1263Use them carefully: they may silence real errors. 1264For the common case of unused parameters to a function, 1265leaving the name off the declaration silences the warnings. 1266That is, listing the type of a parameter but giving it no 1267associated variable name does the trick. 1268.SH 1269Debugging 1270.PP 1271There are two debuggers available on Plan 9. 1272The first, and older, is 1273.CW db , 1274a revision of Unix 1275.CW adb . 1276The other, 1277.CW acid , 1278is a source-level debugger whose commands are statements in 1279a true programming language. 1280.CW Acid 1281is the preferred debugger, but since it 1282borrows some elements of 1283.CW db , 1284notably the formats for displaying values, it is worth knowing a little bit about 1285.CW db . 1286.PP 1287Both debuggers support multiple architectures in a single program; that is, 1288the programs are 1289.CW db 1290and 1291.CW acid , 1292not for example 1293.CW vdb 1294and 1295.CW vacid . 1296They also support cross-architecture debugging comfortably: 1297one may debug a 386 binary on a MIPS. 1298.PP 1299Imagine a program has crashed mysteriously: 1300.P1 1301% X11/X 1302Fatal server bug! 1303failed to create default stipple 1304X 106: suicide: sys: trap: fault read addr=0x0 pc=0x00105fb8 1305% 1306.P2 1307When a process dies on Plan 9 it hangs in the `broken' state 1308for debugging. 1309Attach a debugger to the process by naming its process id: 1310.P1 1311% acid 106 1312/proc/106/text:mips plan 9 executable 1313 1314/sys/lib/acid/port 1315/sys/lib/acid/mips 1316acid: 1317.P2 1318The 1319.CW acid 1320function 1321.CW stk() 1322reports the stack traceback: 1323.P1 1324acid: stk() 1325At pc:0x105fb8:abort+0x24 /sys/src/ape/lib/ap/stdio/abort.c:6 1326abort() /sys/src/ape/lib/ap/stdio/abort.c:4 1327 called from FatalError+#4e 1328 /sys/src/X/mit/server/dix/misc.c:421 1329FatalError(s9=#e02, s8=#4901d200, s7=#2, s6=#72701, s5=#1, 1330 s4=#7270d, s3=#6, s2=#12, s1=#ff37f1c, s0=#6, f=#7270f) 1331 /sys/src/X/mit/server/dix/misc.c:416 1332 called from gnotscreeninit+#4ce 1333 /sys/src/X/mit/server/ddx/gnot/gnot.c:792 1334gnotscreeninit(snum=#0, sc=#80db0) 1335 /sys/src/X/mit/server/ddx/gnot/gnot.c:766 1336 called from AddScreen+#16e 1337 /n/bootes/sys/src/X/mit/server/dix/main.c:610 1338AddScreen(pfnInit=0x0000129c,argc=0x00000001,argv=0x7fffffe4) 1339 /sys/src/X/mit/server/dix/main.c:530 1340 called from InitOutput+0x80 1341 /sys/src/X/mit/server/ddx/brazil/brddx.c:522 1342InitOutput(argc=0x00000001,argv=0x7fffffe4) 1343 /sys/src/X/mit/server/ddx/brazil/brddx.c:511 1344 called from main+0x294 1345 /sys/src/X/mit/server/dix/main.c:225 1346main(argc=0x00000001,argv=0x7fffffe4) 1347 /sys/src/X/mit/server/dix/main.c:136 1348 called from _main+0x24 1349 /sys/src/ape/lib/ap/mips/main9.s:8 1350.P2 1351The function 1352.CW lstk() 1353is similar but 1354also reports the values of local variables. 1355Note that the traceback includes full file names; this is a boon to debugging, 1356although it makes the output much noisier. 1357.PP 1358To use 1359.CW acid 1360well you will need to learn its input language; see the 1361``Acid Manual'', 1362by Phil Winterbottom, 1363for details. For simple debugging, however, the information in the manual page is 1364sufficient. In particular, it describes the most useful functions 1365for examining a process. 1366.PP 1367The compiler does not place 1368information describing the types of variables in the executable, 1369but a compile-time flag provides crude support for symbolic debugging. 1370The 1371.CW -a 1372flag to the compiler suppresses code generation 1373and instead emits source text in the 1374.CW acid 1375language to format and display data structure types defined in the program. 1376The easiest way to use this feature is to put a rule in the 1377.CW mkfile : 1378.P1 1379syms: main.$O 1380 $CC -a main.c > syms 1381.P2 1382Then from within 1383.CW acid , 1384.P1 1385acid: include("sourcedirectory/syms") 1386.P2 1387to read in the relevant definitions. 1388(For multi-file source, you need to be a little fancier; 1389see 1390.I 8c (1)). 1391This text includes, for each defined compound 1392type, a function with that name that may be called with the address of a structure 1393of that type to display its contents. 1394For example, if 1395.CW rect 1396is a global variable of type 1397.CW Rectangle , 1398one may execute 1399.P1 1400Rectangle(*rect) 1401.P2 1402to display it. 1403The 1404.CW * 1405(indirection) operator is necessary because 1406of the way 1407.CW acid 1408works: each global symbol in the program is defined as a variable by 1409.CW acid , 1410with value equal to the 1411.I address 1412of the symbol. 1413.PP 1414Another common technique is to write by hand special 1415.CW acid 1416code to define functions to aid debugging, initialize the debugger, and so on. 1417Conventionally, this is placed in a file called 1418.CW acid 1419in the source directory; it has a line 1420.P1 1421include("sourcedirectory/syms"); 1422.P2 1423to load the compiler-produced symbols. One may edit the compiler output directly but 1424it is wiser to keep the hand-generated 1425.CW acid 1426separate from the machine-generated. 1427.PP 1428To make things simple, the default rules in the system 1429.CW mkfiles 1430include entries to make 1431.CW foo.acid 1432from 1433.CW foo.c , 1434so one may use 1435.CW mk 1436to automate the production of 1437.CW acid 1438definitions for a given C source file. 1439.PP 1440There is much more to say here. See 1441.CW acid 1442manual page, the reference manual, or the paper 1443``Acid: A Debugger Built From A Language'', 1444also by Phil Winterbottom. 1445