1.TL 2How to Use the Plan 9 C Compiler 3.AU 4Rob Pike 5rob@plan9.att.com 6.SH 7Introduction 8.PP 9The C compiler on Plan 9 is a wholly new program; in fact 10it was the first piece of software written for what would 11eventually become Plan 9 from Bell Labs. 12Programmers familiar with existing C compilers will find 13a number of differences in both the language the Plan 9 compiler 14accepts and in how the compiler is used. 15.PP 16The compiler is really a set of compilers, one for each 17architecture \(em MIPS, SPARC, Motorola 68020, Intel 386, etc. \(em 18that accept a dialect of ANSI C and efficiently produce 19fairly good code for the target machine. 20There is a packaging of the compiler that accepts strict ANSI C for 21a POSIX environment, but this document focuses on the 22native Plan 9 environment, that in which all the system source and 23almost all the utilities are written. 24.SH 25Source 26.PP 27The language accepted by the compilers is the core ANSI C language 28with some modest extensions, 29a greatly simplified preprocessor, 30a smaller library that includes system calls and related facilities, 31and a completely different structure for include files. 32.PP 33Official ANSI C accepts the old (K&R) style of declarations for 34functions; the Plan 9 compilers 35are more demanding. 36Without an explicit run-time flag 37.CW -B ) ( 38whose use is discouraged, the compilers insist 39on new-style function declarations, that is, prototypes for 40function arguments. 41The function declarations in the libraries' include files are 42all in the new style so the interfaces are checked at compile time. 43For C programmers who have not yet switched to function prototypes 44the clumsy syntax may seem repellent but the payoff in stronger typing 45is substantial. 46Those who wish to import existing software to Plan 9 are urged 47to use the opportunity to update their code. 48.PP 49The compilers include an integrated preprocessor that accepts the familiar 50.CW #include , 51.CW #define 52for macros both with and without arguments, 53.CW #undef , 54.CW #line , 55.CW #ifdef , 56.CW #ifndef , 57and 58.CW #endif . 59It 60supports neither 61.CW #if 62nor 63.CW ## 64and honors a single 65.CW #pragma . 66The 67.CW #if 68directive was omitted because it greatly complicates the 69preprocessor, is never necessary, and is usually abused. 70Conditional compilation in general makes code hard to understand; 71the Plan 9 source uses it sparingly. 72Also, because the compilers remove dead code, regular 73.CW if 74statements with constant conditions are more readable equivalents to many 75.CW #ifs . 76To compile imported code ineluctably fouled by 77.CW #if 78there is a separate command, 79.CW /bin/cpp , 80that implements the complete ANSI C preprocessor specification. 81.PP 82Include files fall into two groups: machine-dependent and machine-independent. 83The machine-independent files occupy the directory 84.CW /sys/include ; 85the others are placed in a directory appropriate to the machine, such as 86.CW /mips/include . 87The compiler searches for include files 88first in the machine-dependent directory and then 89in the machine-independent directory. 90At the time of writing there are twenty-two machine-independent include 91files and three (per machine) machine-dependent ones: 92.CW <ureg.h> , 93.CW <stdarg.h> , 94and 95.CW <u.h> . 96The first describes the layout of registers on the system stack, 97for use by the debugger; 98the second, as in ANSI C, defines a portable way to declare variadic 99functions. 100The third defines some 101architecture-dependent types such as 102.CW jmp_buf 103for 104.CW setjmp 105and 106also a set of 107.CW typedef 108abbreviations for 109.CW unsigned 110.CW short 111and so on. 112.PP 113Here is an excerpt from 114.CW /68020/include/u.h : 115.P1 116typedef unsigned short ushort; 117typedef unsigned char uchar; 118typedef unsigned long ulong; 119typedef unsigned int uint; 120typedef signed char schar; 121typedef long vlong; 122 123typedef long jmp_buf[2]; 124#define JMPBUFSP 0 125#define JMPBUFPC 1 126#define JMPBUFDPC 0 127.P2 128The type 129.CW vlong 130is the largest integer type available; on some architectures it 131is a 64-bit value. 132The 133.CW #define 134constants permit an architecture-independent (but compiler-dependent) 135implementation of stack-switching using 136.CW setjmp 137and 138.CW longjmp . 139.PP 140Every Plan 9 C program begins 141.P1 142#include <u.h> 143.P2 144because all the other installed header files use the 145.CW typedefs 146declared in 147.CW <u.h> . 148.PP 149In strict ANSI C, include files are grouped to collect related functions 150in a single file: one for string functions, one for memory functions, 151one for I/O, and none for system calls. 152Each include file is protected by an 153.CW #ifdef 154to guarantee its contents are seen by the compiler only once. 155Plan 9 takes a different approach. Other than a few include 156files that define external formats such as archives, the files in 157.CW /sys/include 158correspond to 159.I libraries. 160If a program is using a library, it includes the corresponding header. 161The default C library comprises string functions, memory functions, and 162so on, largely as in ANSI C, some formatted I/O routines, 163plus all the system calls and related functions. 164To use these functions, one must 165.CW #include 166the file 167.CW <libc.h> , 168which in turn must follow 169.CW <u.h> , 170to define their prototypes for the compiler. 171Here is the complete source to the traditional first C program: 172.P1 173#include <u.h> 174#include <libc.h> 175 176void 177main(void) 178{ 179 print("hello world\en"); 180 exits(0); 181} 182.P2 183The 184.CW print 185routine and its relatives 186.CW fprint 187and 188.CW sprint 189resemble the similarly-named functions in Standard I/O but are not 190attached to a specific I/O library. 191In Plan 9 192.CW main 193is not integer-valued; it should call 194.CW exits , 195which takes a string argument (or null; here ANSI C promotes the 0 to a 196.CW char* ). 197All these functions are, of course, documented in the Programmer's Manual. 198.PP 199To use 200.CW printf , 201.CW <stdio.h> 202must be included to define the function prototype for 203.CW printf : 204.P1 205#include <u.h> 206#include <libc.h> 207#include <stdio.h> 208 209void 210main(int argc, char *argv[]) 211{ 212 printf("%s: hello world; argc = %d\en", argv[0], argc); 213 exits(0); 214} 215.P2 216In practice, Standard I/O is not used much in Plan 9. I/O libraries are 217discussed in a later section of this document. 218.PP 219There are libraries for handling regular expressions, bitmap graphics, 220windows, and so on, and each has an associated include file. 221The manual for each library states which include files are needed. 222The files are not protected against multiple inclusion and themselves 223contain no nested 224.CW #includes . 225Instead the 226programmer is expected to sort out the requirements 227and to 228.CW #include 229the necessary files once at the top of each source file. In practice this is 230trivial: this way of handling include files is so straightforward 231that it is rare for a source file to contain more than half a dozen 232.CW #includes . 233.PP 234The compilers do their own register allocation so the 235.CW register 236keyword is ignored. 237For different reasons, 238.CW volatile 239and 240.CW const 241are also ignored. 242.PP 243To make it easier to share code with other systems, Plan 9 has a version 244of the compiler, 245.CW pcc , 246that provides the standard ANSI C preprocessor, headers, and libraries 247with POSIX extensions. 248.CW Pcc 249is recommended only 250when broad external portability is mandated. It compiles slower, 251produces slower code (it takes extra work to simulate POSIX on Plan 9), 252eliminates those parts of the Plan 9 interface 253not related to POSIX, and illustrates the clumsiness of an environment 254designed by committee. 255.CW Pcc 256is described in more detail in 257.I 258APE\(emThe ANSI/POSIX Environment, 259.R 260by Howard Trickey. 261.SH 262Process 263.PP 264Each CPU architecture supported by Plan 9 is identified by a single, 265arbitrary, alphanumeric character: 266.CW v 267for MIPS, 268.CW k 269for SPARC, 270.CW x 271for AT&T DSP3210, 272.CW 2 273for Motorola 68020 and 68040, 274.CW 8 275for Intel 386, and 276.CW 6 277for Intel 960. 278The character labels the support tools and files for that architecture. 279For instance, for the 68020 the compiler is 280.CW 2c , 281the assembler is 282.CW 2a , 283the link editor/loader is 284.CW 2l , 285the object files are suffixed 286.CW \&.2 , 287and the default name for an executable file is 288.CW 2.out . 289Before we can use the compiler we therefore need to know which 290machine we are compiling for. 291The next section explains how this decision is made; for the moment 292assume we are building 68020 binaries and make the mental substitution for 293.CW 2 294appropriate to the machine you are actually using. 295.PP 296To convert source to an executable binary is a two-step process. 297First run the compiler, 298.CW 2c , 299on the source, say 300.CW file.c , 301to generate an object file 302.CW file.2 . 303Then run the loader, 304.CW 2l , 305to generate an executable 306.CW 2.out 307that may be run (on a 680X0 machine): 308.P1 3092c file.c 3102l file.2 3112.out 312.P2 313The loader automatically links with whatever libraries the program 314needs, usually including the standard C library as defined by 315.CW <libc.h> . 316Of course the compiler and loader have lots of options, both familiar and new; 317see the manual for details. 318The compiler does not generate an executable automatically; 319the output of the compiler must be given to the loader. 320Since most compilation is done under the control of 321.CW mk 322(see below), this is rarely an inconvenience. 323.PP 324The distribution of work between the compiler and loader is unusual. 325The compiler integrates preprocessing, parsing, register allocation, 326code generation and some assembly. 327Combining these tasks in a single program is part of the reason for 328the compiler's efficiency. 329The loader does instruction selection, branch folding, 330instruction scheduling, 331and writes the final executable. 332There is no separate C preprocessor and no assembler in the usual pipeline. 333Instead the intermediate object file 334(here a 335.CW \&.2 336file) is a type of binary assembly language. 337The instructions in the intermediate format are not exactly those in 338the machine. For example, on the 68020 the object file may specify 339a MOVE instruction but the loader will decide just which variant of 340the MOVE instruction \(em MOVE immediate, MOVE quick, MOVE address, 341etc. \(em is most efficient. 342.PP 343The assembler, 344.CW 2a , 345is just a translator between the textual and binary 346representations of the object file format. 347It is not an assembler in the traditional sense. It has limited 348macro capabilities (the same as the integral C preprocessor in the compiler), 349clumsy syntax, and minimal error checking. For instance, the assembler 350will accept an instruction (such as memory-to-memory MOVE on the MIPS) that the 351machine does not actually support; only when the output of the assembler 352is passed to the loader will the error be discovered. 353The assembler is intended only for writing things that need access to instructions 354invisible from C, 355such as the machine-dependent 356part of an operating system; 357very little code in Plan 9 is in assembly language. 358.PP 359The compilers take an option 360.CW -S 361that causes them to print on their standard output the generated code 362in a format acceptable as input to the assemblers. 363This is of course merely a formatting of the 364data in the object file; therefore the assembler is just 365an 366ASCII-to-binary converter for this format. 367Other than the specific instructions, the input to the assemblers 368is largely architecture-independent; see 369``A Manual for the Plan 9 Assembler'', 370by Rob Pike, 371for more information. 372.PP 373The loader is an integral part of the compilation process. 374Each library header file contains a 375.CW #pragma 376that tells the loader the name of the associated archive; it is 377not necessary to tell the loader which libraries a program uses. 378The C run-time startup is found, by default, in the C library. 379The loader starts with an undefined 380symbol, 381.CW _main , 382that is resolved by pulling in the run-time startup code from the library. 383(The loader undefines 384.CW _mainp 385when profiling is enabled, to force loading of the profiling start-up 386instead.) 387.PP 388Unlike its counterpart on other systems, the Plan 9 loader rearranges 389data to optimize access. This means the order of variables in the 390loaded program is unrelated to its order in the source. 391Most programs don't care, but some assume that, for example, the 392variables declared by 393.P1 394int a; 395int b; 396.P2 397will appear at adjacent addresses in memory. On Plan 9, they won't. 398.SH 399Heterogeneity 400.PP 401When the system starts or a user logs in the environment is configured 402so the appropriate binaries are available in 403.CW /bin . 404The configuration process is controlled by an environment variable, 405.CW $cputype , 406with value such as 407.CW mips , 408.CW 68020 , 409or 410.CW sparc . 411For each architecture there is a directory in the root, 412with the appropriate name, 413that holds the binary and library files for that architecture. 414Thus 415.CW /mips/lib 416contains the object code libraries for MIPS programs, 417.CW /mips/include 418holds MIPS-specific include files, and 419.CW /mips/bin 420has the MIPS binaries. 421These binaries are attached to 422.CW /bin 423at boot time by binding 424.CW /$cputype/bin 425to 426.CW /bin , 427so 428.CW /bin 429always contains the correct files. 430.PP 431The MIPS compiler, 432.CW vc , 433by definition 434produces object files for the MIPS architecture, 435regardless of the architecture of the machine on which the compiler is running. 436There is a version of 437.CW vc 438compiled for each architecture: 439.CW /mips/bin/vc , 440.CW /68020/bin/vc , 441.CW /sparc/bin/vc , 442and so on, 443each capable of producing MIPS object files regardless of the native 444instruction set. 445If one is running on a SPARC, 446.CW /sparc/bin/vc 447will compile programs for the MIPS; 448if one is running on machine 449.CW $cputype , 450.CW /$cputype/bin/vc 451will compile programs for the MIPS. 452.PP 453Because of the bindings that assemble 454.CW /bin , 455the shell always looks for a command, say 456.CW date , 457in 458.CW /bin 459and automatically finds the file 460.CW /$cputype/bin/date . 461Therefore the MIPS compiler is known as just 462.CW vc ; 463the shell will invoke 464.CW /bin/vc 465and that is guaranteed to be the version of the MIPS compiler 466appropriate for the machine running the command. 467Regardless of the architecture of the compiling machine, 468.CW /bin/vc 469is 470.I always 471the MIPS compiler. 472.PP 473Also, the output of 474.CW vc 475and 476.CW vl 477is completely independent of the machine type on which they are executed: 478.CW \&.v 479files compiled (with 480.CW vc ) 481on a SPARC may be linked (with 482.CW vl ) 483on a 386. 484(The resulting 485.CW v.out 486will run, of course, only on a MIPS.) 487Similarly, the MIPS libraries in 488.CW /mips/lib 489are suitable for loading with 490.CW vl 491on any machine; there is only one set of MIPS libraries, not one 492set for each architecture that supports the MIPS compiler. 493.SH 494Heterogeneity and \f(CWmk\fP 495.PP 496Most software on Plan 9 is compiled under the control of 497.CW mk , 498a descendant of 499.CW make 500that is documented in the Programmer's Manual. 501A convention used throughout the 502.CW mkfiles 503makes it easy to compile the source into binary suitable for any architecture. 504.PP 505The variable 506.CW $cputype 507is advisory: it reports the architecture of the current environment, and should 508not be modified. A second variable, 509.CW $objtype , 510is used to set which architecture is being 511.I compiled 512for. 513The value of 514.CW $objtype 515can be used by a 516.CW mkfile 517to configure the compilation environment. 518.PP 519In each machine's root directory there is a short 520.CW mkfile 521that defines a set of macros for the compiler, loader, etc. 522Here is 523.CW /mips/mkfile : 524.P1 525CC=vc 526ALEF=val 527LD=vl 528O=v 529AS=va 530OS=2kv86x 531CPUS=mips 68020 sparc 386 532CFLAGS= 533LEX=lex 534YACC=yacc 535MK=/bin/mk 536.P2 537.CW CC 538is obviously the compiler, 539.CW AS 540the assembler, and 541.CW LD 542the loader. 543.CW ALEF 544identifies the Alef compiler, described below. 545.CW O 546is the suffix for the object files and 547.CW CPUS 548and 549.CW OS 550are used in special rules described below. 551.PP 552Here is a 553.CW mkfile 554to build the installed source for 555.CW sam : 556.P1 557</$objtype/mkfile 558OBJ=sam.$O address.$O buffer.$O cmd.$O disc.$O error.$O \e 559 file.$O io.$O list.$O mesg.$O moveto.$O multi.$O \e 560 plan9.$O rasp.$O regexp.$O string.$O sys.$O xec.$O 561 562$O.out: $OBJ 563 $LD $OBJ 564 565install: $O.out 566 cp $O.out /$objtype/bin/sam 567 568installall: 569 for(objtype in $CPUS) mk install 570 571%.$O: %.c 572 $CC $CFLAGS $stem.c 573 574$OBJ: sam.h errors.h mesg.h 575address.$O cmd.$O parse.$O xec.$O unix.$O: parse.h 576 577clean:V: 578 rm -f [$OS].out *.[$OS] y.tab.? 579.P2 580(The actual 581.CW mkfile 582imports most of its rules from other secondary files, but 583this example works and is not misleading.) 584The first line causes 585.CW mk 586to include the contents of 587.CW /$objtype/mkfile 588in the current 589.CW mkfile . 590If 591.CW $objtype 592is 593.CW mips , 594this inserts the MIPS macro definitions into the 595.CW mkfile . 596In this case the rule for 597.CW $O.out 598uses the MIPS tools to build 599.CW v.out . 600The 601.CW %.$O 602rule in the file uses 603.CW mk 's 604pattern matching facilities to convert the source files to the object 605files through the compiler. 606(The text of the rules is passed directly to the shell, 607.CW rc , 608without further translation. 609See the 610.CW mk 611manual if any of this is unfamiliar.) 612Because the default rule builds 613.CW $O.out 614rather than 615.CW sam , 616it is possible to maintain binaries for multiple machines in the 617same source directory without conflict. 618This is also, of course, why the output files from the various 619compilers and loaders 620have distinct names. 621.PP 622The rest of the 623.CW mkfile 624should be easy to follow; notice how the rules for 625.CW clean 626and 627.CW installall 628(that is, install versions for all architectures) use other macros 629defined in 630.CW /$objtype/mkfile . 631In Plan 9, 632.CW mkfiles 633for commands conventionally contain rules to 634.CW install 635(compile and install the version for 636.CW $objtype ), 637.CW installall 638(compile and install for all 639.CW $objtypes ), 640and 641.CW clean 642(remove all object files, binaries, etc.). 643.PP 644The 645.CW mkfile 646is easy to use. To build a MIPS binary, 647.CW v.out : 648.P1 649% objtype=mips 650% mk 651.P2 652To build and install a MIPS binary: 653.P1 654% objtype=mips 655% mk install 656.P2 657To build and install all versions: 658.P1 659% mk installall 660.P2 661These conventions make cross-compilation as easy to manage 662as traditional native compilation. 663Plan 9 programs compile and run without change on machines from 664large multiprocessors to laptops. For more information about this process, see 665``Plan 9 Mkfiles'', 666by Bob Flandrena. 667.SH 668Portability 669.PP 670Within Plan 9, it is painless to write portable programs, programs whose 671source is independent of the machine on which they execute. 672The operating system is fixed and the compiler, headers and libraries 673are constant so most of the stumbling blocks to portability are removed. 674Attention to a few details can avoid those that remain. 675.PP 676Plan 9 is a heterogeneous environment, so programs must 677.I expect 678that external files will be written by programs on machines of different 679architectures. 680The compilers, for instance, must handle without confusion 681object files written by other machines. 682The traditional approach to this problem is to pepper the source with 683.CW #ifdefs 684to turn byte-swapping on and off. 685Plan 9 takes a different approach: of the handful of machine-dependent 686.CW #ifdefs 687in all the source, almost all are deep in the libraries. 688Instead programs read and write files in a defined format, 689either (for low volume applications) as formatted text, or 690(for high volume applications) as binary in a known byte order. 691If the external data were written with the most significant 692byte first, the following code reads a 4-byte integer correctly 693regardless of the architecture of the executing machine (assuming 694an unsigned long holds 4 bytes): 695.P1 696ulong 697getlong(void) 698{ 699 ulong l; 700 701 l = (getchar()&0xFF)<<24; 702 l |= (getchar()&0xFF)<<16; 703 l |= (getchar()&0xFF)<<8; 704 l |= (getchar()&0xFF)<<0; 705 return l; 706} 707.P2 708Note that this code does not `swap' the bytes; instead it just reads 709them in the correct order. 710Variations of this code will handle any binary format 711and also avoid problems 712involving how structures are padded, how words are aligned, 713and other impediments to portability. 714Be aware, though, that extra care is needed to handle floating point data. 715.PP 716Efficiency hounds will argue that this method is unnecessarily slow and clumsy 717when the executing machine has the same byte order (and padding and alignment) 718as the data. 719I/O speed is rarely the bottleneck for an application, however, 720and the gain in simplicity of porting and maintaining the code greatly outweighs 721the minor speed loss from handling data in this general way. 722This method is how the Plan 9 compilers, the window system, and even the file 723servers transmit data between programs. 724.PP 725To port programs beyond Plan 9, where the system interface is more variable, 726it is probably necessary to use 727.CW pcc 728and hope that the target machine supports ANSI C and POSIX. 729.SH 730I/O 731.PP 732The default C library, defined by the include file 733.CW <libc.h> , 734contains no buffered I/O package. 735It does have several entry points for printing formatted text: 736.CW print 737outputs text to the standard output, 738.CW fprint 739outputs text to a specified integer file descriptor, and 740.CW sprint 741places text in a character array. 742To access library routines for buffered I/O, a program must 743explicitly include the header file associated with an appropriate library. 744.PP 745The recommended I/O library, used by most Plan 9 utilities, is 746.CW bio 747(buffered I/O), defined by 748.CW <bio.h> . 749There also exists an implementation of ANSI Standard I/O, 750.CW stdio . 751.PP 752.CW Bio 753is small and efficient, particularly for buffer-at-a-time or 754line-at-a-time I/O. 755Even for character-at-a-time I/O, however, it is significantly faster than 756the Standard I/O library, 757.CW stdio . 758Its interface is compact and regular, although it lacks a few conveniences. 759The most noticeable is that one must explicitly define buffers for standard 760input and output; 761.CW bio 762does not predefine them. Here is a program to copy input to output a character 763at a time using 764.CW bio : 765.P1 766#include <u.h> 767#include <libc.h> 768#include <bio.h> 769 770Biobuf bin; 771Biobuf bout; 772 773main(void) 774{ 775 int c; 776 777 Binit(&bin, 0, OREAD); 778 Binit(&bout, 1, OWRITE); 779 780 while((c=Bgetc(&bin)) != Beof) 781 Bputc(&bout, c); 782 exits(0); 783} 784.P2 785For peak performance, we could replace 786.CW Bgetc 787and 788.CW Bputc 789by their equivalent in-line macros 790.CW BGETC 791and 792.CW BPUTC 793but 794the performance gain would be modest. 795For more information on 796.CW bio , 797see the Programmer's Manual. 798.PP 799Perhaps the most dramatic difference in the I/O interface of Plan 9 from other 800systems' is that text is not ASCII. 801The format for 802text in Plan 9 is a byte-stream encoding of 16-bit characters. 803The character set is based on the Unicode Standard and is backward compatible with 804ASCII: 805characters with value 0 through 127 are the same in both sets. 806The 16-bit characters, called 807.I runes 808in Plan 9, are encoded using a representation called 809UTF, 810an encoding that is becoming accepted as a standard. 811(ISO calls it UTF-8; 812throughout Plan 9 it's just called 813UTF.) 814UTF 815defines multibyte sequences to 816represent character values from 0 to 65535. 817In 818UTF, 819character values up to 127 decimal, 7F hexadecimal, represent themselves, 820so straight 821ASCII 822files are also valid 823UTF. 824Also, 825UTF 826guarantees that bytes with values 0 to 127 (NUL to DEL, inclusive) 827will appear only when they represent themselves, so programs that read bytes 828looking for plain ASCII characters will continue to work. 829Any program that expects a one-to-one correspondence between bytes and 830characters will, however, need to be modified. 831An example is parsing file names. 832File names, like all text, are in 833UTF, 834so it is incorrect to search for a character in a string by 835.CW strchr(filename, 836.CW c) 837because the character might have a multi-byte encoding. 838The correct method is to call 839.CW utfrune(filename, 840.CW c) , 841defined in 842.I rune (2), 843which interprets the file name as a sequence of encoded characters 844rather than bytes. 845In fact, even when you know the character is a single byte 846that can represent only itself, 847it is safer to use 848.CW utfrune 849because that assumes nothing about the character set 850and its representation. 851.PP 852The library defines several symbols relevant to the representation of characters. 853Any byte with unsigned value less than 854.CW Runesync 855will not appear in any multi-byte encoding of a character. 856.CW Utfrune 857compares the character being searched against 858.CW Runesync 859to see if it is sufficient to call 860.CW strchr 861or if the byte stream must be interpreted. 862Any byte with unsigned value less than 863.CW Runeself 864is represented by a single byte with the same value. 865Finally, when errors are encountered converting 866to runes from a byte stream, the library returns the rune value 867.CW Runeerror 868and advances a single byte. This permits programs to find runes 869embedded in binary data. 870.PP 871.CW Bio 872includes routines 873.CW Bgetrune 874and 875.CW Bputrune 876to transform the external byte stream 877UTF 878format to and from 879internal 16-bit runes. 880Also, the 881.CW %s 882format to 883.CW print 884accepts 885UTF; 886.CW %c 887prints a character after narrowing it to 8 bits. 888The 889.CW %S 890format prints a null-terminated sequence of runes; 891.CW %C 892prints a character after narrowing it to 16 bits. 893For more information, see the Programmer's Manual, in particular 894.I utf (6) 895and 896.I rune (2), 897and the paper, 898``Hello world, or 899Καλημέρα κόσμε, or\ 900\f(Jpこんにちは 世界\f1'', 901by Rob Pike and 902Ken Thompson; 903there is not room for the full story here. 904.PP 905These issues affect the compiler in several ways. 906First, the C source is in 907UTF. 908ANSI says C variables are formed from 909ASCII 910alphanumerics, but comments and literal strings may contain any characters 911encoded in the native encoding, here 912UTF. 913The declaration 914.P1 915char *cp = "abcÿ"; 916.P2 917initializes the variable 918.CW cp 919to point to an array of bytes holding the 920UTF 921representation of the characters 922.CW abcÿ. 923The type 924.CW Rune 925is defined in 926.CW <u.h> 927to be 928.CW ushort , 929which is also the `wide character' type in the compiler. 930Therefore the declaration 931.P1 932Rune *rp = L"abcÿ"; 933.P2 934initializes the variable 935.CW rp 936to point to an array of unsigned short integers holding the 16-bit 937values of the characters 938.CW abcÿ . 939Note that in both these declarations the characters in the source 940that represent 941.CW "abcÿ" 942are the same; what changes is how those characters are represented 943in memory in the program. 944The following two lines: 945.P1 946print("%s\en", "abcÿ"); 947print("%S\en", L"abcÿ"); 948.P2 949produce the same 950UTF 951string on their output, the first by copying the bytes, the second 952by converting from runes to bytes. 953.PP 954In C, character constants are integers but narrowed through the 955.CW char 956type. 957The Unicode character 958.CW ÿ 959has value 255, so if the 960.CW char 961type is signed, 962the constant 963.CW 'ÿ' 964has value \-1 (which is equal to EOF). 965On the other hand, 966.CW L'ÿ' 967narrows through the wide character type, 968.CW ushort , 969and therefore has value 255. 970.PP 971Finally, although it's not ANSI C, the Plan 9 C compilers 972assume any character with value above 973.CW Runeself 974is an alphanumeric, 975so α is a legal, if non-portable, variable name. 976.SH 977Arguments 978.PP 979Some macros are defined 980in 981.CW <libc.h> 982for parsing the arguments to 983.CW main() . 984They are described in 985.I ARG (2) 986but are fairly self-explanatory. 987There are four macros: 988.CW ARGBEGIN 989and 990.CW ARGEND 991are used to bracket a hidden 992.CW switch 993statement within which 994.CW ARGC 995returns the current option character (rune) being processed and 996.CW ARGF 997returns the argument to the option, as in the loader option 998.CW -o 999.CW file . 1000Here, for example, is the code at the beginning of 1001.CW main() 1002in 1003.CW ramfs.c 1004(see 1005.I ramfs (1)) 1006that cracks its arguments: 1007.P1 1008void 1009main(int argc, char *argv[]) 1010{ 1011 char *defmnt; 1012 int p[2]; 1013 int mfd[2]; 1014 int stdio = 0; 1015 1016 defmnt = "/tmp"; 1017 ARGBEGIN{ 1018 case 'i': 1019 defmnt = 0; 1020 stdio = 1; 1021 mfd[0] = 0; 1022 mfd[1] = 1; 1023 break; 1024 case 's': 1025 defmnt = 0; 1026 break; 1027 case 'm': 1028 defmnt = ARGF(); 1029 break; 1030 default: 1031 usage(); 1032 }ARGEND 1033.P2 1034.SH 1035Extensions 1036.PP 1037The compiler has several extensions to ANSI C, all of which are used 1038extensively in the system source. 1039First, 1040.I structure 1041.I displays 1042permit 1043.CW struct 1044expressions to be formed dynamically. 1045Given these declarations: 1046.P1 1047typedef struct Point Point; 1048typedef struct Rectangle Rectangle; 1049 1050struct Point 1051{ 1052 int x, y; 1053}; 1054 1055struct Rectangle 1056{ 1057 Point min, max; 1058}; 1059 1060Point p, q, add(Point, Point); 1061Rectangle r; 1062int x, y; 1063.P2 1064this assignment may appear anywhere an assignment is legal: 1065.P1 1066r = (Rectangle){add(p, q), (Point){x, y+3}}; 1067.P2 1068The syntax is the same as for initializing a structure but with 1069a leading cast. 1070.PP 1071If an 1072.I anonymous 1073.I structure 1074or 1075.I union 1076is declared within another structure or union, the members of the internal 1077structure or union are addressable without prefix in the outer structure. 1078This feature eliminates the clumsy naming of nested structures and, 1079particularly, unions. 1080For example, after these declarations, 1081.P1 1082struct Lock 1083{ 1084 int locked; 1085}; 1086 1087struct Node 1088{ 1089 int type; 1090 union{ 1091 double dval; 1092 double fval; 1093 long lval; 1094 }; /* anonymous union */ 1095 struct Lock; /* anonymous structure */ 1096} *node; 1097 1098void lock(struct Lock*); 1099.P2 1100one may refer to 1101.CW node->type , 1102.CW node->dval , 1103.CW node->fval , 1104.CW node->lval , 1105and 1106.CW node->locked . 1107Moreover, the address of a 1108.CW struct 1109.CW Node 1110may be used without a cast anywhere that the address of a 1111.CW struct 1112.CW Lock 1113is used, such as in argument lists. 1114The compiler automatically promotes the type and adjusts the address. 1115Thus one may invoke 1116.CW lock(node) . 1117.PP 1118Anonymous structures and unions may be accessed by type name 1119if (and only if) they are declared using a 1120.CW typedef 1121name. 1122For example, using the above declaration for 1123.CW Point , 1124one may declare 1125.P1 1126struct 1127{ 1128 int type; 1129 Point; 1130} p; 1131.P2 1132and refer to 1133.CW p.Point . 1134.PP 1135In the initialization of arrays, a number in square brackets before an 1136element sets the index for the initialization. For example, to initialize 1137some elements in 1138a table of function pointers indexed by 1139ASCII 1140character, 1141.P1 1142void percent(void), slash(void); 1143 1144void (*func[128])(void) = 1145{ 1146 ['%'] percent, 1147 ['/'] slash, 1148}; 1149.P2 1150.PP 1151Finally, the declaration 1152.P1 1153extern register reg; 1154.P2 1155.I this "" ( 1156appearance of the register keyword is not ignored) 1157allocates a global register to hold the variable 1158.CW reg . 1159External registers must be used carefully: they need to be declared in 1160.I all 1161source files and libraries in the program to guarantee the register 1162is not allocated temporarily for other purposes. 1163Especially on machines with few registers, such as the i386, 1164it is easy to link accidentally with code that has already usurped 1165the global registers and there is no diagnostic when this happens. 1166Used wisely, though, external registers are powerful. 1167The Plan 9 operating system uses them to access per-process and 1168per-machine data structures on a multiprocessor. The storage class they provide 1169is hard to create in other ways. 1170.SH 1171The compile-time environment 1172.PP 1173The code generated by the compilers is `optimized' by default: 1174variables are placed in registers and peephole optimizations are 1175performed. 1176The compiler flag 1177.CW -N 1178disables these optimizations. 1179Registerization is done locally rather than throughout a function: 1180whether a variable occupies a register or 1181the memory location identified in the symbol 1182table depends on the activity of the variable and may change 1183throughout the life of the variable. 1184The 1185.CW -N 1186flag is rarely needed; 1187its main use is to simplify debugging. 1188There is no information in the symbol table to identify the 1189registerization of a variable, so 1190.CW -N 1191guarantees the variable is always where the symbol table says it is. 1192.PP 1193Another flag, 1194.CW -w , 1195turns 1196.I on 1197warnings about portability and problems detected in flow analysis. 1198Most code in Plan 9 is compiled with warnings enabled; 1199these warnings plus the type checking offered by function prototypes 1200provide most of the support of the Unix tool 1201.CW lint 1202more accurately and with less chatter. 1203Two of the warnings, 1204`used and not set' and `set and not used', are almost always accurate but 1205may be triggered spuriously by code with invisible control flow, 1206such as in routines that call 1207.CW longjmp . 1208The compiler statements 1209.P1 1210SET(v1); 1211USED(v2); 1212.P2 1213decorate the flow graph to silence the compiler. 1214Either statement accepts a comma-separated list of variables. 1215Use them carefully: they may silence real errors. 1216For the common case of unused parameters to a function, 1217leaving the name off the declaration silences the warnings. 1218That is, listing the type of a parameter but giving it no 1219associated variable name does the trick. 1220.SH 1221Debugging 1222.PP 1223There are two debuggers available on Plan 9. 1224The first, and older, is 1225.CW db , 1226a revision of Unix 1227.CW adb . 1228The other, 1229.CW acid , 1230is a source-level debugger whose commands are statements in 1231a true programming language. 1232.CW Acid 1233is the preferred debugger, but since it 1234borrows some elements of 1235.CW db , 1236notably the formats for displaying values, it is worth knowing a little bit about 1237.CW db . 1238.PP 1239Both debuggers support multiple architectures in a single program; that is, 1240the programs are 1241.CW db 1242and 1243.CW acid , 1244not for example 1245.CW vdb 1246and 1247.CW vacid . 1248They also support cross-architecture debugging comfortably: 1249one may debug a 68020 binary on a MIPS. 1250.PP 1251Imagine a program has crashed mysteriously: 1252.P1 1253% X11/X 1254Fatal server bug! 1255failed to create default stipple 1256X 106: suicide: sys: trap: fault read addr=0x0 pc=0x00105fb8 1257% 1258.P2 1259When a process dies on Plan 9 it hangs in the `broken' state 1260for debugging. 1261Attach a debugger to the process by naming its process id: 1262.P1 1263% acid 106 1264/proc/106/text:mips plan 9 executable 1265 1266/sys/lib/acid/port 1267/sys/lib/acid/mips 1268acid: 1269.P2 1270The 1271.CW acid 1272function 1273.CW stk() 1274reports the stack traceback: 1275.P1 1276acid: stk() 1277At pc:0x105fb8:abort+0x24 /sys/src/ape/lib/ap/stdio/abort.c:6 1278abort() /sys/src/ape/lib/ap/stdio/abort.c:4 1279 called from FatalError+#4e 1280 /sys/src/X/mit/server/dix/misc.c:421 1281FatalError(s9=#e02, s8=#4901d200, s7=#2, s6=#72701, s5=#1, 1282 s4=#7270d, s3=#6, s2=#12, s1=#ff37f1c, s0=#6, f=#7270f) 1283 /sys/src/X/mit/server/dix/misc.c:416 1284 called from gnotscreeninit+#4ce 1285 /sys/src/X/mit/server/ddx/gnot/gnot.c:792 1286gnotscreeninit(snum=#0, sc=#80db0) 1287 /sys/src/X/mit/server/ddx/gnot/gnot.c:766 1288 called from AddScreen+#16e 1289 /n/bootes/sys/src/X/mit/server/dix/main.c:610 1290AddScreen(pfnInit=0x0000129c,argc=0x00000001,argv=0x7fffffe4) 1291 /sys/src/X/mit/server/dix/main.c:530 1292 called from InitOutput+0x80 1293 /sys/src/X/mit/server/ddx/brazil/brddx.c:522 1294InitOutput(argc=0x00000001,argv=0x7fffffe4) 1295 /sys/src/X/mit/server/ddx/brazil/brddx.c:511 1296 called from main+0x294 1297 /sys/src/X/mit/server/dix/main.c:225 1298main(argc=0x00000001,argv=0x7fffffe4) 1299 /sys/src/X/mit/server/dix/main.c:136 1300 called from _main+0x24 1301 /sys/src/ape/lib/ap/mips/main9.s:8 1302.P2 1303The function 1304.CW lstk() 1305is similar but 1306also reports the values of local variables. 1307Note that the traceback includes full file names; this is a boon to debugging, 1308although it makes the output much noisier. 1309.PP 1310To use 1311.CW acid 1312well you will need to learn its input language; see the 1313``Acid Manual'', 1314by Phil Winterbottom, 1315for details. For simple debugging, however, the information in the manual page is 1316sufficient. In particular, it describes the most useful functions 1317for examining a process. 1318.PP 1319The compiler does not place 1320information describing the types of variables in the executable, 1321but a compile-time flag provides crude support for symbolic debugging. 1322The 1323.CW -a 1324flag to the compiler suppresses code generation 1325and instead emits source text in the 1326.CW acid 1327language to format and display data structure types defined in the program. 1328The easiest way to use this feature is to put a rule in the 1329.CW mkfile : 1330.P1 1331syms: main.$O 1332 $CC -a main.c > syms 1333.P2 1334Then from within 1335.CW acid , 1336.P1 1337acid: include("sourcedirectory/syms") 1338.P2 1339to read in the relevant definitions. 1340(For multi-file source, you need to be a little fancier; 1341see 1342.I 2c (1)). 1343This text includes, for each defined compound 1344type, a function with that name that may be called with the address of a structure 1345of that type to display its contents. 1346For example, if 1347.CW rect 1348is a global variable of type 1349.CW Rectangle , 1350one may execute 1351.P1 1352Rectangle(*rect) 1353.P2 1354to display it. 1355The 1356.CW * 1357(indirection) operator is necessary because 1358of the way 1359.CW acid 1360works: each global symbol in the program is defined as a variable by 1361.CW acid , 1362with value equal to the 1363.I address 1364of the symbol. 1365.PP 1366Another common technique is to write by hand special 1367.CW acid 1368code to define functions to aid debugging, initialize the debugger, and so on. 1369Conventionally, this is placed in a file called 1370.CW acid 1371in the source directory; it has a line 1372.P1 1373include("sourcedirectory/syms"); 1374.P2 1375to load the compiler-produced symbols. One may edit the compiler output directly but 1376it is wiser to keep the hand-generated 1377.CW acid 1378separate from the machine-generated. 1379.PP 1380There is much more to say here. See 1381.CW acid 1382manual page, the reference manual, or the paper 1383``Acid: A Debugger Built From A Language'', 1384also by Phil Winterbottom. 1385.SH 1386Alef 1387.PP 1388With minor substitutions, most of this document applies to Alef. 1389The compilers are 1390.CW val , 1391.CW kal , 1392and 1393.CW 8al ; 1394they work with the usual assemblers and loaders. 1395There is no Alef compiler for the 68020. 1396The directory of machine-independent include files is 1397.CW /sys/include/alef ; 1398there are no machine-dependent Alef include files. 1399The libraries are in 1400.CW /$objtype/lib/alef . 1401Alef uses 1402.CW /bin/cpp , 1403which is a full ANSI C preprocessor. 1404Our style of use, however, is the same as in Plan 9 C. 1405.PP 1406The Alef compilers don't have the 1407.CW USED(v) 1408and 1409.CW SET(v) 1410operators; instead say something like 1411.P1 1412if(v); 1413.P2 1414for 1415.CW USED 1416and just set the variable to something benign to silence `used and not set' warnings. 1417The compilers also permit leaving unused parameters unnamed. 1418.PP 1419The compilers support UTF, 1420although variable names must be plain alphanumeric. 1421UTF 1422strings have syntax 1423.CW $"string" 1424rather than 1425.CW L"string" . 1426.PP 1427Finally, when debugging, some helpful 1428.CW acid 1429may be loaded by supplying the flag 1430.CW -lalef 1431when starting 1432.CW acid . 1433This code defines 1434functions to help analyze the state of the run-time system. 1435For example, 1436.CW pchan(c) 1437reports the state of a channel. 1438Because Alef programs are multi-threaded, they have multiple stacks. 1439To print the stack trace for a 1440.CW proc , 1441do 1442.P1 1443setproc(pid); 1444stk(); 1445.P2 1446where 1447.CW pid 1448is the Plan 9 process id of the 1449.CW proc . 1450To print the stack trace for a task is clumsier. 1451In the program, get the `task id' 1452by calling the run-time function 1453.CW ALEF_tid 1454in each task and recording it in a global: 1455.P1 1456taskid = ALEF_tid(); 1457.P2 1458When the program is debugged, the task id 1459may be passed to an 1460.CW acid 1461function to print the stack: 1462.P1 1463labstk(*taskid); 1464.P2 1465This is of course best done in the private, program-specific 1466.CW acid 1467code. 1468