1.HTML "The Use of Name Spaces in Plan 9 2.TL 3The Use of Name Spaces in Plan 9 4.AU 5Rob Pike 6Dave Presotto 7Ken Thompson 8Howard Trickey 9Phil Winterbottom 10.AI 11.MH 12USA 13.AB 14.FS 15Appeared in 16.I 17Operating Systems Review, 18.R 19Vol. 27, #2, April 1993, pp. 72-76 20(reprinted from 21.I 22Proceedings of the 5th ACM SIGOPS European Workshop, 23.R 24Mont Saint-Michel, 1992, Paper nº 34). 25.FE 26Plan 9 is a distributed system built at the Computing Sciences Research 27Center of AT&T Bell Laboratories (now Lucent Technologies, Bell Labs) over the last few years. 28Its goal is to provide a production-quality system for software 29development and general computation using heterogeneous hardware 30and minimal software. A Plan 9 system comprises CPU and file 31servers in a central location connected together by fast networks. 32Slower networks fan out to workstation-class machines that serve as 33user terminals. Plan 9 argues that given a few carefully 34implemented abstractions 35it is possible to 36produce a small operating system that provides support for the largest systems 37on a variety of architectures and networks. The foundations of the system are 38built on two ideas: a per-process name space and a simple message-oriented 39file system protocol. 40.AE 41.PP 42The operating system for the CPU servers and terminals is 43structured as a traditional kernel: a single compiled image 44containing code for resource management, process control, 45user processes, 46virtual memory, and I/O. Because the file server is a separate 47machine, the file system is not compiled in, although the management 48of the name space, a per-process attribute, is. 49The entire kernel for the multiprocessor SGI Power Series machine 50is 25000 lines of C, 51the largest part of which is code for four networks including the 52Ethernet with the Internet protocol suite. 53Fewer than 1500 lines are machine-specific, and a 54functional kernel with minimal I/O can be put together from 55source files totaling 6000 lines. [Pike90] 56.PP 57The system is relatively small for several reasons. 58First, it is all new: it has not had time to accrete as many fixes 59and features as other systems. 60Also, other than the network protocol, it adheres to no 61external interface; in particular, it is not Unix-compatible. 62Economy stems from careful selection of services and interfaces. 63Finally, wherever possible the system is built around 64two simple ideas: 65every resource in the system, either local or remote, 66is represented by a hierarchical file system; and 67a user or process 68assembles a private view of the system by constructing a file 69.I 70name space 71.R 72that connects these resources. [Needham] 73.SH 74File Protocol 75.PP 76All resources in Plan 9 look like file systems. 77That does not mean that they are repositories for 78permanent files on disk, but that the interface to them 79is file-oriented: finding files (resources) in a hierarchical 80name tree, attaching to them by name, and accessing their contents 81by read and write calls. 82There are dozens of file system types in Plan 9, but only a few 83represent traditional files. 84At this level of abstraction, files in Plan 9 are similar 85to objects, except that files are already provided with naming, 86access, and protection methods that must be created afresh for 87objects. Object-oriented readers may approach the rest of this 88paper as a study in how to make objects look like files. 89.PP 90The interface to file systems is defined by a protocol, called 9P, 91analogous but not very similar to the NFS protocol. 92The protocol talks about files, not blocks; given a connection to the root 93directory of a file server, 94the 9P messages navigate the file hierarchy, open files for I/O, 95and read or write arbitrary bytes in the files. 969P contains 17 message types: three for 97initializing and 98authenticating a connection and fourteen for manipulating objects. 99The messages are generated by the kernel in response to user- or 100kernel-level I/O requests. 101Here is a quick tour of the major message types. 102The 103.CW auth 104and 105.CW attach 106messages authenticate a connection, established by means outside 9P, 107and validate its user. 108The result is an authenticated 109.I channel 110that points to the root of the 111server. 112The 113.CW clone 114message makes a new channel identical to an existing channel, 115which may be moved to a file on the server using a 116.CW walk 117message to descend each level in the hierarchy. 118The 119.CW stat 120and 121.CW wstat 122messages read and write the attributes of the file pointed to by a channel. 123The 124.CW open 125message prepares a channel for subsequent 126.CW read 127and 128.CW write 129messages to access the contents of the file, while 130.CW create 131and 132.CW remove 133perform, on the files, the actions implied by their names. 134The 135.CW clunk 136message discards a channel without affecting the file. 137None of the 9P messages consider caching; file caches are provided, 138when needed, either within the server (centralized caching) 139or by implementing the cache as a transparent file system between the 140client and the 9P connection to the server (client caching). 141.PP 142For efficiency, the connection to local 143kernel-resident file systems, misleadingly called 144.I devices, 145is by regular rather than remote procedure calls. 146The procedures map one-to-one with 9P message types. 147Locally each channel has an associated data structure 148that holds a type field used to index 149a table of procedure calls, one set per file system type, 150analogous to selecting the method set for an object. 151One kernel-resident file system, the 152.I 153mount device, 154.R 155translates the local 9P procedure calls into RPC messages to 156remote services over a separately provided transport protocol 157such as TCP or IL, a new reliable datagram protocol, or over a pipe to 158a user process. 159Write and read calls transmit the messages over the transport layer. 160The mount device is the sole bridge between the procedural 161interface seen by user programs and remote and user-level services. 162It does all associated marshaling, buffer 163management, and multiplexing and is 164the only integral RPC mechanism in Plan 9. 165The mount device is in effect a proxy object. 166There is no RPC stub compiler; instead the mount driver and 167all servers just share a library that packs and unpacks 9P messages. 168.SH 169Examples 170.PP 171One file system type serves 172permanent files from the main file server, 173a stand-alone multiprocessor system with a 174350-gigabyte 175optical WORM jukebox that holds the data, fronted by a two-level 176block cache comprising 7 gigabytes of 177magnetic disk and 128 megabytes of RAM. 178Clients connect to the file server using any of a variety of 179networks and protocols and access files using 9P. 180The file server runs a distinct operating system and has no 181support for user processes; other than a restricted set of commands 182available on the console, all it does is answer 9P messages from clients. 183.PP 184Once a day, at 5:00 AM, 185the file server sweeps through the cache blocks and marks dirty blocks 186copy-on-write. 187It creates a copy of the root directory 188and labels it with the current date, for example 189.CW 1995/0314 . 190It then starts a background process to copy the dirty blocks to the WORM. 191The result is that the server retains an image of the file system as it was 192early each morning. 193The set of old root directories is accessible using 9P, so a client 194may examine backup files using ordinary commands. 195Several advantages stem from having the backup service implemented 196as a plain file system. 197Most obviously, ordinary commands can access them. 198For example, to see when a bug was fixed 199.P1 200grep 'mouse bug fix' 1995/*/sys/src/cmd/8½/file.c 201.P2 202The owner, access times, permissions, and other properties of the 203files are also backed up. 204Because it is a file system, the backup 205still has protections; 206it is not possible to subvert security by looking at the backup. 207.PP 208The file server is only one type of file system. 209A number of unusual services are provided within the kernel as 210local file systems. 211These services are not limited to I/O devices such 212as disks. They include network devices and their associated protocols, 213the bitmap display and mouse, 214a representation of processes similar to 215.CW /proc 216[Killian], the name/value pairs that form the `environment' 217passed to a new process, profiling services, 218and other resources. 219Each of these is represented as a file system \(em 220directories containing sets of files \(em 221but the constituent files do not represent permanent storage on disk. 222Instead, they are closer in properties to UNIX device files. 223.PP 224For example, the 225.I console 226device contains the file 227.CW /dev/cons , 228similar to the UNIX file 229.CW /dev/console : 230when written, 231.CW /dev/cons 232appends to the console typescript; when read, 233it returns characters typed on the keyboard. 234Other files in the console device include 235.CW /dev/time , 236the number of seconds since the epoch, 237.CW /dev/cputime , 238the computation time used by the process reading the device, 239.CW /dev/pid , 240the process id of the process reading the device, and 241.CW /dev/user , 242the login name of the user accessing the device. 243All these files contain text, not binary numbers, 244so their use is free of byte-order problems. 245Their contents are synthesized on demand when read; when written, 246they cause modifications to kernel data structures. 247.PP 248The 249.I process 250device contains one directory per live local process, named by its numeric 251process id: 252.CW /proc/1 , 253.CW /proc/2 , 254etc. 255Each directory contains a set of files that access the process. 256For example, in each directory the file 257.CW mem 258is an image of the virtual memory of the process that may be read or 259written for debugging. 260The 261.CW text 262file is a sort of link to the file from which the process was executed; 263it may be opened to read the symbol tables for the process. 264The 265.CW ctl 266file may be written textual messages such as 267.CW stop 268or 269.CW kill 270to control the execution of the process. 271The 272.CW status 273file contains a fixed-format line of text containing information about 274the process: its name, owner, state, and so on. 275Text strings written to the 276.CW note 277file are delivered to the process as 278.I notes, 279analogous to UNIX signals. 280By providing these services as textual I/O on files rather 281than as system calls (such as 282.CW kill ) 283or special-purpose operations (such as 284.CW ptrace ), 285the Plan 9 process device simplifies the implementation of 286debuggers and related programs. 287For example, the command 288.P1 289cat /proc/*/status 290.P2 291is a crude form of the 292.CW ps 293command; the actual 294.CW ps 295merely reformats the data so obtained. 296.PP 297The 298.I bitmap 299device contains three files, 300.CW /dev/mouse , 301.CW /dev/screen , 302and 303.CW /dev/bitblt , 304that provide an interface to the local bitmap display (if any) and pointing device. 305The 306.CW mouse 307file returns a fixed-format record containing 3081 byte of button state and 4 bytes each of 309.I x 310and 311.I y 312position of the mouse. 313If the mouse has not moved since the file was last read, a subsequent read will 314block. 315The 316.CW screen 317file contains a memory image of the contents of the display; 318the 319.CW bitblt 320file provides a procedural interface. 321Calls to the graphics library are translated into messages that are written 322to the 323.CW bitblt 324file to perform bitmap graphics operations. (This is essentially a nested 325RPC protocol.) 326.PP 327The various services being used by a process are gathered together into the 328process's 329.I 330name space, 331.R 332a single rooted hierarchy of file names. 333When a process forks, the child process shares the name space with the parent. 334Several system calls manipulate name spaces. 335Given a file descriptor 336.CW fd 337that holds an open communications channel to a service, 338the call 339.P1 340mount(int fd, char *old, int flags) 341.P2 342authenticates the user and attaches the file tree of the service to 343the directory named by 344.CW old . 345The 346.CW flags 347specify how the tree is to be attached to 348.CW old : 349replacing the current contents or appearing before or after the 350current contents of the directory. 351A directory with several services mounted is called a 352.I union 353directory and is searched in the specified order. 354The call 355.P1 356bind(char *new, char *old, int flags) 357.P2 358takes the portion of the existing name space visible at 359.CW new , 360either a file or a directory, and makes it also visible at 361.CW old . 362For example, 363.P1 364bind("1995/0301/sys/include", "/sys/include", REPLACE) 365.P2 366causes the directory of include files to be overlaid with its 367contents from the dump on March first. 368.PP 369A process is created by the 370.CW rfork 371system call, which takes as argument a bit vector defining which 372attributes of the process are to be shared between parent 373and child instead of copied. 374One of the attributes is the name space: when shared, changes 375made by either process are visible in the other; when copied, 376changes are independent. 377.PP 378Although there is no global name space, 379for a process to function sensibly the local name spaces must adhere 380to global conventions. 381Nonetheless, the use of local name spaces is critical to the system. 382Both these ideas are illustrated by the use of the name space to 383handle heterogeneity. 384The binaries for a given architecture are contained in a directory 385named by the architecture, for example 386.CW /mips/bin ; 387in use, that directory is bound to the conventional location 388.CW /bin . 389Programs such as shell scripts need not know the CPU type they are 390executing on to find binaries to run. 391A directory of private binaries 392is usually unioned with 393.CW /bin . 394(Compare this to the 395.I 396ad hoc 397.R 398and special-purpose idea of the 399.CW PATH 400variable, which is not used in the Plan 9 shell.) 401Local bindings are also helpful for debugging, for example by binding 402an old library to the standard place and linking a program to see 403if recent changes to the library are responsible for a bug in the program. 404.PP 405The window system, 406.CW 8½ 407[Pike91], is a server for files such as 408.CW /dev/cons 409and 410.CW /dev/bitblt . 411Each client sees a distinct copy of these files in its local 412name space: there are many instances of 413.CW /dev/cons , 414each served by 415.CW 8½ 416to the local name space of a window. 417Again, 418.CW 8½ 419implements services using 420local name spaces plus the use 421of I/O to conventionally named files. 422Each client just connects its standard input, output, and error files 423to 424.CW /dev/cons , 425with analogous operations to access bitmap graphics. 426Compare this to the implementation of 427.CW /dev/tty 428on UNIX, which is done by special code in the kernel 429that overloads the file, when opened, 430with the standard input or output of the process. 431Special arrangement must be made by a UNIX window system for 432.CW /dev/tty 433to behave as expected; 434.CW 8½ 435instead uses the provision of the corresponding file as its 436central idea, which to succeed depends critically on local name spaces. 437.PP 438The environment 439.CW 8½ 440provides its clients is exactly the environment under which it is implemented: 441a conventional set of files in 442.CW /dev . 443This permits the window system to be run recursively in one of its own 444windows, which is handy for debugging. 445It also means that if the files are exported to another machine, 446as described below, the window system or client applications may be 447run transparently on remote machines, even ones without graphics hardware. 448This mechanism is used for Plan 9's implementation of the X window 449system: X is run as a client of 450.CW 8½ , 451often on a remote machine with lots of memory. 452In this configuration, using Ethernet to connect 453MIPS machines, we measure only a 10% degradation in graphics 454performance relative to running X on 455a bare Plan 9 machine. 456.PP 457An unusual application of these ideas is a statistics-gathering 458file system implemented by a command called 459.CW iostats . 460The command encapsulates a process in a local name space, monitoring 9P 461requests from the process to the outside world \(em the name space in which 462.CW iostats 463is itself running. When the command completes, 464.CW iostats 465reports usage and performance figures for file activity. 466For example 467.P1 468iostats 8½ 469.P2 470can be used to discover how much I/O the window system 471does to the bitmap device, font files, and so on. 472.PP 473The 474.CW import 475command connects a piece of name space from a remote system 476to the local name space. 477Its implementation is to dial the remote machine and start 478a process there that serves the remote name space using 9P. 479It then calls 480.CW mount 481to attach the connection to the name space and finally dies; 482the remote process continues to serve the files. 483One use is to access devices not available 484locally. For example, to write a floppy one may say 485.P1 486import lab.pc /a: /n/dos 487cp foo /n/dos/bar 488.P2 489The call to 490.CW import 491connects the file tree from 492.CW /a: 493on the machine 494.CW lab.pc 495(which must support 9P) to the local directory 496.CW /n/dos . 497Then the file 498.CW foo 499can be written to the floppy just by copying it across. 500.PP 501Another application is remote debugging: 502.P1 503import helix /proc 504.P2 505makes the process file system on machine 506.CW helix 507available locally; commands such as 508.CW ps 509then see 510.CW helix 's 511processes instead of the local ones. 512The debugger may then look at a remote process: 513.P1 514db /proc/27/text /proc/27/mem 515.P2 516allows breakpoint debugging of the remote process. 517Since 518.CW db 519infers the CPU type of the process from the executable header on 520the text file, it supports 521cross-architecture debugging, too. 522Care is taken within 523.CW db 524to handle issues of byte order and floating point; it is possible to 525breakpoint debug a big-endian MIPS process from a little-endian i386. 526.PP 527Network interfaces are also implemented as file systems [Presotto]. 528For example, 529.CW /net/tcp 530is a directory somewhat like 531.CW /proc : 532it contains a set of numbered directories, one per connection, 533each of which contains files to control and communicate on the connection. 534A process allocates a new connection by accessing 535.CW /net/tcp/clone , 536which evaluates to the directory of an unused connection. 537To make a call, the process writes a textual message such as 538.CW 'connect 539.CW 135.104.53.2!512' 540to the 541.CW ctl 542file and then reads and writes the 543.CW data 544file. 545An 546.CW rlogin 547service can be implemented in a few of lines of shell code. 548.PP 549This structure makes network gatewaying easy to provide. 550We have machines with Datakit interfaces but no Internet interface. 551On such a machine one may type 552.P1 553import helix /net 554telnet tcp!ai.mit.edu 555.P2 556The 557.CW import 558uses Datakit to pull in the TCP interface from 559.CW helix , 560which can then be used directly; the 561.CW tcp! 562notation is necessary because we routinely use multiple networks 563and protocols on Plan 9\(emit identifies the network in which 564.CW ai.mit.edu 565is a valid name. 566.PP 567In practice we do not use 568.CW rlogin 569or 570.CW telnet 571between Plan 9 machines. Instead a command called 572.CW cpu 573in effect replaces the CPU in a window with that 574on another machine, typically a fast multiprocessor CPU server. 575The implementation is to recreate the 576name space on the remote machine, using the equivalent of 577.CW import 578to connect pieces of the terminal's name space to that of 579the process (shell) on the CPU server, making the terminal 580a file server for the CPU. 581CPU-local devices such as fast file system connections 582are still local; only terminal-resident devices are 583imported. 584The result is unlike UNIX 585.CW rlogin , 586which moves into a distinct name space on the remote machine, 587or file sharing with 588.CW NFS , 589which keeps the name space the same but forces processes to execute 590locally. 591Bindings in 592.CW /bin 593may change because of a change in CPU architecture, and 594the networks involved may be different because of differing hardware, 595but the effect feels like simply speeding up the processor in the 596current name space. 597.SH 598Position 599.PP 600These examples illustrate how the ideas of representing resources 601as file systems and per-process name spaces can be used to solve 602problems often left to more exotic mechanisms. 603Nonetheless there are some operations in Plan 9 that are not 604mapped into file I/O. 605An example is process creation. 606We could imagine a message to a control file in 607.CW /proc 608that creates a process, but the details of 609constructing the environment of the new process \(em its open files, 610name space, memory image, etc. \(em are too intricate to 611be described easily in a simple I/O operation. 612Therefore new processes on Plan 9 are created by fairly conventional 613.CW rfork 614and 615.CW exec 616system calls; 617.CW /proc 618is used only to represent and control existing processes. 619.PP 620Plan 9 does not attempt to map network name spaces into the file 621system name space, for several reasons. 622The different addressing rules for various networks and protocols 623cannot be mapped uniformly into a hierarchical file name space. 624Even if they could be, 625the various mechanisms to authenticate, 626select a service, 627and control the connection would not map consistently into 628operations on a file. 629.PP 630Shared memory is another resource not adequately represented by a 631file name space. 632Plan 9 takes care to provide mechanisms 633to allow groups of local processes to share and map memory. 634Memory is controlled 635by system calls rather than special files, however, 636since a representation in the file system would imply that memory could 637be imported from remote machines. 638.PP 639Despite these limitations, file systems and name spaces offer an effective 640model around which to build a distributed system. 641Used well, they can provide a uniform, familiar, transparent 642interface to a diverse set of distributed resources. 643They carry well-understood properties of access, protection, 644and naming. 645The integration of devices into the hierarchical file system 646was the best idea in UNIX. 647Plan 9 pushes the concepts much further and shows that 648file systems, when used inventively, have plenty of scope 649for productive research. 650.SH 651References 652.LP 653[Killian] T. Killian, ``Processes as Files'', USENIX Summer Conf. Proc., Salt Lake City, 1984 654.br 655[Needham] R. Needham, ``Names'', in 656.I 657Distributed systems, 658.R 659S. Mullender, ed., 660Addison Wesley, 1989 661.br 662[Pike90] R. Pike, D. Presotto, K. Thompson, H. Trickey, 663``Plan 9 from Bell Labs'', 664UKUUG Proc. of the Summer 1990 Conf., 665London, England, 6661990 667.br 668[Presotto] D. Presotto, ``Multiprocessor Streams for Plan 9'', 669UKUUG Proc. of the Summer 1990 Conf., 670London, England, 6711990 672.br 673[Pike91] Pike, R., ``8.5, The Plan 9 Window System'', USENIX Summer 674Conf. Proc., Nashville, 1991 675