xref: /plan9/sys/doc/names.ms (revision 426d2b71458df9b491ba6c167f699b3f1f7b0428)
1.HTML "The Use of Name Spaces in Plan 9
2.TL
3The Use of Name Spaces in Plan 9
4.AU
5Rob Pike
6Dave Presotto
7Ken Thompson
8Howard Trickey
9Phil Winterbottom
10.AI
11.MH
12USA
13.AB
14.FS
15Appeared in
16.I
17Operating Systems Review,
18.R
19Vol. 27, #2, April 1993, pp. 72-76
20(reprinted from
21.I
22Proceedings of the 5th ACM SIGOPS European Workshop,
23.R
24Mont Saint-Michel, 1992, Paper nº 34).
25.FE
26Plan 9 is a distributed system built at the Computing Sciences Research
27Center of AT&T Bell Laboratories (now Lucent Technologies, Bell Labs) over the last few years.
28Its goal is to provide a production-quality system for software
29development and general computation using heterogeneous hardware
30and minimal software.  A Plan 9 system comprises CPU and file
31servers in a central location connected together by fast networks.
32Slower networks fan out to workstation-class machines that serve as
33user terminals.  Plan 9 argues that given a few carefully
34implemented abstractions
35it is possible to
36produce a small operating system that provides support for the largest systems
37on a variety of architectures and networks. The foundations of the system are
38built on two ideas: a per-process name space and a simple message-oriented
39file system protocol.
40.AE
41.PP
42The operating system for the CPU servers and terminals is
43structured as a traditional kernel: a single compiled image
44containing code for resource management, process control,
45user processes,
46virtual memory, and I/O.  Because the file server is a separate
47machine, the file system is not compiled in, although the management
48of the name space, a per-process attribute, is.
49The entire kernel for the multiprocessor SGI Power Series machine
50is 25000 lines of C,
51the largest part of which is code for four networks including the
52Ethernet with the Internet protocol suite.
53Fewer than 1500 lines are machine-specific, and a
54functional kernel with minimal I/O can be put together from
55source files totaling 6000 lines. [Pike90]
56.PP
57The system is relatively small for several reasons.
58First, it is all new: it has not had time to accrete as many fixes
59and features as other systems.
60Also, other than the network protocol, it adheres to no
61external interface; in particular, it is not Unix-compatible.
62Economy stems from careful selection of services and interfaces.
63Finally, wherever possible the system is built around
64two simple ideas:
65every resource in the system, either local or remote,
66is represented by a hierarchical file system; and
67a user or process
68assembles a private view of the system by constructing a file
69.I
70name space
71.R
72that connects these resources. [Needham]
73.SH
74File Protocol
75.PP
76All resources in Plan 9 look like file systems.
77That does not mean that they are repositories for
78permanent files on disk, but that the interface to them
79is file-oriented: finding files (resources) in a hierarchical
80name tree, attaching to them by name, and accessing their contents
81by read and write calls.
82There are dozens of file system types in Plan 9, but only a few
83represent traditional files.
84At this level of abstraction, files in Plan 9 are similar
85to objects, except that files are already provided with naming,
86access, and protection methods that must be created afresh for
87objects.  Object-oriented readers may approach the rest of this
88paper as a study in how to make objects look like files.
89.PP
90The interface to file systems is defined by a protocol, called 9P,
91analogous but not very similar to the NFS protocol.
92The protocol talks about files, not blocks; given a connection to the root
93directory of a file server,
94the 9P messages navigate the file hierarchy, open files for I/O,
95and read or write arbitrary bytes in the files.
969P contains 17 message types: three for
97initializing and
98authenticating a connection and fourteen for manipulating objects.
99The messages are generated by the kernel in response to user- or
100kernel-level I/O requests.
101Here is a quick tour of the major message types.
102The
103.CW auth
104and
105.CW attach
106messages authenticate a connection, established by means outside 9P,
107and validate its user.
108The result is an authenticated
109.I channel
110that points to the root of the
111server.
112The
113.CW clone
114message makes a new channel identical to an existing channel,
115which may be moved to a file on the server using a
116.CW walk
117message to descend each level in the hierarchy.
118The
119.CW stat
120and
121.CW wstat
122messages read and write the attributes of the file pointed to by a channel.
123The
124.CW open
125message prepares a channel for subsequent
126.CW read
127and
128.CW write
129messages to access the contents of the file, while
130.CW create
131and
132.CW remove
133perform, on the files, the actions implied by their names.
134The
135.CW clunk
136message discards a channel without affecting the file.
137None of the 9P messages consider caching; file caches are provided,
138when needed, either within the server (centralized caching)
139or by implementing the cache as a transparent file system between the
140client and the 9P connection to the server (client caching).
141.PP
142For efficiency, the connection to local
143kernel-resident file systems, misleadingly called
144.I devices,
145is by regular rather than remote procedure calls.
146The procedures map one-to-one with 9P message  types.
147Locally each channel has an associated data structure
148that holds a type field used to index
149a table of procedure calls, one set per file system type,
150analogous to selecting the method set for an object.
151One kernel-resident file system, the
152.I
153mount device,
154.R
155translates the local 9P procedure calls into RPC messages to
156remote services over a separately provided transport protocol
157such as TCP or IL, a new reliable datagram protocol, or over a pipe to
158a user process.
159Write and read calls transmit the messages over the transport layer.
160The mount device is the sole bridge between the procedural
161interface seen by user programs and remote and user-level services.
162It does all associated marshaling, buffer
163management, and multiplexing and is
164the only integral RPC mechanism in Plan 9.
165The mount device is in effect a proxy object.
166There is no RPC stub compiler; instead the mount driver and
167all servers just share a library that packs and unpacks 9P messages.
168.SH
169Examples
170.PP
171One file system type serves
172permanent files from the main file server,
173a stand-alone multiprocessor system with a
174350-gigabyte
175optical WORM jukebox that holds the data, fronted by a two-level
176block cache comprising 7 gigabytes of
177magnetic disk and 128 megabytes of RAM.
178Clients connect to the file server using any of a variety of
179networks and protocols and access files using 9P.
180The file server runs a distinct operating system and has no
181support for user processes; other than a restricted set of commands
182available on the console, all it does is answer 9P messages from clients.
183.PP
184Once a day, at 5:00 AM,
185the file server sweeps through the cache blocks and marks dirty blocks
186copy-on-write.
187It creates a copy of the root directory
188and labels it with the current date, for example
189.CW 1995/0314 .
190It then starts a background process to copy the dirty blocks to the WORM.
191The result is that the server retains an image of the file system as it was
192early each morning.
193The set of old root directories is accessible using 9P, so a client
194may examine backup files using ordinary commands.
195Several advantages stem from having the backup service implemented
196as a plain file system.
197Most obviously, ordinary commands can access them.
198For example, to see when a bug was fixed
199.P1
200grep 'mouse bug fix' 1995/*/sys/src/cmd/8½/file.c
201.P2
202The owner, access times, permissions, and other properties of the
203files are also backed up.
204Because it is a file system, the backup
205still has protections;
206it is not possible to subvert security by looking at the backup.
207.PP
208The file server is only one type of file system.
209A number of unusual services are provided within the kernel as
210local file systems.
211These services are not limited to I/O devices such
212as disks.  They include network devices and their associated protocols,
213the bitmap display and mouse,
214a representation of processes similar to
215.CW /proc
216[Killian], the name/value pairs that form the `environment'
217passed to a new process, profiling services,
218and other resources.
219Each of these is represented as a file system \(em
220directories containing sets of files \(em
221but the constituent files do not represent permanent storage on disk.
222Instead, they are closer in properties to UNIX device files.
223.PP
224For example, the
225.I console
226device contains the file
227.CW /dev/cons ,
228similar to the UNIX file
229.CW /dev/console :
230when written,
231.CW /dev/cons
232appends to the console typescript; when read,
233it returns characters typed on the keyboard.
234Other files in the console device include
235.CW /dev/time ,
236the number of seconds since the epoch,
237.CW /dev/cputime ,
238the computation time used by the process reading the device,
239.CW /dev/pid ,
240the process id of the process reading the device, and
241.CW /dev/user ,
242the login name of the user accessing the device.
243All these files contain text, not binary numbers,
244so their use is free of byte-order problems.
245Their contents are synthesized on demand when read; when written,
246they cause modifications to kernel data structures.
247.PP
248The
249.I process
250device contains one directory per live local process, named by its numeric
251process id:
252.CW /proc/1 ,
253.CW /proc/2 ,
254etc.
255Each directory contains a set of files that access the process.
256For example, in each directory the file
257.CW mem
258is an image of the virtual memory of the process that may be read or
259written for debugging.
260The
261.CW text
262file is a sort of link to the file from which the process was executed;
263it may be opened to read the symbol tables for the process.
264The
265.CW ctl
266file may be written textual messages such as
267.CW stop
268or
269.CW kill
270to control the execution of the process.
271The
272.CW status
273file contains a fixed-format line of text containing information about
274the process: its name, owner, state, and so on.
275Text strings written to the
276.CW note
277file are delivered to the process as
278.I notes,
279analogous to UNIX signals.
280By providing these services as textual I/O on files rather
281than as system calls (such as
282.CW kill )
283or special-purpose operations (such as
284.CW ptrace ),
285the Plan 9 process device simplifies the implementation of
286debuggers and related programs.
287For example, the command
288.P1
289cat /proc/*/status
290.P2
291is a crude form of the
292.CW ps
293command; the actual
294.CW ps
295merely reformats the data so obtained.
296.PP
297The
298.I bitmap
299device contains three files,
300.CW /dev/mouse ,
301.CW /dev/screen ,
302and
303.CW /dev/bitblt ,
304that provide an interface to the local bitmap display (if any) and pointing device.
305The
306.CW mouse
307file returns a fixed-format record containing
3081 byte of button state and 4 bytes each of
309.I x
310and
311.I y
312position of the mouse.
313If the mouse has not moved since the file was last read, a subsequent read will
314block.
315The
316.CW screen
317file contains a memory image of the contents of the display;
318the
319.CW bitblt
320file provides a procedural interface.
321Calls to the graphics library are translated into messages that are written
322to the
323.CW bitblt
324file to perform bitmap graphics operations.  (This is essentially a nested
325RPC protocol.)
326.PP
327The various services being used by a process are gathered together into the
328process's
329.I
330name space,
331.R
332a single rooted hierarchy of file names.
333When a process forks, the child process shares the name space with the parent.
334Several system calls manipulate name spaces.
335Given a file descriptor
336.CW fd
337that holds an open communications channel to a service,
338the call
339.P1
340mount(int fd, char *old, int flags)
341.P2
342authenticates the user and attaches the file tree of the service to
343the directory named by
344.CW old .
345The
346.CW flags
347specify how the tree is to be attached to
348.CW old :
349replacing the current contents or appearing before or after the
350current contents of the directory.
351A directory with several services mounted is called a
352.I union
353directory and is searched in the specified order.
354The call
355.P1
356bind(char *new, char *old, int flags)
357.P2
358takes the portion of the existing name space visible at
359.CW new ,
360either a file or a directory, and makes it also visible at
361.CW old .
362For example,
363.P1
364bind("1995/0301/sys/include", "/sys/include", REPLACE)
365.P2
366causes the directory of include files to be overlaid with its
367contents from the dump on March first.
368.PP
369A process is created by the
370.CW rfork
371system call, which takes as argument a bit vector defining which
372attributes of the process are to be shared between parent
373and child instead of copied.
374One of the attributes is the name space: when shared, changes
375made by either process are visible in the other; when copied,
376changes are independent.
377.PP
378Although there is no global name space,
379for a process to function sensibly the local name spaces must adhere
380to global conventions.
381Nonetheless, the use of local name spaces is critical to the system.
382Both these ideas are illustrated by the use of the name space to
383handle heterogeneity.
384The binaries for a given architecture are contained in a directory
385named by the architecture, for example
386.CW /mips/bin ;
387in use, that directory is bound to the conventional location
388.CW /bin .
389Programs such as shell scripts need not know the CPU type they are
390executing on to find binaries to run.
391A directory of private binaries
392is usually unioned with
393.CW /bin .
394(Compare this to the
395.I
396ad hoc
397.R
398and special-purpose idea of the
399.CW PATH
400variable, which is not used in the Plan 9 shell.)
401Local bindings are also helpful for debugging, for example by binding
402an old library to the standard place and linking a program to see
403if recent changes to the library are responsible for a bug in the program.
404.PP
405The window system,
406.CW 8½
407[Pike91], is a server for files such as
408.CW /dev/cons
409and
410.CW /dev/bitblt .
411Each client sees a distinct copy of these files in its local
412name space: there are many instances of
413.CW /dev/cons ,
414each served by
415.CW 8½
416to the local name space of a window.
417Again,
418.CW 8½
419implements services using
420local name spaces plus the use
421of I/O to conventionally named files.
422Each client just connects its standard input, output, and error files
423to
424.CW /dev/cons ,
425with analogous operations to access bitmap graphics.
426Compare this to the implementation of
427.CW /dev/tty
428on UNIX, which is done by special code in the kernel
429that overloads the file, when opened,
430with the standard input or output of the process.
431Special arrangement must be made by a UNIX window system for
432.CW /dev/tty
433to behave as expected;
434.CW 8½
435instead uses the provision of the corresponding file as its
436central idea, which to succeed depends critically on local name spaces.
437.PP
438The environment
439.CW 8½
440provides its clients is exactly the environment under which it is implemented:
441a conventional set of files in
442.CW /dev .
443This permits the window system to be run recursively in one of its own
444windows, which is handy for debugging.
445It also means that if the files are exported to another machine,
446as described below, the window system or client applications may be
447run transparently on remote machines, even ones without graphics hardware.
448This mechanism is used for Plan 9's implementation of the X window
449system: X is run as a client of
450.CW 8½ ,
451often on a remote machine with lots of memory.
452In this configuration, using Ethernet to connect
453MIPS machines, we measure only a 10% degradation in graphics
454performance relative to running X on
455a bare Plan 9 machine.
456.PP
457An unusual application of these ideas is a statistics-gathering
458file system implemented by a command called
459.CW iostats .
460The command encapsulates a process in a local name space, monitoring 9P
461requests from the process to the outside world \(em the name space in which
462.CW iostats
463is itself running.  When the command completes,
464.CW iostats
465reports usage and performance figures for file activity.
466For example
467.P1
468iostats 8½
469.P2
470can be used to discover how much I/O the window system
471does to the bitmap device, font files, and so on.
472.PP
473The
474.CW import
475command connects a piece of name space from a remote system
476to the local name space.
477Its implementation is to dial the remote machine and start
478a process there that serves the remote name space using 9P.
479It then calls
480.CW mount
481to attach the connection to the name space and finally dies;
482the remote process continues to serve the files.
483One use is to access devices not available
484locally.  For example, to write a floppy one may say
485.P1
486import lab.pc /a: /n/dos
487cp foo /n/dos/bar
488.P2
489The call to
490.CW import
491connects the file tree from
492.CW /a:
493on the machine
494.CW lab.pc
495(which must support 9P) to the local directory
496.CW /n/dos .
497Then the file
498.CW foo
499can be written to the floppy just by copying it across.
500.PP
501Another application is remote debugging:
502.P1
503import helix /proc
504.P2
505makes the process file system on machine
506.CW helix
507available locally; commands such as
508.CW ps
509then see
510.CW helix 's
511processes instead of the local ones.
512The debugger may then look at a remote process:
513.P1
514db /proc/27/text /proc/27/mem
515.P2
516allows breakpoint debugging of the remote process.
517Since
518.CW db
519infers the CPU type of the process from the executable header on
520the text file, it supports
521cross-architecture debugging, too.
522Care is taken within
523.CW db
524to handle issues of byte order and floating point; it is possible to
525breakpoint debug a big-endian MIPS process from a little-endian i386.
526.PP
527Network interfaces are also implemented as file systems [Presotto].
528For example,
529.CW /net/tcp
530is a directory somewhat like
531.CW /proc :
532it contains a set of numbered directories, one per connection,
533each of which contains files to control and communicate on the connection.
534A process allocates a new connection by accessing
535.CW /net/tcp/clone ,
536which evaluates to the directory of an unused connection.
537To make a call, the process writes a textual message such as
538.CW 'connect
539.CW 135.104.53.2!512'
540to the
541.CW ctl
542file and then reads and writes the
543.CW data
544file.
545An
546.CW rlogin
547service can be implemented in a few of lines of shell code.
548.PP
549This structure makes network gatewaying easy to provide.
550We have machines with Datakit interfaces but no Internet interface.
551On such a machine one may type
552.P1
553import helix /net
554telnet tcp!ai.mit.edu
555.P2
556The
557.CW import
558uses Datakit to pull in the TCP interface from
559.CW helix ,
560which can then be used directly; the
561.CW tcp!
562notation is necessary because we routinely use multiple networks
563and protocols on Plan 9\(emit identifies the network in which
564.CW ai.mit.edu
565is a valid name.
566.PP
567In practice we do not use
568.CW rlogin
569or
570.CW telnet
571between Plan 9 machines.  Instead a command called
572.CW cpu
573in effect replaces the CPU in a window with that
574on another machine, typically a fast multiprocessor CPU server.
575The implementation is to recreate the
576name space on the remote machine, using the equivalent of
577.CW import
578to connect pieces of the terminal's name space to that of
579the process (shell) on the CPU server, making the terminal
580a file server for the CPU.
581CPU-local devices such as fast file system connections
582are still local; only terminal-resident devices are
583imported.
584The result is unlike UNIX
585.CW rlogin ,
586which moves into a distinct name space on the remote machine,
587or file sharing with
588.CW NFS ,
589which keeps the name space the same but forces processes to execute
590locally.
591Bindings in
592.CW /bin
593may change because of a change in CPU architecture, and
594the networks involved may be different because of differing hardware,
595but the effect feels like simply speeding up the processor in the
596current name space.
597.SH
598Position
599.PP
600These examples illustrate how the ideas of representing resources
601as file systems and per-process name spaces can be used to solve
602problems often left to more exotic mechanisms.
603Nonetheless there are some operations in Plan 9 that are not
604mapped into file I/O.
605An example is process creation.
606We could imagine a message to a control file in
607.CW /proc
608that creates a process, but the details of
609constructing the environment of the new process \(em its open files,
610name space, memory image, etc. \(em are too intricate to
611be described easily in a simple I/O operation.
612Therefore new processes on Plan 9 are created by fairly conventional
613.CW rfork
614and
615.CW exec
616system calls;
617.CW /proc
618is used only to represent and control existing processes.
619.PP
620Plan 9 does not attempt to map network name spaces into the file
621system name space, for several reasons.
622The different addressing rules for various networks and protocols
623cannot be mapped uniformly into a hierarchical file name space.
624Even if they could be,
625the various mechanisms to authenticate,
626select a service,
627and control the connection would not map consistently into
628operations on a file.
629.PP
630Shared memory is another resource not adequately represented by a
631file name space.
632Plan 9 takes care to provide mechanisms
633to allow groups of local processes to share and map memory.
634Memory is controlled
635by system calls rather than special files, however,
636since a representation in the file system would imply that memory could
637be imported from remote machines.
638.PP
639Despite these limitations, file systems and name spaces offer an effective
640model around which to build a distributed system.
641Used well, they can provide a uniform, familiar, transparent
642interface to a diverse set of distributed resources.
643They carry well-understood properties of access, protection,
644and naming.
645The integration of devices into the hierarchical file system
646was the best idea in UNIX.
647Plan 9 pushes the concepts much further and shows that
648file systems, when used inventively, have plenty of scope
649for productive research.
650.SH
651References
652.LP
653[Killian] T. Killian, ``Processes as Files'', USENIX Summer Conf. Proc., Salt Lake City, 1984
654.br
655[Needham] R. Needham, ``Names'', in
656.I
657Distributed systems,
658.R
659S. Mullender, ed.,
660Addison Wesley, 1989
661.br
662[Pike90] R. Pike, D. Presotto, K. Thompson, H. Trickey,
663``Plan 9 from Bell Labs'',
664UKUUG Proc. of the Summer 1990 Conf.,
665London, England,
6661990
667.br
668[Presotto] D. Presotto, ``Multiprocessor Streams for Plan 9'',
669UKUUG Proc. of the Summer 1990 Conf.,
670London, England,
6711990
672.br
673[Pike91] Pike, R., ``8.5, The Plan 9 Window System'', USENIX Summer
674Conf. Proc., Nashville, 1991
675