1.ds TM \u\s-2TM\s+2\d 2.nr dT 6 3.nr XT 6 4.TL 5The Styx Architecture for Distributed Systems 6.AU 7Rob Pike 8Dennis M. Ritchie 9.AI 10Computing Science Research Center 11Lucent Technologies, Bell Labs 12Murray Hill, New Jersey 13USA 14.FS 15.FA 16Originally appeared in 17.I "Bell Labs Technical Journal" , 18Vol. 4, 19No. 2, 20April-June 1999, 21pp. 146-152. 22.br 23Copyright © 1999 Lucent Technologies Inc. All rights reserved. 24.FE 25.AB 26A distributed system is constructed from a set of relatively 27independent components that form a unified, but geographically and 28functionally diverse entity. Examples include networked operating 29systems, Internet services, the national telephone 30switching system, and in general 31all the technology using today's diverse digital 32networks. Nevertheless, distributed systems remain difficult 33to design, build, and maintain, primarily because of the lack 34of a clean, perspicuous interconnection model for the 35components. 36.LP 37Our experience with two distributed operating systems, 38Plan 9 and Inferno, encourages us to propose such a model. 39These systems depend on, advocate, and generally push to the 40limit a fruitful idea: to present their 41resources as files in a hierarchical name space. 42The objects appearing as files may represent stored data, but may 43also be devices, dynamic information sources, interfaces to services, 44and control points. The approach unifies and provides basic naming, 45structuring, and access control mechanisms for all system resources. 46A simple underlying network protocol, Styx, forms 47the core of the architecture by presenting a common 48language for communication within the system. 49.LP 50Even within non-distributed systems, the presentation of services 51as files advantageously extends a familiar scheme for naming, classifying, 52and connecting to system resources. 53More important, the approach provides a natural way to build 54distributed systems, by using well-known technology for attaching 55remote file systems. 56If resources are represented as files, 57and there are remote file systems, one has 58a distributed system: resources available in one place 59are usable from another. 60.AE 61.SH 62Introduction 63.LP 64The Styx protocol is a variant of a protocol called 65.I 9P 66that 67was developed for the Plan 9 operating system[9man]. 68For simplicity, we will use the name 69Styx throughout this paper; the difference concerns only the initialization of 70a connection. 71.LP 72The original idea behind Styx was to encode file operations between 73client programs and the file system, 74to be translated into messages for transmission on a computer network. 75Using this technology, 76Plan 9 separates the file server\(ema central repository for 77permanent file storage\(emboth from the CPU server\(ema large 78shared-memory multiprocessor\(emand from the user terminals. 79This physical separation of function was central to the original 80design of the system; 81what was unexpected was how well the model could be used to 82solve a wide variety of problems not usually thought of as 83file system issues. 84.LP 85The breakthrough was to realize that by representing 86a computing resource as a form of file system, 87many of the difficulties of making that resource available 88across the network would disappear naturally, because 89Styx could export the resource transparently. 90For example, 91the Plan 9 window system, 92.CW 8½ 93[Pike91], 94is implemented as a dynamic file server that publishes 95files with names like 96.CW /dev/mouse 97and 98.CW /dev/screen 99to provide access to the local hardware. 100The 101.CW /dev/mouse 102file, for instance, 103may be opened and read like a regular file, in the manner of UNIX\*(TM device 104files, but under 105.CW 8½ 106it is multiplexed: each client program has a private 107.CW /dev/mouse 108file that returns mouse events only when the client's window 109is the active one on the display. 110This design provides a clean, simple mechanism for controlling 111access to the mouse. 112Its real strength, though, is that the representation of the window system's 113resources as files allows Styx to make those resources available across the 114network. 115For example, an interactive graphics program may be run on a CPU server 116simply by having 117.CW 8½ 118serve the appropriate files to that machine. 119.LP 120Note that although the resources published by Styx behave like files\(emthey 121have file names, file permissions, and file access methods\(emthey do not 122need to exist as standard files on disk. 123The 124.CW /dev/mouse 125file is accessed by standard file I/O mechanisms but is nonetheless a 126transient object fabricated dynamically by a running program; 127it has no permanent existence. 128.LP 129By following this approach throughout the system, Plan 9 achieves 130a remarkable degree of transparency in the distribution of resources[PPTTW93]. 131Besides interactive graphics, services such as debugging, maintenance, 132file backup, and even access to the underlying network hardware 133can be made available across the network using Styx, permitting 134the construction of distributed applications and services 135using nothing more sophisticated than file I/O. 136.SH 137The Styx protocol 138.LP 139Styx's place in the world is analogous to 140Sun NFS[RFC][NFS] or Microsoft CIFS[CIFS], although it is simpler and easier to implement 141[Welc94]. 142Furthermore, NFS and CIFS are designed for sharing regular disk files; NFS in particular 143is intimately tied to the implementation and caching strategy 144of the underlying UNIX file system. 145Unlike Styx, NFS and CIFS are clumsier at exporting dynamic device-like 146files such as 147.CW /dev/mouse . 148.LP 149Styx provides a view of a hierarchical, tree-shaped 150file system name space[Nee89], together with access information about 151the files (permissions, sizes, dates) and the means to read and write 152the files. 153Its users (that is, the people who write application programs), 154don't see the protocol itself; instead they see files that they 155read and write, and that provide information or change information. 156.LP 157In use, a Styx 158.I client 159is an entity on one machine that establishes communication with 160another entity, the 161.I server , 162on the same or another machine. 163The client mechanisms may be built into the operating system, as they 164are in Plan 9 or Inferno[INF1][INF2], or into application libraries; 165a server may be part of the operating system, or just as often 166may be application code on a separate server machine. In any case, the 167client and server entities 168communicate by exchanging messages, and the effect is that the client 169sees a hierarchical file system that exists on the server. 170The Styx protocol is the specification of the messages that are exchanged. 171.LP 172At one level, Styx consists of messages of 13 types for 173.RS 174.IP \(bu 175Starting communication (attaching to a file system); 176.IP \(bu 177Navigating the file system (that is, specifying and 178gaining a handle for a named file); 179.IP \(bu 180Reading and writing a file; and 181.IP \(bu 182Performing file status inquiries and changes 183.RE 184.LP 185However, application writers simply code requests to open, read, or write 186files; a library or the operating system translates the requests 187into the necessary byte sequences transmitted over a communication 188channel. The Styx protocol proper specifies the interpretation of these 189byte sequences. It fits, approximately, at the OSI Session Layer level 190of the ISO standard classification. 191Its specification is independent of most details of machine architecture 192and it has been successfully used among machines of varying instruction 193sets and data layout. 194The protocol is summarized in Table 1. 195.KF 196.TS 197center box; 198l l 199-- 200lfCW l. 201Name Description 202attach Authenticate user of connection; return FID 203clone Duplicate FID 204walk Advance FID one level of name hierarchy 205open Check permissions for file I/O 206create Create new file 207read Read contents of file 208write Write contents of file 209close Discard FID 210remove Remove file 211stat Report file state: permissions, etc. 212wstat Modify file state 213error Return error condition for failed operation 214flush Disregard outstanding I/O requests 215.TE 216.ce 100 217.ps -1 218Table 1. Summary of Styx messages. 219.ps 220.ce 0 221.KE 222.LP 223In use, an operation such as 224.P1 225open("/usr/rob/.profile", O_READ); 226.P2 227is translated by the underlying system into a sequence of Styx messages. 228After establishing the initial connection to the 229file server, an 230.CW attach 231message authenticates the user (the person or agent accessing the files) and 232returns an object called a 233.CW FID 234(file ID) that represents the root of the hierarchy on the server. 235When the 236.CW open() 237operation is executed, it proceeds as follows. 238.RS 239.IP \(bu 240A 241.CW clone 242message duplicates the root 243.CW FID , 244returning a new 245.CW FID 246that can navigate the hierarchy without losing the connection to the root. 247.IP \(bu 248The new 249.CW FID 250is then moved to the file 251.CW /usr/rob/.profile 252by a sequence of 253.CW walk 254messages that step along, one path component at a time 255.CW usr , ( 256.CW rob , 257.CW .profile ). 258.IP \(bu 259Finally, an 260.CW open 261message checks that the user has permission to read the file, 262permitting subsequent 263.CW read 264and 265.CW write 266operations (messages) on the 267.CW FID . 268.IP \(bu 269Once I/O is completed, the 270.CW close 271message will release the 272.CW FID . 273.RE 274.LP 275At a lower level, implementations of Styx depend only on a reliable, 276byte-stream Transport communications layer. For example, it runs over either 277TCP/IP, the standard transmission control protocol 278and Internet protocol, 279or Internet link (IL), which is a sequenced, reliable datagram protocol 280using IP packets. 281It is worth emphasizing, though, that the model does not require the 282existence of a network to join the components; Styx runs fine 283over a Unix pipe or even using shared memory. 284The strength of the approach is not so much how it works over a network 285as that its behavior over a network is identical to its behavior locally. 286.SH 287Architectural approach 288.LP 289Styx, as a file system protocol, is merely a component in a 290more encompassing approach 291to system design: the presentation of resources as files. 292This approach will be discussed using a sequence of examples. 293.SH 294.I "Example: networking 295.LP 296As an example, access to a TCP/IP network in Inferno and Plan 9 systems 297appears as a piece of a file system, with (abbreviated) structure 298as follows[PrWi93]: 299.P1 300/net/ 301 dns/ 302 tcp/ 303 clone 304 stats 305 0/ 306 ctl 307 status 308 data 309 listen 310 1/ 311 ... 312 ... 313 ether0/ 314 0/ 315 ctl 316 status 317 ... 318 1/ 319 ... 320 ... 321.P2 322This represents a file system structure in which one can name, read, and write `files' with 323names like 324.CW /net/dns , 325.CW /net/tcp/clone , 326.CW /net/tcp/0/ctl 327and so on; 328there are directories of files 329.CW /net/tcp 330and 331.CW /net/ether0 . 332On the machine that actually has the network interface, all of these 333things that look like files are constructed by the kernel drivers that maintain 334the TCP/IP stack; they are not real files on a disk. 335Operations on the `files' turn into operations sent to the device drivers. 336.LP 337Suppose an application wishes to establish a connection over TCP/IP to 338.CW www.bell-labs.com . 339The first task is to translate the domain name 340.CW www.bell-labs.com 341to a numerical internet address; this is a complicated process, generally 342involving communicating with local and remote Domain Name Servers. 343In the Styx model, this is done by opening the file 344.CW /dev/dns 345and writing the literal string 346.CW www.bell-labs.com 347on the file; then the same file is read. 348It will return the string 349.CW 204.178.16.5 350as a sequence of 12 characters. 351.LP 352Once the numerical Internet address is acquired, the connection must be established; 353this is done by opening 354.CW /net/tcp/clone 355and reading from it a string that specifies a directory like 356.CW /net/tcp/43 , 357which represents a new, unique TCP/IP channel. 358To establish the connection, 359write a message like 360.CW "connect 204.178.16.5 361on the control file for that connection, 362.CW /net/tcp/43/ctl . 363Subsequently, communication with 364.CW www.bell-labs.com 365is done by reading and 366writing on the file 367.CW /net/tcp/43/data . 368.LP 369There are several things to note about this approach. 370.RS 371.IP \(bu 372All the interface points look like files, and are 373accessed by the same I/O mechanisms already available in 374programming languages like C, C++, or Java. However, they do not 375correspond to ordinary data files on disk, but instead are creations 376of a middleware code layer. 377.IP \(bu 378Communication across the interface, by convention, uses printable character strings where 379feasible instead of binary information. This means that the syntax 380of communication does not depend on CPU architecture or language details. 381.IP \(bu 382Because the interface, as in this example with 383.CW /net 384as the interface with networking facilities, looks like a piece of a 385hierarchical file system, it can easily and nearly automatically 386be exported to a remote machine and used from afar. 387.RE 388.LP 389In particular, the Styx implementation encourages a natural way of providing 390controlled access to networks. 391Lucent, like many organizations, has an internal network not 392accessible to the international Internet, and has a few 393gateways between the inside and outside networks. 394Only the gateway machines are connected to both, and they implement 395the administrative controls for safety and security. 396The advantage of the Styx model is the ease with which 397the outside Internet can be used from inside. 398If the 399.CW /net 400file tree described above is provided on a gateway machine, 401it can be used as a remote file system from machines on the 402inside. This is safe, because this connection is one-way: 403inside machines can see the external network interfaces, 404but outside machines cannot see the inside. 405.SH 406.I "Example: debugging 407.LP 408A similar approach, borrowed and generalized from the UNIX 409system [Kill], is useful for controlling and discovering the status 410of the running processes in the operating system. 411Here a directory 412.CW /proc 413contains a subdirectory for each process running on the 414system; the names of the subdirectories correspond to 415process IDs: 416.P1 417/proc/ 418 1/ 419 status 420 ctl 421 fd 422 text 423 mem 424 ... 425 2/ 426 status 427 ctl 428 ... 429 ... 430.P2 431The file names in the process directories refer to various aspects 432of the corresponding process: 433.CW status 434contains information about the state of the process; 435.CW ctl , 436when written, performs operations like pausing, restarting, 437or killing the process; 438.CW fd 439names and describes the files open in the process; 440.CW text 441and 442.CW mem 443represent the program code and the data respectively. 444.LP 445Where possible, the information and control are again 446represented as text strings. For example, one line 447from the 448.CW status 449file of a typical process might be 450.DS 451.CW "samterm dmr Read 0 20 2478910 0 0 ... 452.DE 453which shows the name of the program, the owner, its state, and several numbers 454representing CPU time in various categories. 455.LP 456Once again, the approach provides several payoffs. 457Because process information is represented in file form, 458remote debugging (debugging programs on another machine) 459is possible immediately by remote-mounting the 460.CW /proc 461tree on another machine. 462The machine-independent representation of information means 463that most operations work properly even if the remote machine 464uses a different CPU architecture from the one doing the 465debugging. 466Most of the programs that deal 467with status and control contain no machine-dependent parts 468and are completely portable. 469(A few are not, however: no attempt is made to render the 470memory data or instructions in machine-independent form.) 471.SH 472.I "Example: PathStar\*(TM Access Server 473.LP 474The data shelf of Lucent's PathStar Access Server[PATH] uses Styx to connect 475the line cards and other devices on the shelf to the control computer. 476In fact, Styx is the protocol for high-level communication on the backplane. 477.LP 478The file system hierarchy served by the control computer includes a structure 479like this: 480.P1 481/trip/ 482 config 483 admin/ 484 ospfctl 485 ... 486 boot/ 487 0/ 488 ctl 489 eeprom 490 memory 491 msg 492 pack 493 alarm 494 ... 495 1/ 496 ... 497/net/ 498 ... 499.P2 500The directories under 501.CW /net 502are similar to those in Plan 9 or Inferno; they form the interface to the 503external IP network. 504The 505.CW /trip 506hierarchy represents the control structure of the shelf. 507.LP 508The subdirectories under 509.CW /trip/boot 510each provide access to one of the line cards or other devices in the shelf. 511For example, to initialize a card one writes the text string 512.CW reset 513to the 514.CW ctl 515file of the card, while bootstrapping is done by copying the control 516software for the card into the 517.CW memory 518file and writing a 519.CW reset 520message to 521.CW ctl . 522Once the line card is running, 523the other files present an interface to the higher-level structure of the device: 524.CW pack 525is the port through which IP packets are transferred to and from the card, 526.CW alarm 527may be read to discover outstanding conditions on the card, and so on. 528.LP 529All this structure is exported from the shelf using Styx. 530The external element management software (EMS) controls and monitors the 531shelf using Styx operations. 532For example, the EMS may read 533.CW /trip/boot/7/alarm 534and discover a diagnostic condition. 535By reading and writing the other files under 536.CW /trip/boot/7/ , 537the card may be taken off line, diagnosed, and perhaps reset or substituted, 538all from the system running the EMS, which may be elsewhere in the network. 539.LP 540Another example is the implementation of SNMP in the PathStar Access Server. 541The functionality of SNMP is usually distributed through the various components 542of a network, but here it is a straightforward adaption process, 543running anywhere in the network, that translates SNMP requests to Styx 544operations in the network element. 545Besides dramatically simplifying the implementation, the natural 546ability for aggregation permits 547a single process to provide SNMP access to an arbitrarily complex network subsystem. 548Yet the structure is secure: the file-oriented nature of the operations make it 549easy to establish standard authentication and security controls to guarantee 550that only trusted parties have access to the SNMP operations. 551.LP 552There are local benefits to this architecture, as well. 553Styx provides a single point in the design where control can be separated 554from the details of the underlying fabric, isolating both from changes in the 555other. Components become more adaptable: software can be upgraded 556without worrying about hidden dependencies on the hardware, 557and new hardware may be installed without updating the control 558software above. 559.SH 560Security issues 561.LP 562Styx provides several security mechanisms for 563discouraging hostile or accidental actions that injure the integrity 564of a system. 565.LP 566The underlying file-communication protocol includes 567user and group identifiers that a server may check against 568other authentication. 569For example, a server may check, on a request to open a file, 570that the user ID associated with the request is permitted to 571perform the operation. 572This mechanism is familiar from general-purpose operating 573systems, and its use is well-known. 574It depends on passwords or stronger mechanisms for authenticating 575the identity of clients. 576.LP 577The Styx approach of providing remote resources 578as file systems over a network encourages genuinely secure access 579to the resources in a way transparent to applications, so that 580authentication transactions need not be provided as part of each. 581For example, in Inferno, the negotiation of an initial connection 582between client and server may include installation of any of 583several encrypting or message-digesting protocols that 584supervise the channel. 585All application use of the resources provided by the server 586is then protected against interference, and the server 587has strong assurance that its facilities are being used in 588an authorized way. 589This is relevant both for general-purpose file servers, 590and, in the telephony field, is especially useful for safe 591remote administration. 592.SH 593Summary 594.LP 595Presentation of resources as a piece of a possibly remote file system 596is an attractive way of creating distributed systems that treads a 597path between two extremes: 598.IP 1 599All communication with other parts of the system is by 600explicit messages sent between components. 601This communication differs in style from applications' use 602of local resources. 603.IP 2 604All communication is by means of 605closely shared resources: the CPU-addressable memory in 606various parts is made directly available across a big network; 607applications can read and write far-away objects exactly as 608they do those on the same motherboard as their own CPU. 609.LP 610Something like the first of these extremes is usually more evident 611in today's systems, although either the operating system or software 612layered upon it usually paper over some of the rough spots. 613The second remains more difficult to approach, because 614networks (especially big ones like the Internet) are not very 615reliable, and because 616the machines on them are diverse in processor architecture 617and in installed software. 618.LP 619The design plan described and advocated in this paper 620lies between the two extremes. 621It has these advantages: 622.IP \(bu 623.I "A simple, familiar programming model for reading and writing named files" . 624File systems have well-defined naming, access, and permissions structures. 625.IP \(bu 626.I "Platform and language independence" . 627Underlying access to resources is 628at the file level, which is provided nearly everywhere, instead 629of depending on facilities available only with particular languages 630or operating systems. 631C++ or Java classes, and C libraries can be constructed 632to access the facilities. 633.IP \(bu 634.I "A hierarchical naming and access control structure" . 635This encourages clean 636and well-structured design of resource naming and access. 637.IP \(bu 638.I "Easy testing and debugging" . 639By using well-specified, narrow interfaces 640at the file level, it is straightforward to observe the communication 641between distributed entities. 642.IP \(bu 643.I "Low cost" . 644Support software, at both client and server, 645can be written in a few thousand lines 646of code, and will occupy only small space in products. 647.LP 648This approach to building systems is successful in the general-purpose 649systems Plan 9 and Inferno; 650it has also been used to construct systems specialized for telephony, such 651as Mantra[MAN] and the PathStar Access Server. 652It supplies a coherent, extensible structure both to the internal communications 653within a single system and external communication between heterogeneous 654components of a large digital network. 655.LP 656.SH 657References 658.nr PS -1 659.nr VS -1 660.IP [NFS] 11 661R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and 662B. Lyon, 663``Design and Implementation of the Sun Network File System'', 664.I "Proc. Summer 1985 USENIX Conf." , 665Portland, Oregon, June 1985, 666pp. 119-130. 667.IP [RFC] 11 668Internet RFC 1094. 669.IP [9man] 11 670.I "Plan 9 Programmer's Manual" , 671Second Edition, 672Vol. 1 and 2, 673Bell Laboratories, 674Murray Hill, N.J., 6751995. 676.IP [Kill84] 11 677T. J. Killian, 678``Processes as Files'', 679.I "Proc. Summer 1984 USENIX Conf." , 680June 1984, Salt Lake City, Utah, June 1984, pp. 203-207. 681.IP [Pike91] 11 682R. Pike, 683``8½, the Plan 9 Window System'', 684.I "Proc. Summer 1991 USENIX Conf." , 685Nashville TN, June 1991, pp. 257-265. 686.IP "[PPTTW93] " 11 687R. Pike, D.L. Presotto, K. Thompson, H. Trickey, and P. Winterbottom, ``The Use of Name Spaces in Plan 9'', 688.I "Op. Sys. Rev." , 689Vol. 27, No. 2, April 1993, pp. 72-76. 690.IP [PrWi93] 11 691D. L. Presotto and P. Winterbottom, 692``The Organization of Networks in Plan 9'', 693.I "Proc. Winter 1993 USENIX Conf." , 694San Diego, Calif., Jan. 1993, pp. 43-50. 695.IP [Nee89] 11 696R. Needham, ``Names'', in 697.I "Distributed systems" , 698edited by S. Mullender, 699Addison-Wesley, 700Reading, Mass., 1989, pp. 89-101. 701.IP [CIFS] 702Paul Leach and Dan Perry, ``CIFS: A Common Internet File System'', Nov. 1996, 703.I "http://www.microsoft.com/mind/1196/cifs.htm" . 704.IP [INF1] 705.I "Inferno Programmer's Manual", 706Third Edition, 707Vol. 1 and 2, Vita Nuova Holdings Limited, York, England, 2000. 708.IP [INF2] 709S.M. Dorward, R. Pike, D. L. Presotto, D. M. Ritchie, H. Trickey, 710and P. Winterbottom, ``The Inferno Operating System'', 711.I "Bell Labs Technical Journal" 712Vol. 2, 713No. 1, 714Winter 1997. 715.IP [MAN] 716R. A. Lakshmi-Ratan, 717``The Lucent Technologies Softswitch\-Realizing the Promise of Convergence'', 718.I "Bell Labs Technical Journal" , 719Vol. 4, 720No. 2, 721April-June 1999, 722pp. 174-196. 723.IP [PATH] 724J. M. Fossaceca, J. D. Sandoz, and P. Winterbottom, 725``The PathStar Access Server: Facilitating Carrier-Scale Packet Telephony'', 726.I "Bell Labs Technical Journal" , 727Vol. 3, 728No. 4, 729October-December 1998, 730pp. 86-102. 731.IP [Welc94] 732B. Welch, 733``A Comparison of Three Distributed File System Architectures: Vnode, Sprite, and Plan 9'', 734.I "Computing Systems" , 735Vol. 7, No. 2, pp. 175-199 (1994). 736.nr PS +1 737.nr VS +1 738