xref: /plan9/sys/doc/fs/p4 (revision 6aeb1f0c04d990a2469d1ed5385e6b7f18c098d6)
1.SH
2Block Devices
3.PP
4The block device I/O system is like a
5protocol stack of filters.
6There are a set of pseudo-devices that call
7recursively to other pseudo-devices and real devices.
8The protocol stack is compiled from a configuration
9string that specifies the order of pseudo-devices and devices.
10Each pseudo-device and device has a set of entry points
11that corresponds to the operations that the file system
12requires of a device.
13The most notable operations are
14.CW read ,
15.CW write ,
16and
17.CW size .
18.PP
19The device stack can best be described by
20describing the syntax of the configuration string
21that specifies the stack.
22Configuration strings are used
23during the setup of the file system.
24For a description see
25.I fsconfig (8).
26In the following recursive definition,
27.I D
28represents a
29string that specifies a block device.
30.IP "\fID\fP = (\fIDD\fP...)"
31.br
32This is a set of devices that
33are concatenated to form a single device.
34The size of the catenated device is the
35sum of the sizes of each sub-device.
36.IP "\fID\fP = [\fIDD\fP...]"
37.br
38This is the interleaving of the
39individual devices.
40If there are N devices in the list,
41then the pseudo-device is the N-way block
42interleaving of the sub-devices.
43The size of the interleaved device is
44N times the size of the smallest sub-device.
45.IP "\fID\fP = {\fIDD\fP...}"
46.br
47This is a set of devices that
48constitute a `mirror' of the first sub-device, and form a single device.
49A write to the device is performed,
50at the same block address,
51on the sub-devices, in right-to-left order.
52A read from the device is performed on each sub-device,
53in left-to-right order, until a read succeeds without error,
54or the set is exhausted.
55One can think of this as a poor man's RAID 1.
56The size of the device is the size of the smallest sub-device.
57.IP "\fID\fP = \f(CWp\fP\fIDN1.N2\fP"
58.br
59This is a partition of a sub-device.
60The sub-device is partitioned into 100 equal pieces.
61If the size of the sub-device is not divisible by 100,
62then there will be some slop thrown away at the top.
63The pseudo-device starts at the N1-th piece and
64continues for N2 pieces. Thus
65.CW p\fID\fP67.33
66will be the
67last third of the device
68.I D .
69.IP "\fID\fP = \f(CWf\fP\fID\fP"
70.br
71This is a fake write-once-read-many device simulated by a
72second read-write device.
73This second device is partitioned
74into a set of block flags and a set of blocks.
75The flags are used to generate errors if a
76block is ever written twice or read without being written first.
77.IP "\fID\fP = \f(CWx\fP\fID\fP"
78.br
79This is a byte-swapped version of the file system on D.
80Since the file server currently writes integers in metadata to disk
81in native byte order, moving a file system to a machine of the other
82major byte order (e.g., MIPS to Pentium)
83requires the use of
84.CW x .
85It knows the sizes of the various integer fields in the file system metadata.
86Ideally, the file server would follow the Plan 9 religion and write a consistent
87byte order on disk, regardless of processor.
88In the mean time, it should be possible to automatically determine the need
89for byte-swapping by examining data in the super-block of each file system,
90though this has not been implemented yet.
91.IP "\fID\fP = \f(CWc\fP\fIDD\fP"
92.br
93This is the cache/WORM device made up of a cache (read-write)
94device and a WORM (write-once-read-many) device.
95More on this later.
96.IP "\fID\fP = \f(CWo\fP"
97.br
98This is the dump file system that is the
99two-level hierarchy of all dumps ever taken on a cache/WORM.
100The read-only root of the cache/WORM file system
101(on the dump taken Feb 18, 1995) can
102be referenced as
103.CW /1995/0218
104in this pseudo device.
105The second dump taken that day will be
106.CW /1995/02181 .
107.IP "\fID\fP = \f(CWw\fP\fIN1.N2.N3\fP"
108.br
109This is a SCSI disk on controller N1, target N2 and logical unit number N3.
110.IP "\fID\fP = \f(CWh\fP\fIN1.N2.0\fP"
111.br
112This is an (E)IDE or *ATA disk on controller N1, target N2
113(target 0 is the IDE master, 1 the slave device).
114These disks are currently run via programmed I/O, not DMA,
115so they tend to be slower to access than SCSI disks.
116.IP "\fID\fP = \f(CWr\fP\fIN1\fP"
117.br
118This is the same as
119.CW w ,
120but refers to a side of a WORM disc.
121See the
122.I j
123device.
124.IP "\fID\fP = \f(CWl\fP\fIN1\fP"
125.br
126This is the same as
127.CW r ,
128but one block from the SCSI disk is removed for labeling.
129.IP "\fID\fP = \f(CWj(\fP\fID\d\s-2\&1\s+2\u\fID\d\s-2\&2\s+2\u\f(CW*)\fID\d\s-2\&3\s+2\u\f1"
130.br
131.I D\d\s-2\&1\s+2\u
132is the juke box SCSI interface.
133The
134.I D\d\s-2\&2\s+2\u 's
135are the SCSI drives in the juke box
136and the
137.I D\d\s-2\&3\s+2\u 's
138are the demountable platters in the juke box.
139.I D\d\s-2\&1\s+2\u
140and
141.I D\d\s-2\&2\s+2\u
142must be
143.CW w .
144.I D\d\s-2\&3\s+2\u
145must be pseudo devices of
146.CW w ,
147.CW r ,
148or
149.CW l
150devices.
151.PP
152For
153.CW w ,
154.CW h ,
155.CW l ,
156and
157.CW r
158devices any of the configuration numbers
159can be replaced by an iterator of the form
160.CW <\fIN1-N2\fP> .
161N1 can be greater than N2, indicating a descending sequence.
162Thus
163.Ex
164	[w0.<2-6>]
165.Ee
166is the interleaved SCSI disks on SCSI targets
1672 through 6 of SCSI controller 0.
168The main file system on
169Emelie
170is defined by the configuration string
171.Ex
172	c[w1.<0-5>.0]j(w6w5w4w3w2)(l<0-236>l<238-474>)
173.Ee
174This is a cache/WORM driver.
175The cache is three interleaved disks on SCSI controller 1
176targets 0, 1, 2, 3, 4, and 5.
177The WORM half of the cache/WORM
178is 474 jukebox disks.
179Another file server,
180.I choline ,
181has a main file system defined by
182.Ex
183	c[w<1-3>]j(w1.<6-0>.0)(l<0-124>l<128-252>)
184.Ee
185The order of
186.CW w1.<6-0>.0
187matters here, since the optical jukebox's WORM drives's
188SCSI target ids,
189as delivered,
190run in descending order relative to the numbers of the drives
191in SCSI commands
192(e.g., the jukebox controller is SCSI target 6,
193drive #1 is SCSI target 5,
194and drive #6 is SCSI target 0).
195