xref: /plan9-contrib/sys/doc/fs/p6 (revision 219b2ee8daee37f4aad58d63f21287faa8e4ffdc)
1.SH
2Cache/WORM Driver
3.PP
4The cache/WORM (cw) driver is by far the
5largest and most complicated device driver in the file server.
6There are four devices involved in the cw driver.
7It implements a read/write pseudo-device (the cw-device)
8and a read-only pseudo-device (the dump device)
9by performing operations on its two constituent devices
10the read-write c-device and the write-once-read-many
11w-device.
12The block numbers on the four devices are distinct,
13although the cw addresses,
14dump addresses,
15and the w addresses are
16highly correlated.
17.PP
18The cw-driver uses the w-device as the
19stable storage of the file system at the time of the
20last dump.
21All newly written and a large number of recently used
22exact copies of blocks of the w-device are kept on the c-device.
23The c-device is much smaller than the w-device and
24so the subset of w-blocks that are kept on the c-device are
25mapped through a hash table kept on a partition of the c-device.
26.PP
27The map portion of the c-device consists of blocks of buckets of entries.
28The declarations follow.
29.P1
30    enum
31    {
32        BKPERBLK = 10,
33        CEPERBK  = (BUFSIZE - BKPERBLK*sizeof(long)) /
34                   (sizeof(Centry)*BKPERBLK),
35    };
36.P2
37.P1
38    typedef
39    struct
40    {
41        ushort   age;
42        short    state;
43        long     waddr;
44    } Centry;
45.P2
46.P1
47    typedef
48    struct
49    {
50        long     agegen;
51        Centry   entry[CEPERBK];
52    } Bucket;
53.P2
54.P1
55    Bucket   bucket[BKPERBLK];
56.P2
57There is exactly one entry structure for each block in the
58data partition of the c-device.
59A bucket contains all of the w-addresses that have
60the same hash code.
61There are as many buckets as will fit
62in a block and enough blocks to have the required
63number of entries.
64The entries in the bucket are maintained
65in FIFO order with an age variable and an incrementing age generator.
66When the age generator is about to overflow,
67all of the ages in the bucket are rescaled
68from zero.
69.PP
70The following steps go into converting a w-address into a c-address.
71The bucket is found by
72.P1
73    bucket_number = w-address % total_buckets
74    getbuf(c-device, bucket_offset + bucket_number/BKPERBLK);
75.P2
76After the desired bucket is found,
77the desired entry is found by a linear search within the bucket for the
78entry with the desired
79.CW waddr .
80.PP
81The state variable in the entry is
82one of the following.
83.P1
84    enum
85    {
86        Cnone    = 0,
87        Cdirty,
88        Cdump,
89        Cread,
90        Cwrite,
91        Cdump1,
92    };
93.P2
94Every w-address has a state.
95Blocks that are not in the
96c-device have the implied
97state
98.CW Cnone .
99The
100.CW Cread
101state is for blocks that have the
102same data as the corresponding block in
103the w-device.
104Since the c-device is much faster than the
105w-device,
106.CW Cread
107blocks are kept as long as possible and
108used in preference to reading the w-device.
109.CW Cread
110blocks may be discarded from the c-device
111when the space is needed for newer data.
112The
113.CW Cwrite
114state is when the c-device contains newer data
115than the corresponding block on the w-device.
116This happens when a
117.CW Cnone ,
118.CW Cread ,
119or
120.CW Cwrite
121block is written.
122The
123.CW Cdirty
124state
125is when the c-device contains
126new data and the corresponding block
127on the w-device has never been written.
128This happens when a new block has been
129allocated from the free space on the w-device.
130.PP
131The
132.CW Cwrite
133and
134.CW Cdirty
135blocks are created and never removed.
136Unless something is done to
137convert these blocks,
138the c-device will gradually
139fill up and stop functioning.
140Once a day,
141or by command,
142a
143.I dump
144of the cw-device
145is taken.
146The purpose of
147a dump is to queue the writes that
148have been shunted to the c-device
149to be written to the w-device.
150Since the w-device is a WORM,
151blocks cannot be rewritten.
152Blocks that have already been written to the WORM must be
153relocated to the unused portion of the w-device.
154These are precisely the
155blocks with
156.CW Cwrite
157state.
158.PP
159The dump algorithm is as follows:
160a) The tree on the cw-device is walked
161as long as the blocks visited have been
162modified since the last dump.
163These are the blocks with state
164.CW Cwrite
165and
166.CW Cdirty .
167It is possible to restrict the search
168to within these blocks
169since the directory containing a modified
170file must have been accessed to modify the
171file and accessing a directory will set its
172modified time thus causing the block containing it
173to be written.
174The directory containing that directory must be
175modified for the same reason.
176The tree walk is thus drastically restrained and the
177tree walk does not take much time.
178b) All
179.CW Cwrite
180blocks found in the tree search
181are relocated to new blank blocks on the w-device
182and converted to
183.CW Cdump
184state.
185All
186.CW Cdirty
187blocks are converted to
188.CW Cdump
189state without relocation.
190At this point,
191all modified blocks in the cw-device
192have w-addresses that point to unwritten
193WORM blocks.
194These blocks are marked for later
195writing to the w-device
196with the state
197.CW Cdump .
198c) All open files that were pointing to modified
199blocks are reopened to point at the corresponding
200reallocated blocks.
201This causes the directories leading to the
202open files to be modified.
203Thus the invariant discussed in a) is maintained.
204d) The background dumping process will slowly
205go through the map of the c-device and write out
206all blocks with
207.CW Cdump
208state.
209.PP
210The dump takes a few minutes to walk the tree
211and mark the blocks.
212It then takes hours to write the marked blocks
213to the WORM.
214If a marked block is rewritten before the old
215copy has been written to the WORM,
216it must be forced to the WORM before it is rewritten.
217There is no problem if another dump is taken before the first one
218is finished.
219The newly marked blocks are just added to the marked blocks
220left from the first dump.
221.PP
222If there is an error writing a marked block
223to the WORM
224then the
225.CW dump
226state is converted to
227.CW Cdump1
228and manual intervention is needed.
229(See the
230.CW cwcmd
231.CW mvstate
232command in
233.I fs (8)).
234These blocks can be disposed of by converting
235their state back to
236.CW Cdump
237so that they will be written again.
238They can also be converted to
239.CW Cwrite
240state so that they will be allocated new
241addresses at the next dump.
242In most other respects,
243a
244.CW Cdump1
245block behaves like a
246.CW Cwrite
247block.
248