xref: /plan9/sys/doc/fs/p6 (revision 6aeb1f0c04d990a2469d1ed5385e6b7f18c098d6)
1.SH
2Cache/WORM Driver
3.PP
4The cache/WORM (cw) driver is by far the
5largest and most complicated device driver in the file server.
6There are four devices involved in the cw driver.
7It implements a read/write pseudo-device (the cw-device)
8and a read-only pseudo-device (the dump device)
9by performing operations on its two constituent devices
10the read-write c-device and the write-once-read-many
11w-device.
12The block numbers on the four devices are distinct,
13although the
14.I cw
15addresses,
16dump addresses,
17and the
18.I w
19addresses are
20highly correlated.
21.PP
22The cw-driver uses the w-device as the
23stable storage of the file system at the time of the
24last dump.
25All newly written and a large number of recently used
26exact copies of blocks of the w-device are kept on the c-device.
27The c-device is much smaller than the w-device and
28so the subset of w-blocks that are kept on the c-device are
29mapped through a hash table kept on a partition of the c-device.
30.PP
31The map portion of the c-device consists of blocks of buckets of entries.
32The declarations follow.
33.Ex
34	enum
35	{
36		BKPERBLK = 10,
37		CEPERBK	= (BUFSIZE - BKPERBLK*sizeof(Off)) /
38				(sizeof(Centry)*BKPERBLK),
39	};
40.Ee
41.Ex
42	typedef
43	struct
44	{
45		ushort	age;
46		short	state;
47		Off	waddr;
48	} Centry;
49.Ee
50.Ex
51	typedef
52	struct
53	{
54		long	agegen;
55		Centry	entry[CEPERBK];
56	} Bucket;
57.Ee
58.Ex
59	Bucket	bucket[BKPERBLK];
60.Ee
61There is exactly one entry structure for each block in the
62data partition of the c-device.
63A bucket contains all of the w-addresses that have
64the same hash code.
65There are as many buckets as will fit
66in a block and enough blocks to have the required
67number of entries.
68The entries in the bucket are maintained
69in FIFO order with an age variable and an incrementing age generator.
70When the age generator is about to overflow,
71all of the ages in the bucket are rescaled
72from zero.
73.PP
74The following steps go into converting a w-address into a c-address.
75The bucket is found by
76.Ex
77	bucket_number = w-address % total_buckets;
78	getbuf(c-device, bucket_offset + bucket_number/BKPERBLK);
79.Ee
80After the desired bucket is found,
81the desired entry is found by a linear search within the bucket for the
82entry with the desired
83.CW waddr .
84.PP
85The state variable in the entry is
86one of the following.
87.Ex
88	enum
89	{
90		Cnone	= 0,
91		Cdirty,
92		Cdump,
93		Cread,
94		Cwrite,
95		Cdump1,
96	};
97.Ee
98Every w-address has a state.
99Blocks that are not in the
100c-device have the implied
101state
102.CW Cnone .
103The
104.CW Cread
105state is for blocks that have the
106same data as the corresponding block in
107the w-device.
108Since the c-device is much faster than the
109w-device,
110.CW Cread
111blocks are kept as long as possible and
112used in preference to reading the w-device.
113.CW Cread
114blocks may be discarded from the c-device
115when the space is needed for newer data.
116The
117.CW Cwrite
118state is when the c-device contains newer data
119than the corresponding block on the w-device.
120This happens when a
121.CW Cnone ,
122.CW Cread ,
123or
124.CW Cwrite
125block is written.
126The
127.CW Cdirty
128state
129is when the c-device contains
130new data and the corresponding block
131on the w-device has never been written.
132This happens when a new block has been
133allocated from the free space on the w-device.
134.PP
135The
136.CW Cwrite
137and
138.CW Cdirty
139blocks are created and never removed.
140Unless something is done to
141convert these blocks,
142the c-device will gradually
143fill up and stop functioning.
144Once a day,
145or by command,
146a
147.I dump
148of the cw-device
149is taken.
150The purpose of
151a dump is to queue the writes that
152have been shunted to the c-device
153to be written to the w-device.
154Since the w-device is a WORM,
155blocks cannot be rewritten.
156Blocks that have already been written to the WORM must be
157relocated to the unused portion of the w-device.
158These are precisely the
159blocks with
160.CW Cwrite
161state.
162.PP
163The dump algorithm is as follows:
164.IP a)
165The tree on the cw-device is walked
166as long as the blocks visited have been
167modified since the last dump.
168These are the blocks with state
169.CW Cwrite
170and
171.CW Cdirty .
172It is possible to restrict the search
173to within these blocks
174since the directory containing a modified
175file must have been accessed to modify the
176file and accessing a directory will set its
177modified time thus causing the block containing it
178to be written.
179The directory containing that directory must be
180modified for the same reason.
181The tree walk is thus drastically restrained and the
182tree walk does not take much time.
183.IP b)
184All
185.CW Cwrite
186blocks found in the tree search
187are relocated to new blank blocks on the w-device
188and converted to
189.CW Cdump
190state.
191All
192.CW Cdirty
193blocks are converted to
194.CW Cdump
195state without relocation.
196At this point,
197all modified blocks in the cw-device
198have w-addresses that point to unwritten
199WORM blocks.
200These blocks are marked for later
201writing to the w-device
202with the state
203.CW Cdump .
204.IP c)
205All open files that were pointing to modified
206blocks are reopened to point at the corresponding
207reallocated blocks.
208This causes the directories leading to the
209open files to be modified.
210Thus the invariant discussed in a) is maintained.
211.IP d)
212The background dumping process will slowly
213go through the map of the c-device and write out
214all blocks with
215.CW Cdump
216state.
217.PP
218The dump takes a few minutes to walk the tree
219and mark the blocks.
220It can take hours to write the marked blocks
221to the WORM.
222If a marked block is rewritten before the old
223copy has been written to the WORM,
224it must be forced to the WORM before it is rewritten.
225There is no problem if another dump is taken before the first one
226is finished.
227The newly marked blocks are just added to the marked blocks
228left from the first dump.
229.PP
230If there is an error writing a marked block
231to the WORM
232then the
233.CW dump
234state is converted to
235.CW Cdump1
236and manual intervention is needed.
237(See the
238.CW cwcmd
239.CW mvstate
240command in
241.I fs (8)).
242These blocks can be disposed of by converting
243their state back to
244.CW Cdump
245so that they will be written again.
246They can also be converted to
247.CW Cwrite
248state so that they will be allocated new
249addresses at the next dump.
250In most other respects,
251a
252.CW Cdump1
253block behaves like a
254.CW Cwrite
255block.
256