xref: /plan9/sys/doc/fs/p6 (revision 6aeb1f0c04d990a2469d1ed5385e6b7f18c098d6)
1219b2ee8SDavid du Colombier.SH
2219b2ee8SDavid du ColombierCache/WORM Driver
3219b2ee8SDavid du Colombier.PP
4219b2ee8SDavid du ColombierThe cache/WORM (cw) driver is by far the
5219b2ee8SDavid du Colombierlargest and most complicated device driver in the file server.
6219b2ee8SDavid du ColombierThere are four devices involved in the cw driver.
7219b2ee8SDavid du ColombierIt implements a read/write pseudo-device (the cw-device)
8219b2ee8SDavid du Colombierand a read-only pseudo-device (the dump device)
9219b2ee8SDavid du Colombierby performing operations on its two constituent devices
10219b2ee8SDavid du Colombierthe read-write c-device and the write-once-read-many
11219b2ee8SDavid du Colombierw-device.
12219b2ee8SDavid du ColombierThe block numbers on the four devices are distinct,
13*6aeb1f0cSDavid du Colombieralthough the
14*6aeb1f0cSDavid du Colombier.I cw
15*6aeb1f0cSDavid du Colombieraddresses,
16219b2ee8SDavid du Colombierdump addresses,
17*6aeb1f0cSDavid du Colombierand the
18*6aeb1f0cSDavid du Colombier.I w
19*6aeb1f0cSDavid du Colombieraddresses are
20219b2ee8SDavid du Colombierhighly correlated.
21219b2ee8SDavid du Colombier.PP
22219b2ee8SDavid du ColombierThe cw-driver uses the w-device as the
23219b2ee8SDavid du Colombierstable storage of the file system at the time of the
24219b2ee8SDavid du Colombierlast dump.
25219b2ee8SDavid du ColombierAll newly written and a large number of recently used
26219b2ee8SDavid du Colombierexact copies of blocks of the w-device are kept on the c-device.
27219b2ee8SDavid du ColombierThe c-device is much smaller than the w-device and
28219b2ee8SDavid du Colombierso the subset of w-blocks that are kept on the c-device are
29219b2ee8SDavid du Colombiermapped through a hash table kept on a partition of the c-device.
30219b2ee8SDavid du Colombier.PP
31219b2ee8SDavid du ColombierThe map portion of the c-device consists of blocks of buckets of entries.
32219b2ee8SDavid du ColombierThe declarations follow.
33*6aeb1f0cSDavid du Colombier.Ex
34219b2ee8SDavid du Colombier	enum
35219b2ee8SDavid du Colombier	{
36219b2ee8SDavid du Colombier		BKPERBLK = 10,
37*6aeb1f0cSDavid du Colombier		CEPERBK	= (BUFSIZE - BKPERBLK*sizeof(Off)) /
38219b2ee8SDavid du Colombier				(sizeof(Centry)*BKPERBLK),
39219b2ee8SDavid du Colombier	};
40*6aeb1f0cSDavid du Colombier.Ee
41*6aeb1f0cSDavid du Colombier.Ex
42219b2ee8SDavid du Colombier	typedef
43219b2ee8SDavid du Colombier	struct
44219b2ee8SDavid du Colombier	{
45219b2ee8SDavid du Colombier		ushort	age;
46219b2ee8SDavid du Colombier		short	state;
47*6aeb1f0cSDavid du Colombier		Off	waddr;
48219b2ee8SDavid du Colombier	} Centry;
49*6aeb1f0cSDavid du Colombier.Ee
50*6aeb1f0cSDavid du Colombier.Ex
51219b2ee8SDavid du Colombier	typedef
52219b2ee8SDavid du Colombier	struct
53219b2ee8SDavid du Colombier	{
54219b2ee8SDavid du Colombier		long	agegen;
55219b2ee8SDavid du Colombier		Centry	entry[CEPERBK];
56219b2ee8SDavid du Colombier	} Bucket;
57*6aeb1f0cSDavid du Colombier.Ee
58*6aeb1f0cSDavid du Colombier.Ex
59219b2ee8SDavid du Colombier	Bucket	bucket[BKPERBLK];
60*6aeb1f0cSDavid du Colombier.Ee
61219b2ee8SDavid du ColombierThere is exactly one entry structure for each block in the
62219b2ee8SDavid du Colombierdata partition of the c-device.
63219b2ee8SDavid du ColombierA bucket contains all of the w-addresses that have
64219b2ee8SDavid du Colombierthe same hash code.
65219b2ee8SDavid du ColombierThere are as many buckets as will fit
66219b2ee8SDavid du Colombierin a block and enough blocks to have the required
67219b2ee8SDavid du Colombiernumber of entries.
68219b2ee8SDavid du ColombierThe entries in the bucket are maintained
69219b2ee8SDavid du Colombierin FIFO order with an age variable and an incrementing age generator.
70219b2ee8SDavid du ColombierWhen the age generator is about to overflow,
71219b2ee8SDavid du Colombierall of the ages in the bucket are rescaled
72219b2ee8SDavid du Colombierfrom zero.
73219b2ee8SDavid du Colombier.PP
74219b2ee8SDavid du ColombierThe following steps go into converting a w-address into a c-address.
75219b2ee8SDavid du ColombierThe bucket is found by
76*6aeb1f0cSDavid du Colombier.Ex
77*6aeb1f0cSDavid du Colombier	bucket_number = w-address % total_buckets;
78219b2ee8SDavid du Colombier	getbuf(c-device, bucket_offset + bucket_number/BKPERBLK);
79*6aeb1f0cSDavid du Colombier.Ee
80219b2ee8SDavid du ColombierAfter the desired bucket is found,
81219b2ee8SDavid du Colombierthe desired entry is found by a linear search within the bucket for the
82219b2ee8SDavid du Colombierentry with the desired
83219b2ee8SDavid du Colombier.CW waddr .
84219b2ee8SDavid du Colombier.PP
85219b2ee8SDavid du ColombierThe state variable in the entry is
86219b2ee8SDavid du Colombierone of the following.
87*6aeb1f0cSDavid du Colombier.Ex
88219b2ee8SDavid du Colombier	enum
89219b2ee8SDavid du Colombier	{
90219b2ee8SDavid du Colombier		Cnone	= 0,
91219b2ee8SDavid du Colombier		Cdirty,
92219b2ee8SDavid du Colombier		Cdump,
93219b2ee8SDavid du Colombier		Cread,
94219b2ee8SDavid du Colombier		Cwrite,
95219b2ee8SDavid du Colombier		Cdump1,
96219b2ee8SDavid du Colombier	};
97*6aeb1f0cSDavid du Colombier.Ee
98219b2ee8SDavid du ColombierEvery w-address has a state.
99219b2ee8SDavid du ColombierBlocks that are not in the
100219b2ee8SDavid du Colombierc-device have the implied
101219b2ee8SDavid du Colombierstate
102219b2ee8SDavid du Colombier.CW Cnone .
103219b2ee8SDavid du ColombierThe
104219b2ee8SDavid du Colombier.CW Cread
105219b2ee8SDavid du Colombierstate is for blocks that have the
106219b2ee8SDavid du Colombiersame data as the corresponding block in
107219b2ee8SDavid du Colombierthe w-device.
108219b2ee8SDavid du ColombierSince the c-device is much faster than the
109219b2ee8SDavid du Colombierw-device,
110219b2ee8SDavid du Colombier.CW Cread
111219b2ee8SDavid du Colombierblocks are kept as long as possible and
112219b2ee8SDavid du Colombierused in preference to reading the w-device.
113219b2ee8SDavid du Colombier.CW Cread
114219b2ee8SDavid du Colombierblocks may be discarded from the c-device
115219b2ee8SDavid du Colombierwhen the space is needed for newer data.
116219b2ee8SDavid du ColombierThe
117219b2ee8SDavid du Colombier.CW Cwrite
118219b2ee8SDavid du Colombierstate is when the c-device contains newer data
119219b2ee8SDavid du Colombierthan the corresponding block on the w-device.
120219b2ee8SDavid du ColombierThis happens when a
121219b2ee8SDavid du Colombier.CW Cnone ,
122219b2ee8SDavid du Colombier.CW Cread ,
123219b2ee8SDavid du Colombieror
124219b2ee8SDavid du Colombier.CW Cwrite
125219b2ee8SDavid du Colombierblock is written.
126219b2ee8SDavid du ColombierThe
127219b2ee8SDavid du Colombier.CW Cdirty
128219b2ee8SDavid du Colombierstate
129219b2ee8SDavid du Colombieris when the c-device contains
130219b2ee8SDavid du Colombiernew data and the corresponding block
131219b2ee8SDavid du Colombieron the w-device has never been written.
132219b2ee8SDavid du ColombierThis happens when a new block has been
133219b2ee8SDavid du Colombierallocated from the free space on the w-device.
134219b2ee8SDavid du Colombier.PP
135219b2ee8SDavid du ColombierThe
136219b2ee8SDavid du Colombier.CW Cwrite
137219b2ee8SDavid du Colombierand
138219b2ee8SDavid du Colombier.CW Cdirty
139219b2ee8SDavid du Colombierblocks are created and never removed.
140219b2ee8SDavid du ColombierUnless something is done to
141219b2ee8SDavid du Colombierconvert these blocks,
142219b2ee8SDavid du Colombierthe c-device will gradually
143219b2ee8SDavid du Colombierfill up and stop functioning.
144219b2ee8SDavid du ColombierOnce a day,
145219b2ee8SDavid du Colombieror by command,
146219b2ee8SDavid du Colombiera
147219b2ee8SDavid du Colombier.I dump
148219b2ee8SDavid du Colombierof the cw-device
149219b2ee8SDavid du Colombieris taken.
150219b2ee8SDavid du ColombierThe purpose of
151219b2ee8SDavid du Colombiera dump is to queue the writes that
152219b2ee8SDavid du Colombierhave been shunted to the c-device
153219b2ee8SDavid du Colombierto be written to the w-device.
154219b2ee8SDavid du ColombierSince the w-device is a WORM,
155219b2ee8SDavid du Colombierblocks cannot be rewritten.
156219b2ee8SDavid du ColombierBlocks that have already been written to the WORM must be
157219b2ee8SDavid du Colombierrelocated to the unused portion of the w-device.
158219b2ee8SDavid du ColombierThese are precisely the
159219b2ee8SDavid du Colombierblocks with
160219b2ee8SDavid du Colombier.CW Cwrite
161219b2ee8SDavid du Colombierstate.
162219b2ee8SDavid du Colombier.PP
163219b2ee8SDavid du ColombierThe dump algorithm is as follows:
164*6aeb1f0cSDavid du Colombier.IP a)
165*6aeb1f0cSDavid du ColombierThe tree on the cw-device is walked
166219b2ee8SDavid du Colombieras long as the blocks visited have been
167219b2ee8SDavid du Colombiermodified since the last dump.
168219b2ee8SDavid du ColombierThese are the blocks with state
169219b2ee8SDavid du Colombier.CW Cwrite
170219b2ee8SDavid du Colombierand
171219b2ee8SDavid du Colombier.CW Cdirty .
172219b2ee8SDavid du ColombierIt is possible to restrict the search
173219b2ee8SDavid du Colombierto within these blocks
174219b2ee8SDavid du Colombiersince the directory containing a modified
175219b2ee8SDavid du Colombierfile must have been accessed to modify the
176219b2ee8SDavid du Colombierfile and accessing a directory will set its
177219b2ee8SDavid du Colombiermodified time thus causing the block containing it
178219b2ee8SDavid du Colombierto be written.
179219b2ee8SDavid du ColombierThe directory containing that directory must be
180219b2ee8SDavid du Colombiermodified for the same reason.
181219b2ee8SDavid du ColombierThe tree walk is thus drastically restrained and the
182219b2ee8SDavid du Colombiertree walk does not take much time.
183*6aeb1f0cSDavid du Colombier.IP b)
184*6aeb1f0cSDavid du ColombierAll
185219b2ee8SDavid du Colombier.CW Cwrite
186219b2ee8SDavid du Colombierblocks found in the tree search
187219b2ee8SDavid du Colombierare relocated to new blank blocks on the w-device
188219b2ee8SDavid du Colombierand converted to
189219b2ee8SDavid du Colombier.CW Cdump
190219b2ee8SDavid du Colombierstate.
191219b2ee8SDavid du ColombierAll
192219b2ee8SDavid du Colombier.CW Cdirty
193219b2ee8SDavid du Colombierblocks are converted to
194219b2ee8SDavid du Colombier.CW Cdump
195219b2ee8SDavid du Colombierstate without relocation.
196219b2ee8SDavid du ColombierAt this point,
197219b2ee8SDavid du Colombierall modified blocks in the cw-device
198219b2ee8SDavid du Colombierhave w-addresses that point to unwritten
199219b2ee8SDavid du ColombierWORM blocks.
200219b2ee8SDavid du ColombierThese blocks are marked for later
201219b2ee8SDavid du Colombierwriting to the w-device
202219b2ee8SDavid du Colombierwith the state
203219b2ee8SDavid du Colombier.CW Cdump .
204*6aeb1f0cSDavid du Colombier.IP c)
205*6aeb1f0cSDavid du ColombierAll open files that were pointing to modified
206219b2ee8SDavid du Colombierblocks are reopened to point at the corresponding
207219b2ee8SDavid du Colombierreallocated blocks.
208219b2ee8SDavid du ColombierThis causes the directories leading to the
209219b2ee8SDavid du Colombieropen files to be modified.
210219b2ee8SDavid du ColombierThus the invariant discussed in a) is maintained.
211*6aeb1f0cSDavid du Colombier.IP d)
212*6aeb1f0cSDavid du ColombierThe background dumping process will slowly
213219b2ee8SDavid du Colombiergo through the map of the c-device and write out
214219b2ee8SDavid du Colombierall blocks with
215219b2ee8SDavid du Colombier.CW Cdump
216219b2ee8SDavid du Colombierstate.
217219b2ee8SDavid du Colombier.PP
218219b2ee8SDavid du ColombierThe dump takes a few minutes to walk the tree
219219b2ee8SDavid du Colombierand mark the blocks.
2207dd7cddfSDavid du ColombierIt can take hours to write the marked blocks
221219b2ee8SDavid du Colombierto the WORM.
222219b2ee8SDavid du ColombierIf a marked block is rewritten before the old
223219b2ee8SDavid du Colombiercopy has been written to the WORM,
224219b2ee8SDavid du Colombierit must be forced to the WORM before it is rewritten.
225219b2ee8SDavid du ColombierThere is no problem if another dump is taken before the first one
226219b2ee8SDavid du Colombieris finished.
227219b2ee8SDavid du ColombierThe newly marked blocks are just added to the marked blocks
228219b2ee8SDavid du Colombierleft from the first dump.
229219b2ee8SDavid du Colombier.PP
230219b2ee8SDavid du ColombierIf there is an error writing a marked block
231219b2ee8SDavid du Colombierto the WORM
232219b2ee8SDavid du Colombierthen the
233219b2ee8SDavid du Colombier.CW dump
234219b2ee8SDavid du Colombierstate is converted to
235219b2ee8SDavid du Colombier.CW Cdump1
236219b2ee8SDavid du Colombierand manual intervention is needed.
237219b2ee8SDavid du Colombier(See the
238219b2ee8SDavid du Colombier.CW cwcmd
239219b2ee8SDavid du Colombier.CW mvstate
240219b2ee8SDavid du Colombiercommand in
241219b2ee8SDavid du Colombier.I fs (8)).
242219b2ee8SDavid du ColombierThese blocks can be disposed of by converting
243219b2ee8SDavid du Colombiertheir state back to
244219b2ee8SDavid du Colombier.CW Cdump
245219b2ee8SDavid du Colombierso that they will be written again.
246219b2ee8SDavid du ColombierThey can also be converted to
247219b2ee8SDavid du Colombier.CW Cwrite
248219b2ee8SDavid du Colombierstate so that they will be allocated new
249219b2ee8SDavid du Colombieraddresses at the next dump.
250219b2ee8SDavid du ColombierIn most other respects,
251219b2ee8SDavid du Colombiera
252219b2ee8SDavid du Colombier.CW Cdump1
253219b2ee8SDavid du Colombierblock behaves like a
254219b2ee8SDavid du Colombier.CW Cwrite
255219b2ee8SDavid du Colombierblock.
256