xref: /plan9/sys/doc/fs/p6 (revision 219b2ee8daee37f4aad58d63f21287faa8e4ffdc)
1*219b2ee8SDavid du Colombier.SH
2*219b2ee8SDavid du ColombierCache/WORM Driver
3*219b2ee8SDavid du Colombier.PP
4*219b2ee8SDavid du ColombierThe cache/WORM (cw) driver is by far the
5*219b2ee8SDavid du Colombierlargest and most complicated device driver in the file server.
6*219b2ee8SDavid du ColombierThere are four devices involved in the cw driver.
7*219b2ee8SDavid du ColombierIt implements a read/write pseudo-device (the cw-device)
8*219b2ee8SDavid du Colombierand a read-only pseudo-device (the dump device)
9*219b2ee8SDavid du Colombierby performing operations on its two constituent devices
10*219b2ee8SDavid du Colombierthe read-write c-device and the write-once-read-many
11*219b2ee8SDavid du Colombierw-device.
12*219b2ee8SDavid du ColombierThe block numbers on the four devices are distinct,
13*219b2ee8SDavid du Colombieralthough the cw addresses,
14*219b2ee8SDavid du Colombierdump addresses,
15*219b2ee8SDavid du Colombierand the w addresses are
16*219b2ee8SDavid du Colombierhighly correlated.
17*219b2ee8SDavid du Colombier.PP
18*219b2ee8SDavid du ColombierThe cw-driver uses the w-device as the
19*219b2ee8SDavid du Colombierstable storage of the file system at the time of the
20*219b2ee8SDavid du Colombierlast dump.
21*219b2ee8SDavid du ColombierAll newly written and a large number of recently used
22*219b2ee8SDavid du Colombierexact copies of blocks of the w-device are kept on the c-device.
23*219b2ee8SDavid du ColombierThe c-device is much smaller than the w-device and
24*219b2ee8SDavid du Colombierso the subset of w-blocks that are kept on the c-device are
25*219b2ee8SDavid du Colombiermapped through a hash table kept on a partition of the c-device.
26*219b2ee8SDavid du Colombier.PP
27*219b2ee8SDavid du ColombierThe map portion of the c-device consists of blocks of buckets of entries.
28*219b2ee8SDavid du ColombierThe declarations follow.
29*219b2ee8SDavid du Colombier.P1
30*219b2ee8SDavid du Colombier    enum
31*219b2ee8SDavid du Colombier    {
32*219b2ee8SDavid du Colombier        BKPERBLK = 10,
33*219b2ee8SDavid du Colombier        CEPERBK  = (BUFSIZE - BKPERBLK*sizeof(long)) /
34*219b2ee8SDavid du Colombier                   (sizeof(Centry)*BKPERBLK),
35*219b2ee8SDavid du Colombier    };
36*219b2ee8SDavid du Colombier.P2
37*219b2ee8SDavid du Colombier.P1
38*219b2ee8SDavid du Colombier    typedef
39*219b2ee8SDavid du Colombier    struct
40*219b2ee8SDavid du Colombier    {
41*219b2ee8SDavid du Colombier        ushort   age;
42*219b2ee8SDavid du Colombier        short    state;
43*219b2ee8SDavid du Colombier        long     waddr;
44*219b2ee8SDavid du Colombier    } Centry;
45*219b2ee8SDavid du Colombier.P2
46*219b2ee8SDavid du Colombier.P1
47*219b2ee8SDavid du Colombier    typedef
48*219b2ee8SDavid du Colombier    struct
49*219b2ee8SDavid du Colombier    {
50*219b2ee8SDavid du Colombier        long     agegen;
51*219b2ee8SDavid du Colombier        Centry   entry[CEPERBK];
52*219b2ee8SDavid du Colombier    } Bucket;
53*219b2ee8SDavid du Colombier.P2
54*219b2ee8SDavid du Colombier.P1
55*219b2ee8SDavid du Colombier    Bucket   bucket[BKPERBLK];
56*219b2ee8SDavid du Colombier.P2
57*219b2ee8SDavid du ColombierThere is exactly one entry structure for each block in the
58*219b2ee8SDavid du Colombierdata partition of the c-device.
59*219b2ee8SDavid du ColombierA bucket contains all of the w-addresses that have
60*219b2ee8SDavid du Colombierthe same hash code.
61*219b2ee8SDavid du ColombierThere are as many buckets as will fit
62*219b2ee8SDavid du Colombierin a block and enough blocks to have the required
63*219b2ee8SDavid du Colombiernumber of entries.
64*219b2ee8SDavid du ColombierThe entries in the bucket are maintained
65*219b2ee8SDavid du Colombierin FIFO order with an age variable and an incrementing age generator.
66*219b2ee8SDavid du ColombierWhen the age generator is about to overflow,
67*219b2ee8SDavid du Colombierall of the ages in the bucket are rescaled
68*219b2ee8SDavid du Colombierfrom zero.
69*219b2ee8SDavid du Colombier.PP
70*219b2ee8SDavid du ColombierThe following steps go into converting a w-address into a c-address.
71*219b2ee8SDavid du ColombierThe bucket is found by
72*219b2ee8SDavid du Colombier.P1
73*219b2ee8SDavid du Colombier    bucket_number = w-address % total_buckets
74*219b2ee8SDavid du Colombier    getbuf(c-device, bucket_offset + bucket_number/BKPERBLK);
75*219b2ee8SDavid du Colombier.P2
76*219b2ee8SDavid du ColombierAfter the desired bucket is found,
77*219b2ee8SDavid du Colombierthe desired entry is found by a linear search within the bucket for the
78*219b2ee8SDavid du Colombierentry with the desired
79*219b2ee8SDavid du Colombier.CW waddr .
80*219b2ee8SDavid du Colombier.PP
81*219b2ee8SDavid du ColombierThe state variable in the entry is
82*219b2ee8SDavid du Colombierone of the following.
83*219b2ee8SDavid du Colombier.P1
84*219b2ee8SDavid du Colombier    enum
85*219b2ee8SDavid du Colombier    {
86*219b2ee8SDavid du Colombier        Cnone    = 0,
87*219b2ee8SDavid du Colombier        Cdirty,
88*219b2ee8SDavid du Colombier        Cdump,
89*219b2ee8SDavid du Colombier        Cread,
90*219b2ee8SDavid du Colombier        Cwrite,
91*219b2ee8SDavid du Colombier        Cdump1,
92*219b2ee8SDavid du Colombier    };
93*219b2ee8SDavid du Colombier.P2
94*219b2ee8SDavid du ColombierEvery w-address has a state.
95*219b2ee8SDavid du ColombierBlocks that are not in the
96*219b2ee8SDavid du Colombierc-device have the implied
97*219b2ee8SDavid du Colombierstate
98*219b2ee8SDavid du Colombier.CW Cnone .
99*219b2ee8SDavid du ColombierThe
100*219b2ee8SDavid du Colombier.CW Cread
101*219b2ee8SDavid du Colombierstate is for blocks that have the
102*219b2ee8SDavid du Colombiersame data as the corresponding block in
103*219b2ee8SDavid du Colombierthe w-device.
104*219b2ee8SDavid du ColombierSince the c-device is much faster than the
105*219b2ee8SDavid du Colombierw-device,
106*219b2ee8SDavid du Colombier.CW Cread
107*219b2ee8SDavid du Colombierblocks are kept as long as possible and
108*219b2ee8SDavid du Colombierused in preference to reading the w-device.
109*219b2ee8SDavid du Colombier.CW Cread
110*219b2ee8SDavid du Colombierblocks may be discarded from the c-device
111*219b2ee8SDavid du Colombierwhen the space is needed for newer data.
112*219b2ee8SDavid du ColombierThe
113*219b2ee8SDavid du Colombier.CW Cwrite
114*219b2ee8SDavid du Colombierstate is when the c-device contains newer data
115*219b2ee8SDavid du Colombierthan the corresponding block on the w-device.
116*219b2ee8SDavid du ColombierThis happens when a
117*219b2ee8SDavid du Colombier.CW Cnone ,
118*219b2ee8SDavid du Colombier.CW Cread ,
119*219b2ee8SDavid du Colombieror
120*219b2ee8SDavid du Colombier.CW Cwrite
121*219b2ee8SDavid du Colombierblock is written.
122*219b2ee8SDavid du ColombierThe
123*219b2ee8SDavid du Colombier.CW Cdirty
124*219b2ee8SDavid du Colombierstate
125*219b2ee8SDavid du Colombieris when the c-device contains
126*219b2ee8SDavid du Colombiernew data and the corresponding block
127*219b2ee8SDavid du Colombieron the w-device has never been written.
128*219b2ee8SDavid du ColombierThis happens when a new block has been
129*219b2ee8SDavid du Colombierallocated from the free space on the w-device.
130*219b2ee8SDavid du Colombier.PP
131*219b2ee8SDavid du ColombierThe
132*219b2ee8SDavid du Colombier.CW Cwrite
133*219b2ee8SDavid du Colombierand
134*219b2ee8SDavid du Colombier.CW Cdirty
135*219b2ee8SDavid du Colombierblocks are created and never removed.
136*219b2ee8SDavid du ColombierUnless something is done to
137*219b2ee8SDavid du Colombierconvert these blocks,
138*219b2ee8SDavid du Colombierthe c-device will gradually
139*219b2ee8SDavid du Colombierfill up and stop functioning.
140*219b2ee8SDavid du ColombierOnce a day,
141*219b2ee8SDavid du Colombieror by command,
142*219b2ee8SDavid du Colombiera
143*219b2ee8SDavid du Colombier.I dump
144*219b2ee8SDavid du Colombierof the cw-device
145*219b2ee8SDavid du Colombieris taken.
146*219b2ee8SDavid du ColombierThe purpose of
147*219b2ee8SDavid du Colombiera dump is to queue the writes that
148*219b2ee8SDavid du Colombierhave been shunted to the c-device
149*219b2ee8SDavid du Colombierto be written to the w-device.
150*219b2ee8SDavid du ColombierSince the w-device is a WORM,
151*219b2ee8SDavid du Colombierblocks cannot be rewritten.
152*219b2ee8SDavid du ColombierBlocks that have already been written to the WORM must be
153*219b2ee8SDavid du Colombierrelocated to the unused portion of the w-device.
154*219b2ee8SDavid du ColombierThese are precisely the
155*219b2ee8SDavid du Colombierblocks with
156*219b2ee8SDavid du Colombier.CW Cwrite
157*219b2ee8SDavid du Colombierstate.
158*219b2ee8SDavid du Colombier.PP
159*219b2ee8SDavid du ColombierThe dump algorithm is as follows:
160*219b2ee8SDavid du Colombiera) The tree on the cw-device is walked
161*219b2ee8SDavid du Colombieras long as the blocks visited have been
162*219b2ee8SDavid du Colombiermodified since the last dump.
163*219b2ee8SDavid du ColombierThese are the blocks with state
164*219b2ee8SDavid du Colombier.CW Cwrite
165*219b2ee8SDavid du Colombierand
166*219b2ee8SDavid du Colombier.CW Cdirty .
167*219b2ee8SDavid du ColombierIt is possible to restrict the search
168*219b2ee8SDavid du Colombierto within these blocks
169*219b2ee8SDavid du Colombiersince the directory containing a modified
170*219b2ee8SDavid du Colombierfile must have been accessed to modify the
171*219b2ee8SDavid du Colombierfile and accessing a directory will set its
172*219b2ee8SDavid du Colombiermodified time thus causing the block containing it
173*219b2ee8SDavid du Colombierto be written.
174*219b2ee8SDavid du ColombierThe directory containing that directory must be
175*219b2ee8SDavid du Colombiermodified for the same reason.
176*219b2ee8SDavid du ColombierThe tree walk is thus drastically restrained and the
177*219b2ee8SDavid du Colombiertree walk does not take much time.
178*219b2ee8SDavid du Colombierb) All
179*219b2ee8SDavid du Colombier.CW Cwrite
180*219b2ee8SDavid du Colombierblocks found in the tree search
181*219b2ee8SDavid du Colombierare relocated to new blank blocks on the w-device
182*219b2ee8SDavid du Colombierand converted to
183*219b2ee8SDavid du Colombier.CW Cdump
184*219b2ee8SDavid du Colombierstate.
185*219b2ee8SDavid du ColombierAll
186*219b2ee8SDavid du Colombier.CW Cdirty
187*219b2ee8SDavid du Colombierblocks are converted to
188*219b2ee8SDavid du Colombier.CW Cdump
189*219b2ee8SDavid du Colombierstate without relocation.
190*219b2ee8SDavid du ColombierAt this point,
191*219b2ee8SDavid du Colombierall modified blocks in the cw-device
192*219b2ee8SDavid du Colombierhave w-addresses that point to unwritten
193*219b2ee8SDavid du ColombierWORM blocks.
194*219b2ee8SDavid du ColombierThese blocks are marked for later
195*219b2ee8SDavid du Colombierwriting to the w-device
196*219b2ee8SDavid du Colombierwith the state
197*219b2ee8SDavid du Colombier.CW Cdump .
198*219b2ee8SDavid du Colombierc) All open files that were pointing to modified
199*219b2ee8SDavid du Colombierblocks are reopened to point at the corresponding
200*219b2ee8SDavid du Colombierreallocated blocks.
201*219b2ee8SDavid du ColombierThis causes the directories leading to the
202*219b2ee8SDavid du Colombieropen files to be modified.
203*219b2ee8SDavid du ColombierThus the invariant discussed in a) is maintained.
204*219b2ee8SDavid du Colombierd) The background dumping process will slowly
205*219b2ee8SDavid du Colombiergo through the map of the c-device and write out
206*219b2ee8SDavid du Colombierall blocks with
207*219b2ee8SDavid du Colombier.CW Cdump
208*219b2ee8SDavid du Colombierstate.
209*219b2ee8SDavid du Colombier.PP
210*219b2ee8SDavid du ColombierThe dump takes a few minutes to walk the tree
211*219b2ee8SDavid du Colombierand mark the blocks.
212*219b2ee8SDavid du ColombierIt then takes hours to write the marked blocks
213*219b2ee8SDavid du Colombierto the WORM.
214*219b2ee8SDavid du ColombierIf a marked block is rewritten before the old
215*219b2ee8SDavid du Colombiercopy has been written to the WORM,
216*219b2ee8SDavid du Colombierit must be forced to the WORM before it is rewritten.
217*219b2ee8SDavid du ColombierThere is no problem if another dump is taken before the first one
218*219b2ee8SDavid du Colombieris finished.
219*219b2ee8SDavid du ColombierThe newly marked blocks are just added to the marked blocks
220*219b2ee8SDavid du Colombierleft from the first dump.
221*219b2ee8SDavid du Colombier.PP
222*219b2ee8SDavid du ColombierIf there is an error writing a marked block
223*219b2ee8SDavid du Colombierto the WORM
224*219b2ee8SDavid du Colombierthen the
225*219b2ee8SDavid du Colombier.CW dump
226*219b2ee8SDavid du Colombierstate is converted to
227*219b2ee8SDavid du Colombier.CW Cdump1
228*219b2ee8SDavid du Colombierand manual intervention is needed.
229*219b2ee8SDavid du Colombier(See the
230*219b2ee8SDavid du Colombier.CW cwcmd
231*219b2ee8SDavid du Colombier.CW mvstate
232*219b2ee8SDavid du Colombiercommand in
233*219b2ee8SDavid du Colombier.I fs (8)).
234*219b2ee8SDavid du ColombierThese blocks can be disposed of by converting
235*219b2ee8SDavid du Colombiertheir state back to
236*219b2ee8SDavid du Colombier.CW Cdump
237*219b2ee8SDavid du Colombierso that they will be written again.
238*219b2ee8SDavid du ColombierThey can also be converted to
239*219b2ee8SDavid du Colombier.CW Cwrite
240*219b2ee8SDavid du Colombierstate so that they will be allocated new
241*219b2ee8SDavid du Colombieraddresses at the next dump.
242*219b2ee8SDavid du ColombierIn most other respects,
243*219b2ee8SDavid du Colombiera
244*219b2ee8SDavid du Colombier.CW Cdump1
245*219b2ee8SDavid du Colombierblock behaves like a
246*219b2ee8SDavid du Colombier.CW Cwrite
247*219b2ee8SDavid du Colombierblock.
248