1*219b2ee8SDavid du Colombier.SH 2*219b2ee8SDavid du ColombierCache/WORM Driver 3*219b2ee8SDavid du Colombier.PP 4*219b2ee8SDavid du ColombierThe cache/WORM (cw) driver is by far the 5*219b2ee8SDavid du Colombierlargest and most complicated device driver in the file server. 6*219b2ee8SDavid du ColombierThere are four devices involved in the cw driver. 7*219b2ee8SDavid du ColombierIt implements a read/write pseudo-device (the cw-device) 8*219b2ee8SDavid du Colombierand a read-only pseudo-device (the dump device) 9*219b2ee8SDavid du Colombierby performing operations on its two constituent devices 10*219b2ee8SDavid du Colombierthe read-write c-device and the write-once-read-many 11*219b2ee8SDavid du Colombierw-device. 12*219b2ee8SDavid du ColombierThe block numbers on the four devices are distinct, 13*219b2ee8SDavid du Colombieralthough the cw addresses, 14*219b2ee8SDavid du Colombierdump addresses, 15*219b2ee8SDavid du Colombierand the w addresses are 16*219b2ee8SDavid du Colombierhighly correlated. 17*219b2ee8SDavid du Colombier.PP 18*219b2ee8SDavid du ColombierThe cw-driver uses the w-device as the 19*219b2ee8SDavid du Colombierstable storage of the file system at the time of the 20*219b2ee8SDavid du Colombierlast dump. 21*219b2ee8SDavid du ColombierAll newly written and a large number of recently used 22*219b2ee8SDavid du Colombierexact copies of blocks of the w-device are kept on the c-device. 23*219b2ee8SDavid du ColombierThe c-device is much smaller than the w-device and 24*219b2ee8SDavid du Colombierso the subset of w-blocks that are kept on the c-device are 25*219b2ee8SDavid du Colombiermapped through a hash table kept on a partition of the c-device. 26*219b2ee8SDavid du Colombier.PP 27*219b2ee8SDavid du ColombierThe map portion of the c-device consists of blocks of buckets of entries. 28*219b2ee8SDavid du ColombierThe declarations follow. 29*219b2ee8SDavid du Colombier.P1 30*219b2ee8SDavid du Colombier enum 31*219b2ee8SDavid du Colombier { 32*219b2ee8SDavid du Colombier BKPERBLK = 10, 33*219b2ee8SDavid du Colombier CEPERBK = (BUFSIZE - BKPERBLK*sizeof(long)) / 34*219b2ee8SDavid du Colombier (sizeof(Centry)*BKPERBLK), 35*219b2ee8SDavid du Colombier }; 36*219b2ee8SDavid du Colombier.P2 37*219b2ee8SDavid du Colombier.P1 38*219b2ee8SDavid du Colombier typedef 39*219b2ee8SDavid du Colombier struct 40*219b2ee8SDavid du Colombier { 41*219b2ee8SDavid du Colombier ushort age; 42*219b2ee8SDavid du Colombier short state; 43*219b2ee8SDavid du Colombier long waddr; 44*219b2ee8SDavid du Colombier } Centry; 45*219b2ee8SDavid du Colombier.P2 46*219b2ee8SDavid du Colombier.P1 47*219b2ee8SDavid du Colombier typedef 48*219b2ee8SDavid du Colombier struct 49*219b2ee8SDavid du Colombier { 50*219b2ee8SDavid du Colombier long agegen; 51*219b2ee8SDavid du Colombier Centry entry[CEPERBK]; 52*219b2ee8SDavid du Colombier } Bucket; 53*219b2ee8SDavid du Colombier.P2 54*219b2ee8SDavid du Colombier.P1 55*219b2ee8SDavid du Colombier Bucket bucket[BKPERBLK]; 56*219b2ee8SDavid du Colombier.P2 57*219b2ee8SDavid du ColombierThere is exactly one entry structure for each block in the 58*219b2ee8SDavid du Colombierdata partition of the c-device. 59*219b2ee8SDavid du ColombierA bucket contains all of the w-addresses that have 60*219b2ee8SDavid du Colombierthe same hash code. 61*219b2ee8SDavid du ColombierThere are as many buckets as will fit 62*219b2ee8SDavid du Colombierin a block and enough blocks to have the required 63*219b2ee8SDavid du Colombiernumber of entries. 64*219b2ee8SDavid du ColombierThe entries in the bucket are maintained 65*219b2ee8SDavid du Colombierin FIFO order with an age variable and an incrementing age generator. 66*219b2ee8SDavid du ColombierWhen the age generator is about to overflow, 67*219b2ee8SDavid du Colombierall of the ages in the bucket are rescaled 68*219b2ee8SDavid du Colombierfrom zero. 69*219b2ee8SDavid du Colombier.PP 70*219b2ee8SDavid du ColombierThe following steps go into converting a w-address into a c-address. 71*219b2ee8SDavid du ColombierThe bucket is found by 72*219b2ee8SDavid du Colombier.P1 73*219b2ee8SDavid du Colombier bucket_number = w-address % total_buckets 74*219b2ee8SDavid du Colombier getbuf(c-device, bucket_offset + bucket_number/BKPERBLK); 75*219b2ee8SDavid du Colombier.P2 76*219b2ee8SDavid du ColombierAfter the desired bucket is found, 77*219b2ee8SDavid du Colombierthe desired entry is found by a linear search within the bucket for the 78*219b2ee8SDavid du Colombierentry with the desired 79*219b2ee8SDavid du Colombier.CW waddr . 80*219b2ee8SDavid du Colombier.PP 81*219b2ee8SDavid du ColombierThe state variable in the entry is 82*219b2ee8SDavid du Colombierone of the following. 83*219b2ee8SDavid du Colombier.P1 84*219b2ee8SDavid du Colombier enum 85*219b2ee8SDavid du Colombier { 86*219b2ee8SDavid du Colombier Cnone = 0, 87*219b2ee8SDavid du Colombier Cdirty, 88*219b2ee8SDavid du Colombier Cdump, 89*219b2ee8SDavid du Colombier Cread, 90*219b2ee8SDavid du Colombier Cwrite, 91*219b2ee8SDavid du Colombier Cdump1, 92*219b2ee8SDavid du Colombier }; 93*219b2ee8SDavid du Colombier.P2 94*219b2ee8SDavid du ColombierEvery w-address has a state. 95*219b2ee8SDavid du ColombierBlocks that are not in the 96*219b2ee8SDavid du Colombierc-device have the implied 97*219b2ee8SDavid du Colombierstate 98*219b2ee8SDavid du Colombier.CW Cnone . 99*219b2ee8SDavid du ColombierThe 100*219b2ee8SDavid du Colombier.CW Cread 101*219b2ee8SDavid du Colombierstate is for blocks that have the 102*219b2ee8SDavid du Colombiersame data as the corresponding block in 103*219b2ee8SDavid du Colombierthe w-device. 104*219b2ee8SDavid du ColombierSince the c-device is much faster than the 105*219b2ee8SDavid du Colombierw-device, 106*219b2ee8SDavid du Colombier.CW Cread 107*219b2ee8SDavid du Colombierblocks are kept as long as possible and 108*219b2ee8SDavid du Colombierused in preference to reading the w-device. 109*219b2ee8SDavid du Colombier.CW Cread 110*219b2ee8SDavid du Colombierblocks may be discarded from the c-device 111*219b2ee8SDavid du Colombierwhen the space is needed for newer data. 112*219b2ee8SDavid du ColombierThe 113*219b2ee8SDavid du Colombier.CW Cwrite 114*219b2ee8SDavid du Colombierstate is when the c-device contains newer data 115*219b2ee8SDavid du Colombierthan the corresponding block on the w-device. 116*219b2ee8SDavid du ColombierThis happens when a 117*219b2ee8SDavid du Colombier.CW Cnone , 118*219b2ee8SDavid du Colombier.CW Cread , 119*219b2ee8SDavid du Colombieror 120*219b2ee8SDavid du Colombier.CW Cwrite 121*219b2ee8SDavid du Colombierblock is written. 122*219b2ee8SDavid du ColombierThe 123*219b2ee8SDavid du Colombier.CW Cdirty 124*219b2ee8SDavid du Colombierstate 125*219b2ee8SDavid du Colombieris when the c-device contains 126*219b2ee8SDavid du Colombiernew data and the corresponding block 127*219b2ee8SDavid du Colombieron the w-device has never been written. 128*219b2ee8SDavid du ColombierThis happens when a new block has been 129*219b2ee8SDavid du Colombierallocated from the free space on the w-device. 130*219b2ee8SDavid du Colombier.PP 131*219b2ee8SDavid du ColombierThe 132*219b2ee8SDavid du Colombier.CW Cwrite 133*219b2ee8SDavid du Colombierand 134*219b2ee8SDavid du Colombier.CW Cdirty 135*219b2ee8SDavid du Colombierblocks are created and never removed. 136*219b2ee8SDavid du ColombierUnless something is done to 137*219b2ee8SDavid du Colombierconvert these blocks, 138*219b2ee8SDavid du Colombierthe c-device will gradually 139*219b2ee8SDavid du Colombierfill up and stop functioning. 140*219b2ee8SDavid du ColombierOnce a day, 141*219b2ee8SDavid du Colombieror by command, 142*219b2ee8SDavid du Colombiera 143*219b2ee8SDavid du Colombier.I dump 144*219b2ee8SDavid du Colombierof the cw-device 145*219b2ee8SDavid du Colombieris taken. 146*219b2ee8SDavid du ColombierThe purpose of 147*219b2ee8SDavid du Colombiera dump is to queue the writes that 148*219b2ee8SDavid du Colombierhave been shunted to the c-device 149*219b2ee8SDavid du Colombierto be written to the w-device. 150*219b2ee8SDavid du ColombierSince the w-device is a WORM, 151*219b2ee8SDavid du Colombierblocks cannot be rewritten. 152*219b2ee8SDavid du ColombierBlocks that have already been written to the WORM must be 153*219b2ee8SDavid du Colombierrelocated to the unused portion of the w-device. 154*219b2ee8SDavid du ColombierThese are precisely the 155*219b2ee8SDavid du Colombierblocks with 156*219b2ee8SDavid du Colombier.CW Cwrite 157*219b2ee8SDavid du Colombierstate. 158*219b2ee8SDavid du Colombier.PP 159*219b2ee8SDavid du ColombierThe dump algorithm is as follows: 160*219b2ee8SDavid du Colombiera) The tree on the cw-device is walked 161*219b2ee8SDavid du Colombieras long as the blocks visited have been 162*219b2ee8SDavid du Colombiermodified since the last dump. 163*219b2ee8SDavid du ColombierThese are the blocks with state 164*219b2ee8SDavid du Colombier.CW Cwrite 165*219b2ee8SDavid du Colombierand 166*219b2ee8SDavid du Colombier.CW Cdirty . 167*219b2ee8SDavid du ColombierIt is possible to restrict the search 168*219b2ee8SDavid du Colombierto within these blocks 169*219b2ee8SDavid du Colombiersince the directory containing a modified 170*219b2ee8SDavid du Colombierfile must have been accessed to modify the 171*219b2ee8SDavid du Colombierfile and accessing a directory will set its 172*219b2ee8SDavid du Colombiermodified time thus causing the block containing it 173*219b2ee8SDavid du Colombierto be written. 174*219b2ee8SDavid du ColombierThe directory containing that directory must be 175*219b2ee8SDavid du Colombiermodified for the same reason. 176*219b2ee8SDavid du ColombierThe tree walk is thus drastically restrained and the 177*219b2ee8SDavid du Colombiertree walk does not take much time. 178*219b2ee8SDavid du Colombierb) All 179*219b2ee8SDavid du Colombier.CW Cwrite 180*219b2ee8SDavid du Colombierblocks found in the tree search 181*219b2ee8SDavid du Colombierare relocated to new blank blocks on the w-device 182*219b2ee8SDavid du Colombierand converted to 183*219b2ee8SDavid du Colombier.CW Cdump 184*219b2ee8SDavid du Colombierstate. 185*219b2ee8SDavid du ColombierAll 186*219b2ee8SDavid du Colombier.CW Cdirty 187*219b2ee8SDavid du Colombierblocks are converted to 188*219b2ee8SDavid du Colombier.CW Cdump 189*219b2ee8SDavid du Colombierstate without relocation. 190*219b2ee8SDavid du ColombierAt this point, 191*219b2ee8SDavid du Colombierall modified blocks in the cw-device 192*219b2ee8SDavid du Colombierhave w-addresses that point to unwritten 193*219b2ee8SDavid du ColombierWORM blocks. 194*219b2ee8SDavid du ColombierThese blocks are marked for later 195*219b2ee8SDavid du Colombierwriting to the w-device 196*219b2ee8SDavid du Colombierwith the state 197*219b2ee8SDavid du Colombier.CW Cdump . 198*219b2ee8SDavid du Colombierc) All open files that were pointing to modified 199*219b2ee8SDavid du Colombierblocks are reopened to point at the corresponding 200*219b2ee8SDavid du Colombierreallocated blocks. 201*219b2ee8SDavid du ColombierThis causes the directories leading to the 202*219b2ee8SDavid du Colombieropen files to be modified. 203*219b2ee8SDavid du ColombierThus the invariant discussed in a) is maintained. 204*219b2ee8SDavid du Colombierd) The background dumping process will slowly 205*219b2ee8SDavid du Colombiergo through the map of the c-device and write out 206*219b2ee8SDavid du Colombierall blocks with 207*219b2ee8SDavid du Colombier.CW Cdump 208*219b2ee8SDavid du Colombierstate. 209*219b2ee8SDavid du Colombier.PP 210*219b2ee8SDavid du ColombierThe dump takes a few minutes to walk the tree 211*219b2ee8SDavid du Colombierand mark the blocks. 212*219b2ee8SDavid du ColombierIt then takes hours to write the marked blocks 213*219b2ee8SDavid du Colombierto the WORM. 214*219b2ee8SDavid du ColombierIf a marked block is rewritten before the old 215*219b2ee8SDavid du Colombiercopy has been written to the WORM, 216*219b2ee8SDavid du Colombierit must be forced to the WORM before it is rewritten. 217*219b2ee8SDavid du ColombierThere is no problem if another dump is taken before the first one 218*219b2ee8SDavid du Colombieris finished. 219*219b2ee8SDavid du ColombierThe newly marked blocks are just added to the marked blocks 220*219b2ee8SDavid du Colombierleft from the first dump. 221*219b2ee8SDavid du Colombier.PP 222*219b2ee8SDavid du ColombierIf there is an error writing a marked block 223*219b2ee8SDavid du Colombierto the WORM 224*219b2ee8SDavid du Colombierthen the 225*219b2ee8SDavid du Colombier.CW dump 226*219b2ee8SDavid du Colombierstate is converted to 227*219b2ee8SDavid du Colombier.CW Cdump1 228*219b2ee8SDavid du Colombierand manual intervention is needed. 229*219b2ee8SDavid du Colombier(See the 230*219b2ee8SDavid du Colombier.CW cwcmd 231*219b2ee8SDavid du Colombier.CW mvstate 232*219b2ee8SDavid du Colombiercommand in 233*219b2ee8SDavid du Colombier.I fs (8)). 234*219b2ee8SDavid du ColombierThese blocks can be disposed of by converting 235*219b2ee8SDavid du Colombiertheir state back to 236*219b2ee8SDavid du Colombier.CW Cdump 237*219b2ee8SDavid du Colombierso that they will be written again. 238*219b2ee8SDavid du ColombierThey can also be converted to 239*219b2ee8SDavid du Colombier.CW Cwrite 240*219b2ee8SDavid du Colombierstate so that they will be allocated new 241*219b2ee8SDavid du Colombieraddresses at the next dump. 242*219b2ee8SDavid du ColombierIn most other respects, 243*219b2ee8SDavid du Colombiera 244*219b2ee8SDavid du Colombier.CW Cdump1 245*219b2ee8SDavid du Colombierblock behaves like a 246*219b2ee8SDavid du Colombier.CW Cwrite 247*219b2ee8SDavid du Colombierblock. 248