xref: /plan9/sys/src/cmd/venti/words/notes (revision 42d82305fbcd839fd6b81940f0fae5a107fcfc96)
1all data is big-endian on disk.
2
3arena layout:
4
5ArenaPart (first at offset PartBlank = 256kB in the disk file)
6	magic[4] 0xA9E4A5E7
7	version[4] 3
8	blockSize[4]
9	arenaBase[4] offset of first ArenaHead structure in the disk file
10
11the ArenaMap starts at the first block at offset >= PartBlank+512 bytes.
12it is a sequence of text lines
13/*
14 * amap: n '\n' amapelem * n
15 * n: u32int
16 * amapelem: name '\t' astart '\t' asize '\n'
17 * astart, asize: u64int
18 */
19
20the astart and astop are byte offsets in the disk file.
21they are the offsets to the ArenaHead and the end of the Arena block.
22
23ArenaHead
24[base points here in the C code]
25size bytes
26	Clumps
27	ClumpInfo blocks
28Arena
29
30Arena
31	magic[4] 0xF2A14EAD
32	version[4] 4
33	name[64]
34	clumps[4]
35	cclumps[4]
36	ctime[4]
37	wtime[4]
38	used[8]
39	uncsize[8]
40	sealed[1]
41	optional score[20]
42
43once sealed, the sha1 hash of every block from the
44ArenaHead to the Arena is checksummed, as though
45the final score in Arena were the zeroScore.  strangely,
46the tail of the Arena block (the last one) is not included in the checksum
47(i.e., the unused data after the score).
48
49clumpMax = blocksize/ClumpInfoSize = blocksize/25
50dirsize = ((clumps/clumpMax)+1) * blocksize
51want used+dirsize <= size
52want cclumps <= clumps
53want uncsize+clumps*ClumpSize+blocksize < used
54want ctime <= wtime
55
56clump info is stored packed into blocks in order.
57clump info moves forward through a block but the
58blocks themselves move backwards.  so if cm=clumpMax
59and there are two blocks worth of clumpinfo, the blocks
60look like;
61
62	[cm..2*cm-1] [0..cm-1] [Arena]
63
64with the blocks pushed right up against the Arena trailer.
65
66ArenaHead
67	magic[4] 0xD15C4EAD
68	version[4] = Arena.version
69	name[64]
70	blockSize[4]
71	size[8]
72
73Clump
74	magic[4] 0xD15CB10C (0 for an unused clump)
75	type[1]
76	size[2]
77	uncsize[2]
78	score[20]
79	encoding[1] raw=1, compress=2
80	creator[4]
81	time[4]
82
83ClumpInfo
84	type[1]
85	size[2]
86	uncsize[2]
87	score[20]
88
89the arenas are mapped into a single address space corresponding
90to the index that brings them together.  if each arena has 100M bytes
91excluding the headers and there are 4 arenas, then there's 400M of
92index address space between them.  index address space starts at 1M
93instead of 0, so the index addresses assigned to the first arena are
941M up to 101M, then 101M to 201M, etc.
95
96of course, the assignment of addresses has nothing to do with the index,
97but that's what they're called.
98
99
100the index is split into index sections, which are put on different disks
101to get parallelism of disk heads.  each index section holds some number
102of hash buckets, each in its own disk block.  collectively the index sections
103hold ix->buckets between them.
104
105the top 32-bits of the score is used to assign scores to buckets.
106div = ceil(2³² / ix->buckets) is the amount of 32-bit score space per bucket.
107
108to look up a block, take the top 32 bits of score and divide by div
109to get the bucket number.  then look through the index section headers
110to figure out which index section has that bucket.
111
112then load that block from the index section.  it's an IBucket.
113
114the IBucket has ib.n IEntry structures in it, sorted by score and then by type.
115do the lookup and get an IEntry.  the ia.addr will be a logical address
116that you then use to get the
117
118ISect
119	magic[4] 0xD15C5EC7
120	version[4]
121	name[64]
122	index[64]
123	blockSize[4]
124	blockBase[4]	address in partition where bucket blocks start
125	blocks[4]
126	start[4]
127	stop[4]	stop - start <= blocks, but not necessarily ==
128
129IEntry
130	score[20]
131	wtime[4]
132	train[2]
133	ia.addr[8]		index address (see note above)
134	ia.size[2]		size of uncompressed block data
135	ia.type[1]
136	ia.blocks[1]	number of blocks of clump on disk
137
138IBucket
139	n[2]
140	next[4]	not sure; either 0 or inside [start,stop) for the ISect
141	data[n*IEntrySize]
142
143final piece: all the disk partitions start with PartBlank=256kB of unused disk
144(presumably to avoid problems with boot sectors and layout tables
145and the like).
146
147actually the last 8k of the 256k (that is, at offset 248kB) can hold
148a venti config file to help during bootstrap of the venti file server.
149
150