1*63373Sbostic# @(#)README 8.1 (Berkeley) 06/11/93 251844Sbostic 363372SbosticThe file system is reasonably stable, but incomplete. There are 463372Sbosticplaces where cleaning performance can be improved dramatically (see 563372Sbosticcomments in lfs_syscalls.c). For details on the implementation, 663372Sbosticperformance and why garbage collection always wins, see Dr. Margo 763372SbosticSeltzer's thesis available for anonymous ftp from toe.cs.berkeley.edu, 863372Sbosticin the directory pub/personal/margo/thesis.ps.Z, or the January 1993 963372SbosticUSENIX paper. 1055802Sbostic 1155802SbosticMissing Functionality: 1263372Sbostic Multiple block sizes and/or fragments are not yet implemented. 1355802Sbostic 1455802Sbostic---------- 1551849SbosticThe disk is laid out in segments. The first segment starts 8K into the 1651849Sbosticdisk (the first 8K is used for boot information). Each segment is composed 1751849Sbosticof the following: 1851844Sbostic 1951849Sbostic An optional super block 2051849Sbostic One or more groups of: 2151849Sbostic segment summary 2251849Sbostic 0 or more data blocks 2351849Sbostic 0 or more inode blocks 2451844Sbostic 2551849SbosticThe segment summary and inode/data blocks start after the super block (if 2651849Sbosticpresent), and grow toward the end of the segment. 2751844Sbostic 2851849Sbostic _______________________________________________ 2951849Sbostic | | | | | 3051849Sbostic | summary | data/inode | summary | data/inode | 3151849Sbostic | block | blocks | block | blocks | ... 3251849Sbostic |_________|____________|_________|____________| 3351844Sbostic 3451849SbosticThe data/inode blocks following a summary block are described by the 3551849Sbosticsummary block. In order to permit the segment to be written in any order 3651849Sbosticand in a forward direction only, a checksum is calculated across the 3751849Sbosticblocks described by the summary. Additionally, the summary is checksummed 3851849Sbosticand timestamped. Both of these are intended for recovery; the former is 3951849Sbosticto make it easy to determine that it *is* a summary block and the latter 4051849Sbosticis to make it easy to determine when recovery is finished for partially 4155802Sbosticwritten segments. These checksums are also used by the cleaner. 4251844Sbostic 4351849Sbostic Summary block (detail) 4451849Sbostic ________________ 4551849Sbostic | sum cksum | 4651849Sbostic | data cksum | 4751849Sbostic | next segment | 4851849Sbostic | timestamp | 4951849Sbostic | FINFO count | 5051849Sbostic | inode count | 5155802Sbostic | flags | 5251849Sbostic |______________| 5351849Sbostic | FINFO-1 | 0 or more file info structures, identifying the 5451849Sbostic | . | blocks in the segment. 5551849Sbostic | . | 5651849Sbostic | . | 5751849Sbostic | FINFO-N | 5851849Sbostic | inode-N | 5951849Sbostic | . | 6051849Sbostic | . | 6151849Sbostic | . | 0 or more inode daddr_t's, identifying the inode 6251849Sbostic | inode-1 | blocks in the segment. 6351849Sbostic |______________| 6451844Sbostic 6551849SbosticInode blocks are blocks of on-disk inodes in the same format as those in 6655802Sbosticthe FFS. However, spare[0] contains the inode number of the inode so we 6755802Sbosticcan find a particular inode on a page. They are packed page_size / 6855802Sbosticsizeof(inode) to a block. Data blocks are exactly as in the FFS. Both 6955802Sbosticinodes and data blocks move around the file system at will. 7051844Sbostic 7151849SbosticThe file system is described by a super-block which is replicated and 7251849Sbosticoccurs as the first block of the first and other segments. (The maximum 7351849Sbosticnumber of super-blocks is MAXNUMSB). Each super-block maintains a list 7451849Sbosticof the disk addresses of all the super-blocks. The super-block maintains 7551849Sbostica small amount of checkpoint information, essentially just enough to find 7655802Sbosticthe inode for the IFILE (fs->lfs_idaddr). 7751844Sbostic 7851849SbosticThe IFILE is visible in the file system, as inode number IFILE_INUM. It 7951849Sbosticcontains information shared between the kernel and various user processes. 8051844Sbostic 8151849Sbostic Ifile (detail) 8251849Sbostic ________________ 8351849Sbostic | cleaner info | Cleaner information per file system. (Page 8451849Sbostic | | granularity.) 8551849Sbostic |______________| 8651849Sbostic | segment | Space available and last modified times per 8751849Sbostic | usage table | segment. (Page granularity.) 8851849Sbostic |______________| 8951849Sbostic | IFILE-1 | Per inode status information: current version #, 9051849Sbostic | . | if currently allocated, last access time and 9151849Sbostic | . | current disk address of containing inode block. 9251849Sbostic | . | If current disk address is LFS_UNUSED_DADDR, the 9351849Sbostic | IFILE-N | inode is not in use, and it's on the free list. 9451849Sbostic |______________| 9551844Sbostic 9651844Sbostic 9751849SbosticFirst Segment at Creation Time: 9851849Sbostic_____________________________________________________________ 9951849Sbostic| | | | | | | | 10051849Sbostic| 8K pad | Super | summary | inode | ifile | root | l + f | 10151849Sbostic| | block | | block | | dir | dir | 10251849Sbostic|________|_______|_________|_______|_______|_______|_______| 10351849Sbostic ^ 10451849Sbostic Segment starts here. 10551849Sbostic 10651844SbosticSome differences from the Sprite LFS implementation. 10751844Sbostic 10851844Sbostic1. The LFS implementation placed the ifile metadata and the super block 10951844Sbostic at fixed locations. This implementation replicates the super block 11051844Sbostic and puts each at a fixed location. The checkpoint data is divided into 11151844Sbostic two parts -- just enough information to find the IFILE is stored in 11251849Sbostic two of the super blocks, although it is not toggled between them as in 11351849Sbostic the Sprite implementation. (This was deliberate, to avoid a single 11451849Sbostic point of failure.) The remaining checkpoint information is treated as 11551849Sbostic a regular file, which means that the cleaner info, the segment usage 11651849Sbostic table and the ifile meta-data are stored in normal log segments. 11751849Sbostic (Tastes great, less filling...) 11851844Sbostic 11951849Sbostic2. The segment layout is radically different in Sprite; this implementation 12051849Sbostic uses something a lot like network framing, where data/inode blocks are 12151849Sbostic written asynchronously, and a checksum is used to validate any set of 12251849Sbostic summary and data/inode blocks. Sprite writes summary blocks synchronously 12351849Sbostic after the data/inode blocks have been written and the existence of the 12451849Sbostic summary block validates the data/inode blocks. This permits us to write 12551849Sbostic everything contiguously, even partial segments and their summaries, whereas 12651849Sbostic Sprite is forced to seek (from the end of the data inode to the summary 12751849Sbostic which lives at the end of the segment). Additionally, writing the summary 12851849Sbostic synchronously should cost about 1/2 a rotation per summary. 12951844Sbostic 13051849Sbostic3. Sprite LFS distinguishes between different types of blocks in the segment. 13151849Sbostic Other than inode blocks and data blocks, we don't. 13251844Sbostic 13351849Sbostic4. Sprite LFS traverses the IFILE looking for free blocks. We maintain a 13451849Sbostic free list threaded through the IFILE entries. 13551849Sbostic 13651849Sbostic5. The cleaner runs in user space, as opposed to kernel space. It shares 13751849Sbostic information with the kernel by reading/writing the IFILE and through 13851849Sbostic cleaner specific system calls. 13951849Sbostic 140