xref: /csrg-svn/sys/ufs/lfs/README (revision 63373)
1*63373Sbostic#	@(#)README	8.1 (Berkeley) 06/11/93
251844Sbostic
363372SbosticThe file system is reasonably stable, but incomplete.  There are
463372Sbosticplaces where cleaning performance can be improved dramatically (see
563372Sbosticcomments in lfs_syscalls.c).  For details on the implementation,
663372Sbosticperformance and why garbage collection always wins, see Dr. Margo
763372SbosticSeltzer's thesis available for anonymous ftp from toe.cs.berkeley.edu,
863372Sbosticin the directory pub/personal/margo/thesis.ps.Z, or the January 1993
963372SbosticUSENIX paper.
1055802Sbostic
1155802SbosticMissing Functionality:
1263372Sbostic	Multiple block sizes and/or fragments are not yet implemented.
1355802Sbostic
1455802Sbostic----------
1551849SbosticThe disk is laid out in segments.  The first segment starts 8K into the
1651849Sbosticdisk (the first 8K is used for boot information).  Each segment is composed
1751849Sbosticof the following:
1851844Sbostic
1951849Sbostic	An optional super block
2051849Sbostic	One or more groups of:
2151849Sbostic		segment summary
2251849Sbostic		0 or more data blocks
2351849Sbostic		0 or more inode blocks
2451844Sbostic
2551849SbosticThe segment summary and inode/data blocks start after the super block (if
2651849Sbosticpresent), and grow toward the end of the segment.
2751844Sbostic
2851849Sbostic	_______________________________________________
2951849Sbostic	|         |            |         |            |
3051849Sbostic	| summary | data/inode | summary | data/inode |
3151849Sbostic	|  block  |   blocks   |  block  |   blocks   | ...
3251849Sbostic	|_________|____________|_________|____________|
3351844Sbostic
3451849SbosticThe data/inode blocks following a summary block are described by the
3551849Sbosticsummary block.  In order to permit the segment to be written in any order
3651849Sbosticand in a forward direction only, a checksum is calculated across the
3751849Sbosticblocks described by the summary.  Additionally, the summary is checksummed
3851849Sbosticand timestamped.  Both of these are intended for recovery; the former is
3951849Sbosticto make it easy to determine that it *is* a summary block and the latter
4051849Sbosticis to make it easy to determine when recovery is finished for partially
4155802Sbosticwritten segments.  These checksums are also used by the cleaner.
4251844Sbostic
4351849Sbostic	Summary block (detail)
4451849Sbostic	________________
4551849Sbostic	| sum cksum    |
4651849Sbostic	| data cksum   |
4751849Sbostic	| next segment |
4851849Sbostic	| timestamp    |
4951849Sbostic	| FINFO count  |
5051849Sbostic	| inode count  |
5155802Sbostic	| flags        |
5251849Sbostic	|______________|
5351849Sbostic	|   FINFO-1    | 0 or more file info structures, identifying the
5451849Sbostic	|     .        | blocks in the segment.
5551849Sbostic	|     .        |
5651849Sbostic	|     .        |
5751849Sbostic	|   FINFO-N    |
5851849Sbostic	|   inode-N    |
5951849Sbostic	|     .        |
6051849Sbostic	|     .        |
6151849Sbostic	|     .        | 0 or more inode daddr_t's, identifying the inode
6251849Sbostic	|   inode-1    | blocks in the segment.
6351849Sbostic	|______________|
6451844Sbostic
6551849SbosticInode blocks are blocks of on-disk inodes in the same format as those in
6655802Sbosticthe FFS.  However, spare[0] contains the inode number of the inode so we
6755802Sbosticcan find a particular inode on a page.  They are packed page_size /
6855802Sbosticsizeof(inode) to a block.  Data blocks are exactly as in the FFS.  Both
6955802Sbosticinodes and data blocks move around the file system at will.
7051844Sbostic
7151849SbosticThe file system is described by a super-block which is replicated and
7251849Sbosticoccurs as the first block of the first and other segments.  (The maximum
7351849Sbosticnumber of super-blocks is MAXNUMSB).  Each super-block maintains a list
7451849Sbosticof the disk addresses of all the super-blocks.  The super-block maintains
7551849Sbostica small amount of checkpoint information, essentially just enough to find
7655802Sbosticthe inode for the IFILE (fs->lfs_idaddr).
7751844Sbostic
7851849SbosticThe IFILE is visible in the file system, as inode number IFILE_INUM.  It
7951849Sbosticcontains information shared between the kernel and various user processes.
8051844Sbostic
8151849Sbostic	Ifile (detail)
8251849Sbostic	________________
8351849Sbostic	| cleaner info | Cleaner information per file system.  (Page
8451849Sbostic	|              | granularity.)
8551849Sbostic	|______________|
8651849Sbostic	| segment      | Space available and last modified times per
8751849Sbostic	| usage table  | segment.  (Page granularity.)
8851849Sbostic	|______________|
8951849Sbostic	|   IFILE-1    | Per inode status information: current version #,
9051849Sbostic	|     .        | if currently allocated, last access time and
9151849Sbostic	|     .        | current disk address of containing inode block.
9251849Sbostic	|     .        | If current disk address is LFS_UNUSED_DADDR, the
9351849Sbostic	|   IFILE-N    | inode is not in use, and it's on the free list.
9451849Sbostic	|______________|
9551844Sbostic
9651844Sbostic
9751849SbosticFirst Segment at Creation Time:
9851849Sbostic_____________________________________________________________
9951849Sbostic|        |       |         |       |       |       |       |
10051849Sbostic| 8K pad | Super | summary | inode | ifile | root  | l + f |
10151849Sbostic|        | block |         | block |       | dir   | dir   |
10251849Sbostic|________|_______|_________|_______|_______|_______|_______|
10351849Sbostic	  ^
10451849Sbostic           Segment starts here.
10551849Sbostic
10651844SbosticSome differences from the Sprite LFS implementation.
10751844Sbostic
10851844Sbostic1. The LFS implementation placed the ifile metadata and the super block
10951844Sbostic   at fixed locations.  This implementation replicates the super block
11051844Sbostic   and puts each at a fixed location.  The checkpoint data is divided into
11151844Sbostic   two parts -- just enough information to find the IFILE is stored in
11251849Sbostic   two of the super blocks, although it is not toggled between them as in
11351849Sbostic   the Sprite implementation.  (This was deliberate, to avoid a single
11451849Sbostic   point of failure.)  The remaining checkpoint information is treated as
11551849Sbostic   a regular file, which means that the cleaner info, the segment usage
11651849Sbostic   table and the ifile meta-data are stored in normal log segments.
11751849Sbostic   (Tastes great, less filling...)
11851844Sbostic
11951849Sbostic2. The segment layout is radically different in Sprite; this implementation
12051849Sbostic   uses something a lot like network framing, where data/inode blocks are
12151849Sbostic   written asynchronously, and a checksum is used to validate any set of
12251849Sbostic   summary and data/inode blocks.  Sprite writes summary blocks synchronously
12351849Sbostic   after the data/inode blocks have been written and the existence of the
12451849Sbostic   summary block validates the data/inode blocks.  This permits us to write
12551849Sbostic   everything contiguously, even partial segments and their summaries, whereas
12651849Sbostic   Sprite is forced to seek (from the end of the data inode to the summary
12751849Sbostic   which lives at the end of the segment).  Additionally, writing the summary
12851849Sbostic   synchronously should cost about 1/2 a rotation per summary.
12951844Sbostic
13051849Sbostic3. Sprite LFS distinguishes between different types of blocks in the segment.
13151849Sbostic   Other than inode blocks and data blocks, we don't.
13251844Sbostic
13351849Sbostic4. Sprite LFS traverses the IFILE looking for free blocks.  We maintain a
13451849Sbostic   free list threaded through the IFILE entries.
13551849Sbostic
13651849Sbostic5. The cleaner runs in user space, as opposed to kernel space.  It shares
13751849Sbostic   information with the kernel by reading/writing the IFILE and through
13851849Sbostic   cleaner specific system calls.
13951849Sbostic
140