#
73896937 |
| 12-Jan-2010 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER VFS - REDO implementation base code part 3/many
* Track the oldest potentially uncommitted UNDO/REDO FIFO offset on an inode-by-inode basis and use a red-black tree to find the aggregate
HAMMER VFS - REDO implementation base code part 3/many
* Track the oldest potentially uncommitted UNDO/REDO FIFO offset on an inode-by-inode basis and use a red-black tree to find the aggregate oldest offset.
* If REDOs are present generate a REDO SYNC entry in the UNDO/REDO FIFO within the recovery span which indicates to the recovery code how far out of the span it must go to process REDOs.
* Fix a bug in hammer_generate_redo() where the REDO would not be generated if the data length was 0 (SYNC records use a data length of 0 as a degenerate case).
* Print the REDO SYNC entries on the console if bit 2 is set in vfs.hammer.debug_io (0x04).
* NOTE: The recovery code does not yet process REDOs.
show more ...
|
#
6048b411 |
| 10-Jan-2010 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER VFS - REDO/fsync precursor work
* Adjust hammer_fifo_redo structure (not yet used), add a mtime field so the mtime can be restored from the REDO records.
* Move the undo buffer flush code
HAMMER VFS - REDO/fsync precursor work
* Adjust hammer_fifo_redo structure (not yet used), add a mtime field so the mtime can be restored from the REDO records.
* Move the undo buffer flush code into its own procedure, hammer_flusher_flush_undos().
* Implement hammer_generate_redo() to generate file write operation REDOs.
* Implement sysctls statistics and limits for redo, vfs.hammer.limit_redo and vfs.hammer.stats_redo.
show more ...
|
#
d978e7cf |
| 05-Jan-2010 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER VFS - Fix volume ref count leak in fhtovp code.
* The transaction is left dangling open if the inode could not be found in a fhtovp operation, leaking refs on the root volume. Fix by prope
HAMMER VFS - Fix volume ref count leak in fhtovp code.
* The transaction is left dangling open if the inode could not be found in a fhtovp operation, leaking refs on the root volume. Fix by properly closing the transaction.
Reported-by: Jan Lentfer <Jan.Lentfer@web.de>
show more ...
|
#
2247fe02 |
| 28-Dec-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - fine-grained namecache and partial vnode MPSAFE work
Namecache subsystem
* All vnode->v_flag modifications now use vsetflags() and vclrflags(). Because some flags are set and cleared
kernel - fine-grained namecache and partial vnode MPSAFE work
Namecache subsystem
* All vnode->v_flag modifications now use vsetflags() and vclrflags(). Because some flags are set and cleared by vhold()/vdrop() which do not require any locks to be held, all modifications must use atomic ops.
* Clean up and revamp the namecache MPSAFE work. Namecache operations now use a fine-grained MPSAFE locking model which loosely follows these rules:
- lock ordering is child to parent. e.g. lock file, then lock parent directory. This allows resolver recursions up the parent directory chain.
- Downward-traversing namecache invalidations and path lookups will unlock the parent (but leave it referenced) before attempting to lock the child.
- Namecache hash table lookups utilize a per-bucket spinlock.
- vnode locks may be acquired while holding namecache locks but not vise-versa. VNodes are not destroyed until all namecache references go away, but can enter reclamation. Namecache lookups detect the case and re-resolve to overcome the race. Namecache entries are not destroyed while referenced.
* Remove vfs_token, the namecache MPSAFE model is now totally fine-grained.
* Revamp namecache locking primitves (cache_lock/cache_unlock and friends). Use atomic ops and nc_exlocks instead of nc_locktd and build-in a request flag. This solves busy/tsleep races between lock holder and lock requester.
* Revamp namecache parent/child linkages. Instead of using vfs_token to lock such operations we simply lock both child and parent namecache entries. Hash table operations are also fully integrated with the parent/child linking operations.
* The vnode->v_namecache list is locked via vnode->v_spinlock, which is actually vnode->v_lock.lk_spinlock.
* Revamp cache_vref() and cache_vget(). The passed namecache entry must be referenced and locked. Internals are simplified.
* Fix a deadlock by moving the call to _cache_hysteresis() to a place where the current thread otherwise does not hold any locked ncp's.
* Revamp nlookup() to follow the new namecache locking rules.
* Fix a number of places, e.g. in vfs/nfs/nfs_subs.c, where ncp->nc_parent or ncp->nc_vp was being accessed with an unlocked ncp. nc_parent and nc_vp accesses are only valid if the ncp is locked.
* Add the vfs.cache_mpsafe sysctl, which defaults to 0. This may be set to 1 to enable MPSAFE namecache operations for [l,f]stat() and open() system calls (for the moment).
VFS/VNODE subsystem
* Use a global spinlock for now called vfs_spin to manage vnode_free_list. Use vnode->v_spinlock (and vfs_spin) to manage vhold/vdrop ops and to interlock v_auxrefs tests against vnode terminations.
* Integrate per-mount mnt_token and (for now) the MP lock into VOP_*() and VFS_*() operations. This allows the MP lock to be shifted further inward from the system calls, but we don't do it quite yet.
* HAMMER: VOP_GETATTR, VOP_READ, and VOP_INACTIVE are now MPSAFE. The corresponding sysctls have been removed.
* FIFOFS: Needed some MPSAFE work in order to allow HAMMER to make things MPSAFE above, since HAMMER forwards vops for in-filesystem fifos to fifofs.
* Add some debugging kprintf()s when certain MP races are averted, for testing only.
MISC
* Add some assertions to the VM system.
* Document existing and newly MPSAFE code.
show more ...
|
#
52e547e3 |
| 27-Dec-2009 |
Michael Neumann <mneumann@ntecs.de> |
hammer volume - Serialize volume operations
Only one hammer volume-add or hammer volume-del operation is allowed at the same time per mount.
While for volume-add operations it is not strictly neede
hammer volume - Serialize volume operations
Only one hammer volume-add or hammer volume-del operation is allowed at the same time per mount.
While for volume-add operations it is not strictly needed, it is absolutely required for the reblocking phase of the volume-del operation.
show more ...
|
#
aac0aabd |
| 27-Dec-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Add per-mount token to replace mplock.
* Fix issues with dupfdopen() not being MPSAFE.
* Implement a dummy mount structure for devfs-synthesized vnodes prior to the root mount.
* Wrap a
kernel - Add per-mount token to replace mplock.
* Fix issues with dupfdopen() not being MPSAFE.
* Implement a dummy mount structure for devfs-synthesized vnodes prior to the root mount.
* Wrap all VFS_*() calls, including vfs_init() and vfs_uninit(), to acquire the per-mount token if not flagged as being MPSAFE.
* Wrap all VOP_*() calls to acquire the per-mount token if not flagged as being MPSAFE.
* Move VOP_READ/VOP_WRITE MPSAFE flags to the mount structure.
* Make fifoops MPSAFE (so HAMMER can flag read & write as being MPSAFE generally).
* The VFS code currently also acquires the MP lock when not MPSAFE (there are things called by VFSes which are not yet MPSAFE), except for read() and write().
show more ...
|
#
a407819f |
| 16-Dec-2009 |
Michael Neumann <mneumann@ntecs.de> |
Unbreak HAMMER root mounts
Absolute device names in vfs.root.mountfrom where handled incorrectly.
Reported-by: aggelos
|
#
865c9609 |
| 11-Dec-2009 |
Michael Neumann <mneumann@ntecs.de> |
HAMMER - Implement experimental volume removal
A volume other than the root volume can be removed with:
hammer volume-del device filesystem
WARNING: Experimental!
|
#
104cb849 |
| 07-Dec-2009 |
Michael Neumann <mneumann@ntecs.de> |
HAMMER - Implement multi-volume root mounts
|
#
c0763659 |
| 03-Dec-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER VFS - Fix bug in statvfs()
* statvfs() had a coding error which caused it to return 0 free space until the first statfs() call was made.
Reported-by: Johannes Hofmann <johannes.hofmann@gmx
HAMMER VFS - Fix bug in statvfs()
* statvfs() had a coding error which caused it to return 0 free space until the first statfs() call was made.
Reported-by: Johannes Hofmann <johannes.hofmann@gmx.de>
show more ...
|
#
88c39f64 |
| 11-Nov-2009 |
Thomas Nikolajsen <thomas@dragonflybsd.org> |
hammer: hammer_recover_stage2() may only be called for read-write mounts
This fixes panic doing readonly mount.
|
#
fc73edd8 |
| 02-Nov-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER VFS - Add sysctl vfs.hammer.debug_critical
* Add sysctl vfs.hammer.debug_critical. If set to non-zero a critical CRC or media error will cause the kernel debugger to be entered.
|
#
ff003b11 |
| 02-Nov-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER VFS - The backend flusher now sorts inodes
* Change the tailq of inodes in a flush group to a red-black tree. The flusher now processes inodes in sorted order and breaks them up into larg
HAMMER VFS - The backend flusher now sorts inodes
* Change the tailq of inodes in a flush group to a red-black tree. The flusher now processes inodes in sorted order and breaks them up into larger sets for concurrent flushing. The flusher threads are thus more likely to concurrently process inodes which are fairly far apart in the B-Tree.
This greatly reduces lock interference between flusher threads. However, B-Tree deadlocks are still an issue between inodes undergoing flushes and front-end access operations. This can be observed by noting periods of low dev-write activity in 'hammer iostats 1' output during a blogbench test. The hammer-S* kernel threads will likely be in a 'hmrdlk' state at the same time.
* Add sysctl vfs.hammer.limit_reclaim to set the maximum number of inodes with no vnode associations, default 4000.
NOTE: For debugging only, setting this value too high will blow out the kmalloc pool.
show more ...
|
#
02428fb6 |
| 02-Nov-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER VFS - Version 4 part 1/many - UNDO FIFO layout work.
These changes only apply to HAMMER version 4+ filesystems. HAMMER versions less then 4 only implement some of these changes and do not us
HAMMER VFS - Version 4 part 1/many - UNDO FIFO layout work.
These changes only apply to HAMMER version 4+ filesystems. HAMMER versions less then 4 only implement some of these changes and do not use the new features during crash recovery.
* Add a sequence number of the UNDO FIFO media record format. The field already existed for just this purpose so no media structures changed size.
* Change the alignment boundary for HAMMER UNDO records from 16K to 512 bytes. This coupled with the sequence number virtually guarantees that the recovery code can detect uninterrupted sequences of UNDO records without having to relay on the FIFO last_offset field in the volume header.
This isn't as bad as it sounds. It just means that large UNDO blocks are broken up into smaller on-media structures in order to ensure a record header occurs on every 512 byte boundary.
* Add HAMMER_HEAD_TYPE_DUMMY and HAMMER_HEAD_TYPE_REDO (Redo is not yet used). The DUMMY type is a dummy record used solely to identify a sequence number. PAD records cannot have sequence numbers so we need a DUMMY record for it.
Remove unused UNDO FIFO record types.
* Adjust the version upgrade code to completely reinitialize the UNDO FIFO space when moving from version < 4 to version >= 4. This puts all blocks in the UNDO FIFO in a deterministic state with deterministic sequence numbers on 512 byte boundaries.
* Refactor the flush code. In versions less then 4 the flush code had to flush dirty UNDO buffers, synchronize disk, then flush the volume header and synchronize disk again, then flush the meta data. For HAMMER versions >= 4 the flush code removes the second disk synchronization operation.
* Refactor the crash recovery code. For versions < 4 the crash recovery code relied on the UNDO FIFO first_offset and next_offset indexes in the volume header to calculate the UNDO space that needed to be run. For versions >= 4 the crash recovery code uses first_offset for the beginning of the UNDO space and proactively scans the UNDO FIFO to find the end of the space. This takes longer but allows HAMMER to remove one of the two disk sync operations in the flush code.
* Split the crash recovery code into stage 1 and stage 2. Stage 2 will be used to run REDO operations (REDO is not yet implemented).
show more ...
|
#
83f2a3aa |
| 14-Oct-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER - Add version 3 meta-data features
* These features are available for filesystem version 3. Version 2 may be upgraded to version 3 in-place. These features are not usable until you upgrad
HAMMER - Add version 3 meta-data features
* These features are available for filesystem version 3. Version 2 may be upgraded to version 3 in-place. These features are not usable until you upgrade.
* Definitively store snapshots in filesystem meta-data. Softlinks still work. The new snapshot directives (snap, snaplo, snapq, etc) also allow you to specify up to a 64-character note for each snapshot you create. The snapls directive may be used to list all snapshots stored in meta-data.
'hammer cleanup' will move all softlink-based snapshots residing in the <fs>/snapshots directory to meta-data when it next snapshots the filesystem (within a day of upgrading, usually). The snapshot softlinks are left intact.
Storing snapshot information in meta-data means that accidental wipes of your <fs>/snapshots directory will NOT cause later hammer cleanup runs to destroy your snapshots! The meta-data snapshots are also removed if you do a prune-everything, or through normal pruning expirations, and thus 'hammer snapls' will definitively list your valid snapshots.
This feature also means that you can obtain a definitive list of snapshots available on mirroring slaves.
* Definitively store the hammer cleanup configuration file in filesystem meta-data. This meta-data is not mirrored. 'hammer cleanup' will move <fs>/snapshots/config to the new meta-data config and deletes <fs>/snapshots/config after you've upgraded the filesystem. You can edit the configuration with the 'viconfig' directive.
* The HAMMER utility has new directives: snap, snaplo, snapq, snaprm, snapls, config, and viconfig.
* WARNING! Filesystems mounted 'nohistory' and files chflagged similarly do not have snapshots, but the hammer utility still allows the directives to be run. This is a bug that needs to be fixed.
show more ...
|
#
89e744ce |
| 03-Sep-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER - Add vfs.hammer.stats_undo
* Statistics on number of bytes of undo space written.
|
#
c9ce54d6 |
| 03-Sep-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER - Fix lost inode issue (primarily with nohistory mounts)
* When a HAMMER cursor is unlocked it becomes tracked and unrelated B-Tree operations will cause the tracked cursor's nodes and indi
HAMMER - Fix lost inode issue (primarily with nohistory mounts)
* When a HAMMER cursor is unlocked it becomes tracked and unrelated B-Tree operations will cause the tracked cursor's nodes and indices to be updated. The cursor structure also has a leaf element pointer which was not being properly updated. This could lead to panics and lost inodes.
Properly adjust the leaf element pointer in tracked cursors.
* The bug primarily occurs with nohistory mounts or nohistory sub-trees due to the larger number of physical deletions made to the B-Tree, but could also occur (rarely) with normal mounts.
* Add additional assertions to catch any further occurrences (though I think all the cases have been covered now).
* Add a new sysctl vfs.hammer.error_panic which can be set to e.g. 9 to cause critical errors to panic immediately instead of returning through the call stack, making debugging possible.
Reported-by: Numerous people
show more ...
|
#
6f3d87c0 |
| 24-Aug-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
fsync - Add sysctl to relax fsync requirements.
* sysctl vfs.hammer.fsync_mode (defaults to 0 == full fsync semantics).
0 - full fsync semantics 1 - asynchronous 2 - synchronous fsync on close i
fsync - Add sysctl to relax fsync requirements.
* sysctl vfs.hammer.fsync_mode (defaults to 0 == full fsync semantics).
0 - full fsync semantics 1 - asynchronous 2 - synchronous fsync on close if fsync called prior to close 3 - asynchronous fsync on close if fsync called prior to close 4 - ignore fsync (30-second system sync takes care of it)
* This is likely a temporary measure until HAMMER gets a REDO log. It is mainly to facilitate testing and to reduce the pounding disks take from pkgsrc bulk builds (pkg_add seems to insist on calling fsync() a lot for no reason).
show more ...
|
#
de996e86 |
| 19-Aug-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER - Rework write pipelining
* Rework write pipelining so it is based on pending direct writes on an inode-by-inode basis. ip->rsv_recs and hmp->rsv_recs are now decremented after the direc
HAMMER - Rework write pipelining
* Rework write pipelining so it is based on pending direct writes on an inode-by-inode basis. ip->rsv_recs and hmp->rsv_recs are now decremented after the direct has completed rather then when the sync code has processed the record.
This fixes serious buffer cache overloading when doing linear writes.
* Implement write clustering or bawrite() calls based on a filesystem block getting filled up instead of relying on the buffer cache's bdwrite() to keep ahead of the mark.
* vfs.hammer.cluster_enable now effects both read and write clustering.
show more ...
|
#
3e583440 |
| 16-Aug-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER - Add vfs.hammer.yield_check, yield on cpu intensive loops
* When running in the kernel HAMMER can wind up cpu-bound. This code allows it to yield to other processes during these periods.
HAMMER - Add vfs.hammer.yield_check, yield on cpu intensive loops
* When running in the kernel HAMMER can wind up cpu-bound. This code allows it to yield to other processes during these periods. This is a bit of a hack and may undergo further development.
* Default check for yield every 16 B-tree iterations.
show more ...
|
#
b9b0a6d0 |
| 23-Jul-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER / VFS_VGET - Add optional dvp argument to VFS_VGET(). Fix readdirplus
* VGET is used by NFS to acquire a vnode given an inode number. HAMMER requires additional information to determine t
HAMMER / VFS_VGET - Add optional dvp argument to VFS_VGET(). Fix readdirplus
* VGET is used by NFS to acquire a vnode given an inode number. HAMMER requires additional information to determine the PFS the inode is being acquired from.
Add an optional directory vnode argument to the VGET. If non-NULL, HAMMER will extract the PFS information from this vnode.
* Adjust NFS to pass the dvp to VGET when doing a readdirplus.
Note that the PFS is already encoded in file handles, but readdirplus acquires the attributes for each directory entry it scans (readdir does not). This fixes readdirplus for NFS served HAMMER PFS exports.
show more ...
|
#
31a56ce2 |
| 30-Jun-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER VFS - Make the same changes to statfs() that were made to statvfs().
|
#
5987cc42 |
| 28-Jun-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER VFS - Export highest supported production version with sysctl.
* Implement vfs.hammer.support_version to return the highest supported HAMMER FS version to userland. Used by newfs_hammer.
|
#
0f65be10 |
| 25-Jun-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
HAMMER VFS - Take reserved space into account when reporting statvfs data
Adjust statvfs data so reserved space is taken into account, so the filesystem starts failing modifying operations closer to
HAMMER VFS - Take reserved space into account when reporting statvfs data
Adjust statvfs data so reserved space is taken into account, so the filesystem starts failing modifying operations closer to when 'df' would otherwise say that 0 free space remains.
Submitted-by: Antonio Huete Jimenez <tuxillo@quantumachine.net> (with modification)
show more ...
|
#
973c11b9 |
| 24-Jun-2009 |
Matthew Dillon <dillon@apollo.backplane.com> |
AMD64 - Fix many compile-time warnings. int/ptr type mismatches, %llx, etc.
|