#
c82af904 |
| 26-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 59A/Many: Mirroring related work (and one bug fix).
* BUG FIX: Fix a bug in directory hashkey generation. The iterator could sometimes conflict with a key already on-disk and interfere wit
HAMMER 59A/Many: Mirroring related work (and one bug fix).
* BUG FIX: Fix a bug in directory hashkey generation. The iterator could sometimes conflict with a key already on-disk and interfere with a pending deletion. The chance of this occuring was miniscule but not 0. Now fixed.
The fix also revamps the directory iterator code, moving it all to one place and removing it from two other places.
* PRUNING CHANGE: The pruning code no longer shifts the create_tid and delete_tid of adjacent records to fill gaps. This means that historical queries must either use snapshot softlinks or use a fine-grained transaction id greater then the most recent snapshot softlink.
fine-grained historical access still works up to the first snapshot softlink.
* Clean up the cursor code responsible for acquiring the parent node.
* Add the core mirror ioctl read/write infrastructure. This work is still in progress.
- ioctl commands - pseudofs enhancements, including st_dev munging. - mount options - transaction id and object id conflictless allocation - initial mirror_tid recursion up the B-Tree (not finished) - B-Tree mirror scan optimizations to skip sub-hierarchies that do not need to be scanned (requires mirror_tid recursion to be 100% working).
show more ...
|
#
dd94f1b1 |
| 24-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 58B/Many: Revamp ioctls, add non-monotonic timestamps, mirroring
* Revamp most of HAMMER's ioctl structures with an eye towards future enhancements.
* Adjust on-media structures to include
HAMMER 58B/Many: Revamp ioctls, add non-monotonic timestamps, mirroring
* Revamp most of HAMMER's ioctl structures with an eye towards future enhancements.
* Adjust on-media structures to include a non-monotonic creation and deletion timestamps. Since the transaction id no longer translates to a timestamp adding explicit timestamps allows the 'hammer history' and 'undo' utilities to still display timestamps for the change history.
* Start working on the mirroring support ioctls.
show more ...
|
#
5de0c0e5 |
| 23-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 58A/Many: Mirroring support part 1
* Implement mastership domains for transaction ids to support multi-master mirroring and implement mastership selection at mount time.
Mastership domai
HAMMER 58A/Many: Mirroring support part 1
* Implement mastership domains for transaction ids to support multi-master mirroring and implement mastership selection at mount time.
Mastership domains work by having the low 4 bits of the transaction id specify the mastership id (0-15). This allows the mirroring code to distinguish between changes originating on a particular node and changes mirrored from another node.
This also ensures that filesystem objects can be created on the mirrors in parallel without resulting in conflicitng object ids.
* Eliminate time-based TID generation. Just increment the TID as appropriate.
NOTE: Portions of this change may be reverted at a later time depending on how the mirroring implementation proceeds.
* Minor code cleanups.
show more ...
|
#
ddfdf542 |
| 20-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 56D/Many: Media structure finalization, atime/mtime, etc.
* Move the atime and mtime fields to the end of struct hammer_inode_data. CRCs on hammer_inode_data no longer include atime or mtim
HAMMER 56D/Many: Media structure finalization, atime/mtime, etc.
* Move the atime and mtime fields to the end of struct hammer_inode_data. CRCs on hammer_inode_data no longer include atime or mtime, allowing them to be updated without having to update the related B-Tree node.
* Change the time format for ctime, atime, and mtime. These 64 bit fields now store microseconds in real time instead of transaction ids.
* atime is now updated asynchronously, and mtime is now updated with UNDO records only. Split the ITIMES flag into ATIME and MTIME and no longer set the DDIRTY (inode_data generally dirty) flag when the mtime changes.
* Finish on-media structural components for pseudo-fs support inside a HAMMER filesystem.
* Finish on-media structural components for adding a serial number to the B-Tree element structures, for mirroring support.
* Make fsync() wait for the flush to complete, issue extra flushes as needed to take the UNDO FIFO's start position past the fsync'd data so a crash does not undo it.
show more ...
|
#
4a2796f3 |
| 20-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 56C/Many: Performance tuning - MEDIA STRUCTURES CHANGED!
* Implement variable block sizes. Records at offsets < 1MB use 16K buffers while records at offsets >= 1MB use 64K buffers. This i
HAMMER 56C/Many: Performance tuning - MEDIA STRUCTURES CHANGED!
* Implement variable block sizes. Records at offsets < 1MB use 16K buffers while records at offsets >= 1MB use 64K buffers. This improves performance primarily by reducing the number of B-Tree elements we have to stuff.
* Mess around with the deadlock handling code a bit. It still needs a re-think but it works. Implement low-priority shared locks. A low priority shared lock can only be acquired if no other locks are held by the thread.
* Implement slow-down code for the record backlog to the flusher and reimplement the slow-down code that deals with reclaimed inodes queued to the flusher. This should hopefully fix the kernel memory exhaustion issues for M_HAMMER.
* Update layer2->append_off. It isn't implemented yet but doing this now will prevent media incompatibilities later on when it does get implemented.
* Split hammer_blockmap_free() into hammer_blockmap_free() and hammer_blockmap_finalize().
* Fix a bug in the delayed-CRC handling related to reblocking. When throwing away a modified buffer, pending CRC requests must also be thrown away.
* Fix a bug in the record overlap compare code. If we cannot return 0 due to an overlap because the record has been deleted, we must still return an appropriate formal code so the scan progresses in the correct direction down the red-black tree.
* Make data in the meta-data zone a meta-data buffer structure type so it gets synced to disk at the appropriate time. This may be temporary, it's needed to deal with the atime/mtime code but another commit may soon make it moot.
* Bump the seqcount so cluster_read() does the right thing when reading into a large UIO just after opening a file.
* Do a better job calculating vap->va_bytes. It's still fake, but its a more accurate fake.
* Fix an issue in the BMAP code related to ranges that do not cover the requested logical offset.
* Fix a bug in the blockmap code. If a reservation is released without finalizing any allocations within that big-block, another zone can steal it out from under the current zone's next_offset, resulting in a zone mismatch panic.
show more ...
|
#
bcac4bbb |
| 18-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 56B/Many: Performance tuning - MEDIA STRUCTURES CHANGED!
* MEDIA CHANGE: The atime has been moved back into the inode data proper. The nlinks field has also been moved.
* PERFORMANCE: The
HAMMER 56B/Many: Performance tuning - MEDIA STRUCTURES CHANGED!
* MEDIA CHANGE: The atime has been moved back into the inode data proper. The nlinks field has also been moved.
* PERFORMANCE: The CRC for cached B-Tree nodes was being run on every access instead of just the first time. This was the cause of HAMMER's poor directory scanning performance and cpu-intensive write flushes.
Adjusted to only check the CRC on the initial load into the buffer cache.
* PERFORMANCE: The CRC for modified B-Tree nodes was being regenerated every time the node was modified, so a large number of insertions or deletions modifying the same B-Tree need needlessly regenerated the CRC each time.
Adjusted to delay generation of the CRC until just before the buffer is flushed to the physical media.
Just for the record, B-Tree nodes are 4K and it takes ~25uS to run a CRC on them. Needless to say removing the unnecessary calls solved a lot of performance issues.
* PERFORMANCE: Removed limitations in the node caching algorithms. Now more then one inode can cache pointers to the same B-Tree node.
* PERFORMANCE: When calculating the parent B-Tree node we have to scan the element array to locate the index that points back to the child. Use a power-of-2 algorithm instead of a linear scan.
* PERFORMANCE: Clean up the selection of ip->cache[0] or ip->cache[1] based on whether we are trying to cache the location of the inode or the location of the file object's data.
show more ...
|
#
cb51be26 |
| 17-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 56A/Many: Performance tuning - MEDIA STRUCTURES CHANGED!
* MEDIA CHANGE: The blockmaps have been folded into the freemap. Allocations are now made directly out of the freemap. More work i
HAMMER 56A/Many: Performance tuning - MEDIA STRUCTURES CHANGED!
* MEDIA CHANGE: The blockmaps have been folded into the freemap. Allocations are now made directly out of the freemap. More work is expected here.
The blockmaps are still used to sequence allocations, but no block number translation is required any more. This didn't improve performance much but it will make it easier for future optimizations to localize allocations.
* PERFORMANCE: Removed the holes recording code. Another commit will soon take over the functionality.
* PERFORMANCE: The flusher's slave threads now collect a number of inodes into a batch before starting their work, in an attempt to reduce deadlocks between slave threads from adjacent inodes.
* PERFORMANCE: B-Tree positional caching now works much better, greatly reducing the cpu overhead when accessing the filesystem.
* PERFORMANCE: Added a write-append optimization. Do not do a lookup/iteration to locate records being overwritten when no such records should exist. This cuts the cpu overhead of write-append flushes in half.
* PERFORMANCE: Add a vfs.hammer.write_mode sysctl feature to test out two different ways of queueing write I/O's.
* Added B-Tree statistics (hammer bstats 1).
show more ...
|
#
7bc5b8c2 |
| 13-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 54D/Many: Performance tuning.
* Remove major barriers to write performance and fix hicups revealed by blogbench.
Change the HAMMER reclaim-delay algorithm to operate like a FIFO instead
HAMMER 54D/Many: Performance tuning.
* Remove major barriers to write performance and fix hicups revealed by blogbench.
Change the HAMMER reclaim-delay algorithm to operate like a FIFO instead of as a free-for-all. The idea of introducing a dynamic delay helped some, but the addition of the wakeup FIFO allows burst completions by the flusher to immediately wakeup processes that were waiting for the reclaim count to drain. The result is far, far smoother operation.
* Remove a major blocking conflict between the buffer cache daemon and HAMMER. The buffer cache was getting stuck on trying to overwrite dirty records that had already been queued to the flusher. The flusher might not act on the record(s) for a long period of time, causing the buffer cache daemon to stall.
Fix the problem by properly using the HAMMER_RECF_INTERLOCK_BE flag, which stays on only for a very short period of time, instead of testing the record's flush state (record->flush_state), which can stay in the HAMMER_FST_FLUSH state for a very long time.
* The parent B-Tree node does not need to be locked when inserting into the child.
* Use the new B_AGE semantics to keep meta-data intact longer. This results in another big improvement in random read and write performance.
show more ...
|
#
d99d6bf5 |
| 12-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 54B/Many: Performance tuning.
* Fix a major performance issue in the UNDO generation code. The code was calling hammer_bread() instead of hammer_bnew() for 'new' undo buffers, meaning it
HAMMER 54B/Many: Performance tuning.
* Fix a major performance issue in the UNDO generation code. The code was calling hammer_bread() instead of hammer_bnew() for 'new' undo buffers, meaning it was doing a read-modify-write on the disk instead of just a write.
This fix results in a MAJOR improvement in performance across the board.
* Replace the only lockmgr lock in the module with a hammer_lock.
* Tweak hammer_inode_waitreclaims(). This will probably need even more tweaking as time passes.
show more ...
|
#
a99b9ea2 |
| 11-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 54/Many: Performance tuning
* Implement hammer_vop_bmap().
* Implement cluster_read() support. This should jump up linear read performance almost to the platter speed. I get 100 MB/sec t
HAMMER 54/Many: Performance tuning
* Implement hammer_vop_bmap().
* Implement cluster_read() support. This should jump up linear read performance almost to the platter speed. I get 100 MB/sec testing vs 35 MB/sec previously.
* Do a better job kicking an inode into the flusher when writing sequentially. This hops up write rate at least +50%. It isn't quite able to run at the platter speed due to B-Tree overheads which will be addressed in a later patch.
* Do not create data fragments at the ends of files greater then 16K, use a full 16K block. The reason is that fragments in HAMMER are allocated out of a wholely different zone and we do not want to lose the chance of making the tail end of the file contiguous.
Files less then 16K still use data fragments.
* Fix a machine lockup related to an interrupt race with biodone() and insertions and deltions from hmp->lose_list.
* Fix a memory exhaustion issue.
Reported-by: Francois Tigeot <ftigeot@wolfpond.org> (machine lockup) Credit-also: Jonathan Stuart on the 0 byte sized file bug fix.
show more ...
|
#
af209b0f |
| 10-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 53H/Many: Performance tuning, bug fixes
* CHANGE THE ON-MEDIA B-TREE STRUCTURE. The number of elements per node has been increased from 16 to 64. The intent is to reduce the number of see
HAMMER 53H/Many: Performance tuning, bug fixes
* CHANGE THE ON-MEDIA B-TREE STRUCTURE. The number of elements per node has been increased from 16 to 64. The intent is to reduce the number of seeks required in a heavy random-access loading situation.
* Add a shortcut to the B-Tree node scanning code (requires more testing). Instead of scanning linearly we do a power-of-2 narrowing search.
* Only do clustered reads for DATA types. Do not cluster meta-data (aka B-Tree) I/O. Note that the inode data structure is considered to be a DATA type. Reduce the cluster read side from 256K to 64K to avoid blowing out the buffer cache.
* Augment hammer locks so one can discern between a normal lock blockage and one that is recovering from a deadlock.
* Change the slave work threads for the flusher to pull their work off a single queue. This fixes an issue where one slave work thread would sometimes get a disproportionate percentage of the work and the master thread then had to wait for it to finish while the other work threads were twiddling their thumbs.
* Adjust the wait reclaims code to solve a long standing performance issue. The flusher could get so far behind that the system's buffer cache buffers would no longer have any locality of reference to what was being flushed, causing a massive drop in performance.
* Do not queue a dirty inode to the flusher unconditionally in the strategy write code. Only do it if system resources appear to be stressed. The inode will get flushed when the filesystem syncs.
* Code cleanup.
* Fix a bug reported by Antonio Huete Jimenez related to 0-length writes not working properly.
show more ...
|
#
3897d7e9 |
| 10-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 53E/Many: Performance tuning
* Change the code which waits for reclaims to drain to be more inline with the new bwillwrite(). Impose a dynamic delay instead of blocking outright.
* Move t
HAMMER 53E/Many: Performance tuning
* Change the code which waits for reclaims to drain to be more inline with the new bwillwrite(). Impose a dynamic delay instead of blocking outright.
* Move the hammer_inode_waitreclaims() call from hammer_vop_open() to hammer_get_inode(), and only call it when we would otherwise have to create a new inode.
* Sort HAMMER's file list in conf/files.
show more ...
|
#
cebe9493 |
| 10-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 53D/Many: Stabilization
* Fix an overwrite bug with direct write which could result in file corruption.
* Reserve just-freed big blocks for two flush cycles to prevent HAMMER from overwr
HAMMER 53D/Many: Stabilization
* Fix an overwrite bug with direct write which could result in file corruption.
* Reserve just-freed big blocks for two flush cycles to prevent HAMMER from overwriting destroyed data so it does not become corrupt if the system crashes. This is needed because the recover code does not record UNDOs for data (nor do we want it to).
* More I/O subsystem work. There may still be an ellusive panic related to calls to regetblk().
show more ...
|
#
9f5097dc |
| 09-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 53C/Many: Stabilization
* HAMMER queues dirty inodes reclaimed by the kernel to the backend for their final sync. Programs like blogbench can overload the backend and generate more new i
HAMMER 53C/Many: Stabilization
* HAMMER queues dirty inodes reclaimed by the kernel to the backend for their final sync. Programs like blogbench can overload the backend and generate more new inodes then the backend can dispose of, running M_HAMMER out of memory.
Add code to stall on vop_open() when this condition is detected to give the backend a chance to catch-up. (see NOTE 1 below).
* HAMMER could build up too many meta-data buffers and cause the system to deadlock in newbuf. Recode the flusher to allow a block of UNDOs, the volume header, and all related meta-data buffers to be flushed piecemeal, and then continue the flush loop without closing out the transaction. If a crash occurs the recovery code will undo the partial flushes.
* Fix an issue located by FSX under load. The in-memory/on-disk record merging code was not dealing with in-memory data records properly The key field for data records is (base_offset + data_len), not just (base_off), so a 'match' between an in-memory data record and an on-disk data records requires a special case test. This is the case where the in-memory record is intended to overwrite the on-disk record, so the in-memory record must be chosen and the on-disk record discarded for the purposes of read().
* Fix a bug in hammer_io.c related to the handling of B_LOCKED buffers that resulted in an assertion at umount time. Buffer cache buffers were not being properly disassociated from their hammer_buffer countparts in the direct-write case.
* The frontend's direct-write capability for truncated buffers (such as used with small files) was causing an assertion to occur on the backend. Add an interlock on the related hammer_buffer to prevent the frontend from attempting to modify the buffer while the backend is trying to write it to the media.
* Dynamically size the dirty buffer limit. This still needs some work.
(NOTE 1): On read/write performance issues. Currently HAMMER's frontend VOPs are massively disassociated from modifying B-Tree updates. Even though a direct-write capability now exists, it applies only to bulk data writes to disk and NOT to B-Tree updates. Each direct write creates a record which must be queued to the backend to do the B-Tree update on the media. The flusher is currently single-threaded and when HAMMER gets too far behind doing these updates the current safeties will cause performance to degrade drastically. This is a known issue that will be addressed.
show more ...
|
#
0832c9bb |
| 08-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 53B/Many: Complete overhaul of strategy code, reservations, etc
* Completely overhaul the strategy code. Implement direct reads and writes for all cases. REMOVE THE BACKEND BIO QUEUE. BI
HAMMER 53B/Many: Complete overhaul of strategy code, reservations, etc
* Completely overhaul the strategy code. Implement direct reads and writes for all cases. REMOVE THE BACKEND BIO QUEUE. BIOs are no longer queued to the flusher under any circumstances.
Remove numerous hacks that were previously emplaced to deal with BIO's being queued to the flusher.
* Add a mechanism to invalidate buffer cache buffers that might be shadowed by direct I/O. e.g. if a strategy write uses the vnode's bio directly there may be a shadow hammer_buffer that will then become stale and must be invalidated.
* Implement a reservation tracking structure (hammer_reserve) to track storage reservations made by the frontend. The backend will not attempt to free or reuse reserved space if it encounters it.
Use reservations to back cached holes (struct hammer_hole) for the same reason.
* Index hammer_buffer on the zone-X offset instead of the zone-2 offset. Base the RB tree in the hammer_mount instead of (zone-2) hammer_volume. This removes nearly all blockmap lookup operations from the critical path.
* Do a much better job tracking cached dirty data for the purposes of calculating whether the filesystem will become full or not.
* Fix a critical bug in the CRC generation of short data buffers.
* Fix a VM deadlock.
* Use 16-byte alignment for all on-disk data instead of 8-byte alignment.
* Major code cleanup.
As-of this commit write performance is now extremely good.
show more ...
|
#
47637bff |
| 07-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 53A/Many: Read and write performance enhancements, etc.
* Add hammer_io_direct_read(). For full-block reads this code allows a high-level frontend buffer cache buffer associated with the
HAMMER 53A/Many: Read and write performance enhancements, etc.
* Add hammer_io_direct_read(). For full-block reads this code allows a high-level frontend buffer cache buffer associated with the regular file vnode to directly access the underlying storage, instead of loading that storage via a hammer_buffer and bcopy()ing it.
* Add a write bypass, allowing the frontend to bypass the flusher and write full-blocks directly to the underlying storage, greatly improving frontend write performance. Caveat: See note at bottom.
The write bypass is implemented by adding a feature whereby the frontend can soft-reserve unused disk space on the physical media without having to interact (much) with on-disk meta-data structures. This allows the frontend to flush high-level buffer cache buffers directly to disk and release the buffer for reuse by the system, resulting in very high write performance.
To properly associate the reserved space with the filesystem so it can be accessed in later reads, an in-memory hammer_record is created referencing it. This record is queued to the backend flusher for final disposition. The backend disposes of the record by inserting the appropriate B-Tree element and marking the storage as allocated. At that point the storage becomes official.
* Clean up numerous procedures to support the above new features. In particular, do a major cleanup of the cached truncation offset code (this is the code which allows HAMMER to implement wholely asynchronous truncate()/ftruncate() support.
Also clean up the flusher triggering code, removing numerous hacks that had been in place to deal with the lack of a direct-write mechanism.
* Start working on statistics gathering to track record and B-Tree operations.
* CAVEAT: The backend flusher creates a significant cpu burden when flushing a large number of in-memory data records. Even though the data itself has already been written to disk, there is currently a great deal of overhead involved in manipulating the B-Tree to insert the new records. Overall write performance will only be modestly improved until these code paths are optimized.
show more ...
|
#
51c35492 |
| 03-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 52/Many: Read-only mounts and mount upgrades/downgrades.
* Finish implementing MNT_UPDATE, allowing a HAMMER mount to be upgraded or downgraded.
* Adjust the recovery code to not flush buf
HAMMER 52/Many: Read-only mounts and mount upgrades/downgrades.
* Finish implementing MNT_UPDATE, allowing a HAMMER mount to be upgraded or downgraded.
* Adjust the recovery code to not flush buffers dirtied by recovery operations (running the UNDOs) when the mount is read-only. The buffers will be flushed when the mount is updated to read-write.
* Improve recovery performance by not flushing dirty buffers until the end (if a read-write mount).
* A crash which occurs during recovery might cause the next recovery to fail. Delay writing out the recovered volume header until all the other buffers have been written out to fix the problem.
show more ...
|
#
e63644f0 |
| 02-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 51/Many: Filesystem full casework, nohistory flag.
* Track the amount of unsynced information and return ENOSPC if the filesystem would become full. The idea here is to detect that the f
HAMMER 51/Many: Filesystem full casework, nohistory flag.
* Track the amount of unsynced information and return ENOSPC if the filesystem would become full. The idea here is to detect that the filesystem is full and yet still give the flusher enough runway to flush cached dirty data and inodes.
* Implement the NOHISTORY flag. Implement inheritance of NOHISTORY and NODUMP.
The NOHISTORY flag tells HAMMER not to retain historical information on a filesystem object. If set on a directory any objects created in that directory will also inherit the flag. For example, it could be set on /usr/obj.
show more ...
|
#
6f97fce3 |
| 01-Jun-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 50/Many: VFS_STATVFS() support, stabilization.
* Add support for VFS_STATVFS(), returning 64 bit quantities for available space, etc.
* When freeing a big-block any holes cached for that b
HAMMER 50/Many: VFS_STATVFS() support, stabilization.
* Add support for VFS_STATVFS(), returning 64 bit quantities for available space, etc.
* When freeing a big-block any holes cached for that block must be cleaned out.
* Fix a conditional testing whether a layer2 big-block must be allocated in layer1. The bug could only occur if a layer2 big-block gets freed in layer1, and we currently never do this.
* Clean-up comments related to freeing blocks.
show more ...
|
#
2f85fa4d |
| 18-May-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 46/Many: Performance pass, media changes, bug fixes.
* Add a localization field to the B-Tree element which has sorting priority over the object id.
Use the localization field to separat
HAMMER 46/Many: Performance pass, media changes, bug fixes.
* Add a localization field to the B-Tree element which has sorting priority over the object id.
Use the localization field to separate inode entries from file data. This allows the reblocker to cluster inode information together and greatly improves directory/stat performance.
* Enhance the reblocker to reblock internal B-Tree nodes as well as leaves.
* Enhance the reblocker by adding 'reblock-inodes' in addition to 'reblock-data' and 'reblock-btree', allowing individual types of meta-data to be independantly reblocked.
* Fix a bug in hammer_bread(). The buffer's zoneX_offset field was sometimes not being properly masked, resulting in unnecessary blockmap lookups. Also add hammer_clrxlate_buffer() to clear the translation cache for a hammer_buffer.
* Fix numerous issues with hmp->sync_lock.
* Fix a buffer exhaustion issue in the pruner and reblocker due to not counting I/O's in progress as being dirty.
* Enhance the symlink implementation. Take advantage of the extra 24 bytes of space in the inode data to directly store symlinks <= 24 bytes.
* Use cluster_read() to gang read I/O's into 64KB chunks. Rely on localization and the reblocker and pruner to make doing the larger I/O's worthwhile.
These changes reduce ls -lR overhead on 43383 files (half created with cpdup, half totally randomly created with blogbench). Overhead went from 35 seconds after reblocking, before the changes, to 5 seconds after reblocking, after the changes.
show more ...
|
#
77062c8a |
| 06-May-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 41B/Many: Cleanup.
* Disable (most) debugging kprintfs unless a hammer debug sysctl is set.
* Do not allow buffers to be synced on panic.
|
#
c9b9e29d |
| 04-May-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 40F/Many: UNDO cleanup & stabilization.
* Properly classify UNDO zone buffers so they are flushed at the correct point in time.
* Minor rewrite of the code tracking the UNDO demark for the
HAMMER 40F/Many: UNDO cleanup & stabilization.
* Properly classify UNDO zone buffers so they are flushed at the correct point in time.
* Minor rewrite of the code tracking the UNDO demark for the next flush.
* Introduce a considerably better backend flushing activation algorithm to avoid single-buffer flushes.
* Put a lock around the freemap allocator.
show more ...
|
#
e8599db1 |
| 03-May-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 40F/Many: Inode/link-count sequencer cleanup pass, UNDO cache.
* Implement an UNDO cache. If we have already laid down an UNDO in the current flush cycle we do not have to lay down another
HAMMER 40F/Many: Inode/link-count sequencer cleanup pass, UNDO cache.
* Implement an UNDO cache. If we have already laid down an UNDO in the current flush cycle we do not have to lay down another one for the same address. This greatly reduces the number of UNDOs we generate during a flush.
* Properly get the vnode in order to be able to issue vfsync()'s from the backend. We may also have to acquire the vnode when doing an unload check for a file deletion.
* Properly generate UNDO records for the volume header. During crash recovery we have to UNDO the volume header along with any partially written meta-data, because the volume header refers to the meta-data.
* Add another record type, GENERAL, representing inode or softlink records.
* Move the setting of HAMMER_INODE_WRITE_ALT to the backend, allowing the kernel to flush buffers up to the point where the backend syncs the inode.
show more ...
|
#
1f07f686 |
| 02-May-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 40A/Many: Inode/link-count sequencer.
* Remove the hammer_depend structure and build the dependancies directly into the hammer_record structure.
* Attempt to implement layout rules to ensu
HAMMER 40A/Many: Inode/link-count sequencer.
* Remove the hammer_depend structure and build the dependancies directly into the hammer_record structure.
* Attempt to implement layout rules to ensure connectivity is maintained. This means, for example, that before HAMMER can flush a newly created file it will make sure the file has namespace connectivity to the directory it was created it, recursively to the root.
NOTE: 40A destabilizes the filesystem a bit, it's going to take a few passes to get everything working properly. There are numerous issues with this commit.
show more ...
|
#
0729c8c8 |
| 29-Apr-2008 |
Matthew Dillon <dillon@dragonflybsd.org> |
HAMMER 39/Many: Parallel operations optimizations
* Implement a per-direct cache of new object IDs. Up to 128 directories will be managed in LRU fashion. The cached provides a pool of object I
HAMMER 39/Many: Parallel operations optimizations
* Implement a per-direct cache of new object IDs. Up to 128 directories will be managed in LRU fashion. The cached provides a pool of object IDs to better localize the object ids of files created in a directory, so parallel operations on the filesystem do not create a fragmented object id space.
* Cache numerous fields in the root volume's header to avoid creating undo records for them, creatly improving
(ultimately we can sync an undo space representing the volume header using a direct comparison mechanic but for now we assume the write of the volume header to be atomic).
* Implement a zone limit for the blockmap which newfs_hammer can install. The blockmap zones have an ultimate limit of 2^60 bytes, or around one million terrabytes. If you create a 100G filesystem there is no reason to let the blockmap iterate over its entire range as that would result in a lot of fragmentation and blockmap overhead. By default newfs_hammer sets the zone limit to 100x the size of the filesystem.
* Fix a bug in the crash recovery code. Do not sync newly added inodes once the flusher is running, otherwise the volume header can get out of sync. Just create a dummy marker structure and move it to the tail of the inode flush_list when the flush starts, and stop when we hit it.
* Adjust hammer_vfs_sync() to sync twice. The second sync is needed to update the volume header's undo fifo indices, otherwise HAMMER will believe that it must undo the last fully synchronized flush.
show more ...
|