hammer_reblock.c - OpenGrok history log for /dflybsd-src/sys/vfs/hammer/hammer

Revision	Date	Author	Comments
# 745703c7	07-Jul-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	hammer: Remove trailing whitespaces - (Non-functional commits could make it difficult to git-blame the history if there are too many of those)
# a981af19	02-Jul-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Change "bigblock" to "big-block" - There are(or were) several terms for 8MB chunk, for example "big-block", "bigblock", "big block", "large-block", etc but "big-block" seems to b sys/vfs/hammer: Change "bigblock" to "big-block" - There are(or were) several terms for 8MB chunk, for example "big-block", "bigblock", "big block", "large-block", etc but "big-block" seems to be the canonical term. - Changes are mostly comments and some in printf and hammer(8). Variable names (e.g. xxx_bigblock_xxx) remain unchanged. - The official design document as well as much of the existing code (excluding variable and macro names) use "big-block". https://www.dragonflybsd.org/hammer/hammer.pdf - Also see e04ee2de and the previous commit. show more ...
# d165c90a	02-Jul-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Change "big block" to "big-block" - This word refers to 8MB chunk in hammer's blockmap layers, not literally "big" "block". - Changes are mostly comments and some in printf and ha sys/vfs/hammer: Change "big block" to "big-block" - This word refers to 8MB chunk in hammer's blockmap layers, not literally "big" "block". - Changes are mostly comments and some in printf and hammer(8). - The official design document as well as much of the existing code (excluding variable and macro names) use "big-block". https://www.dragonflybsd.org/hammer/hammer.pdf - Also see e04ee2de. show more ...
# f1c0ae53	30-Apr-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Add inline functions hammer_modify_buffer\|volume_noundo() - Add noundo wrappers hammer_modify_buffer\|volume_noundo() similar to the existing inline function hammer_modify_node_noun sys/vfs/hammer: Add inline functions hammer_modify_buffer\|volume_noundo() - Add noundo wrappers hammer_modify_buffer\|volume_noundo() similar to the existing inline function hammer_modify_node_noundo() for better readability. - A pair of args (NULL, 0) indicating that it's not generating undo is a bit unclear (and there are even comments for them). - (The compiler doesn't actually inline hammer_modify_node_noundo() in my environment, but these one-line wrappers are inlined) show more ...
# 5e1e1454	24-Apr-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Add -A option to reblock\|rebalance all pfs - -A option makes certain per pfs hammer commands perform on all pfs of the filesystem that the [filesystem] arg belongs to. Currently sys/vfs/hammer: Add -A option to reblock\|rebalance all pfs - -A option makes certain per pfs hammer commands perform on all pfs of the filesystem that the [filesystem] arg belongs to. Currently hammer reblock and rebalance commands support this. It does nothing to other commands. - With -A option, above hammer commands use a range of 0 to 0xFFFF for pfs id (upper 16 bits) of the cursor localization. This makes it iterate all pfs in the filesystem. - Above difference in localization range means btree iteration applies to larger range of nodes in terms of pfs id, since it's been used as a top priority key that works as a localizing factor of pfs within the btree. There is no logical difference other than the range is different. So performing these commands on all pfs is as simple as changing the localization range (unless other keys are involved as additional parameters like hammer prune does). show more ...
# 6540d157	21-Apr-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Cleanup cursor initialization code on reblock - Just make things a bit more clear. - The rule is the ioctl caller sets localization type to reblock, and the ioctl code adds up ip sys/vfs/hammer: Cleanup cursor initialization code on reblock - Just make things a bit more clear. - The rule is the ioctl caller sets localization type to reblock, and the ioctl code adds up ip localization to initialize cursor. show more ...
# 558a44e2	21-Apr-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Fix comment - Sync a comment with what's written in reblock_usage().
# 4fa5fb92	21-Apr-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Cleanup sanity checks - Move sanity checks to the beginning of the function. - Check 'free_level > HAMMER_BIGBLOCK_SIZE'. free_level is somewhere between 0 and 8MB (inclusive).
# e04ee2de	31-Jan-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	hammer: fix terminology of "large block" This cleanup patch changes terminology "large block" to "big block". - Both "large block" and "big block" are widely used in hammer source from kernel to us hammer: fix terminology of "large block" This cleanup patch changes terminology "large block" to "big block". - Both "large block" and "big block" are widely used in hammer source from kernel to userspace, however these two refer to the same data structure which is a 8MB sized chunk within low level blockmapped storage layer. - The original design document https://www.dragonflybsd.org/hammer/hammer.pdf uses big block for this data structure. Having two expressions in its implementation is confusing and makes grep difficult. Closes: #2782 show more ...
# f31f6d84	07-Jan-2013	Sascha Wildner <saw@online.de>	kernel/hammer: Remove unused variables and add __debugvar.
# 55b50bd5	16-Jan-2012	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Fix 3:00 a.m. crashes (deadlocks) related to HAMMER VM use When memory is low and the pageout daemon needs to write things out we still need to have at least some reserve to perform the sup kernel - Fix 3:00 a.m. crashes (deadlocks) related to HAMMER VM use When memory is low and the pageout daemon needs to write things out we still need to have at least some reserve to perform the supporting operations for the pageout. HAMMER is particularly memory intensive and could get into a situation where insufficient reserve memory was available, deadlocking the system. With these changes DragonFly should run stable on systems with as little as 256M of ram, and possibly a bit lower. * The getblk/bread/bwrite/etc brelse/bqrelse sequence used to manage buffers had several bugs in it that prevented the low memory handling code from operating properly. The b[q]relse() sequence was not properly detecting the low memory condition and freeing or caching the underlying VM pages (when possible). * Also change the low memory test used by the buffer cache from 'severe' to 'min' in kern/vfs_bio.c. We may be able to change this back to 'severe' at a later date with further testing. These tests are in brelse(), bqrelse(), and vfs_vmio_release(). * Rewrite bio_page_alloc(). It effectively does the same thing that it did before but should operate more smoothly. We also no longer try to recover pages from unrelated buffer cache buffers from this function, which could lead to deadlocks. The warning kprintf is now also rate-limited. * Add a buffer overload test in the hammer dedup ioctl. A hammer dedup could cause a buffer cache deadlock by allowing too many dirty buffers to build up. * Add a VM memory test to the core hammer flusher code that was previously only checking for the UNDO meta-data and buffer overload limits. This is now done on a per-record basis and should prevent HAMMER from allocating too much memory during a flusher operation when the VM system is already too low on memory. * Add some vm_wait_nominal() calls in critical I/O paths, but make sure we do not use these calls in any I/O path used by the HAMMER pageout code. Probably the most important path is the vm_object_page_clean() code path, effectively called via either msync() or via the 30-60 second system sync. Properly bawrite() a buffer in hammer_vop_write() when IO_ASYNC is set (which is used by the pageout daemon), otherwise the pageout daemon will not be able to directly recover memory in low memory situations when paging to a HAMMER file mapped SHARED/RW. Testing-by: tuxillo, lentferj, ftigeot, dillon show more ...
# e86903d8	11-Apr-2011	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Fix degenerate stall condition in flusher during unmount * Fix a case where the flusher can stall during an unmount. * Rework the flusher sequence numbers to always allocate a sequence HAMMER VFS - Fix degenerate stall condition in flusher during unmount * Fix a case where the flusher can stall during an unmount. * Rework the flusher sequence numbers to always allocate a sequence number when a flush is requested, remove the flusher.act field, and rejigger the code a bit. * This also cleans up an edge case when a full sync is inserted (when taking snapshots, filesystem sync, etc), by inserting several sequence numbers to completely flush the UNDO/REDO FIFO before moving on to the next active flush group. Reported-by: Sepherosa Ziehau <sepherosa@gmail.com>, Francois Tigeot <ftigeot@wolfpond.org>, numerous others. show more ...
# 18bee4a2	03-Apr-2011	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Implement swapcache for HAMMER data in double_buffer mode * Support swapcache data caching when HAMMER's double_buffer mode is enabled. Typically the following sysctls: vfs.hammer. HAMMER VFS - Implement swapcache for HAMMER data in double_buffer mode * Support swapcache data caching when HAMMER's double_buffer mode is enabled. Typically the following sysctls: vfs.hammer.double_buffer=1 vm.swapcache.read_enable=1 vm.swapcache.data_enable=1 vm.swapcache.meta_enable=1 (optional) vm.swapcache.use_chflags=0 (optional - see man swapcache) * This causes swapcache to attempt to cache file data from HAMMER filesystems stored via the block device instead of the individual file vnodes. * This allows swapcache to more efficiently cache file data without vnode recycling from a limited kern.maxvnodes value getting in the way. If you have a large dataset spread across many smaller files which would normally overwhelm maxvnodes, and even on large systems handling very large data sets where you wish to cache the file data for some of the files (using use_chflags=1 mode), this makes it possible to cache ALL the file data AND meta-data on the SSD even though the related vnodes cached by the kernel get recycled. * Whereas it may have been inefficient to turn on vm.swapcache.data_enable before, due to filesystem scans and such, it may now be possible to this feature on with double buffering also enabled. Note that you must still be cognizant of the aggregate amount of file data being accessed by your system if you have set use_chflags to 0, you simply no longer need to worry about how many files that data belongs to. * Enabling HAMMER's double_buffer mode will reduce performance somewhat for the normal best-case file caching, but it will also greatly improve performance once you start blowing out your memory caches. show more ...
# b4f86ea3	12-Jan-2011	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Remove B-Tree allocation hints, add double_buffer option. * Remove the allocation hints when allocating b-tree nodes and remove over-full test in the blockmap allocator for b-tree and HAMMER VFS - Remove B-Tree allocation hints, add double_buffer option. * Remove the allocation hints when allocating b-tree nodes and remove over-full test in the blockmap allocator for b-tree and meta-data elements. The hinting and leaving some space unused in the big-blocks did not improve performance. Write performance is actually slightly better when new allocations are made linearly. * Either way we have to depend on the reblocker to reorganize the B-Tree. * Add a sysctl vfs.hammer.double_buffer, defaulting to off. This is currently used for debugging and testing live-dedup. Normally only small-data blocks are run through the device vnode's buffer cache (allowing us to consolidate many small data blocks within the device vnode's buffer cache), and large data blocks are read directly into the file vnode's buffer. Turning on double_buffer cases ALL file data to run through the device vnode's buffer cache, resulting in double data caching which is not normally useful, so leave this off for now. It will not improve performance. show more ...
# b9107f58	17-Aug-2010	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER Utility - Add catastrophic recovery feature * hammer -f <devices> recover <empty_target_dir> * Add a catastrophic recovery feature. A HAMMER filesystem image is scanned (using the -f <blo HAMMER Utility - Add catastrophic recovery feature * hammer -f <devices> recover <empty_target_dir> * Add a catastrophic recovery feature. A HAMMER filesystem image is scanned (using the -f <blockdevs> specification). Any buffer which looks like a B-Tree node is then sub-scanned for inode, directory, and data records and the filesystem is reconstructed in the specified target directory. * The files and directories are initially named after the object id and are renamed and moved as directory entries are found to resolve the fragmentory information. * File writes strip trailing 0's (data records are not limited to the file EOF), but will properly truncate the file if/when the related inode record is found. * Currently no attempt is made to restore owner, group, file modes, softlinks, or hardlinks (only one link will be restored). TODO: Currently a valid volume header is required, but the only thing we actually need from it is the vol_buf_beg field. This field could be guessed or passed in on the command line in a future update to the recovery code. show more ...
# 07ed04b5	19-Apr-2010	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Fix probable corruption case when filesystem becomes nearly full * The reblocking code was incorrectly assuming the cursor would be pointing at a valid node element after an unlock/re HAMMER VFS - Fix probable corruption case when filesystem becomes nearly full * The reblocking code was incorrectly assuming the cursor would be pointing at a valid node element after an unlock/relock sequence, when it could actually be pointing at the EOF of a node. This case can occur when the filesystem is nearly full (possibly due to the reblocking operation itself), when the filesystem is also under load from unrelated operations. * This can result in the creation of a corrupted B-Tree leaf node or data record. * Corruption can be checked with hammer checkmap and hammer show (as of this rev): hammer -f device checkmap Should output no B-Tree node records or free space mismatches. You will still get the initial volume summary. hammer -f device show \| egrep '^B' \| egrep -v '^BM' Should output no records. * Currently the only recourse if corruption is found is to copy off the filesystem, newfs_hammer, and copy it back. Full history and snapshots can be retained by using 'hammer -B mirror-read' to copy off the filesystem and mirror-write to copy it back. However, pleaes remember you must do this for each PFS individually. Make sure you have a viable backup before newfsing anything. Reported-by: Francois Tigeot <ftigeot@wolfpond.org>, Jan Lentfer <Jan.Lentfer@web.de> show more ...
# ebbcfba9	01-Apr-2010	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Fix insufficient cursor change test * The reblocking code tests whether a cursor has changed after being unlocked. This test was insufficient and resulted in an assertion panic. B HAMMER VFS - Fix insufficient cursor change test * The reblocking code tests whether a cursor has changed after being unlocked. This test was insufficient and resulted in an assertion panic. Beef up the test. Reported-by: Jan Lentfer <Jan.Lentfer@web.de> show more ...
# 24cf83d2	20-Mar-2010	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - frontload kmalloc()'s when rebalancing * The rebalancing code must allocate upwards of 16MB of memory to hold copies of B-Tree nodes (~64644K). This is enough to blow out the eme HAMMER VFS - frontload kmalloc()'s when rebalancing * The rebalancing code must allocate upwards of 16MB of memory to hold copies of B-Tree nodes (~64644K). This is enough to blow out the emergency memory reserve used by the pageout daemon and deadlock the system in low memory situations. * Refactor the allocations. Allocate all the memory up-front so no major allocations occur while nodes in the B-Tree are held locked. * There are probably other cases where this may become a problem. With UFS it wasn't an issue because flushing a file was fairly unsophisticated. But with HAMMER certain aspects of the flush require B-Tree lookups and can't be dumbed down to a simple raw disk write. The rebalancing code was the most aggregious abuser of kernel memory though and that should now be fixed. Reported-by: Francois Tigeot <ftigeot@wolfpond.org> show more ...
# b8a41159	12-Feb-2010	Matthew Dillon <dillon@apollo.backplane.com>	kernel - SWAP CACHE part 19/many - distinguish bulk data in HAMMER block dev * Add buf->flags/B_NOTMETA, vm_page->flags/PG_NOTMETA. If set the pages underlying the buffer will not be considered m kernel - SWAP CACHE part 19/many - distinguish bulk data in HAMMER block dev * Add buf->flags/B_NOTMETA, vm_page->flags/PG_NOTMETA. If set the pages underlying the buffer will not be considered meta-data from the point of view of the swapcache. * HAMMER must sometimes access bulk data via the block device instead of via a file vnode. For example, the reblocking and mirroring code. We do not want this data to be misinterpreted as meta-data when the meta-data-only swapcache is turned on, otherwise it will blow out the actual meta-data in the swapcache. HAMMER_RECTYPE_DATA and HAMMER_RECTYPE_DB are considered normal data. All other record types (e.g. direntry, inode, etc) are meta-data. show more ...
# 83f2a3aa	14-Oct-2009	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER - Add version 3 meta-data features * These features are available for filesystem version 3. Version 2 may be upgraded to version 3 in-place. These features are not usable until you upgrad HAMMER - Add version 3 meta-data features * These features are available for filesystem version 3. Version 2 may be upgraded to version 3 in-place. These features are not usable until you upgrade. * Definitively store snapshots in filesystem meta-data. Softlinks still work. The new snapshot directives (snap, snaplo, snapq, etc) also allow you to specify up to a 64-character note for each snapshot you create. The snapls directive may be used to list all snapshots stored in meta-data. 'hammer cleanup' will move all softlink-based snapshots residing in the <fs>/snapshots directory to meta-data when it next snapshots the filesystem (within a day of upgrading, usually). The snapshot softlinks are left intact. Storing snapshot information in meta-data means that accidental wipes of your <fs>/snapshots directory will NOT cause later hammer cleanup runs to destroy your snapshots! The meta-data snapshots are also removed if you do a prune-everything, or through normal pruning expirations, and thus 'hammer snapls' will definitively list your valid snapshots. This feature also means that you can obtain a definitive list of snapshots available on mirroring slaves. * Definitively store the hammer cleanup configuration file in filesystem meta-data. This meta-data is not mirrored. 'hammer cleanup' will move <fs>/snapshots/config to the new meta-data config and deletes <fs>/snapshots/config after you've upgraded the filesystem. You can edit the configuration with the 'viconfig' directive. * The HAMMER utility has new directives: snap, snaplo, snapq, snaprm, snapls, config, and viconfig. * WARNING! Filesystems mounted 'nohistory' and files chflagged similarly do not have snapshots, but the hammer utility still allows the directives to be run. This is a bug that needs to be fixed. show more ...
# c9ce54d6	03-Sep-2009	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER - Fix lost inode issue (primarily with nohistory mounts) * When a HAMMER cursor is unlocked it becomes tracked and unrelated B-Tree operations will cause the tracked cursor's nodes and indi HAMMER - Fix lost inode issue (primarily with nohistory mounts) * When a HAMMER cursor is unlocked it becomes tracked and unrelated B-Tree operations will cause the tracked cursor's nodes and indices to be updated. The cursor structure also has a leaf element pointer which was not being properly updated. This could lead to panics and lost inodes. Properly adjust the leaf element pointer in tracked cursors. * The bug primarily occurs with nohistory mounts or nohistory sub-trees due to the larger number of physical deletions made to the B-Tree, but could also occur (rarely) with normal mounts. * Add additional assertions to catch any further occurrences (though I think all the cases have been covered now). * Add a new sysctl vfs.hammer.error_panic which can be set to e.g. 9 to cause critical errors to panic immediately instead of returning through the call stack, making debugging possible. Reported-by: Numerous people show more ...
# 973c11b9	24-Jun-2009	Matthew Dillon <dillon@apollo.backplane.com>	AMD64 - Fix many compile-time warnings. int/ptr type mismatches, %llx, etc.
# df2ccbac	20-Jun-2009	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Add hinting capability to block allocator, hint B-Tree * A hammer_off_t can now be supplied to the blockmap allocator as a hint. * Use the hinting mechanism to better-localize B-Tree n HAMMER VFS - Add hinting capability to block allocator, hint B-Tree * A hammer_off_t can now be supplied to the blockmap allocator as a hint. * Use the hinting mechanism to better-localize B-Tree node allocations and meta-data updates. show more ...
# 1775b6a0	15-Mar-2009	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Add a B-Tree rebalancing feature. This is the initial commit of B-Tree rebalancing support for HAMMER. The rebalancer may be run using the 'hammer rebalance' utility directive. The lea HAMMER VFS - Add a B-Tree rebalancing feature. This is the initial commit of B-Tree rebalancing support for HAMMER. The rebalancer may be run using the 'hammer rebalance' utility directive. The leafs in a HAMMER B-Tree all reside at the same depth. Insertions and deletions only collapse the B-Tree when a leaf node becomes empty and then only if any necessary recursion (possibly reaching the root node) succeeds. No balancing occurs during normal operation and B-Tree nodes can wind up with wildly different element counts which bloats the tree and makes searches less efficient. The rebalancer effectively does a depth-first traversal of the B-Tree, visiting leaf nodes first and parent nodes as a trailing function on the way back up the tree. For any given internal node the sum total of elements contained in its children is divided by the number of children. The effective number of children is reduced as is practical to obtain a 75% fill level. The elements are then packed into the children and any wholely empty children left over are deleted. The rebalancer does not create new B-Tree nodes. Element packing is fairly complex, requiring tracked cursors, on-media parent pointers, mirror TIDs, and boundary elements to be updated. The rebalancer must hold a large number of B-Tree nodes exclusively locked while running. show more ...
# 982be4bf	24-Jan-2009	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Remove the unused also_ip argument from the cursor API
123 4