hammer_io.c - OpenGrok history log for /dflybsd-src/sys/vfs/hammer/hammer

Revision	Date	Author	Comments
# ff66f880	07-Dec-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Remove union hammer_io_structure Remove union hammer_io_structure that was added in the early stage of hammer development. This has been used in a certain part of hammer_io.c, but th sys/vfs/hammer: Remove union hammer_io_structure Remove union hammer_io_structure that was added in the early stage of hammer development. This has been used in a certain part of hammer_io.c, but the code is more clear without this. Using the existing HAMMER_ITOB() as well as a newly added HAMMER_ITOV() makes the code less complicated than using this union. show more ...
# 195f6076	03-Nov-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Add HAMMER_ITOB() macro to hide explicit cast from C code so the impact will be small on possible data structure change. No change in binary.
# 33234d14	12-Sep-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Adjust raw kprintfs using hkprintf variants This is part3 of hkprintf related after the following two. sys/vfs/hammer: Change hkprintf() to macro and add variants [2/2] sys/vfs/hamme sys/vfs/hammer: Adjust raw kprintfs using hkprintf variants This is part3 of hkprintf related after the following two. sys/vfs/hammer: Change hkprintf() to macro and add variants [2/2] sys/vfs/hammer: Change hkprintf() to macro and add variants [1/2] Above two commits have replaced the existing kprintf calls using "HAMMER:" or "HAMMER(label)" or function name prefix with hkprintf and newly added variants, which basically didn't change actual output other than fixing wrong function names to the right ones, etc. This commit continues replacing remaining kprintfs to make output more understandable than raw kprintf calls with no clue that they're hammer related. For example, an error message like "BIGBLOCK UNDERFLOW\n" or a debug message like "rt %3u, xt %3u, tt %3u\n" become more understanbale with "HAMMER:" prefix or the name of the function. This commit is based on the followings. 1. Use hdkprintf which is hkprintf variant with __func__ prefix if that kprintf call is used when vfs.hammer.debug_xxx is enabled. This implies the messages are only for debugging and those are usually better and more understandable with a function name prefix as mentioned above. Also this is what's been done mostly in the existing hammer code. 2. Use hkprintf which has "HAMMER:" prefix if that kprintf call is a regular hammer message that appears in regular filesystem operations such as "Formatting of valid HAMMER volume %s denied. Erase with dd!\n". 3. Use h[vm]kprintf which are hkprintf variants with hammer label prefix "HAMMER(label)" if that kprintf can safely access the label via vol or hmp pointer. Some kprintfs in hammer does this rather than just "HAMMER:" and this seems to be better, however this commit doesn't go far as to aggressively replace the existing ones with this because a caller safely dereferencing hmp or vol is a different topic from merely replacing. show more ...
# 903fdd05	11-Sep-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Add hpanic() [1/2] This commit does the following. 1. Add a macro hpanic() which is a wrapper for panic() that embed the following prefix. "function_name: ..." 2. Replace raw sys/vfs/hammer: Add hpanic() [1/2] This commit does the following. 1. Add a macro hpanic() which is a wrapper for panic() that embed the following prefix. "function_name: ..." 2. Replace raw panic() calls that have above prefix with hpanic(). 3. Fix some wrong function name literals using __func__. show more ...
# 11605a5c	15-Sep-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Change hkprintf() to macro and add variants [2/2] This commit does the following. 1. Add macros hdkprintf() and hdkrateprintf() that embed the following prefix. "function_name sys/vfs/hammer: Change hkprintf() to macro and add variants [2/2] This commit does the following. 1. Add macros hdkprintf() and hdkrateprintf() that embed the following prefix. "function_name: ..." 2. Replace raw kprintf() calls that have above prefix with newly added macros. 3. Fix some wrong function name literals using __func__. show more ...
# 8fc055b2	06-Sep-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Remove unnecessary header includes
# 97fb61c0	06-Sep-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Remove header includes from hammer.h Remove #include <sys/buf2.h> #include <sys/mountctl.h> #include <sys/globaldata.h> #include <vm/vm_page2.h> from sys/vfs/hammer/hammer.h
# 7bb4ec32	06-Sep-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Add header includes to hammer.h Add #include <sys/fcntl.h> #include <sys/dirent.h> #include <sys/sysctl.h> #include <sys/event.h> #include <sys/file.h> #include <vm/swap_pager. sys/vfs/hammer: Add header includes to hammer.h Add #include <sys/fcntl.h> #include <sys/dirent.h> #include <sys/sysctl.h> #include <sys/event.h> #include <sys/file.h> #include <vm/swap_pager.h> to sys/vfs/hammer/hammer.h show more ...
# 653fa4cd	14-Aug-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	hammer: Cleanups Unfold lines (that aren't intentionally separated into different lines) that fit in 80 chars. Fold lines that are way too long.
# b45803e3	12-Aug-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	hammer: Conform to style(9) Also * Remove header includes that are already included by common hammer headers (of either userspace or kernel). * Add "#include <sys/vnode.h>" to sys/vfs/fifofs/fifo. hammer: Conform to style(9) Also * Remove header includes that are already included by common hammer headers (of either userspace or kernel). * Add "#include <sys/vnode.h>" to sys/vfs/fifofs/fifo.h. show more ...
# 745703c7	07-Jul-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	hammer: Remove trailing whitespaces - (Non-functional commits could make it difficult to git-blame the history if there are too many of those)
# a981af19	02-Jul-2015	Tomohiro Kusumi <kusumi.tomohiro@gmail.com>	sys/vfs/hammer: Change "bigblock" to "big-block" - There are(or were) several terms for 8MB chunk, for example "big-block", "bigblock", "big block", "large-block", etc but "big-block" seems to b sys/vfs/hammer: Change "bigblock" to "big-block" - There are(or were) several terms for 8MB chunk, for example "big-block", "bigblock", "big block", "large-block", etc but "big-block" seems to be the canonical term. - Changes are mostly comments and some in printf and hammer(8). Variable names (e.g. xxx_bigblock_xxx) remain unchanged. - The official design document as well as much of the existing code (excluding variable and macro names) use "big-block". https://www.dragonflybsd.org/hammer/hammer.pdf - Also see e04ee2de and the previous commit. show more ...
# 32fcc103	01-Nov-2013	Matthew Dillon <dillon@apollo.backplane.com>	hammer1 - cleanup, minor bug fixes * Cleanup pass, remove some dead code * Minor bug fixes, add tokens around some paths that need them. * Remove use of the master token in several paths that don' hammer1 - cleanup, minor bug fixes * Cleanup pass, remove some dead code * Minor bug fixes, add tokens around some paths that need them. * Remove use of the master token in several paths that don't need it, improving concurrency. show more ...
# f31f6d84	07-Jan-2013	Sascha Wildner <saw@online.de>	kernel/hammer: Remove unused variables and add __debugvar.
# 41a8e517	01-Apr-2012	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Fix assertion with multi-volume setup * The RB compare code for hammer_io was extracting the volume number from the wrong place, creating a situation where duplicate hammer_io's wou HAMMER VFS - Fix assertion with multi-volume setup * The RB compare code for hammer_io was extracting the volume number from the wrong place, creating a situation where duplicate hammer_io's would sometimes be inserted in the RB tree (causing an assertion + panic). * Pull the volume number from a different field. Reported-by: Mark Saad <nonesuch@longcount.org> show more ...
# 9de13b88	22-Mar-2012	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Cluster fixes + Enable clustering for HAMMER1 * Add cluster_awrite(), which replaces vfs_bio_awrite() and has the same basic semantics as bawrite(). * Remove vfs_bio_awrite(), which had kernel - Cluster fixes + Enable clustering for HAMMER1 * Add cluster_awrite(), which replaces vfs_bio_awrite() and has the same basic semantics as bawrite(). * Remove vfs_bio_awrite(), which had an odd API that required the buffer to be locked but not removed from its queues. * Make cluster operations work on disk device buffers as well as on regular files. * Add a blkflags argument to getcacheblk(), allowing GETBLK_NOWAIT to be passed to it. * Enhance cluster_wbuild() to support cluster_awrite() by having it take an optional bp to incorporate into the cluster. The caller disposes of the bp by calling bawrite() if the cluster_wbuild() code could not use it. * Certain adjustments to cluster_write() and related code in checking against the file EOF to not break when variable block sizes are used. * Fix a bug in calls made to buf_checkwrite(). The caller is required to initiate the I/O if the function returns good (0). HAMMER1 uses this save side effects and blows up if the I/O is then not initiated. * Enable clustering in HAMMER1 for both data and meta-data. show more ...
# 54341a3b	15-Nov-2011	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Greatly improve shared memory fault rate concurrency / shared tokens This commit rolls up a lot of work to improve postgres database operations and the system in general. With this changes kernel - Greatly improve shared memory fault rate concurrency / shared tokens This commit rolls up a lot of work to improve postgres database operations and the system in general. With this changes we can pgbench -j 8 -c 40 on our 48-core opteron monster at 140000+ tps, and the shm vm_fault rate hits 3.1M pps. * Implement shared tokens. They work as advertised, with some cavets. It is acceptable to acquire a shared token while you already hold the same token exclusively, but you will deadlock if you acquire an exclusive token while you hold the same token shared. Currently exclusive tokens are not given priority over shared tokens so starvation is possible under certain circumstances. * Create a critical code path in vm_fault() using the new shared token feature to quickly fault-in pages which already exist in the VM cache. pmap_object_init_pt() also uses the new feature. This increases fault-in concurrency by a ridiculously huge amount, particularly on SHM segments (say when you have a large number of postgres clients). Scaling for large numbers of clients on large numbers of cores is significantly improved. This also increases fault-in concurrency for MAP_SHARED file maps. * Expand the breadn() and cluster_read() APIs. Implement breadnx() and cluster_readx() which allows a getblk()'d bp to be passed. If bpp is not NULL a bp is being passed in, otherwise the routines call getblk(). Modify the HAMMER read path to use the new API. Instead of calling getcacheblk() HAMMER now calls getblk() and checks the B_CACHE flag. This gives getblk() a chance to regenerate a fully cached buffer from VM backing store without having to acquire any hammer-related locks, resulting in even faster operation. * If kern.ipc.shm_use_phys is set to 2 the VM pages will be pre-allocated. This can take quite a while for a large map and also lock the machine up for a few seconds. Defaults to off. * Reorder the smp_invltlb()/cpu_invltlb() combos in a few places, running cpu_invltlb() last. * An invalidation interlock might be needed in pmap_enter() under certain circumstances, enable the code for now. * vm_object_backing_scan_callback() was failing to properly check the validity of a vm_object after acquiring its token. Add the required check + some debugging. * Make vm_object_set_writeable_dirty() a bit more cache friendly. * The vmstats sysctl was scanning every process's vm_map (requiring a vm_map read lock to do so), which can stall for long periods of time when the system is paging heavily. Change the mechanic to a LWP flag which can be tested with minimal locking. * Have the phys_pager mark the page as dirty too, to make sure nothing tries to free it. * Remove the spinlock in pmap_prefault_ok(), since we do not delete page table pages it shouldn't be needed. * Add a required cpu_ccfence() in pmap_inval.c. The code generated prior to this fix was still correct, and this makes sure it stays that way. * Replace several manual wiring cases with calls to vm_page_wire(). show more ...
# 3583bbb4	12-Nov-2011	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Upgrade buffer space tracking variables from int to long * Several bufspace-related buffer cache parameters can now overflow a 32 bit integer on machines with large amounts (~64G+) of mem kernel - Upgrade buffer space tracking variables from int to long * Several bufspace-related buffer cache parameters can now overflow a 32 bit integer on machines with large amounts (~64G+) of memory. Change these to long. bufspace, maxbufspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, dirtybufspace, dirtybufspacehw, runningbufspace, lodirtybufspace, hidirtybufspace. * Also requires an API change to libkcore/libkinfo which effects top. show more ...
# 3038a8ca	11-Nov-2011	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Performance improvements during heavy memory/IO use * Remove the vm.vm_load logic, it was breaking things worse and fixing things not so much. * Fix a bug in the pageout algorithm that w kernel - Performance improvements during heavy memory/IO use * Remove the vm.vm_load logic, it was breaking things worse and fixing things not so much. * Fix a bug in the pageout algorithm that was causing the PQ_ACTIVE queue to drain excessively, messing up the LRU/activity algorithm. * Rip out hammer_limit_running_io and instead just call waitrunningbufspace(). * Change the waitrunningbufspace() logic to add a bit of hyseresis and to fairly block everyone doing write I/O, otherwise some threads may be blocked while other threads are allowed to proceed while the buf_daemon is trying to flush stuff out. show more ...
# 9a98f3cc	10-Apr-2011	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Implement async I/O for double-buffer strategy case * When vfs.hammer.double_buffer is enabled the HAMMER strategy code was running synchronously. This creates numerous problems incl HAMMER VFS - Implement async I/O for double-buffer strategy case * When vfs.hammer.double_buffer is enabled the HAMMER strategy code was running synchronously. This creates numerous problems including extra stalls when read-ahead is issued. * Use the new breadcb() function to allow nominal double_buffer strategy operations to run asynchronously. Essentially the original buffer and CRC is recorded in the device bio and the copyback is made in the callback. * This improves performance vfs.hammer.double_buffer is enabled. show more ...
# 1afb73cf	11-Jan-2011	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Improve saturated write performance (2). * Change the dirty io buffer lists from TAILQs to Red-Black trees. * The dirty io buffers are sorted by disk address on a flush-group by flus HAMMER VFS - Improve saturated write performance (2). * Change the dirty io buffer lists from TAILQs to Red-Black trees. * The dirty io buffers are sorted by disk address on a flush-group by flush-group basis and I/O writes are initiated in sorted order. This significantly improves write I/O throughput to normal HDs. Essentially what is happening here is that the sheer number of unsorted buffers are overwhelming the HDs own caches. Having HAMMER pre-sort the buffers, of which there can be upwards of 100MBs worth, allow the HD to write more optimally. show more ...
# 3b98d912	11-Nov-2010	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Close possible hole in hammer inval code * Do not use FINDBLK_TEST, instead have findblk() return a locked buffer cache buffer and deal with it from there. While the original code shou kernel - Close possible hole in hammer inval code * Do not use FINDBLK_TEST, instead have findblk() return a locked buffer cache buffer and deal with it from there. While the original code should have been ok (it would getblk() the buffer cache in either case), it depended on certain MP race characteristics that might not hold so don't take any chances. * This does not fix any known issues but removes some uncertainty. show more ...
# 77912481	27-Aug-2010	Matthew Dillon <dillon@apollo.backplane.com>	kernel - Major MPSAFE Infrastructure * vm_page_lookup() now requires the vm_token to be held on call instead of the MP lock. And fix the few places where the routine was being called without th kernel - Major MPSAFE Infrastructure * vm_page_lookup() now requires the vm_token to be held on call instead of the MP lock. And fix the few places where the routine was being called without the vm_token. Various situations where a vm_page_lookup() is performed followed by vm_page_wire(), without busying the page, and other similar situations, require the vm_token to be held across the whole block of code. * bio_done callbacks are now MPSAFE but some drivers (ata, ccd, vinum, aio, nfs) are not MPSAFE yet so get the mplock for those. They will be converted to a generic driver-wide token later. * Remove critical sections that used to protect VM system related interrupts, replace with the vm_token. * Spinlocks now bump thread->td_critcount in addition to mycpu->gd_spinlock. Note the ordering is important. Then remove gd_spinlock checks elsewhere that are covered by td_critcount and replace with assertions. Also use td_critcount in the kern_mutex.c code instead of gd_spinlock. This fixes situations where the last crit_exit() would call splx() without checking for spinlocks. Adding the additional checks would have made the crit_() inlines too complex so instead we just fold it into td_critcount. * lwkt_yield() no longer guarantees that lwkt_switch() will be called so call lwkt_switch() instead in places where a switch is required. For example, to unwind a preemption. Otherwise the kernel could end up live-locking trying to yield because the new switch code does not necessarily schedule a different kernel thread. * Add the sysctl user_pri_sched (default 0). Setting this will make the LWKT scheduler more aggressively schedule user threads when runnable kernel threads are unable to gain token/mplock resources. For debugging only. * Change the bufspin spinlock to bufqspin and bufcspin, and generally rework vfs_bio.c to lock numerous fields with bufcspin. Also use bufcspin to interlock waitrunningbufspace() and friends. Remove several mplocks in vfs_bio.c that are no longer needed. Protect the page manipulation code in vfs_bio.c with vm_token instead of the mplock. * Fix a deadlock with the FINDBLK_TEST/BUF_LOCK sequence which can occur due to the fact that the buffer may change its (vp,loffset) during the BUF_LOCK call. Even though the code checks for this after the lock succeeds there is still the problem of the locking operation itself potentially creating a deadlock betwen two threads by locking an unexpected buffer when the caller is already holding other buffers locked. We do this by adding an interlock refcounter, b_refs. getnewbuf() will avoid reusing such buffers. * The syncer_token was not protecting all accesses to the syncer list. Fix that. * Make HAMMER MPSAFE. All major entry points now use a per-mount token, hmp->fs_token. Backend callbacks (bioops, bio_done) use hmp->io_token. The cache-case for the read and getattr paths require not tokens at all (as before). The bitfield flags had to be separated into two groups to deal with SMP cache coherency races. Certain flags in the hammer_record structure had to be separated for the same reason. Certain interactions between the frontend and the backend must use the hmp->io_token. It is important to note that for any given buffer there are two locking entities: (1) The hammer structure and (2) The buffer cache buffer. These interactions are very fragile. Do not allow the kernel to flush a dirty buffer if we are unable to obtain a norefs-interlock on the buffer, which fixes numerous frontend/backend MP races on the io structure. Add a write interlock in one of the recover_flush_buffer cases. show more ...
# 9c90dba2	25-Aug-2010	Matthew Dillon <dillon@apollo.backplane.com>	kernel - document and clarify FINDBLK_TEST * Clarify operation of FINDBLK_TEST
# b0aab9b9	24-Aug-2010	Matthew Dillon <dillon@apollo.backplane.com>	HAMMER VFS - Make all entry points MPSAFE, remove giant & critical sections * All VFS, VOP, ioops, and bio_done entry points are now mpsafe and no longer use giant. * Implement hmp->fs_token and HAMMER VFS - Make all entry points MPSAFE, remove giant & critical sections * All VFS, VOP, ioops, and bio_done entry points are now mpsafe and no longer use giant. * Implement hmp->fs_token and hmp->io_token for each HAMMER mount. All operations that previously needed the MP lock now use hmp->fs_token. All operations that interact with BIO callbacks now use hmp->io_token. All critical sections now use io_token (these previously interlocked against IO callbacks). NOTE: read (for cached data) and getattr were MPSAFE before and continue to be MPSAFE. show more ...
123 4 5 6