History log of /dflybsd-src/sys/vfs/hammer/hammer_vfsops.c (Results 126 – 150 of 179)
Revision Date Author Comments
# c82af904 26-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 59A/Many: Mirroring related work (and one bug fix).

* BUG FIX: Fix a bug in directory hashkey generation. The iterator could
sometimes conflict with a key already on-disk and interfere wit

HAMMER 59A/Many: Mirroring related work (and one bug fix).

* BUG FIX: Fix a bug in directory hashkey generation. The iterator could
sometimes conflict with a key already on-disk and interfere with a pending
deletion. The chance of this occuring was miniscule but not 0. Now fixed.

The fix also revamps the directory iterator code, moving it all to one
place and removing it from two other places.

* PRUNING CHANGE: The pruning code no longer shifts the create_tid and
delete_tid of adjacent records to fill gaps. This means that historical
queries must either use snapshot softlinks or use a fine-grained
transaction id greater then the most recent snapshot softlink.

fine-grained historical access still works up to the first snapshot
softlink.

* Clean up the cursor code responsible for acquiring the parent node.

* Add the core mirror ioctl read/write infrastructure. This work is still
in progress.

- ioctl commands
- pseudofs enhancements, including st_dev munging.
- mount options
- transaction id and object id conflictless allocation
- initial mirror_tid recursion up the B-Tree (not finished)
- B-Tree mirror scan optimizations to skip sub-hierarchies that do not
need to be scanned (requires mirror_tid recursion to be 100% working).

show more ...


# dd94f1b1 24-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 58B/Many: Revamp ioctls, add non-monotonic timestamps, mirroring

* Revamp most of HAMMER's ioctl structures with an eye towards future
enhancements.

* Adjust on-media structures to include

HAMMER 58B/Many: Revamp ioctls, add non-monotonic timestamps, mirroring

* Revamp most of HAMMER's ioctl structures with an eye towards future
enhancements.

* Adjust on-media structures to include a non-monotonic creation and
deletion timestamps. Since the transaction id no longer translates
to a timestamp adding explicit timestamps allows the 'hammer history'
and 'undo' utilities to still display timestamps for the change history.

* Start working on the mirroring support ioctls.

show more ...


# 5de0c0e5 23-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 58A/Many: Mirroring support part 1

* Implement mastership domains for transaction ids to support multi-master
mirroring and implement mastership selection at mount time.

Mastership domai

HAMMER 58A/Many: Mirroring support part 1

* Implement mastership domains for transaction ids to support multi-master
mirroring and implement mastership selection at mount time.

Mastership domains work by having the low 4 bits of the transaction id
specify the mastership id (0-15). This allows the mirroring code to
distinguish between changes originating on a particular node and changes
mirrored from another node.

This also ensures that filesystem objects can be created on the mirrors
in parallel without resulting in conflicitng object ids.

* Eliminate time-based TID generation. Just increment the TID as
appropriate.

NOTE: Portions of this change may be reverted at a later time depending
on how the mirroring implementation proceeds.

* Minor code cleanups.

show more ...


# ddfdf542 20-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 56D/Many: Media structure finalization, atime/mtime, etc.

* Move the atime and mtime fields to the end of struct hammer_inode_data.
CRCs on hammer_inode_data no longer include atime or mtim

HAMMER 56D/Many: Media structure finalization, atime/mtime, etc.

* Move the atime and mtime fields to the end of struct hammer_inode_data.
CRCs on hammer_inode_data no longer include atime or mtime, allowing
them to be updated without having to update the related B-Tree node.

* Change the time format for ctime, atime, and mtime. These 64 bit fields
now store microseconds in real time instead of transaction ids.

* atime is now updated asynchronously, and mtime is now updated with
UNDO records only. Split the ITIMES flag into ATIME and MTIME and
no longer set the DDIRTY (inode_data generally dirty) flag when the
mtime changes.

* Finish on-media structural components for pseudo-fs support inside a
HAMMER filesystem.

* Finish on-media structural components for adding a serial number to
the B-Tree element structures, for mirroring support.

* Make fsync() wait for the flush to complete, issue extra flushes as
needed to take the UNDO FIFO's start position past the fsync'd data
so a crash does not undo it.

show more ...


# 4a2796f3 20-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 56C/Many: Performance tuning - MEDIA STRUCTURES CHANGED!

* Implement variable block sizes. Records at offsets < 1MB use 16K buffers
while records at offsets >= 1MB use 64K buffers. This i

HAMMER 56C/Many: Performance tuning - MEDIA STRUCTURES CHANGED!

* Implement variable block sizes. Records at offsets < 1MB use 16K buffers
while records at offsets >= 1MB use 64K buffers. This improves performance
primarily by reducing the number of B-Tree elements we have to stuff.

* Mess around with the deadlock handling code a bit. It still needs
a re-think but it works. Implement low-priority shared locks.
A low priority shared lock can only be acquired if no other locks are
held by the thread.

* Implement slow-down code for the record backlog to the flusher and
reimplement the slow-down code that deals with reclaimed inodes queued
to the flusher. This should hopefully fix the kernel memory exhaustion
issues for M_HAMMER.

* Update layer2->append_off. It isn't implemented yet but doing this now
will prevent media incompatibilities later on when it does get implemented.

* Split hammer_blockmap_free() into hammer_blockmap_free() and
hammer_blockmap_finalize().

* Fix a bug in the delayed-CRC handling related to reblocking. When
throwing away a modified buffer, pending CRC requests must also be
thrown away.

* Fix a bug in the record overlap compare code. If we cannot return 0
due to an overlap because the record has been deleted, we must still
return an appropriate formal code so the scan progresses in the
correct direction down the red-black tree.

* Make data in the meta-data zone a meta-data buffer structure type so
it gets synced to disk at the appropriate time. This may be temporary,
it's needed to deal with the atime/mtime code but another commit may
soon make it moot.

* Bump the seqcount so cluster_read() does the right thing when reading
into a large UIO just after opening a file.

* Do a better job calculating vap->va_bytes. It's still fake, but its a
more accurate fake.

* Fix an issue in the BMAP code related to ranges that do not cover the
requested logical offset.

* Fix a bug in the blockmap code. If a reservation is released without
finalizing any allocations within that big-block, another zone can steal
it out from under the current zone's next_offset, resulting in a zone
mismatch panic.

show more ...


# bcac4bbb 18-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 56B/Many: Performance tuning - MEDIA STRUCTURES CHANGED!

* MEDIA CHANGE: The atime has been moved back into the inode data proper.
The nlinks field has also been moved.

* PERFORMANCE: The

HAMMER 56B/Many: Performance tuning - MEDIA STRUCTURES CHANGED!

* MEDIA CHANGE: The atime has been moved back into the inode data proper.
The nlinks field has also been moved.

* PERFORMANCE: The CRC for cached B-Tree nodes was being run on every access
instead of just the first time. This was the cause of HAMMER's poor
directory scanning performance and cpu-intensive write flushes.

Adjusted to only check the CRC on the initial load into the buffer cache.

* PERFORMANCE: The CRC for modified B-Tree nodes was being regenerated every
time the node was modified, so a large number of insertions or deletions
modifying the same B-Tree need needlessly regenerated the CRC each time.

Adjusted to delay generation of the CRC until just before the buffer is
flushed to the physical media.

Just for the record, B-Tree nodes are 4K and it takes ~25uS to run a CRC
on them. Needless to say removing the unnecessary calls solved a lot of
performance issues.

* PERFORMANCE: Removed limitations in the node caching algorithms. Now more
then one inode can cache pointers to the same B-Tree node.

* PERFORMANCE: When calculating the parent B-Tree node we have to scan the
element array to locate the index that points back to the child. Use a
power-of-2 algorithm instead of a linear scan.

* PERFORMANCE: Clean up the selection of ip->cache[0] or ip->cache[1] based
on whether we are trying to cache the location of the inode or the
location of the file object's data.

show more ...


# cb51be26 17-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 56A/Many: Performance tuning - MEDIA STRUCTURES CHANGED!

* MEDIA CHANGE: The blockmaps have been folded into the freemap. Allocations
are now made directly out of the freemap. More work i

HAMMER 56A/Many: Performance tuning - MEDIA STRUCTURES CHANGED!

* MEDIA CHANGE: The blockmaps have been folded into the freemap. Allocations
are now made directly out of the freemap. More work is expected here.

The blockmaps are still used to sequence allocations, but no block
number translation is required any more. This didn't improve performance
much but it will make it easier for future optimizations to localize
allocations.

* PERFORMANCE: Removed the holes recording code. Another commit will
soon take over the functionality.

* PERFORMANCE: The flusher's slave threads now collect a number of inodes
into a batch before starting their work, in an attempt to reduce
deadlocks between slave threads from adjacent inodes.

* PERFORMANCE: B-Tree positional caching now works much better, greatly
reducing the cpu overhead when accessing the filesystem.

* PERFORMANCE: Added a write-append optimization. Do not do a
lookup/iteration to locate records being overwritten when no such
records should exist. This cuts the cpu overhead of write-append
flushes in half.

* PERFORMANCE: Add a vfs.hammer.write_mode sysctl feature to test out
two different ways of queueing write I/O's.

* Added B-Tree statistics (hammer bstats 1).

show more ...


# 7bc5b8c2 13-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 54D/Many: Performance tuning.

* Remove major barriers to write performance and fix hicups revealed by
blogbench.

Change the HAMMER reclaim-delay algorithm to operate like a FIFO instead

HAMMER 54D/Many: Performance tuning.

* Remove major barriers to write performance and fix hicups revealed by
blogbench.

Change the HAMMER reclaim-delay algorithm to operate like a FIFO instead
of as a free-for-all. The idea of introducing a dynamic delay helped some,
but the addition of the wakeup FIFO allows burst completions by the flusher
to immediately wakeup processes that were waiting for the reclaim count to
drain. The result is far, far smoother operation.

* Remove a major blocking conflict between the buffer cache daemon and
HAMMER. The buffer cache was getting stuck on trying to overwrite dirty
records that had already been queued to the flusher. The flusher might
not act on the record(s) for a long period of time, causing the buffer
cache daemon to stall.

Fix the problem by properly using the HAMMER_RECF_INTERLOCK_BE flag,
which stays on only for a very short period of time, instead of testing
the record's flush state (record->flush_state), which can stay in
the HAMMER_FST_FLUSH state for a very long time.

* The parent B-Tree node does not need to be locked when inserting
into the child.

* Use the new B_AGE semantics to keep meta-data intact longer. This results
in another big improvement in random read and write performance.

show more ...


# d99d6bf5 12-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 54B/Many: Performance tuning.

* Fix a major performance issue in the UNDO generation code. The code
was calling hammer_bread() instead of hammer_bnew() for 'new' undo buffers,
meaning it

HAMMER 54B/Many: Performance tuning.

* Fix a major performance issue in the UNDO generation code. The code
was calling hammer_bread() instead of hammer_bnew() for 'new' undo buffers,
meaning it was doing a read-modify-write on the disk instead of just a
write.

This fix results in a MAJOR improvement in performance across the board.

* Replace the only lockmgr lock in the module with a hammer_lock.

* Tweak hammer_inode_waitreclaims(). This will probably need even more
tweaking as time passes.

show more ...


# a99b9ea2 11-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 54/Many: Performance tuning

* Implement hammer_vop_bmap().

* Implement cluster_read() support. This should jump up linear read
performance almost to the platter speed. I get 100 MB/sec t

HAMMER 54/Many: Performance tuning

* Implement hammer_vop_bmap().

* Implement cluster_read() support. This should jump up linear read
performance almost to the platter speed. I get 100 MB/sec testing
vs 35 MB/sec previously.

* Do a better job kicking an inode into the flusher when writing
sequentially. This hops up write rate at least +50%. It isn't
quite able to run at the platter speed due to B-Tree overheads
which will be addressed in a later patch.

* Do not create data fragments at the ends of files greater then 16K, use
a full 16K block. The reason is that fragments in HAMMER are allocated
out of a wholely different zone and we do not want to lose the chance of
making the tail end of the file contiguous.

Files less then 16K still use data fragments.

* Fix a machine lockup related to an interrupt race with biodone() and
insertions and deltions from hmp->lose_list.

* Fix a memory exhaustion issue.

Reported-by: Francois Tigeot <ftigeot@wolfpond.org> (machine lockup)
Credit-also: Jonathan Stuart on the 0 byte sized file bug fix.

show more ...


# af209b0f 10-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 53H/Many: Performance tuning, bug fixes

* CHANGE THE ON-MEDIA B-TREE STRUCTURE. The number of elements per node has
been increased from 16 to 64. The intent is to reduce the number of see

HAMMER 53H/Many: Performance tuning, bug fixes

* CHANGE THE ON-MEDIA B-TREE STRUCTURE. The number of elements per node has
been increased from 16 to 64. The intent is to reduce the number of seeks
required in a heavy random-access loading situation.

* Add a shortcut to the B-Tree node scanning code (requires more testing).
Instead of scanning linearly we do a power-of-2 narrowing search.

* Only do clustered reads for DATA types. Do not cluster meta-data (aka
B-Tree) I/O. Note that the inode data structure is considered to be
a DATA type. Reduce the cluster read side from 256K to 64K to avoid
blowing out the buffer cache.

* Augment hammer locks so one can discern between a normal lock blockage
and one that is recovering from a deadlock.

* Change the slave work threads for the flusher to pull their work off a
single queue. This fixes an issue where one slave work thread would
sometimes get a disproportionate percentage of the work and the
master thread then had to wait for it to finish while the other work
threads were twiddling their thumbs.

* Adjust the wait reclaims code to solve a long standing performance issue.
The flusher could get so far behind that the system's buffer cache buffers
would no longer have any locality of reference to what was being flushed,
causing a massive drop in performance.

* Do not queue a dirty inode to the flusher unconditionally in the strategy
write code. Only do it if system resources appear to be stressed.
The inode will get flushed when the filesystem syncs.

* Code cleanup.

* Fix a bug reported by Antonio Huete Jimenez related to 0-length writes
not working properly.

show more ...


# 3897d7e9 10-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 53E/Many: Performance tuning

* Change the code which waits for reclaims to drain to be more inline with
the new bwillwrite(). Impose a dynamic delay instead of blocking outright.

* Move t

HAMMER 53E/Many: Performance tuning

* Change the code which waits for reclaims to drain to be more inline with
the new bwillwrite(). Impose a dynamic delay instead of blocking outright.

* Move the hammer_inode_waitreclaims() call from hammer_vop_open() to
hammer_get_inode(), and only call it when we would otherwise have to
create a new inode.

* Sort HAMMER's file list in conf/files.

show more ...


# cebe9493 10-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 53D/Many: Stabilization

* Fix an overwrite bug with direct write which could result in file
corruption.

* Reserve just-freed big blocks for two flush cycles to prevent HAMMER from
overwr

HAMMER 53D/Many: Stabilization

* Fix an overwrite bug with direct write which could result in file
corruption.

* Reserve just-freed big blocks for two flush cycles to prevent HAMMER from
overwriting destroyed data so it does not become corrupt if the system
crashes. This is needed because the recover code does not record UNDOs
for data (nor do we want it to).

* More I/O subsystem work. There may still be an ellusive panic related
to calls to regetblk().

show more ...


# 9f5097dc 09-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 53C/Many: Stabilization

* HAMMER queues dirty inodes reclaimed by the kernel to the backend for
their final sync. Programs like blogbench can overload the backend and
generate more new i

HAMMER 53C/Many: Stabilization

* HAMMER queues dirty inodes reclaimed by the kernel to the backend for
their final sync. Programs like blogbench can overload the backend and
generate more new inodes then the backend can dispose of, running
M_HAMMER out of memory.

Add code to stall on vop_open() when this condition is detected to
give the backend a chance to catch-up. (see NOTE 1 below).

* HAMMER could build up too many meta-data buffers and cause the system
to deadlock in newbuf. Recode the flusher to allow a block of UNDOs,
the volume header, and all related meta-data buffers to be flushed
piecemeal, and then continue the flush loop without closing out the
transaction. If a crash occurs the recovery code will undo the partial
flushes.

* Fix an issue located by FSX under load. The in-memory/on-disk record
merging code was not dealing with in-memory data records properly
The key field for data records is (base_offset + data_len), not just
(base_off), so a 'match' between an in-memory data record and an on-disk
data records requires a special case test. This is the case where the
in-memory record is intended to overwrite the on-disk record, so the
in-memory record must be chosen and the on-disk record discarded for
the purposes of read().

* Fix a bug in hammer_io.c related to the handling of B_LOCKED buffers
that resulted in an assertion at umount time. Buffer cache buffers
were not being properly disassociated from their hammer_buffer countparts
in the direct-write case.

* The frontend's direct-write capability for truncated buffers (such as
used with small files) was causing an assertion to occur on the backend.
Add an interlock on the related hammer_buffer to prevent the frontend
from attempting to modify the buffer while the backend is trying to
write it to the media.

* Dynamically size the dirty buffer limit. This still needs some work.

(NOTE 1): On read/write performance issues. Currently HAMMER's frontend
VOPs are massively disassociated from modifying B-Tree updates. Even though
a direct-write capability now exists, it applies only to bulk data writes
to disk and NOT to B-Tree updates. Each direct write creates a record
which must be queued to the backend to do the B-Tree update on the
media. The flusher is currently single-threaded and when HAMMER gets
too far behind doing these updates the current safeties will cause
performance to degrade drastically. This is a known issue that
will be addressed.

show more ...


# 0832c9bb 08-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 53B/Many: Complete overhaul of strategy code, reservations, etc

* Completely overhaul the strategy code. Implement direct reads and writes
for all cases. REMOVE THE BACKEND BIO QUEUE. BI

HAMMER 53B/Many: Complete overhaul of strategy code, reservations, etc

* Completely overhaul the strategy code. Implement direct reads and writes
for all cases. REMOVE THE BACKEND BIO QUEUE. BIOs are no longer queued
to the flusher under any circumstances.

Remove numerous hacks that were previously emplaced to deal with BIO's
being queued to the flusher.

* Add a mechanism to invalidate buffer cache buffers that might be shadowed
by direct I/O. e.g. if a strategy write uses the vnode's bio directly
there may be a shadow hammer_buffer that will then become stale and must
be invalidated.

* Implement a reservation tracking structure (hammer_reserve) to track
storage reservations made by the frontend. The backend will not attempt
to free or reuse reserved space if it encounters it.

Use reservations to back cached holes (struct hammer_hole) for the
same reason.

* Index hammer_buffer on the zone-X offset instead of the zone-2 offset.
Base the RB tree in the hammer_mount instead of (zone-2) hammer_volume.
This removes nearly all blockmap lookup operations from the critical path.

* Do a much better job tracking cached dirty data for the purposes of
calculating whether the filesystem will become full or not.

* Fix a critical bug in the CRC generation of short data buffers.

* Fix a VM deadlock.

* Use 16-byte alignment for all on-disk data instead of 8-byte alignment.

* Major code cleanup.

As-of this commit write performance is now extremely good.

show more ...


# 47637bff 07-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 53A/Many: Read and write performance enhancements, etc.

* Add hammer_io_direct_read(). For full-block reads this code allows
a high-level frontend buffer cache buffer associated with the

HAMMER 53A/Many: Read and write performance enhancements, etc.

* Add hammer_io_direct_read(). For full-block reads this code allows
a high-level frontend buffer cache buffer associated with the
regular file vnode to directly access the underlying storage,
instead of loading that storage via a hammer_buffer and bcopy()ing it.

* Add a write bypass, allowing the frontend to bypass the flusher and
write full-blocks directly to the underlying storage, greatly improving
frontend write performance. Caveat: See note at bottom.

The write bypass is implemented by adding a feature whereby the frontend
can soft-reserve unused disk space on the physical media without having
to interact (much) with on-disk meta-data structures. This allows the
frontend to flush high-level buffer cache buffers directly to disk
and release the buffer for reuse by the system, resulting in very high
write performance.

To properly associate the reserved space with the filesystem so it can be
accessed in later reads, an in-memory hammer_record is created referencing
it. This record is queued to the backend flusher for final disposition.
The backend disposes of the record by inserting the appropriate B-Tree
element and marking the storage as allocated. At that point the storage
becomes official.

* Clean up numerous procedures to support the above new features. In
particular, do a major cleanup of the cached truncation offset code
(this is the code which allows HAMMER to implement wholely asynchronous
truncate()/ftruncate() support.

Also clean up the flusher triggering code, removing numerous hacks that
had been in place to deal with the lack of a direct-write mechanism.

* Start working on statistics gathering to track record and B-Tree
operations.

* CAVEAT: The backend flusher creates a significant cpu burden when flushing
a large number of in-memory data records. Even though the data itself
has already been written to disk, there is currently a great deal of
overhead involved in manipulating the B-Tree to insert the new records.
Overall write performance will only be modestly improved until these
code paths are optimized.

show more ...


# 51c35492 03-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 52/Many: Read-only mounts and mount upgrades/downgrades.

* Finish implementing MNT_UPDATE, allowing a HAMMER mount to be upgraded
or downgraded.

* Adjust the recovery code to not flush buf

HAMMER 52/Many: Read-only mounts and mount upgrades/downgrades.

* Finish implementing MNT_UPDATE, allowing a HAMMER mount to be upgraded
or downgraded.

* Adjust the recovery code to not flush buffers dirtied by recovery
operations (running the UNDOs) when the mount is read-only. The
buffers will be flushed when the mount is updated to read-write.

* Improve recovery performance by not flushing dirty buffers until the
end (if a read-write mount).

* A crash which occurs during recovery might cause the next recovery to
fail. Delay writing out the recovered volume header until all the other
buffers have been written out to fix the problem.

show more ...


# e63644f0 02-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 51/Many: Filesystem full casework, nohistory flag.

* Track the amount of unsynced information and return ENOSPC if the
filesystem would become full. The idea here is to detect that the
f

HAMMER 51/Many: Filesystem full casework, nohistory flag.

* Track the amount of unsynced information and return ENOSPC if the
filesystem would become full. The idea here is to detect that the
filesystem is full and yet still give the flusher enough runway to
flush cached dirty data and inodes.

* Implement the NOHISTORY flag. Implement inheritance of NOHISTORY and
NODUMP.

The NOHISTORY flag tells HAMMER not to retain historical information on
a filesystem object. If set on a directory any objects created in that
directory will also inherit the flag. For example, it could be set
on /usr/obj.

show more ...


# 6f97fce3 01-Jun-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 50/Many: VFS_STATVFS() support, stabilization.

* Add support for VFS_STATVFS(), returning 64 bit quantities for available
space, etc.

* When freeing a big-block any holes cached for that b

HAMMER 50/Many: VFS_STATVFS() support, stabilization.

* Add support for VFS_STATVFS(), returning 64 bit quantities for available
space, etc.

* When freeing a big-block any holes cached for that block must be
cleaned out.

* Fix a conditional testing whether a layer2 big-block must be allocated in
layer1. The bug could only occur if a layer2 big-block gets freed in
layer1, and we currently never do this.

* Clean-up comments related to freeing blocks.

show more ...


# 2f85fa4d 18-May-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 46/Many: Performance pass, media changes, bug fixes.

* Add a localization field to the B-Tree element which has sorting priority
over the object id.

Use the localization field to separat

HAMMER 46/Many: Performance pass, media changes, bug fixes.

* Add a localization field to the B-Tree element which has sorting priority
over the object id.

Use the localization field to separate inode entries from file data. This
allows the reblocker to cluster inode information together and greatly
improves directory/stat performance.

* Enhance the reblocker to reblock internal B-Tree nodes as well as leaves.

* Enhance the reblocker by adding 'reblock-inodes' in addition to
'reblock-data' and 'reblock-btree', allowing individual types of
meta-data to be independantly reblocked.

* Fix a bug in hammer_bread(). The buffer's zoneX_offset field was
sometimes not being properly masked, resulting in unnecessary blockmap
lookups. Also add hammer_clrxlate_buffer() to clear the translation
cache for a hammer_buffer.

* Fix numerous issues with hmp->sync_lock.

* Fix a buffer exhaustion issue in the pruner and reblocker due to not
counting I/O's in progress as being dirty.

* Enhance the symlink implementation. Take advantage of the extra 24 bytes
of space in the inode data to directly store symlinks <= 24 bytes.

* Use cluster_read() to gang read I/O's into 64KB chunks. Rely on
localization and the reblocker and pruner to make doing the larger
I/O's worthwhile.

These changes reduce ls -lR overhead on 43383 files (half created with cpdup,
half totally randomly created with blogbench). Overhead went from 35 seconds
after reblocking, before the changes, to 5 seconds after reblocking,
after the changes.

show more ...


# 77062c8a 06-May-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 41B/Many: Cleanup.

* Disable (most) debugging kprintfs unless a hammer debug sysctl is set.

* Do not allow buffers to be synced on panic.


# c9b9e29d 04-May-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 40F/Many: UNDO cleanup & stabilization.

* Properly classify UNDO zone buffers so they are flushed at the correct
point in time.

* Minor rewrite of the code tracking the UNDO demark for the

HAMMER 40F/Many: UNDO cleanup & stabilization.

* Properly classify UNDO zone buffers so they are flushed at the correct
point in time.

* Minor rewrite of the code tracking the UNDO demark for the next flush.

* Introduce a considerably better backend flushing activation algorithm
to avoid single-buffer flushes.

* Put a lock around the freemap allocator.

show more ...


# e8599db1 03-May-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 40F/Many: Inode/link-count sequencer cleanup pass, UNDO cache.

* Implement an UNDO cache. If we have already laid down an UNDO in the
current flush cycle we do not have to lay down another

HAMMER 40F/Many: Inode/link-count sequencer cleanup pass, UNDO cache.

* Implement an UNDO cache. If we have already laid down an UNDO in the
current flush cycle we do not have to lay down another one for the same
address. This greatly reduces the number of UNDOs we generate during
a flush.

* Properly get the vnode in order to be able to issue vfsync()'s from the
backend. We may also have to acquire the vnode when doing an unload
check for a file deletion.

* Properly generate UNDO records for the volume header. During crash recovery
we have to UNDO the volume header along with any partially written
meta-data, because the volume header refers to the meta-data.

* Add another record type, GENERAL, representing inode or softlink records.

* Move the setting of HAMMER_INODE_WRITE_ALT to the backend, allowing
the kernel to flush buffers up to the point where the backend syncs
the inode.

show more ...


# 1f07f686 02-May-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 40A/Many: Inode/link-count sequencer.

* Remove the hammer_depend structure and build the dependancies directly
into the hammer_record structure.

* Attempt to implement layout rules to ensu

HAMMER 40A/Many: Inode/link-count sequencer.

* Remove the hammer_depend structure and build the dependancies directly
into the hammer_record structure.

* Attempt to implement layout rules to ensure connectivity is maintained.
This means, for example, that before HAMMER can flush a newly created
file it will make sure the file has namespace connectivity to the
directory it was created it, recursively to the root.

NOTE: 40A destabilizes the filesystem a bit, it's going to take a few
passes to get everything working properly. There are numerous issues
with this commit.

show more ...


# 0729c8c8 29-Apr-2008 Matthew Dillon <dillon@dragonflybsd.org>

HAMMER 39/Many: Parallel operations optimizations

* Implement a per-direct cache of new object IDs. Up to 128 directories
will be managed in LRU fashion. The cached provides a pool of object
I

HAMMER 39/Many: Parallel operations optimizations

* Implement a per-direct cache of new object IDs. Up to 128 directories
will be managed in LRU fashion. The cached provides a pool of object
IDs to better localize the object ids of files created in a directory,
so parallel operations on the filesystem do not create a fragmented
object id space.

* Cache numerous fields in the root volume's header to avoid creating
undo records for them, creatly improving

(ultimately we can sync an undo space representing the volume header
using a direct comparison mechanic but for now we assume the write of
the volume header to be atomic).

* Implement a zone limit for the blockmap which newfs_hammer can install.
The blockmap zones have an ultimate limit of 2^60 bytes, or around
one million terrabytes. If you create a 100G filesystem there is no
reason to let the blockmap iterate over its entire range as that would
result in a lot of fragmentation and blockmap overhead. By default
newfs_hammer sets the zone limit to 100x the size of the filesystem.

* Fix a bug in the crash recovery code. Do not sync newly added inodes
once the flusher is running, otherwise the volume header can get out
of sync. Just create a dummy marker structure and move it to the tail
of the inode flush_list when the flush starts, and stop when we hit it.

* Adjust hammer_vfs_sync() to sync twice. The second sync is needed to
update the volume header's undo fifo indices, otherwise HAMMER will
believe that it must undo the last fully synchronized flush.

show more ...


12345678