History log of /dflybsd-src/sys/vm/vm_object.h (Results 26 – 50 of 74)
Revision Date Author Comments
# 7b00fbb4 25-Oct-2013 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Replace global vmobj_token with vmobj_tokens[] array

* Remove one of the two remaining major bottlenecks in the system, the
global vmobj_token which is used to manage access to the vm_obj

kernel - Replace global vmobj_token with vmobj_tokens[] array

* Remove one of the two remaining major bottlenecks in the system, the
global vmobj_token which is used to manage access to the vm_object_list.
All VM object creation and deletion would get thrown into this list.

* Replace it with an array of 64 tokens and an array of 64 lists.
vmobj_token[] and vm_object_lists[]. Use a simple right-shift
hash code to index the array.

* This reduces contention by a factor of 64 or so which makes a big
difference on multi-chip cpu systems. It won't be as noticable on
single-chip (e.g. 4-core/8-thread) systems.

* Rip-out some of the linux vmstats compat functions which were iterating
the object list and replace with the pcpu accumulator scan that was
recently implemented for dragonfly vmstats.

* TODO: proc_token.

show more ...


# 2734d278 24-Oct-2013 Matthew Dillon <dillon@apollo.backplane.com>

kernel - more SMP optimizations in the VM system

* imgact_elf - drop the vm_object a little earlier in load_section(),
and use a shared object lock when iterating ELF segments.

* When starting a

kernel - more SMP optimizations in the VM system

* imgact_elf - drop the vm_object a little earlier in load_section(),
and use a shared object lock when iterating ELF segments.

* When starting a vforked process use a shared process token to
interlock the wait loop instead of an exclusive token. Also don't
bother with the token if there's nothing to wait for.

* When forking, pre-assign lp2 thread's td_ucred.

* Remove the vp->v_object load check loop. It should not be possible
for vp->v_object to change after being assigned as long as the vp
is referenced.

* Replace most OBJ_DEAD tests with assertions that the flag is not set.

* Remove the VOLOCK/VOWANT vnode interlock. It shouldn't be possible
for the vnode's object to change while the vnode is ref'd. This was
a leftover from a long-ago time when vnodes were more persistent and
could be recycled and race accessors.

This also removes vm_object_dead_sleep/wait and related code.

* When memory mapping a vnode object there is no need to formally
hold and chain_wait the object. We can simply add a ref to it,
because vnode objects cannot have backing chains.

* When deallocating a vm_object we can shortcut counts greater than 1
for OBJT_VNODE objects instead of counts greater than 3.

* Optimize vnode_pager_alloc(), avoiding unnecessary locks. Keep the
temporary vnode token for the moment.

* Optimize vnode_pager_reference(), removing all locks from the path.

show more ...


# 501747bf 12-Oct-2013 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Greatly improve concurrent fork's and concurrent exec's

* Rewrite all the vm_fault*() API functions to use a two-stage methodology
which keeps track of whether a shared or exclusive lock

kernel - Greatly improve concurrent fork's and concurrent exec's

* Rewrite all the vm_fault*() API functions to use a two-stage methodology
which keeps track of whether a shared or exclusive lock is being used
on fs.first_object and fs.object. For most VM faults a shared lock is
sufficient, particularly under fork and exec circumstances.

If the shared lock is not sufficient the functions will back-down to an
exclusive lock on either or both elements.

* Implement shared chain locks for use by the above.

* kern_exec - exec_map_page() now attempts to access the page with a
shared lock first, and backs down to an exclusive lock if the page
is not conveniently available.

* vm_object ref-counting now uses atomic ops across the board. The
acquisition call can operate with a shared object lock. The deallocate
call will optimize decrementation of ref_count for values above 3 using
an atomic op without needing any lock at all.

* vm_map_split() and vm_object_collapse() and associated functions are now
smart about handling terminal (e.g. OBJT_VNODE) VM objects and will use
a shared lock when possible.

* When creating new shadow chains in front of a OBJT_VNODE object, we no
longer enter those objects onto the OBJT_VNODE object's shadow_head.
That is, only DEFAULT and SWAP objects need to track who might be shadowing
them. TODO: This code needs to be cleaned up a bit though.

This removes another exclusive object lock from the critical path.

* vm_page_grab() will use a shared object lock when possible.

show more ...


# f2c2051e 29-Jul-2013 Johannes Hofmann <johannes.hofmann@gmx.de>

kernel: Port new device_pager interface from FreeBSD

Some parts implemented by François Tigeot and Matthew Dillon


# 6ed30774 18-Jul-2013 François Tigeot <ftigeot@wolfpond.org>

pat: Make the API more compatible with FreeBSD


# b524ca76 18-Jul-2013 Matthew Dillon <dillon@apollo.backplane.com>

PAT work, mapdev_attr, kmem_alloc_attr

Partially based on work by
Aggelos Economopoulos <aoiko@cc.ece.ntua.gr>


# adddfd62 05-Jul-2013 Sascha Wildner <saw@online.de>

kernel: Remove some #include duplicates in vfs/ and vm/


# b004e484 24-Feb-2013 Sascha Wildner <saw@online.de>

kernel/vm_object: Add debugvm_object_hold_maybe_shared() prototype.


# ce94514e 23-Feb-2013 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Implementat much deeper use of shared VM object locks

* Use a shared VM object lock on terminal (and likely highly shared)
OBJT_VNODE objects. For example, binaries in the system such as

kernel - Implementat much deeper use of shared VM object locks

* Use a shared VM object lock on terminal (and likely highly shared)
OBJT_VNODE objects. For example, binaries in the system such as
/bin/sh or /usr/bin/make.

This greatly improves fork/exec and related VM faults on concurrently
executing binaries. Most commonly, parallel builds often exec
hundreds of thousands of sh's and make's.

+50% to +100% nominal improved performance under these conditions.
+200% to +300% improved poudriere performance during the depend
stage.

* Formalize the shared VM object lock with a new API function,
vm_object_lock_maybe_shared(), which determines whether a VM
object meets the requirements for obtaining a shared lock.

* Adjust the vm_fault*() APIs to track whether the VM object is
locked shared or exclusive on entry.

* Clarify that OBJ_ONEMAPPING is only applicable to OBJT_DEFAULT
and OBJT_SWAP objects.

* Heavy work on the exec path. Somewhat lighter work on the exit
path. Tons more work could be done.

show more ...


# 9a0c03af 06-Aug-2012 François Tigeot <ftigeot@wolfpond.org>

kernel: add VM_OBJECT_LOCK/UNLOCK macros


# 921c891e 13-Sep-2012 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Implement segment pmap optimizations for x86-64

* Implement 2MB segment optimizations for x86-64. Any shared read-only
or read-write VM object mapped into memory, including physical obje

kernel - Implement segment pmap optimizations for x86-64

* Implement 2MB segment optimizations for x86-64. Any shared read-only
or read-write VM object mapped into memory, including physical objects
(so both sysv_shm and mmap), which is a multiple of the segment size
and segment-aligned can be optimized.

* Enable with sysctl machdep.pmap_mmu_optimize=1

Default is off for now. This is an experimental feature.

* It works as follows: A VM object which is large enough will, when VM
faults are generated, store a truncated pmap (PD, PT, and PTEs) in the
VM object itself.

VM faults whos vm_map_entry's can be optimized will cause the PTE, PT,
and also the PD (for now) to be stored in a pmap embedded in the VM_OBJECT,
instead of in the process pmap.

The process pmap then creates PT entry in the PD page table that points
to the PT page table page stored in the VM_OBJECT's pmap.

* This removes nearly all page table overhead from fork()'d processes or
even unrelated process which massively share data via mmap() or sysv_shm.
We still recommend using sysctl kern.ipc.shm_use_phys=1 (which is now
the default), which also removes the PV entries associated with the
shared pmap. However, with this optimization PV entries are no longer
a big issue since they will not be replicated in each process, only in
the common pmap stored in the VM_OBJECT.

* Features of this optimization:

* Number of PV entries is reduced to approximately the number of live
pages and no longer multiplied by the number of processes separately
mapping the shared memory.

* One process faulting in a page naturally makes the PTE available to
all other processes mapping the same shared memory. The other processes
do not have to fault that same page in.

* Page tables survive process exit and restart.

* Once page tables are populated and cached, any new process that maps
the shared memory will take far fewer faults because each fault will
bring in an ENTIRE page table. Postgres w/ 64-clients, VM fault rate
was observed to drop from 1M faults/sec to less than 500 at startup,
and during the run the fault rates dropped from a steady decline into
the hundreds of thousands into an instant decline to virtually zero
VM faults.

* We no longer have to depend on sysv_shm to optimize the MMU.

* CPU caches will do a better job caching page tables since most of
them are now themselves shared. Even when we invltlb, more of the
page tables will be in the L1, L2, and L3 caches.

* EXPERIMENTAL!!!!!

show more ...


# ad23467e 23-May-2012 Sascha Wildner <saw@online.de>

kernel: Remove some bogus casts to the own type.


# a2ee730d 02-Dec-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Refactor the vmspace locking code and several use cases

* Reorder the vnode ref/rele sequence in the exec path so p_textvp is
left in a more valid state while being initialized.

* Removi

kernel - Refactor the vmspace locking code and several use cases

* Reorder the vnode ref/rele sequence in the exec path so p_textvp is
left in a more valid state while being initialized.

* Removing the vm_exitingcnt test in exec_new_vmspace(). Release
various resources unconditionally on the last exiting thread regardless
of the state of exitingcnt. This just moves some of the resource
releases out of the wait*() system call path and back into the exit*()
path.

* Implement a hold/drop mechanic for vmspaces and use them in procfs_rwmem(),
vmspace_anonymous_count(), and vmspace_swap_count(), and various other
places.

This does a better job protecting the vmspace from deletion while various
unrelated third parties might be trying to access it.

* Implement vmspace_free() for other code to call instead of them trying
to call sysref_put() directly. Interlock with a vmspace_hold() so
final termination processing always keys off the vm_holdcount.

* Implement vm_object_allocate_hold() and use it in a few places in order
to allow OBJT_SWAP objects to be allocated atomically, so other third
parties (like the swapcache cleaning code) can't wiggle their way in
and access a partially initialized object.

* Reorder the vmspace_terminate() code and introduce some flags to ensure
that resources are terminated at the proper time and in the proper order.

show more ...


# 609c9aae 20-Nov-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix vm_object token deadlock (2)

* Files missed in original commit.


# c9958a5a 16-Nov-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Move VM objects from pool tokens to per-vm-object tokens

* Move VM objects from pool tokens to per-vm-object tokens.

* This fixes booting issues on i386 with vm.shared_fault=1 (pool
toke

kernel - Move VM objects from pool tokens to per-vm-object tokens

* Move VM objects from pool tokens to per-vm-object tokens.

* This fixes booting issues on i386 with vm.shared_fault=1 (pool
tokens would sometimes coincide with the token used for kernel_object
which causes problems on i386 due to the pmap code's use of
kernel_map/kernel_object).

show more ...


# 54341a3b 15-Nov-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Greatly improve shared memory fault rate concurrency / shared tokens

This commit rolls up a lot of work to improve postgres database operations
and the system in general. With this changes

kernel - Greatly improve shared memory fault rate concurrency / shared tokens

This commit rolls up a lot of work to improve postgres database operations
and the system in general. With this changes we can pgbench -j 8 -c 40 on
our 48-core opteron monster at 140000+ tps, and the shm vm_fault rate
hits 3.1M pps.

* Implement shared tokens. They work as advertised, with some cavets.

It is acceptable to acquire a shared token while you already hold the same
token exclusively, but you will deadlock if you acquire an exclusive token
while you hold the same token shared.

Currently exclusive tokens are not given priority over shared tokens so
starvation is possible under certain circumstances.

* Create a critical code path in vm_fault() using the new shared token
feature to quickly fault-in pages which already exist in the VM cache.
pmap_object_init_pt() also uses the new feature.

This increases fault-in concurrency by a ridiculously huge amount,
particularly on SHM segments (say when you have a large number of postgres
clients). Scaling for large numbers of clients on large numbers of
cores is significantly improved.

This also increases fault-in concurrency for MAP_SHARED file maps.

* Expand the breadn() and cluster_read() APIs. Implement breadnx() and
cluster_readx() which allows a getblk()'d bp to be passed. If *bpp is not
NULL a bp is being passed in, otherwise the routines call getblk().

* Modify the HAMMER read path to use the new API. Instead of calling
getcacheblk() HAMMER now calls getblk() and checks the B_CACHE flag.
This gives getblk() a chance to regenerate a fully cached buffer from
VM backing store without having to acquire any hammer-related locks,
resulting in even faster operation.

* If kern.ipc.shm_use_phys is set to 2 the VM pages will be pre-allocated.
This can take quite a while for a large map and also lock the machine
up for a few seconds. Defaults to off.

* Reorder the smp_invltlb()/cpu_invltlb() combos in a few places, running
cpu_invltlb() last.

* An invalidation interlock might be needed in pmap_enter() under certain
circumstances, enable the code for now.

* vm_object_backing_scan_callback() was failing to properly check the
validity of a vm_object after acquiring its token. Add the required
check + some debugging.

* Make vm_object_set_writeable_dirty() a bit more cache friendly.

* The vmstats sysctl was scanning every process's vm_map (requiring a
vm_map read lock to do so), which can stall for long periods of time
when the system is paging heavily. Change the mechanic to a LWP flag
which can be tested with minimal locking.

* Have the phys_pager mark the page as dirty too, to make sure nothing
tries to free it.

* Remove the spinlock in pmap_prefault_ok(), since we do not delete page
table pages it shouldn't be needed.

* Add a required cpu_ccfence() in pmap_inval.c. The code generated prior
to this fix was still correct, and this makes sure it stays that way.

* Replace several manual wiring cases with calls to vm_page_wire().

show more ...


# e806bedd 27-Oct-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Fix deep recursion in vm_object_collapse()

* vm_object_collapse() will loop but its backing_object sometimes needs
to be deallocated as well and this can trigger another collapse against

kernel - Fix deep recursion in vm_object_collapse()

* vm_object_collapse() will loop but its backing_object sometimes needs
to be deallocated as well and this can trigger another collapse against
a different parent object.

* Introduce vm_object_dealloc_list and friends to collect a list of objects
requiring deallocation so the caller can run the list in a way that avoids
a deep recursion.

Reported-by: juanfra

show more ...


# b12defdc 18-Oct-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes

This is a very large patch which reworks locking in the entire VM subsystem,
concentrated on VM objects and the x86-64 pma

kernel - Major SMP performance patch / VM system, bus-fault/seg-fault fixes

This is a very large patch which reworks locking in the entire VM subsystem,
concentrated on VM objects and the x86-64 pmap code. These fixes remove
nearly all the spin lock contention for non-threaded VM faults and narrows
contention for threaded VM faults to just the threads sharing the pmap.

Multi-socket many-core machines will see a 30-50% improvement in parallel
build performance (tested on a 48-core opteron), depending on how well
the build parallelizes.

As part of this work a long-standing problem on 64-bit systems where programs
would occasionally seg-fault or bus-fault for no reason has been fixed. The
problem was related to races between vm_fault, the vm_object collapse code,
and the vm_map splitting code.

* Most uses of vm_token have been removed. All uses of vm_spin have been
removed. These have been replaced with per-object tokens and per-queue
(vm_page_queues[]) spin locks.

Note in particular that since we still have the page coloring code the
PQ_FREE and PQ_CACHE queues are actually many queues, individually
spin-locked, resulting in very excellent MP page allocation and freeing
performance.

* Reworked vm_page_lookup() and vm_object->rb_memq. All (object,pindex)
lookup operations are now covered by the vm_object hold/drop system,
which utilize pool tokens on vm_objects. Calls now require that the
VM object be held in order to ensure a stable outcome.

Also added vm_page_lookup_busy_wait(), vm_page_lookup_busy_try(),
vm_page_busy_wait(), vm_page_busy_try(), and other API functions
which integrate the PG_BUSY handling.

* Added OBJ_CHAINLOCK. Most vm_object operations are protected by
the vm_object_hold/drop() facility which is token-based. Certain
critical functions which must traverse backing_object chains use
a hard-locking flag and lock almost the entire chain as it is traversed
to prevent races against object deallocation, collapses, and splits.

The last object in the chain (typically a vnode) is NOT locked in
this manner, so concurrent faults which terminate at the same vnode will
still have good performance. This is important e.g. for parallel compiles
which might be running dozens of the same compiler binary concurrently.

* Created a per vm_map token and removed most uses of vmspace_token.

* Removed the mp_lock in sys_execve(). It has not been needed in a while.

* Add kmem_lim_size() which returns approximate available memory (reduced
by available KVM), in megabytes. This is now used to scale up the
slab allocator cache and the pipe buffer caches to reduce unnecessary
global kmem operations.

* Rewrote vm_page_alloc(), various bits in vm/vm_contig.c, the swapcache
scan code, and the pageout scan code. These routines were rewritten
to use the per-queue spin locks.

* Replaced the exponential backoff in the spinlock code with something
a bit less complex and cleaned it up.

* Restructured the IPIQ func/arg1/arg2 array for better cache locality.
Removed the per-queue ip_npoll and replaced it with a per-cpu gd_npoll,
which is used by other cores to determine if they need to issue an
actual hardware IPI or not. This reduces hardware IPI issuance
considerably (and the removal of the decontention code reduced it even
more).

* Temporarily removed the lwkt thread fairq code and disabled a number of
features. These will be worked back in once we track down some of the
remaining performance issues.

Temproarily removed the lwkt thread resequencer for tokens for the same
reason. This might wind up being permanent.

Added splz_check()s in a few critical places.

* Increased the number of pool tokens from 1024 to 4001 and went to a
prime-number mod algorithm to reduce overlaps.

* Removed the token decontention code. This was a bit of an eyesore and
while it did its job when we had global locks it just gets in the way now
that most of the global locks are gone.

Replaced the decontention code with a fall back which acquires the
tokens in sorted order, to guarantee that deadlocks will always be
resolved eventually in the scheduler.

* Introduced a simplified spin-for-a-little-while function
_lwkt_trytoken_spin() that the token code now uses rather than giving
up immediately.

* The vfs_bio subsystem no longer uses vm_token and now uses the
vm_object_hold/drop API for buffer cache operations, resulting
in very good concurrency.

* Gave the vnode its own spinlock instead of sharing vp->v_lock.lk_spinlock,
which fixes a deadlock.

* Adjusted all platform pamp.c's to handle the new main kernel APIs. The
i386 pmap.c is still a bit out of date but should be compatible.

* Completely rewrote very large chunks of the x86-64 pmap.c code. The
critical path no longer needs pmap_spin but pmap_spin itself is still
used heavily, particularin the pv_entry handling code.

A per-pmap token and per-pmap object are now used to serialize pmamp
access and vm_page lookup operations when needed.

The x86-64 pmap.c code now uses only vm_page->crit_count instead of
both crit_count and hold_count, which fixes races against other parts of
the kernel uses vm_page_hold().

_pmap_allocpte() mechanics have been completely rewritten to remove
potential races. Much of pmap_enter() and pmap_enter_quick() has also
been rewritten.

Many other changes.

* The following subsystems (and probably more) no longer use the vm_token
or vmobj_token in critical paths:

x The swap_pager now uses the vm_object_hold/drop API instead of vm_token.

x mmap() and vm_map/vm_mmap in general now use the vm_object_hold/drop API
instead of vm_token.

x vnode_pager

x zalloc

x vm_page handling

x vfs_bio

x umtx system calls

x vm_fault and friends

* Minor fixes to fill_kinfo_proc() to deal with process scan panics (ps)
revealed by recent global lock removals.

* lockmgr() locks no longer support LK_NOSPINWAIT. Spin locks are
unconditionally acquired.

* Replaced netif/e1000's spinlocks with lockmgr locks. The spinlocks
were not appropriate owing to the large context they were covering.

* Misc atomic ops added

show more ...


# a31129d8 14-Jul-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Add debugging and attempt to fix vm_prefault issue

* Add debugging assertions and attempt to fix a race in the vm_prefault
code when running through backing_object chains.

* The fix may

kernel - Add debugging and attempt to fix vm_prefault issue

* Add debugging assertions and attempt to fix a race in the vm_prefault
code when running through backing_object chains.

* The fix may be incomplete, we really need a way to determine whether any
chain element has changed state during the scan. The generation count
may be too excessive as it also covers vm_page insertions.

Reported-by: Peter Avalos <peter@theshell.com>

show more ...


# 00db03f1 15-Jun-2011 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Adjust vm_object->paging_in_progress to use refcount API

* Adjust vm_object->paging_in_progress to use refcount API

* Fixes races related to release / wait which could stall a process.


# 18a4c8dc 14-Jun-2011 Venkatesh Srinivas <me@endeavour.zapto.org>

kernel -- vm_object DEBUG_LOCKS: Record file/line of vm_object holds.


# e42208e6 07-Jun-2011 Sascha Wildner <saw@online.de>

<vm/vm_object.h>: Some little style cleanup.


# b4460ab3 07-Jun-2011 Venkatesh Srinivas <me@endeavour.zapto.org>

kernel -- vm_object locking: Interlock vm_object work in vm_fault.c
and vm_map.c with per-object token. Handle NULL objects for _hold and _drop.


# feea37dc 29-Mar-2011 Venkatesh Srinivas <me@endeavour.zapto.org>

kernel -- vm_object hold debugging should not panic if the debug array overflows

If the debug array overflows, we lose the ability to test for object drops
when we never established a hold. However

kernel -- vm_object hold debugging should not panic if the debug array overflows

If the debug array overflows, we lose the ability to test for object drops
when we never established a hold. However the system keeps running.

Suggested-by: dillon

show more ...


# cb443cbb 27-Mar-2011 Venkatesh Srinivas <me@endeavour.zapto.org>

kernel -- vm_object locking: DEBUG_LOCKS check for hold_wait vs hold deadlock

If a thread has a hold on a vm_object and enters hold_wait (via either
vm_object_terminate or vm_object_collapse), it wi

kernel -- vm_object locking: DEBUG_LOCKS check for hold_wait vs hold deadlock

If a thread has a hold on a vm_object and enters hold_wait (via either
vm_object_terminate or vm_object_collapse), it will wait forever for the hold
count to hit 0. Record the threads holding an object in a per-object array.

show more ...


123