| #
297fb598 |
| 27-Feb-2023 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Reduce spin-lock contention issues in page allocator
* The primary source for cpu-localized exhaustion of VM page queues is bio_page_alloc(). Mostly because the related pages tend to be
kernel - Reduce spin-lock contention issues in page allocator
* The primary source for cpu-localized exhaustion of VM page queues is bio_page_alloc(). Mostly because the related pages tend to be very long-lived. When this occurs, multiple cpus wind up being funneled to alternative page queues and wind up with lock contention.
* As a first-order solution, use cpuid + ticks to rotate through cpu-localized page queues when allocating BIO pages.
Note that this is not really NUMA friendly, but the kernel has a hard time determining which BIO pages might be useful when NUMA-localized and which might not be. We might need to make adjustments in the future to retain some localization.
* Significantly reduces vm_page_alloc() contention on heavily loaded systems.
show more ...
|
|
Revision tags: v6.4.0, v6.4.0rc1, v6.5.0, v6.2.2, v6.2.1, v6.3.0, v6.0.1 |
|
| #
712b6620 |
| 21-May-2021 |
Aaron LI <aly@aaronly.me> |
vm: Change 'kernel_object' global to pointer type
Following the previous commits, this commit changes the 'kernel_object' to pointer type of 'struct vm_object *'. This makes it align better with 'k
vm: Change 'kernel_object' global to pointer type
Following the previous commits, this commit changes the 'kernel_object' to pointer type of 'struct vm_object *'. This makes it align better with 'kernel_map' and simplifies the code a bit.
No functional changes.
show more ...
|
| #
5936d3e8 |
| 20-May-2021 |
Aaron LI <aly@aaronly.me> |
vm: Change {buffer,clean,pager}_map globals to pointer type
Similar to the previous commit that changes global 'kernel_map' to type of 'struct vm_map *', change related globals 'buffer_map', 'clean_
vm: Change {buffer,clean,pager}_map globals to pointer type
Similar to the previous commit that changes global 'kernel_map' to type of 'struct vm_map *', change related globals 'buffer_map', 'clean_map' and 'pager_map' to pointer type, i.e., 'struct vm_map *'.
No functional changes.
show more ...
|
| #
1eeaf6b2 |
| 20-May-2021 |
Aaron LI <aly@aaronly.me> |
vm: Change 'kernel_map' global to type of 'struct vm_map *'
Change the global variable 'kernel_map' from type 'struct vm_map' to a pointer to this struct. This simplify the code a bit since all inv
vm: Change 'kernel_map' global to type of 'struct vm_map *'
Change the global variable 'kernel_map' from type 'struct vm_map' to a pointer to this struct. This simplify the code a bit since all invocations take its address. This change also aligns with NetBSD's 'kernal_map' that it's also a pointer, which also helps the porting of NVMM.
No functional changes.
show more ...
|
|
Revision tags: v6.0.0, v6.0.0rc1, v6.1.0 |
|
| #
36abb8ba |
| 29-Mar-2021 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Use kmalloc_obj for M_VM_OBJECT
* Use the kmalloc_obj API for struct vm_object management. Further reduces kernel memory fragmentation.
|
| #
4d4f84f5 |
| 07-Jan-2021 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Remove MAP_VPAGETABLE
* This will break vkernel support for now, but after a lot of mulling there's just no other way forward. MAP_VPAGETABLE was basically a software page-table featur
kernel - Remove MAP_VPAGETABLE
* This will break vkernel support for now, but after a lot of mulling there's just no other way forward. MAP_VPAGETABLE was basically a software page-table feature for mmap()s that allowed the vkernel to implement page tables without needing hardware virtualization support.
* The basic problem is that the VM system is moving to an extent-based mechanism for tracking VM pages entered into PMAPs and is no longer indexing individual terminal PTEs with pv_entry's.
This means that the VM system is no longer able to get an exact list of PTEs in PMAPs that a particular vm_page is using. It just has a flag 'this page is in at least one pmap' or 'this page is not in any pmaps'. To track down the PTEs, the VM system must run through the extents via the vm_map_backing structures hanging off the related VM object.
This mechanism does not work with MAP_VPAGETABLE. Short of scanning the entire real pmap, the kernel has no way to reverse-index a page that might be indirected through MAP_VPAGETABLE.
* We will need actual hardware mmu virtualization to get the vkernel working again.
show more ...
|
| #
0fa99166 |
| 07-Nov-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Change pager interface to pass page index 3/*
* Don't shortcut vm_object_page_remove() with a resident_page_count test for MGTDEVICE objects. These objects are not required to have the
kernel - Change pager interface to pass page index 3/*
* Don't shortcut vm_object_page_remove() with a resident_page_count test for MGTDEVICE objects. These objects are not required to have their VM pages entered into them.
* in vm_object_page_remove(), change pmap_remove_pages() to pmap_remove(). The former is meant to be used only in the exit code and does not bother with TLB synchronization. pmap_remove() properly handles any TLB synchronization.
show more ...
|
| #
5ebb17ad |
| 04-Nov-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Change pager interface to pass page index 1/2
* Change the *getpage() API to include the page index as an argument. This allows us to avoid passing any vm_page_t for OBJT_MGTDEVICE VM
kernel - Change pager interface to pass page index 1/2
* Change the *getpage() API to include the page index as an argument. This allows us to avoid passing any vm_page_t for OBJT_MGTDEVICE VM pages.
By removing this requirement, the VM system no longer has to pre-allocate a placemarker page for DRM faults and the DRM system can directly install the page in the pmap without tracking it via a vm_page_t.
show more ...
|
|
Revision tags: v5.8.3, v5.8.2 |
|
| #
b9a6fe08 |
| 22-Aug-2020 |
Sascha Wildner <saw@online.de> |
kernel/pmap: Remove code under !defined(PMAP_ADVANCED).
We've been running with PMAP_ADVANCED by default since February 27. Remove the old, inactive code.
Approved-by: dillon
|
| #
cdf89dcf |
| 05-May-2020 |
Sascha Wildner <saw@online.de> |
kernel/vm: Rename VM_PAGER_PUT_* to OBJPC_*.
While here, rename the rest of the VM_PAGER_* flags too.
Suggested-by: dillon
|
|
Revision tags: v5.8.1, v5.8.0 |
|
| #
c2830aa6 |
| 27-Feb-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Continue pmap work
* Conditionalize this work on PMAP_ADVANCED, default enabled.
* Remove md_page.pmap_count and md_page.writeable_count, no longer track these counts which cause tons of
kernel - Continue pmap work
* Conditionalize this work on PMAP_ADVANCED, default enabled.
* Remove md_page.pmap_count and md_page.writeable_count, no longer track these counts which cause tons of cache line interactions.
However, there are still a few stubborn hold-overs.
* The vm_page still needs to be soft-busied in the page fault path
* For now we need to have a md_page.interlock_count to flag pages being replaced by pmap_enter() (e.g. COW faults) in order to be able to safely dispose of the page without busying it.
This need will eventually go away, hopefully just leaving us with the soft-busy-count issue.
show more ...
|
| #
a7c16d7a |
| 25-Feb-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Simple cache line optimizations
* Reorder struct vm_page, struct vnode, and struct vm_object a bit to improve cache-line locality.
* Use atomic_fcmpset_*() instead of atomic_cmpset_*() i
kernel - Simple cache line optimizations
* Reorder struct vm_page, struct vnode, and struct vm_object a bit to improve cache-line locality.
* Use atomic_fcmpset_*() instead of atomic_cmpset_*() in several places to reduce the inter-cpu cache coherency load a bit.
show more ...
|
|
Revision tags: v5.9.0, v5.8.0rc1, v5.6.3 |
|
| #
e2164e29 |
| 18-Oct-2019 |
zrj <rimvydas.jasinskas@gmail.com> |
<sys/slaballoc.h>: Switch to lighter <sys/_malloc.h> header.
The <sys/globaldata.h> embeds SLGlobalData that in turn embeds the "struct malloc_type". Adjust several kernel sources for missing in
<sys/slaballoc.h>: Switch to lighter <sys/_malloc.h> header.
The <sys/globaldata.h> embeds SLGlobalData that in turn embeds the "struct malloc_type". Adjust several kernel sources for missing includes where memory allocation is performed. Try to use alphabetical include order.
Now (in most cases) <sys/malloc.h> is included after <sys/objcache.h>. Once it gets cleaned up, the <sys/malloc.h> inclusion could be moved out of <sys/idr.h> to drm Linux compat layer linux/slab.h without side effects.
show more ...
|
| #
bce6845a |
| 18-Oct-2019 |
zrj <rimvydas.jasinskas@gmail.com> |
kernel: Minor whitespace cleanup in few sources.
|
|
Revision tags: v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0 |
|
| #
831a8507 |
| 20-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 15 - Core pmap work, refactor PG_*
* Augment PG_FICTITIOUS. This takes over some of PG_UNMANAGED's previous capabilities. In addition, the pmap_*() API will work with fic
kernel - VM rework part 15 - Core pmap work, refactor PG_*
* Augment PG_FICTITIOUS. This takes over some of PG_UNMANAGED's previous capabilities. In addition, the pmap_*() API will work with fictitious pages, making mmap() operation (aka of the GPU) more consistent.
* Add PG_UNQUEUED. This prevents a vm_page from being manipulated in the vm_page_queues[] in any way. This takes over another feature of the old PG_UNMANAGED flag.
* Remove PG_UNMANAGED
* Remove PG_DEVICE_IDX. This is no longer relevant. We use PG_FICTITIOUS for all device pages.
* Refactor vm_contig_pg_alloc(), vm_contig_pg_free(), vm_page_alloc_contig(), and vm_page_free_contig().
These functions now set PG_FICTITIOUS | PG_UNQUEUED on the returned pages, and properly clear the bits upon free or if/when a regular (but special contig-managed) page is handed over to the normal paging system.
This is combined with making the pmap*() functions work better with PG_FICTITIOUS is the primary 'fix' for some of DRMs hacks.
show more ...
|
| #
e3c330f0 |
| 19-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 12 - Core pmap work, stabilize & optimize
* Add tracking for the number of PTEs mapped writeable in md_page. Change how PG_WRITEABLE and PG_MAPPED is cleared in the vm_page
kernel - VM rework part 12 - Core pmap work, stabilize & optimize
* Add tracking for the number of PTEs mapped writeable in md_page. Change how PG_WRITEABLE and PG_MAPPED is cleared in the vm_page to avoid clear/set races. This problem occurs because we would have otherwise tried to clear the bits without hard-busying the page. This allows the bits to be set with only an atomic op.
Procedures which test these bits universally do so while holding the page hard-busied, and now call pmap_mapped_sfync() prior to properly synchronize the bits.
* Fix bugs related to various counterse. pm_stats.resident_count, wiring counts, vm_page->md.writeable_count, and vm_page->md.pmap_count.
* Fix bugs related to synchronizing removed pte's with the vm_page. Fix one case where we were improperly updating (m)'s state based on a lost race against a pte swap-to-0 (pulling the pte).
* Fix a bug related to the page soft-busying code when the m->object/m->pindex race is lost.
* Implement a heuristical version of vm_page_active() which just updates act_count unlocked if the page is already in the PQ_ACTIVE queue, or if it is fictitious.
* Allow races against the backing scan for pmap_remove_all() and pmap_page_protect(VM_PROT_READ). Callers of these routines for these cases expect full synchronization of the page dirty state. We can identify when a page has not been fully cleaned out by checking vm_page->md.pmap_count and vm_page->md.writeable_count. In the rare situation where this happens, simply retry.
* Assert that the PTE pindex is properly interlocked in pmap_enter(). We still allows PTEs to be pulled by other routines without the interlock, but multiple pmap_enter()s of the same page will be interlocked.
* Assert additional wiring count failure cases.
* (UNTESTED) Flag DEVICE pages (dev_pager_getfake()) as being PG_UNMANAGED. This essentially prevents all the various reference counters (e.g. vm_page->md.pmap_count and vm_page->md.writeable_count), PG_M, PG_A, etc from being updated.
The vm_page's aren't tracked in the pmap at all because there is no way to find them.. they are 'fake', so without a pv_entry, we can't track them. Instead we simply rely on the vm_map_backing scan to manipulate the PTEs.
* Optimize the new vm_map_entry_shadow() to use a shared object token instead of an exclusive one. OBJ_ONEMAPPING will be cleared with the shared token.
* Optimize single-threaded access to pmaps to avoid pmap_inval_*() complexities.
* Optimize __read_mostly for more globals.
* Optimize pmap_testbit(), pmap_clearbit(), pmap_page_protect(). Pre-check vm_page->md.writeable_count and vm_page->md.pmap_count for an easy degenerate return; before real work.
* Optimize pmap_inval_smp() and pmap_inval_smp_cmpset() for the single-threaded pmap case, when called on the same CPU the pmap is associated with. This allows us to use simple atomics and cpu_*() instructions and avoid the complexities of the pmap_inval_*() infrastructure.
* Randomize the page queue used in bio_page_alloc(). This does not appear to hurt performance (e.g. heavy tmpfs use) on large many-core NUMA machines and it makes the vm_page_alloc()'s job easier.
This change might have a downside for temporary files, but for more long-lasting files there's no point allocating pages localized to a particular cpu.
* Optimize vm_page_alloc().
(1) Refactor the _vm_page_list_find*() routines to avoid re-scanning the same array indices over and over again when trying to find a page.
(2) Add a heuristic, vpq.lastq, for each queue, which we set if a _vm_page_list_find*() operation had to go far-afield to find its page. Subsequent finds will skip to the far-afield position until the current CPUs queues have pages again.
(3) Reduce PQ_L2_SIZE From an extravagant 2048 entries per queue down to 1024. The original 2048 was meant to provide 8-way set-associativity for 256 cores but wound up reducing performance due to longer index iterations.
* Refactor the vm_page_hash[] array. This array is used to shortcut vm_object locks and locate VM pages more quickly, without locks. The new code limits the size of the array to something more reasonable, implements a 4-way set-associative replacement policy using 'ticks', and rewrites the hashing math.
* Effectively remove pmap_object_init_pt() for now. In current tests it does not actually improve performance, probably because it may map pages that are not actually used by the program.
* Remove vm_map_backing->refs. This field is no longer used.
* Remove more of the old now-stale code related to use of pv_entry's for terminal PTEs.
* Remove more of the old shared page-table-page code. This worked but could never be fully validated and was prone to bugs. So remove it. In the future we will likely use larger 2MB and 1GB pages anyway.
* Remove pmap_softwait()/pmap_softhold()/pmap_softdone().
* Remove more #if 0'd code.
show more ...
|
| #
567a6398 |
| 18-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 11 - Core pmap work to remove terminal PVs
* Remove pv_entry_t belonging to terminal PTEs. The pv_entry's for PT, PD, PDP, and PML4 remain. This reduces kernel memory use
kernel - VM rework part 11 - Core pmap work to remove terminal PVs
* Remove pv_entry_t belonging to terminal PTEs. The pv_entry's for PT, PD, PDP, and PML4 remain. This reduces kernel memory use for pv_entry's by 99%.
The pmap code now iterates vm_object->backing_list (of vm_map_backing structures) to run-down pages for various operations.
* Remove vm_page->pv_list. This was one of the biggest sources of contention for shared faults. However, in this first attempt I am leaving all sorts of ref-counting intact so the contention has not been entirely removed yet.
* Current hacks:
- Dynamic page table page removal currently disabled because the vm_map_backing scan needs to be able to deterministically run-down PTE pointers. Removal only occurs at program exit.
- PG_DEVICE_IDX probably isn't being handled properly yet.
- Shared page faults not yet optimized.
* So far minor improvements in performance across the board. This is realtively unoptimized. The buildkernel test improves by 2% and the zero-fill fault test improves by around 10%.
Kernel memory use is improved (reduced) enormously.
show more ...
|
| #
530e94fc |
| 17-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 9 - Precursor work for terminal pv_entry removal
* Cleanup the API a bit
* Get rid of pmap_enter_quick()
* Remove unused procedures.
* Document that vm_page_protect() (and
kernel - VM rework part 9 - Precursor work for terminal pv_entry removal
* Cleanup the API a bit
* Get rid of pmap_enter_quick()
* Remove unused procedures.
* Document that vm_page_protect() (and thus the related pmap_page_protect()) must be called with a hard-busied page. This ensures that the operation does not race a new pmap_enter() of the page.
show more ...
|
|
Revision tags: v5.4.3 |
|
| #
5b329e62 |
| 11-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 7 - Initial vm_map_backing index
* Implement a TAILQ and hang vm_map_backing structures off of the related object. This feature is still in progress and will eventually
kernel - VM rework part 7 - Initial vm_map_backing index
* Implement a TAILQ and hang vm_map_backing structures off of the related object. This feature is still in progress and will eventually be used to allow pmaps to manipulate vm_page's without pv_entry's.
At the same time, remove all sharing of vm_map_backing. For example, clips no longer share the vm_map_backing. We can't share the structures if they are being used to itemize areas for pmap management.
TODO - reoptimize this at some point.
TODO - not yet quite deterministic enough for pmap searches (due to clips).
* Refactor vm_object_reference_quick() to again allow operation on any vm_object whos ref_count is already at least 1, or which belongs to a vnode. The ref_count is no longer being used for complex vm_object collapse, shadowing, or migration code.
This allows us to avoid a number of unnecessary token grabs on objects during clips, shadowing, and forks.
* Cleanup a few fields in vm_object. Name TAILQ_ENTRY() elements blahblah_entry instead of blahblah_list.
* Fix an issue with a.out binaries (that are still supported but nobody uses) where the object refs on the binaries were not being properly accounted for.
show more ...
|
| #
1dcf1bc7 |
| 11-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework (fix introduced bug)
* Fix a null-pointer dereferencing bug in vm_object_madvise() introduced in recent commits.
|
| #
44293a80 |
| 09-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 3 - Cleanup pass
* Cleanup various structures and code
|
| #
9de48ead |
| 09-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 2 - Replace backing_object with backing_ba
* Remove the vm_object based backing_object chains and all related chaining code.
This removes an enormous number of locks fro
kernel - VM rework part 2 - Replace backing_object with backing_ba
* Remove the vm_object based backing_object chains and all related chaining code.
This removes an enormous number of locks from the VM system and also removes object-to-object dependencies which requires careful traversal code. A great deal of complex code has been removed and replaced with far simpler code.
Ultimately the intention will be to support removal of pv_entry tracking from vm_pages to gain lockless shared faults, but that is far in the future. It will require hanging vm_map_backing structures off of a list based in the object.
* Implement the vm_map_backing structure which is embedded in the vm_map_entry and then links to additional dynamically allocated vm_map_backing structures via entry->ba.backing_ba. This structure contains the object and offset and essentially takes over the functionality that object->backing_object used to have.
backing objects are now handled via vm_map_backing. In this commit, fork operations create a fan-in tree to shared subsets of backings via vm_map_backing. In this particular commit, these subsets are not collapsed in any way.
* Remove all the vm_map_split and collapse code. Every last line is gone. It will be reimplemented using vm_map_backing in a later commit.
This means that as-of this commit both recursive forks and parent-to-multiple-children forks cause an accumulation of inefficient lists of backing objects to occur in the parent and children. This will begin to get addressed in part 3.
* The code no longer releases the vm_map lock (typically shared) across (get_pages) I/O. There are no longer any chaining locks to get in the way (hopefully). This means that the code does not have to re-check as carefully as it did before. However, some complexity will have to be added back in once we begin to address the accumulation of vm_map_backing structures.
* Paging performance improved by 30-40%
show more ...
|
| #
6f76a56d |
| 07-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 1 - Remove shadow_list
* Remove shadow_head, shadow_list, shadow_count.
* This leaves the kernel operational but without collapse optimizations on 'other' processes when a
kernel - VM rework part 1 - Remove shadow_list
* Remove shadow_head, shadow_list, shadow_count.
* This leaves the kernel operational but without collapse optimizations on 'other' processes when a prorgam exits.
show more ...
|
|
Revision tags: v5.4.2 |
|
| #
0ca81fbe |
| 27-Mar-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix more wiring / fictitious bugs w/recent VM work
* Fictitious pages cannot be placed on any VM paging queue, assert this and fix a few places where it could.
* This will remove most of
kernel - Fix more wiring / fictitious bugs w/recent VM work
* Fictitious pages cannot be placed on any VM paging queue, assert this and fix a few places where it could.
* This will remove most of the 'Encountered wired page %p on queue' warnings reported. This kprintf() is still in place for the moment because generally speaking pages should be unwired before VM objects are destroyed.
But other than VM object destruction, its actually ok for wired pages to be on the paging queues now.
show more ...
|
| #
47ec0953 |
| 23-Mar-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor vm_map structure 1/2
* Remove the embedded vm_map_entry 'header' from vm_map.
* Remove the prev and next fields from vm_map_entry.
* Refactor the code to iterate only via the RB
kernel - Refactor vm_map structure 1/2
* Remove the embedded vm_map_entry 'header' from vm_map.
* Remove the prev and next fields from vm_map_entry.
* Refactor the code to iterate only via the RB tree. This is not as optimal as the prev/next fields were, but we can improve the RB tree code later to recover the performance.
show more ...
|