#
712b6620 |
| 21-May-2021 |
Aaron LI <aly@aaronly.me> |
vm: Change 'kernel_object' global to pointer type
Following the previous commits, this commit changes the 'kernel_object' to pointer type of 'struct vm_object *'. This makes it align better with 'k
vm: Change 'kernel_object' global to pointer type
Following the previous commits, this commit changes the 'kernel_object' to pointer type of 'struct vm_object *'. This makes it align better with 'kernel_map' and simplifies the code a bit.
No functional changes.
show more ...
|
#
cdf89dcf |
| 05-May-2020 |
Sascha Wildner <saw@online.de> |
kernel/vm: Rename VM_PAGER_PUT_* to OBJPC_*.
While here, rename the rest of the VM_PAGER_* flags too.
Suggested-by: dillon
|
#
a7c16d7a |
| 25-Feb-2020 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Simple cache line optimizations
* Reorder struct vm_page, struct vnode, and struct vm_object a bit to improve cache-line locality.
* Use atomic_fcmpset_*() instead of atomic_cmpset_*() i
kernel - Simple cache line optimizations
* Reorder struct vm_page, struct vnode, and struct vm_object a bit to improve cache-line locality.
* Use atomic_fcmpset_*() instead of atomic_cmpset_*() in several places to reduce the inter-cpu cache coherency load a bit.
show more ...
|
#
567a6398 |
| 18-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 11 - Core pmap work to remove terminal PVs
* Remove pv_entry_t belonging to terminal PTEs. The pv_entry's for PT, PD, PDP, and PML4 remain. This reduces kernel memory use
kernel - VM rework part 11 - Core pmap work to remove terminal PVs
* Remove pv_entry_t belonging to terminal PTEs. The pv_entry's for PT, PD, PDP, and PML4 remain. This reduces kernel memory use for pv_entry's by 99%.
The pmap code now iterates vm_object->backing_list (of vm_map_backing structures) to run-down pages for various operations.
* Remove vm_page->pv_list. This was one of the biggest sources of contention for shared faults. However, in this first attempt I am leaving all sorts of ref-counting intact so the contention has not been entirely removed yet.
* Current hacks:
- Dynamic page table page removal currently disabled because the vm_map_backing scan needs to be able to deterministically run-down PTE pointers. Removal only occurs at program exit.
- PG_DEVICE_IDX probably isn't being handled properly yet.
- Shared page faults not yet optimized.
* So far minor improvements in performance across the board. This is realtively unoptimized. The buildkernel test improves by 2% and the zero-fill fault test improves by around 10%.
Kernel memory use is improved (reduced) enormously.
show more ...
|
#
530e94fc |
| 17-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 9 - Precursor work for terminal pv_entry removal
* Cleanup the API a bit
* Get rid of pmap_enter_quick()
* Remove unused procedures.
* Document that vm_page_protect() (and
kernel - VM rework part 9 - Precursor work for terminal pv_entry removal
* Cleanup the API a bit
* Get rid of pmap_enter_quick()
* Remove unused procedures.
* Document that vm_page_protect() (and thus the related pmap_page_protect()) must be called with a hard-busied page. This ensures that the operation does not race a new pmap_enter() of the page.
show more ...
|
#
67e7cb85 |
| 14-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 8 - Precursor work for terminal pv_entry removal
* Adjust structures so the pmap code can iterate backing_ba's with just the vm_object spinlock.
Add a ba.pmap back-point
kernel - VM rework part 8 - Precursor work for terminal pv_entry removal
* Adjust structures so the pmap code can iterate backing_ba's with just the vm_object spinlock.
Add a ba.pmap back-pointer.
Move entry->start and entry->end into the ba (ba.start, ba.end). This is replicative of the base entry->ba.start and entry->ba.end, but local modifications are locked by individual objects to allow pmap ops to just look at backing ba's iterated via the object.
Remove the entry->map back-pointer.
Remove the ba.entry_base back-pointer.
* ba.offset is now an absolute offset and not additive. Adjust all code that calculates and uses ba.offset (fortunately it is all concentrated in vm_map.c and vm_fault.c).
* Refactor ba.start/offset/end modificatons to be atomic with the necessary spin-locks to allow the pmap code to safely iterate the vm_map_backing list for a vm_object.
* Test VM system with full synth run.
show more ...
|
#
5b329e62 |
| 11-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 7 - Initial vm_map_backing index
* Implement a TAILQ and hang vm_map_backing structures off of the related object. This feature is still in progress and will eventually
kernel - VM rework part 7 - Initial vm_map_backing index
* Implement a TAILQ and hang vm_map_backing structures off of the related object. This feature is still in progress and will eventually be used to allow pmaps to manipulate vm_page's without pv_entry's.
At the same time, remove all sharing of vm_map_backing. For example, clips no longer share the vm_map_backing. We can't share the structures if they are being used to itemize areas for pmap management.
TODO - reoptimize this at some point.
TODO - not yet quite deterministic enough for pmap searches (due to clips).
* Refactor vm_object_reference_quick() to again allow operation on any vm_object whos ref_count is already at least 1, or which belongs to a vnode. The ref_count is no longer being used for complex vm_object collapse, shadowing, or migration code.
This allows us to avoid a number of unnecessary token grabs on objects during clips, shadowing, and forks.
* Cleanup a few fields in vm_object. Name TAILQ_ENTRY() elements blahblah_entry instead of blahblah_list.
* Fix an issue with a.out binaries (that are still supported but nobody uses) where the object refs on the binaries were not being properly accounted for.
show more ...
|
#
8492a2fe |
| 10-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 5 - Cleanup
* Cleanup vm_map_entry_shadow()
* Remove (unused) vmspace_president_count() Remove (barely used) struct lwkt_token typedef.
* Cleanup the vm_map_aux, vm_map_e
kernel - VM rework part 5 - Cleanup
* Cleanup vm_map_entry_shadow()
* Remove (unused) vmspace_president_count() Remove (barely used) struct lwkt_token typedef.
* Cleanup the vm_map_aux, vm_map_entry, vm_map, and vm_object structures
* Adjfustments to in-code documentation
show more ...
|
#
1c024bc6 |
| 10-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 4 - Implement vm_fault_collapse()
* Add the function vm_fault_collapse(). This function simulates faults to copy all pages from backing objects into the front object, al
kernel - VM rework part 4 - Implement vm_fault_collapse()
* Add the function vm_fault_collapse(). This function simulates faults to copy all pages from backing objects into the front object, allowing the backing objects to be disconnected from the map entry.
This function is called under certain conditions from the vmspace_fork*() code prior to a fork to potentially collapse the entry's backing objects into the front object. The caller then disconnects the backing objects, truncating the list to a single object (the front object).
This optimization is necessary to prevent the backing_ba list from growing in an unbounded fashion. In addition, being able to disconnect the graph allows redundant backing store to be freed more quickly, reducing memory use.
* Add sysctl vm.map_backing_shadow_test (default enabled). The vmspace_fork*() code now does a quick all-shadowed test on the first backing object and calls vm_fault_collapse() if it comes back true, regardless of the chain length.
* Add sysctl vm.map_backing_limit (default 5). The vmspace_fork*() code calls vm_fault_collapse() when the ba.backing_ba list exceeds the specified number of entries.
* Performance is a tad faster than the original collapse code.
show more ...
|
#
9de48ead |
| 09-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 2 - Replace backing_object with backing_ba
* Remove the vm_object based backing_object chains and all related chaining code.
This removes an enormous number of locks fro
kernel - VM rework part 2 - Replace backing_object with backing_ba
* Remove the vm_object based backing_object chains and all related chaining code.
This removes an enormous number of locks from the VM system and also removes object-to-object dependencies which requires careful traversal code. A great deal of complex code has been removed and replaced with far simpler code.
Ultimately the intention will be to support removal of pv_entry tracking from vm_pages to gain lockless shared faults, but that is far in the future. It will require hanging vm_map_backing structures off of a list based in the object.
* Implement the vm_map_backing structure which is embedded in the vm_map_entry and then links to additional dynamically allocated vm_map_backing structures via entry->ba.backing_ba. This structure contains the object and offset and essentially takes over the functionality that object->backing_object used to have.
backing objects are now handled via vm_map_backing. In this commit, fork operations create a fan-in tree to shared subsets of backings via vm_map_backing. In this particular commit, these subsets are not collapsed in any way.
* Remove all the vm_map_split and collapse code. Every last line is gone. It will be reimplemented using vm_map_backing in a later commit.
This means that as-of this commit both recursive forks and parent-to-multiple-children forks cause an accumulation of inefficient lists of backing objects to occur in the parent and children. This will begin to get addressed in part 3.
* The code no longer releases the vm_map lock (typically shared) across (get_pages) I/O. There are no longer any chaining locks to get in the way (hopefully). This means that the code does not have to re-check as carefully as it did before. However, some complexity will have to be added back in once we begin to address the accumulation of vm_map_backing structures.
* Paging performance improved by 30-40%
show more ...
|
#
6f76a56d |
| 07-May-2019 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - VM rework part 1 - Remove shadow_list
* Remove shadow_head, shadow_list, shadow_count.
* This leaves the kernel operational but without collapse optimizations on 'other' processes when a
kernel - VM rework part 1 - Remove shadow_list
* Remove shadow_head, shadow_list, shadow_count.
* This leaves the kernel operational but without collapse optimizations on 'other' processes when a prorgam exits.
show more ...
|
#
fcf6efef |
| 02-Mar-2019 |
Sascha Wildner <saw@online.de> |
kernel: Remove numerous #include <sys/thread2.h>.
Most of them were added when we converted spl*() calls to crit_enter()/crit_exit(), almost 14 years ago. We can now remove a good chunk of them agai
kernel: Remove numerous #include <sys/thread2.h>.
Most of them were added when we converted spl*() calls to crit_enter()/crit_exit(), almost 14 years ago. We can now remove a good chunk of them again for where crit_*() are no longer used.
I had to adjust some files that were relying on thread2.h or headers that it includes coming in via other headers that it was removed from.
show more ...
|
#
562ffbba |
| 20-Apr-2018 |
Matthew Dillon <dillon@backplane.com> |
kernel - Increase vm_object hash table
* Increase table from 64 to 256 entries.
* Improve the hash algorithm considerably for better coverage.
|
#
641f3b0a |
| 02-Nov-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refactor vm_fault and vm_map a bit.
* Allow the virtual copy feature to be disabled via a sysctl. Default enabled.
* Fix a bug in the virtual copy test. Multiple elements were not bei
kernel - Refactor vm_fault and vm_map a bit.
* Allow the virtual copy feature to be disabled via a sysctl. Default enabled.
* Fix a bug in the virtual copy test. Multiple elements were not being retested after reacquiring the map lock.
* Change the auto-partitioning of vm_map_entry structures from 16MB to 32MB. Add a sysctl to allow the feature to be disabled. Default enabled.
* Cleanup map->timestamp bumps. Basically we bump it in vm_map_lock(), and also fix a bug where it was not being bumped after relocking the map in the virtual copy feature.
* Fix an incorrect assertion in vm_map_split(). Refactor tests in vm_map_split(). Also, acquire the chain lock for the VM object in the caller to vm_map_split() instead of in vm_map_split() itself, allowing us to include the pmap adjustment within the locked area.
* Make sure OBJ_ONEMAPPING is cleared for nobject in vm_map_split().
* Fix a bug in a call to vm_map_transition_wait() that double-locked the vm_map in the partitioning code.
* General cleanups in vm/vm_object.c
show more ...
|
#
46b71cbe |
| 06-Oct-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Refuse to swapoff under certain conditions
* Both tmpfs and vn can't handle swapoff's method of bringing pages back in from the swap partition being decomissioned.
* Fixing this properly
kernel - Refuse to swapoff under certain conditions
* Both tmpfs and vn can't handle swapoff's method of bringing pages back in from the swap partition being decomissioned.
* Fixing this properly is fairly involved. The normal swapoff procedure is to page swap into the related VM object, but tmpfs and vn use their VM objects ONLY to track swap blocks and not for vm_page manipulation, so that just won't work. In addition, the swap code may associate a swap block with a VM object before issuing the write I/O to page out the data, and the swapoff code's asynchronous pagein might cause problems.
For now, just make sure that swapoff refuses to remove the partition under these conditions, so it doesn't blow up tmpfs or vn.
show more ...
|
#
0062b9ff |
| 26-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Remove object->agg_pv_list_count
* Remove the object->agg_pv_list_count field. It represents an unnecessary global cache bounce, was only being used to help report vkernel RSS, and was
kernel - Remove object->agg_pv_list_count
* Remove the object->agg_pv_list_count field. It represents an unnecessary global cache bounce, was only being used to help report vkernel RSS, and wasn't working very well anyway.
show more ...
|
#
76f1911e |
| 23-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - pmap and vkernel work
* Remove the pmap.pm_token entirely. The pmap is currently protected primarily by fine-grained locks and the vm_map lock. The intention is to eventually be able
kernel - pmap and vkernel work
* Remove the pmap.pm_token entirely. The pmap is currently protected primarily by fine-grained locks and the vm_map lock. The intention is to eventually be able to protect it without the vm_map lock at all.
* Enhance pv_entry acquisition (representing PTE locations) to include a placemarker facility for non-existant PTEs, allowing the PTE location to be locked whether a pv_entry exists for it or not.
* Fix dev_dmmap (struct dev_mmap) (for future use), it was returning a page index for physical memory as a 32-bit integer instead of a 64-bit integer.
* Use pmap_kextract() instead of pmap_extract() where appropriate.
* Put the token contention test back in kern_clock.c for real kernels so token contention shows up as sys% instead of idle%.
* Modify the pmap_extract() API to also return a locked pv_entry, and add pmap_extract_done() to release it. Adjust users of pmap_extract().
* Change madvise/mcontrol MADV_INVAL (used primarily by the vkernel) to use a shared vm_map lock instead of an exclusive lock. This significantly improves the vkernel's performance and significantly reduces stalls and glitches when typing in one under heavy loads.
* The new placemarkers also have the side effect of fixing several difficult-to-reproduce bugs in the pmap code, by ensuring that shared and unmanaged pages are properly locked whereas before only managed pages (with pv_entry's) were properly locked.
* Adjust the vkernel's pmap code to use atomic ops in numerous places.
* Rename the pmap_change_wiring() call to pmap_unwire(). The routine was only being used to unwire (and could only safely be called for unwiring anyway). Remove the unused 'wired' and the 'entry' arguments.
Also change how pmap_unwire() works to remove a small race condition.
* Fix race conditions in the vmspace_*() system calls which could lead to pmap corruption. Note that the vkernel did not trigger any of these conditions, I found them while looking for another bug.
* Add missing maptypes to procfs's /proc/*/map report.
show more ...
|
#
a17c6c05 |
| 18-Jan-2017 |
Antonio Huete Jimenez <tuxillo@quantumachine.net> |
kernel: Add a new vm_object_init()
|
#
4f077c8a |
| 18-Jan-2017 |
Antonio Huete Jimenez <tuxillo@quantumachine.net> |
kernel: Rename vm_object_init() to vm_object_init1()
- No functional change.
|
#
fde6be6a |
| 03-Jan-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - vm_object work
* Adjust OBJT_SWAP object management to be more SMP friendly. The hash table now uses a combined structure to reduce unnecessary cache interactions.
* Allocate VM objec
kernel - vm_object work
* Adjust OBJT_SWAP object management to be more SMP friendly. The hash table now uses a combined structure to reduce unnecessary cache interactions.
* Allocate VM objects via kmalloc() instead of zalloc. Remove the zalloc pool for VM objects and use kmalloc(). Early initialization of the kernel does not have to access vm_object allocation functions until after basic VM initialization.
* Remove a vm_page_cache console warning that is no longer applicable. (It could be triggered by the RSS rlimit handling code).
show more ...
|
#
534ee349 |
| 28-Dec-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Implement RLIMIT_RSS, Increase maximum supported swap
* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS exceeds the rlimit. Currently the algorith used to choose the
kernel - Implement RLIMIT_RSS, Increase maximum supported swap
* Implement RLIMIT_RSS by forcing pages out to swap if a process's RSS exceeds the rlimit. Currently the algorith used to choose the pages is fairly unsophisticated (we don't have the luxury of a per-process vm_page_queues[] array).
* Implement the swap_user_async sysctl, default off. This sysctl can be set to 1 to enable asynchronous paging in the RSS code. This is mostly for testing and is not recommended since it allows the process to eat memory more quickly than it can be paged out.
* Reimplement vm.swap_burst_read so the sysctl now specifies the number of pages that are allowed to be burst. Still disabled by default (will be enabled in a followup commit).
* Fix an overflow in the nswap_lowat and nswap_hiwat calculations.
* Refactor some of the pageout code to support synchronous direct paging, which the RSS code uses. Thew new code also implements a feature that will move clean pages to PQ_CACHE, making them immediately reallocatable.
* Refactor the vm_pageout_deficit variable, using atomic ops.
* Fix an issue in vm_pageout_clean() (originally part of the inactive scan) which prevented clustering from operating properly on write.
* Refactor kern/subr_blist.c and all associated code that uses to increase swblk_t from int32_t to int64_t, and to increase the radix supported from 31 bits to 63 bits.
This increases the maximum supported swap from 2TB to some ungodly large value. Remember that, by default, space for up to 4 swap devices is preallocated so if you are allocating insane amounts of swap it is best to do it with four equal-sized partitions instead of one so kernel memory is efficiently allocated.
* There are two kernel data structures associated with swap. The blmeta structure which has approximately a 1:8192 ratio (ram:swap) and is pre-allocated up-front, and the swmeta structure whos KVA is reserved but not allocated.
The swmeta structure has a 1:341 ratio. It tracks swap assignments for pages in vm_object's. The kernel limits the number of structures to approximately half of physical memory, meaning that if you have a machine with 16GB of ram the maximum amount of swapped-out data you can support with that is 16/2*341 = 2.7TB. Not that you would actually want to eat half your ram to do actually do that.
A large system with, say, 128GB of ram, would be able to support 128/2*341 = 21TB of swap. The ultimate limitation is the 512GB of KVM. The swap system can use up to 256GB of this so the maximum swap currently supported by DragonFly on a machine with > 512GB of ram is going to be 256/2*341 = 43TB. To expand this further would require some adjustments to increase the amount of KVM supported by the kernel.
* WARNING! swmeta is allocated via zalloc(). Once allocated, the memory can be reused for swmeta but cannot be freed for use by other subsystems. You should only configure as much swap as you are willing to reserve ram for.
show more ...
|
#
3b2f3463 |
| 18-Jul-2016 |
zrj <rimvydas.jasinskas@gmail.com> |
sys: Various include guard fixes.
|
#
c66c7e2f |
| 25-Jan-2016 |
zrj <rimvydas.jasinskas@gmail.com> |
Correct BSD License clause numbering from 1-2-4 to 1-2-3.
|
#
15553805 |
| 30-Dec-2014 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Fix a major (pageable) memory leak
* Under certain relatively easy to reproduce conditions an extra ref_count can be added to a VM object during a fork(), preventing the object from eve
kernel - Fix a major (pageable) memory leak
* Under certain relatively easy to reproduce conditions an extra ref_count can be added to a VM object during a fork(), preventing the object from ever being destroyed. It's pages may even be paged out, but the system will eventually run out of swap space too.
* The actual fix is to assign 'map_object = object' in vm_map_insert() (see the diff). The rest of this commit is conditionalized debugging code and code documentation.
* Because this change implements a relatively esoteric feature in the VM system by allowing an anonymous VM object to be extended to cover an area even though it might have a gap (so a new VM object does not have to be allocated), further testing is needed before we can MFC this to the RELEASE branch.
show more ...
|
#
99ebfb7c |
| 06-May-2014 |
Sascha Wildner <saw@online.de> |
kernel: Fix some boolean_t vs. int confusion.
When boolean_t is defined to be _Bool instead of int (not part of this commit), this is what gcc is sad about.
|