1.\" $NetBSD: uvm.9,v 1.115 2024/02/04 05:43:06 mrg Exp $ 2.\" 3.\" Copyright (c) 1998 Matthew R. Green 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 16.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 17.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 18.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 19.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 20.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 21.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 22.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 23.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 25.\" SUCH DAMAGE. 26.\" 27.Dd March 23, 2015 28.Dt UVM 9 29.Os 30.Sh NAME 31.Nm uvm 32.Nd virtual memory system external interface 33.Sh SYNOPSIS 34.In sys/param.h 35.In uvm/uvm.h 36.Sh DESCRIPTION 37The UVM virtual memory system manages access to the computer's memory 38resources. 39User processes and the kernel access these resources through 40UVM's external interface. 41UVM's external interface includes functions that: 42.Pp 43.Bl -hyphen -compact 44.It 45initialize UVM sub-systems 46.It 47manage virtual address spaces 48.It 49resolve page faults 50.It 51memory map files and devices 52.It 53perform uio-based I/O to virtual memory 54.It 55allocate and free kernel virtual memory 56.It 57allocate and free physical memory 58.El 59.Pp 60In addition to exporting these services, UVM has two kernel-level processes: 61pagedaemon and swapper. 62The pagedaemon process sleeps until physical memory becomes scarce. 63When that happens, pagedaemon is awoken. 64It scans physical memory, paging out and freeing memory that has not 65been recently used. 66The swapper process swaps in runnable processes that are currently swapped 67out, if there is room. 68.Pp 69There are also several miscellaneous functions. 70.Sh INITIALIZATION 71.Bl -ohang 72.It Ft void 73.Fn uvm_init "void" ; 74.It Ft void 75.Fn uvm_init_limits "struct lwp *l" ; 76.It Ft void 77.Fn uvm_setpagesize "void" ; 78.It Ft void 79.Fn uvm_swap_init "void" ; 80.El 81.Pp 82.Fn uvm_init 83sets up the UVM system at system boot time, after the 84console has been setup. 85It initializes global state, the page, map, kernel virtual memory state, 86machine-dependent physical map, kernel memory allocator, 87pager and anonymous memory sub-systems, and then enables 88paging of kernel objects. 89.Pp 90.Fn uvm_init_limits 91initializes process limits for the named process. 92This is for use by the system startup for process zero, before any 93other processes are created. 94.Pp 95.Fn uvm_md_init 96does early boot initialization. 97This currently includes: 98.Fn uvm_setpagesize 99which initializes the uvmexp members pagesize (if not already done by 100machine-dependent code), pageshift and pagemask. 101.Fn uvm_physseg_init 102which initialises the 103.Xr uvm_hotplug 9 104subsystem. 105It should be called by machine-dependent code early in the 106.Fn pmap_init 107call (see 108.Xr pmap 9 ) . 109.Pp 110.Fn uvm_swap_init 111initializes the swap sub-system. 112.Sh VIRTUAL ADDRESS SPACE MANAGEMENT 113See 114.Xr uvm_map 9 . 115.Sh PAGE FAULT HANDLING 116.Bl -ohang 117.It Ft int 118.Fn uvm_fault "struct vm_map *orig_map" "vaddr_t vaddr" "vm_prot_t access_type" ; 119.El 120.Pp 121.Fn uvm_fault 122is the main entry point for faults. 123It takes 124.Fa orig_map 125as the map the fault originated in, a 126.Fa vaddr 127offset into the map the fault occurred, and 128.Fa access_type 129describing the type of access requested. 130.Fn uvm_fault 131returns a standard UVM return value. 132.Sh MEMORY MAPPING FILES AND DEVICES 133See 134.Xr ubc 9 . 135.Sh VIRTUAL MEMORY I/O 136.Bl -ohang 137.It Ft int 138.Fn uvm_io "struct vm_map *map" "struct uio *uio" ; 139.El 140.Pp 141.Fn uvm_io 142performs the I/O described in 143.Fa uio 144on the memory described in 145.Fa map . 146.Sh ALLOCATION OF KERNEL MEMORY 147See 148.Xr uvm_km 9 . 149.Sh ALLOCATION OF PHYSICAL MEMORY 150.Bl -ohang 151.It Ft struct vm_page * 152.Fn uvm_pagealloc "struct uvm_object *uobj" "voff_t off" "struct vm_anon *anon" "int flags" ; 153.It Ft void 154.Fn uvm_pagerealloc "struct vm_page *pg" "struct uvm_object *newobj" "voff_t newoff" ; 155.It Ft void 156.Fn uvm_pagefree "struct vm_page *pg" ; 157.It Ft int 158.Fn uvm_pglistalloc "psize_t size" "paddr_t low" "paddr_t high" "paddr_t alignment" "paddr_t boundary" "struct pglist *rlist" "int nsegs" "int waitok" ; 159.It Ft void 160.Fn uvm_pglistfree "struct pglist *list" ; 161.It Ft void 162.Fn uvm_page_physload "paddr_t start" "paddr_t end" "paddr_t avail_start" "paddr_t avail_end" "int free_list" ; 163.El 164.Pp 165.Fn uvm_pagealloc 166allocates a page of memory at virtual address 167.Fa off 168in either the object 169.Fa uobj 170or the anonymous memory 171.Fa anon , 172which must be locked by the caller. 173Only one of 174.Fa uobj 175and 176.Fa anon 177can be non 178.Dv NULL . 179Returns 180.Dv NULL 181when no page can be found. 182The flags can be any of 183.Bd -literal 184#define UVM_PGA_USERESERVE 0x0001 /* ok to use reserve pages */ 185#define UVM_PGA_ZERO 0x0002 /* returned page must be zero'd */ 186.Ed 187.Pp 188.Dv UVM_PGA_USERESERVE 189means to allocate a page even if that will result in the number of free pages 190being lower than 191.Dv uvmexp.reserve_pagedaemon 192(if the current thread is the pagedaemon) or 193.Dv uvmexp.reserve_kernel 194(if the current thread is not the pagedaemon). 195.Dv UVM_PGA_ZERO 196causes the returned page to be filled with zeroes, either by allocating it 197from a pool of pre-zeroed pages or by zeroing it in-line as necessary. 198.Pp 199.Fn uvm_pagerealloc 200reallocates page 201.Fa pg 202to a new object 203.Fa newobj , 204at a new offset 205.Fa newoff . 206.Pp 207.Fn uvm_pagefree 208frees the physical page 209.Fa pg . 210If the content of the page is known to be zero-filled, 211caller should set 212.Dv PG_ZERO 213in pg->flags so that the page allocator will use 214the page to serve future 215.Dv UVM_PGA_ZERO 216requests efficiently. 217.Pp 218.Fn uvm_pglistalloc 219allocates a list of pages for size 220.Fa size 221byte under various constraints. 222.Fa low 223and 224.Fa high 225describe the lowest and highest addresses acceptable for the list. 226If 227.Fa alignment 228is non-zero, it describes the required alignment of the list, in 229power-of-two notation. 230If 231.Fa boundary 232is non-zero, no segment of the list may cross this power-of-two 233boundary, relative to zero. 234.Fa nsegs 235is the maximum number of physically contiguous segments. 236If 237.Fa waitok 238is non-zero, the function may sleep until enough memory is available. 239(It also may give up in some situations, so a non-zero 240.Fa waitok 241does not imply that 242.Fn uvm_pglistalloc 243cannot return an error.) 244The allocated memory is returned in the 245.Fa rlist 246list; the caller has to provide storage only, the list is initialized by 247.Fn uvm_pglistalloc . 248.Pp 249.Fn uvm_pglistfree 250frees the list of pages pointed to by 251.Fa list . 252If the content of the page is known to be zero-filled, 253caller should set 254.Dv PG_ZERO 255in pg->flags so that the page allocator will use 256the page to serve future 257.Dv UVM_PGA_ZERO 258requests efficiently. 259.Pp 260.Fn uvm_page_physload 261loads physical memory segments into VM space on the specified 262.Fa free_list . 263It must be called at system boot time to set up physical memory 264management pages. 265The arguments describe the 266.Fa start 267and 268.Fa end 269of the physical addresses of the segment, and the available start and end 270addresses of pages not already in use. 271If a system has memory banks of 272different speeds the slower memory should be given a higher 273.Fa free_list 274value. 275.\" XXX expand on "system boot time"! 276.Sh PROCESSES 277.Bl -ohang 278.It Ft void 279.Fn uvm_pageout "void" ; 280.It Ft void 281.Fn uvm_scheduler "void" ; 282.El 283.Pp 284.Fn uvm_pageout 285is the main loop for the page daemon. 286.Pp 287.Fn uvm_scheduler 288is the process zero main loop, which is to be called after the 289system has finished starting other processes. 290It handles the swapping in of runnable, swapped out processes in priority 291order. 292.Sh PAGE LOAN 293.Bl -ohang 294.It Ft int 295.Fn uvm_loan "struct vm_map *map" "vaddr_t start" "vsize_t len" "void *v" "int flags" ; 296.It Ft void 297.Fn uvm_unloan "void *v" "int npages" "int flags" ; 298.El 299.Pp 300.Fn uvm_loan 301loans pages in a map out to anons or to the kernel. 302.Fa map 303should be unlocked, 304.Fa start 305and 306.Fa len 307should be multiples of 308.Dv PAGE_SIZE . 309Argument 310.Fa flags 311should be one of 312.Bd -literal 313#define UVM_LOAN_TOANON 0x01 /* loan to anons */ 314#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */ 315.Ed 316.Pp 317.Fa v 318should be pointer to array of pointers to 319.Li struct anon 320or 321.Li struct vm_page , 322as appropriate. 323The caller has to allocate memory for the array and 324ensure it's big enough to hold 325.Fa len / PAGE_SIZE 326pointers. 327Returns 0 for success, or appropriate error number otherwise. 328Note that wired pages can't be loaned out and 329.Fn uvm_loan 330will fail in that case. 331.Pp 332.Fn uvm_unloan 333kills loans on pages or anons. 334The 335.Fa v 336must point to the array of pointers initialized by previous call to 337.Fn uvm_loan . 338.Fa npages 339should match number of pages allocated for loan, this also matches 340number of items in the array. 341Argument 342.Fa flags 343should be one of 344.Bd -literal 345#define UVM_LOAN_TOANON 0x01 /* loan to anons */ 346#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */ 347.Ed 348.Pp 349and should match what was used for previous call to 350.Fn uvm_loan . 351.Sh MISCELLANEOUS FUNCTIONS 352.Bl -ohang 353.It Ft struct uvm_object * 354.Fn uao_create "vsize_t size" "int flags" ; 355.It Ft void 356.Fn uao_detach "struct uvm_object *uobj" ; 357.It Ft void 358.Fn uao_reference "struct uvm_object *uobj" ; 359.It Ft bool 360.Fn uvm_chgkprot "void *addr" "size_t len" "int rw" ; 361.It Ft void 362.Fn uvm_kernacc "void *addr" "size_t len" "int rw" ; 363.It Ft int 364.Fn uvm_vslock "struct vmspace *vs" "void *addr" "size_t len" "vm_prot_t prot" ; 365.It Ft void 366.Fn uvm_vsunlock "struct vmspace *vs" "void *addr" "size_t len" ; 367.It Ft void 368.Fn uvm_meter "void" ; 369.It Ft void 370.Fn uvm_proc_fork "struct proc *p1" "struct proc *p2" "bool shared" ; 371.It Ft int 372.Fn uvm_grow "struct proc *p" "vaddr_t sp" ; 373.It Ft void 374.Fn uvn_findpages "struct uvm_object *uobj" "voff_t offset" "int *npagesp" "struct vm_page **pps" "int flags" ; 375.It Ft void 376.Fn uvm_vnp_setsize "struct vnode *vp" "voff_t newsize" ; 377.El 378.Pp 379The 380.Fn uao_create , 381.Fn uao_detach , 382and 383.Fn uao_reference 384functions operate on anonymous memory objects, such as those used to support 385System V shared memory. 386.Fn uao_create 387returns an object of size 388.Fa size 389with flags: 390.Bd -literal 391#define UAO_FLAG_KERNOBJ 0x1 /* create kernel object */ 392#define UAO_FLAG_KERNSWAP 0x2 /* enable kernel swap */ 393.Ed 394.Pp 395which can only be used once each at system boot time. 396.Fn uao_reference 397creates an additional reference to the named anonymous memory object. 398.Fn uao_detach 399removes a reference from the named anonymous memory object, destroying 400it if removing the last reference. 401.Pp 402.Fn uvm_chgkprot 403changes the protection of kernel memory from 404.Fa addr 405to 406.Fa addr + len 407to the value of 408.Fa rw . 409This is primarily useful for debuggers, for setting breakpoints. 410This function is only available with options 411.Dv KGDB . 412.Pp 413.Fn uvm_kernacc 414checks the access at address 415.Fa addr 416to 417.Fa addr + len 418for 419.Fa rw 420access in the kernel address space. 421.Pp 422.Fn uvm_vslock 423and 424.Fn uvm_vsunlock 425control the wiring and unwiring of pages for process 426.Fa p 427from 428.Fa addr 429to 430.Fa addr + len . 431These functions are normally used to wire memory for I/O. 432.Pp 433.Fn uvm_meter 434calculates the load average. 435.Pp 436.Fn uvm_proc_fork 437forks a virtual address space for process' (old) 438.Fa p1 439and (new) 440.Fa p2 . 441If the 442.Fa shared 443argument is non zero, p1 shares its address space with p2, 444otherwise a new address space is created. 445This function currently has no return value, and thus cannot fail. 446In the future, this function will be changed to allow it to 447fail in low memory conditions. 448.Pp 449.Fn uvm_grow 450increases the stack segment of process 451.Fa p 452to include 453.Fa sp . 454.Pp 455.Fn uvn_findpages 456looks up or creates pages in 457.Fa uobj 458at offset 459.Fa offset , 460marks them busy and returns them in the 461.Fa pps 462array. 463Currently 464.Fa uobj 465must be a vnode object. 466The number of pages requested is pointed to by 467.Fa npagesp , 468and this value is updated with the actual number of pages returned. 469The flags can be any bitwise inclusive-or of: 470.Pp 471.Bl -tag -offset abcd -compact -width UVM_ADV_SEQUENTIAL 472.It Dv UFP_ALL 473Zero pseudo-flag meaning return all pages. 474.It Dv UFP_NOWAIT 475Don't sleep \(em yield 476.Dv NULL 477for busy pages or for uncached pages for which allocation would sleep. 478.It Dv UFP_NOALLOC 479Don't allocate \(em yield 480.Dv NULL 481for uncached pages. 482.It Dv UFP_NOCACHE 483Don't use cached pages \(em yield 484.Dv NULL 485instead. 486.It Dv UFP_NORDONLY 487Don't yield read-only pages \(em yield 488.Dv NULL 489for pages marked 490.Dv PG_READONLY . 491.It Dv UFP_DIRTYONLY 492Don't yield clean pages \(em stop early at the first clean one. 493As a side effect, mark yielded dirty pages clean. 494Caller must write them to permanent storage before unbusying. 495.It Dv UFP_BACKWARD 496Traverse pages in reverse order. 497If 498.Fn uvn_findpages 499returns early, it will have filled 500.Li * Ns Fa npagesp 501entries at the end of 502.Fa pps 503rather than the beginning. 504.El 505.Pp 506.Fn uvm_vnp_setsize 507sets the size of vnode 508.Fa vp 509to 510.Fa newsize . 511Caller must hold a reference to the vnode. 512If the vnode shrinks, pages no longer used are discarded. 513.Sh MISCELLANEOUS MACROS 514.Bl -ohang 515.It Ft paddr_t 516.Fn atop "paddr_t pa" ; 517.It Ft paddr_t 518.Fn ptoa "paddr_t pn" ; 519.It Ft paddr_t 520.Fn round_page "address" ; 521.It Ft paddr_t 522.Fn trunc_page "address" ; 523.El 524.Pp 525The 526.Fn atop 527macro converts a physical address 528.Fa pa 529into a page number. 530The 531.Fn ptoa 532macro does the opposite by converting a page number 533.Fa pn 534into a physical address. 535.Pp 536.Fn round_page 537and 538.Fn trunc_page 539macros return a page address boundary from rounding 540.Fa address 541up and down, respectively, to the nearest page boundary. 542These macros work for either addresses or byte counts. 543.Sh SYSCTL 544UVM provides support for the 545.Dv CTL_VM 546domain of the 547.Xr sysctl 3 548hierarchy. 549It handles the 550.Dv VM_LOADAVG , 551.Dv VM_METER , 552.Dv VM_UVMEXP , 553and 554.Dv VM_UVMEXP2 555nodes, which return the current load averages, calculates current VM 556totals, returns the uvmexp structure, and a kernel version independent 557view of the uvmexp structure, respectively. 558It also exports a number of tunables that control how much VM space is 559allowed to be consumed by various tasks. 560The load averages are typically accessed from userland using the 561.Xr getloadavg 3 562function. 563The uvmexp structure has all global state of the UVM system, 564and has the following members: 565.Bd -literal 566/* vm_page constants */ 567int pagesize; /* size of a page (PAGE_SIZE): must be power of 2 */ 568int pagemask; /* page mask */ 569int pageshift; /* page shift */ 570 571/* vm_page counters */ 572int npages; /* number of pages we manage */ 573int free; /* number of free pages */ 574int paging; /* number of pages in the process of being paged out */ 575int wired; /* number of wired pages */ 576int reserve_pagedaemon; /* number of pages reserved for pagedaemon */ 577int reserve_kernel; /* number of pages reserved for kernel */ 578 579/* pageout params */ 580int freemin; /* min number of free pages */ 581int freetarg; /* target number of free pages */ 582int inactarg; /* target number of inactive pages */ 583int wiredmax; /* max number of wired pages */ 584 585/* swap */ 586int nswapdev; /* number of configured swap devices in system */ 587int swpages; /* number of PAGE_SIZE'ed swap pages */ 588int swpginuse; /* number of swap pages in use */ 589int nswget; /* number of times fault calls uvm_swap_get() */ 590int nanon; /* number total of anon's in system */ 591int nfreeanon; /* number of free anon's */ 592 593/* stat counters */ 594int faults; /* page fault count */ 595int traps; /* trap count */ 596int intrs; /* interrupt count */ 597int swtch; /* context switch count */ 598int softs; /* software interrupt count */ 599int syscalls; /* system calls */ 600int pageins; /* pagein operation count */ 601 /* pageouts are in pdpageouts below */ 602int pgswapin; /* pages swapped in */ 603int pgswapout; /* pages swapped out */ 604int forks; /* forks */ 605int forks_ppwait; /* forks where parent waits */ 606int forks_sharevm; /* forks where vmspace is shared */ 607 608/* fault subcounters */ 609int fltnoram; /* number of times fault was out of ram */ 610int fltnoanon; /* number of times fault was out of anons */ 611int fltpgwait; /* number of times fault had to wait on a page */ 612int fltpgrele; /* number of times fault found a released page */ 613int fltrelck; /* number of times fault relock called */ 614int fltrelckok; /* number of times fault relock is a success */ 615int fltanget; /* number of times fault gets anon page */ 616int fltanretry; /* number of times fault retrys an anon get */ 617int fltamcopy; /* number of times fault clears "needs copy" */ 618int fltnamap; /* number of times fault maps a neighbor anon page */ 619int fltnomap; /* number of times fault maps a neighbor obj page */ 620int fltlget; /* number of times fault does a locked pgo_get */ 621int fltget; /* number of times fault does an unlocked get */ 622int flt_anon; /* number of times fault anon (case 1a) */ 623int flt_acow; /* number of times fault anon cow (case 1b) */ 624int flt_obj; /* number of times fault is on object page (2a) */ 625int flt_prcopy; /* number of times fault promotes with copy (2b) */ 626int flt_przero; /* number of times fault promotes with zerofill (2b) */ 627 628/* daemon counters */ 629int pdwoke; /* number of times daemon woke up */ 630int pdrevs; /* number of times daemon rev'd clock hand */ 631int pdfreed; /* number of pages daemon freed since boot */ 632int pdscans; /* number of pages daemon scanned since boot */ 633int pdanscan; /* number of anonymous pages scanned by daemon */ 634int pdobscan; /* number of object pages scanned by daemon */ 635int pdreact; /* number of pages daemon reactivated since boot */ 636int pdbusy; /* number of times daemon found a busy page */ 637int pdpageouts; /* number of times daemon started a pageout */ 638int pdpending; /* number of times daemon got a pending pageout */ 639int pddeact; /* number of pages daemon deactivates */ 640.Ed 641.Sh NOTES 642.Fn uvm_chgkprot 643is only available if the kernel has been compiled with options 644.Dv KGDB . 645.Pp 646All structure and types whose names begin with 647.Dq vm_ 648will be renamed to 649.Dq uvm_ . 650.Sh SEE ALSO 651.Xr swapctl 2 , 652.Xr getloadavg 3 , 653.Xr kvm 3 , 654.Xr sysctl 3 , 655.Xr ddb 4 , 656.Xr options 4 , 657.Xr memoryallocators 9 , 658.Xr pmap 9 , 659.Xr ubc 9 , 660.Xr uvm_km 9 , 661.Xr uvm_map 9 662.Rs 663.%A Charles D. Cranor 664.%A Gurudatta M. Parulkar 665.%T "The UVM Virtual Memory System" 666.%I USENIX Association 667.%B Proceedings of the USENIX Annual Technical Conference 668.%P 117-130 669.%D June 6-11, 1999 670.%U http://www.usenix.org/event/usenix99/full_papers/cranor/cranor.pdf 671.Re 672.Sh HISTORY 673UVM is a new VM system developed at Washington University in St. Louis 674(Missouri). 675UVM's roots lie partly in the Mach-based 676.Bx 4.4 677VM system, the 678.Fx 679VM system, and the SunOS 4 VM system. 680UVM's basic structure is based on the 681.Bx 4.4 682VM system. 683UVM's new anonymous memory system is based on the 684anonymous memory system found in the SunOS 4 VM (as described in papers 685published by Sun Microsystems, Inc.). 686UVM also includes a number of features new to 687.Bx 688including page loanout, map entry passing, simplified 689copy-on-write, and clustered anonymous memory pageout. 690UVM is also further documented in an August 1998 dissertation by 691Charles D. Cranor. 692.Pp 693UVM appeared in 694.Nx 1.4 . 695.Sh AUTHORS 696.An -nosplit 697.An Charles D. Cranor 698.Aq Mt chuck@ccrc.wustl.edu 699designed and implemented UVM. 700.Pp 701.An Matthew Green 702.Aq Mt mrg@eterna23.net 703wrote the swap-space management code and handled the logistical issues 704involved with merging UVM into the 705.Nx 706source tree. 707.Pp 708.An Chuck Silvers 709.Aq Mt chuq@chuq.com 710implemented the aobj pager, thus allowing UVM to support System V shared 711memory and process swapping. 712