1.\" $NetBSD: uvm.9,v 1.110 2015/03/23 08:19:12 riastradh Exp $ 2.\" 3.\" Copyright (c) 1998 Matthew R. Green 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 16.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 17.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 18.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 19.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 20.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 21.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 22.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 23.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 25.\" SUCH DAMAGE. 26.\" 27.Dd March 23, 2015 28.Dt UVM 9 29.Os 30.Sh NAME 31.Nm uvm 32.Nd virtual memory system external interface 33.Sh SYNOPSIS 34.In sys/param.h 35.In uvm/uvm.h 36.Sh DESCRIPTION 37The UVM virtual memory system manages access to the computer's memory 38resources. 39User processes and the kernel access these resources through 40UVM's external interface. 41UVM's external interface includes functions that: 42.Pp 43.Bl -hyphen -compact 44.It 45initialize UVM sub-systems 46.It 47manage virtual address spaces 48.It 49resolve page faults 50.It 51memory map files and devices 52.It 53perform uio-based I/O to virtual memory 54.It 55allocate and free kernel virtual memory 56.It 57allocate and free physical memory 58.El 59.Pp 60In addition to exporting these services, UVM has two kernel-level processes: 61pagedaemon and swapper. 62The pagedaemon process sleeps until physical memory becomes scarce. 63When that happens, pagedaemon is awoken. 64It scans physical memory, paging out and freeing memory that has not 65been recently used. 66The swapper process swaps in runnable processes that are currently swapped 67out, if there is room. 68.Pp 69There are also several miscellaneous functions. 70.Sh INITIALIZATION 71.Bl -ohang 72.It Ft void 73.Fn uvm_init "void" ; 74.It Ft void 75.Fn uvm_init_limits "struct lwp *l" ; 76.It Ft void 77.Fn uvm_setpagesize "void" ; 78.It Ft void 79.Fn uvm_swap_init "void" ; 80.El 81.Pp 82.Fn uvm_init 83sets up the UVM system at system boot time, after the 84console has been setup. 85It initializes global state, the page, map, kernel virtual memory state, 86machine-dependent physical map, kernel memory allocator, 87pager and anonymous memory sub-systems, and then enables 88paging of kernel objects. 89.Pp 90.Fn uvm_init_limits 91initializes process limits for the named process. 92This is for use by the system startup for process zero, before any 93other processes are created. 94.Pp 95.Fn uvm_setpagesize 96initializes the uvmexp members pagesize (if not already done by 97machine-dependent code), pageshift and pagemask. 98It should be called by machine-dependent code early in the 99.Fn pmap_init 100call (see 101.Xr pmap 9 ) . 102.Pp 103.Fn uvm_swap_init 104initializes the swap sub-system. 105.Sh VIRTUAL ADDRESS SPACE MANAGEMENT 106See 107.Xr uvm_map 9 . 108.Sh PAGE FAULT HANDLING 109.Bl -ohang 110.It Ft int 111.Fn uvm_fault "struct vm_map *orig_map" "vaddr_t vaddr" "vm_prot_t access_type" ; 112.El 113.Pp 114.Fn uvm_fault 115is the main entry point for faults. 116It takes 117.Fa orig_map 118as the map the fault originated in, a 119.Fa vaddr 120offset into the map the fault occurred, and 121.Fa access_type 122describing the type of access requested. 123.Fn uvm_fault 124returns a standard UVM return value. 125.Sh MEMORY MAPPING FILES AND DEVICES 126See 127.Xr ubc 9 . 128.Sh VIRTUAL MEMORY I/O 129.Bl -ohang 130.It Ft int 131.Fn uvm_io "struct vm_map *map" "struct uio *uio" ; 132.El 133.Pp 134.Fn uvm_io 135performs the I/O described in 136.Fa uio 137on the memory described in 138.Fa map . 139.Sh ALLOCATION OF KERNEL MEMORY 140See 141.Xr uvm_km 9 . 142.Sh ALLOCATION OF PHYSICAL MEMORY 143.Bl -ohang 144.It Ft struct vm_page * 145.Fn uvm_pagealloc "struct uvm_object *uobj" "voff_t off" "struct vm_anon *anon" "int flags" ; 146.It Ft void 147.Fn uvm_pagerealloc "struct vm_page *pg" "struct uvm_object *newobj" "voff_t newoff" ; 148.It Ft void 149.Fn uvm_pagefree "struct vm_page *pg" ; 150.It Ft int 151.Fn uvm_pglistalloc "psize_t size" "paddr_t low" "paddr_t high" "paddr_t alignment" "paddr_t boundary" "struct pglist *rlist" "int nsegs" "int waitok" ; 152.It Ft void 153.Fn uvm_pglistfree "struct pglist *list" ; 154.It Ft void 155.Fn uvm_page_physload "paddr_t start" "paddr_t end" "paddr_t avail_start" "paddr_t avail_end" "int free_list" ; 156.El 157.Pp 158.Fn uvm_pagealloc 159allocates a page of memory at virtual address 160.Fa off 161in either the object 162.Fa uobj 163or the anonymous memory 164.Fa anon , 165which must be locked by the caller. 166Only one of 167.Fa uobj 168and 169.Fa anon 170can be non 171.Dv NULL . 172Returns 173.Dv NULL 174when no page can be found. 175The flags can be any of 176.Bd -literal 177#define UVM_PGA_USERESERVE 0x0001 /* ok to use reserve pages */ 178#define UVM_PGA_ZERO 0x0002 /* returned page must be zero'd */ 179.Ed 180.Pp 181.Dv UVM_PGA_USERESERVE 182means to allocate a page even if that will result in the number of free pages 183being lower than 184.Dv uvmexp.reserve_pagedaemon 185(if the current thread is the pagedaemon) or 186.Dv uvmexp.reserve_kernel 187(if the current thread is not the pagedaemon). 188.Dv UVM_PGA_ZERO 189causes the returned page to be filled with zeroes, either by allocating it 190from a pool of pre-zeroed pages or by zeroing it in-line as necessary. 191.Pp 192.Fn uvm_pagerealloc 193reallocates page 194.Fa pg 195to a new object 196.Fa newobj , 197at a new offset 198.Fa newoff . 199.Pp 200.Fn uvm_pagefree 201frees the physical page 202.Fa pg . 203If the content of the page is known to be zero-filled, 204caller should set 205.Dv PG_ZERO 206in pg-\*[Gt]flags so that the page allocator will use 207the page to serve future 208.Dv UVM_PGA_ZERO 209requests efficiently. 210.Pp 211.Fn uvm_pglistalloc 212allocates a list of pages for size 213.Fa size 214byte under various constraints. 215.Fa low 216and 217.Fa high 218describe the lowest and highest addresses acceptable for the list. 219If 220.Fa alignment 221is non-zero, it describes the required alignment of the list, in 222power-of-two notation. 223If 224.Fa boundary 225is non-zero, no segment of the list may cross this power-of-two 226boundary, relative to zero. 227.Fa nsegs 228is the maximum number of physically contiguous segments. 229If 230.Fa waitok 231is non-zero, the function may sleep until enough memory is available. 232(It also may give up in some situations, so a non-zero 233.Fa waitok 234does not imply that 235.Fn uvm_pglistalloc 236cannot return an error.) 237The allocated memory is returned in the 238.Fa rlist 239list; the caller has to provide storage only, the list is initialized by 240.Fn uvm_pglistalloc . 241.Pp 242.Fn uvm_pglistfree 243frees the list of pages pointed to by 244.Fa list . 245If the content of the page is known to be zero-filled, 246caller should set 247.Dv PG_ZERO 248in pg-\*[Gt]flags so that the page allocator will use 249the page to serve future 250.Dv UVM_PGA_ZERO 251requests efficiently. 252.Pp 253.Fn uvm_page_physload 254loads physical memory segments into VM space on the specified 255.Fa free_list . 256It must be called at system boot time to set up physical memory 257management pages. 258The arguments describe the 259.Fa start 260and 261.Fa end 262of the physical addresses of the segment, and the available start and end 263addresses of pages not already in use. 264If a system has memory banks of 265different speeds the slower memory should be given a higher 266.Fa free_list 267value. 268.\" XXX expand on "system boot time"! 269.Sh PROCESSES 270.Bl -ohang 271.It Ft void 272.Fn uvm_pageout "void" ; 273.It Ft void 274.Fn uvm_scheduler "void" ; 275.El 276.Pp 277.Fn uvm_pageout 278is the main loop for the page daemon. 279.Pp 280.Fn uvm_scheduler 281is the process zero main loop, which is to be called after the 282system has finished starting other processes. 283It handles the swapping in of runnable, swapped out processes in priority 284order. 285.Sh PAGE LOAN 286.Bl -ohang 287.It Ft int 288.Fn uvm_loan "struct vm_map *map" "vaddr_t start" "vsize_t len" "void *v" "int flags" ; 289.It Ft void 290.Fn uvm_unloan "void *v" "int npages" "int flags" ; 291.El 292.Pp 293.Fn uvm_loan 294loans pages in a map out to anons or to the kernel. 295.Fa map 296should be unlocked, 297.Fa start 298and 299.Fa len 300should be multiples of 301.Dv PAGE_SIZE . 302Argument 303.Fa flags 304should be one of 305.Bd -literal 306#define UVM_LOAN_TOANON 0x01 /* loan to anons */ 307#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */ 308.Ed 309.Pp 310.Fa v 311should be pointer to array of pointers to 312.Li struct anon 313or 314.Li struct vm_page , 315as appropriate. 316The caller has to allocate memory for the array and 317ensure it's big enough to hold 318.Fa len / PAGE_SIZE 319pointers. 320Returns 0 for success, or appropriate error number otherwise. 321Note that wired pages can't be loaned out and 322.Fn uvm_loan 323will fail in that case. 324.Pp 325.Fn uvm_unloan 326kills loans on pages or anons. 327The 328.Fa v 329must point to the array of pointers initialized by previous call to 330.Fn uvm_loan . 331.Fa npages 332should match number of pages allocated for loan, this also matches 333number of items in the array. 334Argument 335.Fa flags 336should be one of 337.Bd -literal 338#define UVM_LOAN_TOANON 0x01 /* loan to anons */ 339#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */ 340.Ed 341.Pp 342and should match what was used for previous call to 343.Fn uvm_loan . 344.Sh MISCELLANEOUS FUNCTIONS 345.Bl -ohang 346.It Ft struct uvm_object * 347.Fn uao_create "vsize_t size" "int flags" ; 348.It Ft void 349.Fn uao_detach "struct uvm_object *uobj" ; 350.It Ft void 351.Fn uao_reference "struct uvm_object *uobj" ; 352.It Ft bool 353.Fn uvm_chgkprot "void *addr" "size_t len" "int rw" ; 354.It Ft void 355.Fn uvm_kernacc "void *addr" "size_t len" "int rw" ; 356.It Ft int 357.Fn uvm_vslock "struct vmspace *vs" "void *addr" "size_t len" "vm_prot_t prot" ; 358.It Ft void 359.Fn uvm_vsunlock "struct vmspace *vs" "void *addr" "size_t len" ; 360.It Ft void 361.Fn uvm_meter "void" ; 362.It Ft void 363.Fn uvm_proc_fork "struct proc *p1" "struct proc *p2" "bool shared" ; 364.It Ft int 365.Fn uvm_grow "struct proc *p" "vaddr_t sp" ; 366.It Ft void 367.Fn uvn_findpages "struct uvm_object *uobj" "voff_t offset" "int *npagesp" "struct vm_page **pps" "int flags" ; 368.It Ft void 369.Fn uvm_vnp_setsize "struct vnode *vp" "voff_t newsize" ; 370.El 371.Pp 372The 373.Fn uao_create , 374.Fn uao_detach , 375and 376.Fn uao_reference 377functions operate on anonymous memory objects, such as those used to support 378System V shared memory. 379.Fn uao_create 380returns an object of size 381.Fa size 382with flags: 383.Bd -literal 384#define UAO_FLAG_KERNOBJ 0x1 /* create kernel object */ 385#define UAO_FLAG_KERNSWAP 0x2 /* enable kernel swap */ 386.Ed 387.Pp 388which can only be used once each at system boot time. 389.Fn uao_reference 390creates an additional reference to the named anonymous memory object. 391.Fn uao_detach 392removes a reference from the named anonymous memory object, destroying 393it if removing the last reference. 394.Pp 395.Fn uvm_chgkprot 396changes the protection of kernel memory from 397.Fa addr 398to 399.Fa addr + len 400to the value of 401.Fa rw . 402This is primarily useful for debuggers, for setting breakpoints. 403This function is only available with options 404.Dv KGDB . 405.Pp 406.Fn uvm_kernacc 407checks the access at address 408.Fa addr 409to 410.Fa addr + len 411for 412.Fa rw 413access in the kernel address space. 414.Pp 415.Fn uvm_vslock 416and 417.Fn uvm_vsunlock 418control the wiring and unwiring of pages for process 419.Fa p 420from 421.Fa addr 422to 423.Fa addr + len . 424These functions are normally used to wire memory for I/O. 425.Pp 426.Fn uvm_meter 427calculates the load average. 428.Pp 429.Fn uvm_proc_fork 430forks a virtual address space for process' (old) 431.Fa p1 432and (new) 433.Fa p2 . 434If the 435.Fa shared 436argument is non zero, p1 shares its address space with p2, 437otherwise a new address space is created. 438This function currently has no return value, and thus cannot fail. 439In the future, this function will be changed to allow it to 440fail in low memory conditions. 441.Pp 442.Fn uvm_grow 443increases the stack segment of process 444.Fa p 445to include 446.Fa sp . 447.Pp 448.Fn uvn_findpages 449looks up or creates pages in 450.Fa uobj 451at offset 452.Fa offset , 453marks them busy and returns them in the 454.Fa pps 455array. 456Currently 457.Fa uobj 458must be a vnode object. 459The number of pages requested is pointed to by 460.Fa npagesp , 461and this value is updated with the actual number of pages returned. 462The flags can be any bitwise inclusive-or of: 463.Pp 464.Bl -tag -offset abcd -compact -width UVM_ADV_SEQUENTIAL 465.It Dv UFP_ALL 466Zero pseudo-flag meaning return all pages. 467.It Dv UFP_NOWAIT 468Don't sleep -- yield 469.Dv NULL 470for busy pages or for uncached pages for which allocation would sleep. 471.It Dv UFP_NOALLOC 472Don't allocate -- yield 473.Dv NULL 474for uncached pages. 475.It Dv UFP_NOCACHE 476Don't use cached pages -- yield 477.Dv NULL 478instead. 479.It Dv UFP_NORDONLY 480Don't yield read-only pages -- yield 481.Dv NULL 482for pages marked 483.Dv PG_READONLY . 484.It Dv UFP_DIRTYONLY 485Don't yield clean pages -- stop early at the first clean one. 486As a side effect, mark yielded dirty pages clean. 487Caller must write them to permanent storage before unbusying. 488.It Dv UFP_BACKWARD 489Traverse pages in reverse order. 490If 491.Fn uvn_findpages 492returns early, it will have filled 493.Li * Ns Fa npagesp 494entries at the end of 495.Fa pps 496rather than the beginning. 497.El 498.Pp 499.Fn uvm_vnp_setsize 500sets the size of vnode 501.Fa vp 502to 503.Fa newsize . 504Caller must hold a reference to the vnode. 505If the vnode shrinks, pages no longer used are discarded. 506.Sh MISCELLANEOUS MACROS 507.Bl -ohang 508.It Ft paddr_t 509.Fn atop "paddr_t pa" ; 510.It Ft paddr_t 511.Fn ptoa "paddr_t pn" ; 512.It Ft paddr_t 513.Fn round_page "address" ; 514.It Ft paddr_t 515.Fn trunc_page "address" ; 516.El 517.Pp 518The 519.Fn atop 520macro converts a physical address 521.Fa pa 522into a page number. 523The 524.Fn ptoa 525macro does the opposite by converting a page number 526.Fa pn 527into a physical address. 528.Pp 529.Fn round_page 530and 531.Fn trunc_page 532macros return a page address boundary from rounding 533.Fa address 534up and down, respectively, to the nearest page boundary. 535These macros work for either addresses or byte counts. 536.Sh SYSCTL 537UVM provides support for the 538.Dv CTL_VM 539domain of the 540.Xr sysctl 3 541hierarchy. 542It handles the 543.Dv VM_LOADAVG , 544.Dv VM_METER , 545.Dv VM_UVMEXP , 546and 547.Dv VM_UVMEXP2 548nodes, which return the current load averages, calculates current VM 549totals, returns the uvmexp structure, and a kernel version independent 550view of the uvmexp structure, respectively. 551It also exports a number of tunables that control how much VM space is 552allowed to be consumed by various tasks. 553The load averages are typically accessed from userland using the 554.Xr getloadavg 3 555function. 556The uvmexp structure has all global state of the UVM system, 557and has the following members: 558.Bd -literal 559/* vm_page constants */ 560int pagesize; /* size of a page (PAGE_SIZE): must be power of 2 */ 561int pagemask; /* page mask */ 562int pageshift; /* page shift */ 563 564/* vm_page counters */ 565int npages; /* number of pages we manage */ 566int free; /* number of free pages */ 567int paging; /* number of pages in the process of being paged out */ 568int wired; /* number of wired pages */ 569int reserve_pagedaemon; /* number of pages reserved for pagedaemon */ 570int reserve_kernel; /* number of pages reserved for kernel */ 571 572/* pageout params */ 573int freemin; /* min number of free pages */ 574int freetarg; /* target number of free pages */ 575int inactarg; /* target number of inactive pages */ 576int wiredmax; /* max number of wired pages */ 577 578/* swap */ 579int nswapdev; /* number of configured swap devices in system */ 580int swpages; /* number of PAGE_SIZE'ed swap pages */ 581int swpginuse; /* number of swap pages in use */ 582int nswget; /* number of times fault calls uvm_swap_get() */ 583int nanon; /* number total of anon's in system */ 584int nfreeanon; /* number of free anon's */ 585 586/* stat counters */ 587int faults; /* page fault count */ 588int traps; /* trap count */ 589int intrs; /* interrupt count */ 590int swtch; /* context switch count */ 591int softs; /* software interrupt count */ 592int syscalls; /* system calls */ 593int pageins; /* pagein operation count */ 594 /* pageouts are in pdpageouts below */ 595int pgswapin; /* pages swapped in */ 596int pgswapout; /* pages swapped out */ 597int forks; /* forks */ 598int forks_ppwait; /* forks where parent waits */ 599int forks_sharevm; /* forks where vmspace is shared */ 600 601/* fault subcounters */ 602int fltnoram; /* number of times fault was out of ram */ 603int fltnoanon; /* number of times fault was out of anons */ 604int fltpgwait; /* number of times fault had to wait on a page */ 605int fltpgrele; /* number of times fault found a released page */ 606int fltrelck; /* number of times fault relock called */ 607int fltrelckok; /* number of times fault relock is a success */ 608int fltanget; /* number of times fault gets anon page */ 609int fltanretry; /* number of times fault retrys an anon get */ 610int fltamcopy; /* number of times fault clears "needs copy" */ 611int fltnamap; /* number of times fault maps a neighbor anon page */ 612int fltnomap; /* number of times fault maps a neighbor obj page */ 613int fltlget; /* number of times fault does a locked pgo_get */ 614int fltget; /* number of times fault does an unlocked get */ 615int flt_anon; /* number of times fault anon (case 1a) */ 616int flt_acow; /* number of times fault anon cow (case 1b) */ 617int flt_obj; /* number of times fault is on object page (2a) */ 618int flt_prcopy; /* number of times fault promotes with copy (2b) */ 619int flt_przero; /* number of times fault promotes with zerofill (2b) */ 620 621/* daemon counters */ 622int pdwoke; /* number of times daemon woke up */ 623int pdrevs; /* number of times daemon rev'd clock hand */ 624int pdfreed; /* number of pages daemon freed since boot */ 625int pdscans; /* number of pages daemon scanned since boot */ 626int pdanscan; /* number of anonymous pages scanned by daemon */ 627int pdobscan; /* number of object pages scanned by daemon */ 628int pdreact; /* number of pages daemon reactivated since boot */ 629int pdbusy; /* number of times daemon found a busy page */ 630int pdpageouts; /* number of times daemon started a pageout */ 631int pdpending; /* number of times daemon got a pending pageout */ 632int pddeact; /* number of pages daemon deactivates */ 633.Ed 634.Sh NOTES 635.Fn uvm_chgkprot 636is only available if the kernel has been compiled with options 637.Dv KGDB . 638.Pp 639All structure and types whose names begin with 640.Dq vm_ 641will be renamed to 642.Dq uvm_ . 643.Sh SEE ALSO 644.Xr swapctl 2 , 645.Xr getloadavg 3 , 646.Xr kvm 3 , 647.Xr sysctl 3 , 648.Xr ddb 4 , 649.Xr options 4 , 650.Xr memoryallocators 9 , 651.Xr pmap 9 , 652.Xr ubc 9 , 653.Xr uvm_km 9 , 654.Xr uvm_map 9 655.Rs 656.%A Charles D. Cranor 657.%A Gurudatta M. Parulkar 658.%T "The UVM Virtual Memory System" 659.%I USENIX Association 660.%B Proceedings of the USENIX Annual Technical Conference 661.%P 117-130 662.%D June 6-11, 1999 663.%U http://www.usenix.org/event/usenix99/full_papers/cranor/cranor.pdf 664.Re 665.Sh HISTORY 666UVM is a new VM system developed at Washington University in St. Louis 667(Missouri). 668UVM's roots lie partly in the Mach-based 669.Bx 4.4 670VM system, the 671.Fx 672VM system, and the SunOS 4 VM system. 673UVM's basic structure is based on the 674.Bx 4.4 675VM system. 676UVM's new anonymous memory system is based on the 677anonymous memory system found in the SunOS 4 VM (as described in papers 678published by Sun Microsystems, Inc.). 679UVM also includes a number of features new to 680.Bx 681including page loanout, map entry passing, simplified 682copy-on-write, and clustered anonymous memory pageout. 683UVM is also further documented in an August 1998 dissertation by 684Charles D. Cranor. 685.Pp 686UVM appeared in 687.Nx 1.4 . 688.Sh AUTHORS 689.An -nosplit 690.An Charles D. Cranor 691.Aq Mt chuck@ccrc.wustl.edu 692designed and implemented UVM. 693.Pp 694.An Matthew Green 695.Aq Mt mrg@eterna.com.au 696wrote the swap-space management code and handled the logistical issues 697involved with merging UVM into the 698.Nx 699source tree. 700.Pp 701.An Chuck Silvers 702.Aq Mt chuq@chuq.com 703implemented the aobj pager, thus allowing UVM to support System V shared 704memory and process swapping. 705