1.\" $NetBSD: uvm.9,v 1.107 2012/07/02 21:10:31 jym Exp $ 2.\" 3.\" Copyright (c) 1998 Matthew R. Green 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 16.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 17.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 18.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 19.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 20.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 21.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 22.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 23.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 25.\" SUCH DAMAGE. 26.\" 27.Dd July 2, 2012 28.Dt UVM 9 29.Os 30.Sh NAME 31.Nm uvm 32.Nd virtual memory system external interface 33.Sh SYNOPSIS 34.In sys/param.h 35.In uvm/uvm.h 36.Sh DESCRIPTION 37The UVM virtual memory system manages access to the computer's memory 38resources. 39User processes and the kernel access these resources through 40UVM's external interface. 41UVM's external interface includes functions that: 42.Pp 43.Bl -hyphen -compact 44.It 45initialize UVM sub-systems 46.It 47manage virtual address spaces 48.It 49resolve page faults 50.It 51memory map files and devices 52.It 53perform uio-based I/O to virtual memory 54.It 55allocate and free kernel virtual memory 56.It 57allocate and free physical memory 58.El 59.Pp 60In addition to exporting these services, UVM has two kernel-level processes: 61pagedaemon and swapper. 62The pagedaemon process sleeps until physical memory becomes scarce. 63When that happens, pagedaemon is awoken. 64It scans physical memory, paging out and freeing memory that has not 65been recently used. 66The swapper process swaps in runnable processes that are currently swapped 67out, if there is room. 68.Pp 69There are also several miscellaneous functions. 70.Sh INITIALIZATION 71.Bl -ohang 72.It Ft void 73.Fn uvm_init "void" ; 74.It Ft void 75.Fn uvm_init_limits "struct lwp *l" ; 76.It Ft void 77.Fn uvm_setpagesize "void" ; 78.It Ft void 79.Fn uvm_swap_init "void" ; 80.El 81.Pp 82.Fn uvm_init 83sets up the UVM system at system boot time, after the 84console has been setup. 85It initializes global state, the page, map, kernel virtual memory state, 86machine-dependent physical map, kernel memory allocator, 87pager and anonymous memory sub-systems, and then enables 88paging of kernel objects. 89.Pp 90.Fn uvm_init_limits 91initializes process limits for the named process. 92This is for use by the system startup for process zero, before any 93other processes are created. 94.Pp 95.Fn uvm_setpagesize 96initializes the uvmexp members pagesize (if not already done by 97machine-dependent code), pageshift and pagemask. 98It should be called by machine-dependent code early in the 99.Fn pmap_init 100call (see 101.Xr pmap 9 ) . 102.Pp 103.Fn uvm_swap_init 104initializes the swap sub-system. 105.Sh VIRTUAL ADDRESS SPACE MANAGEMENT 106See 107.Xr uvm_map 9 . 108.Sh PAGE FAULT HANDLING 109.Bl -ohang 110.It Ft int 111.Fn uvm_fault "struct vm_map *orig_map" "vaddr_t vaddr" "vm_prot_t access_type" ; 112.El 113.Pp 114.Fn uvm_fault 115is the main entry point for faults. 116It takes 117.Fa orig_map 118as the map the fault originated in, a 119.Fa vaddr 120offset into the map the fault occurred, and 121.Fa access_type 122describing the type of access requested. 123.Fn uvm_fault 124returns a standard UVM return value. 125.Sh MEMORY MAPPING FILES AND DEVICES 126See 127.Xr ubc 9 . 128.Sh VIRTUAL MEMORY I/O 129.Bl -ohang 130.It Ft int 131.Fn uvm_io "struct vm_map *map" "struct uio *uio" ; 132.El 133.Pp 134.Fn uvm_io 135performs the I/O described in 136.Fa uio 137on the memory described in 138.Fa map . 139.Sh ALLOCATION OF KERNEL MEMORY 140See 141.Xr uvm_km 9 . 142.Sh ALLOCATION OF PHYSICAL MEMORY 143.Bl -ohang 144.It Ft struct vm_page * 145.Fn uvm_pagealloc "struct uvm_object *uobj" "voff_t off" "struct vm_anon *anon" "int flags" ; 146.It Ft void 147.Fn uvm_pagerealloc "struct vm_page *pg" "struct uvm_object *newobj" "voff_t newoff" ; 148.It Ft void 149.Fn uvm_pagefree "struct vm_page *pg" ; 150.It Ft int 151.Fn uvm_pglistalloc "psize_t size" "paddr_t low" "paddr_t high" "paddr_t alignment" "paddr_t boundary" "struct pglist *rlist" "int nsegs" "int waitok" ; 152.It Ft void 153.Fn uvm_pglistfree "struct pglist *list" ; 154.It Ft void 155.Fn uvm_page_physload "paddr_t start" "paddr_t end" "paddr_t avail_start" "paddr_t avail_end" "int free_list" ; 156.El 157.Pp 158.Fn uvm_pagealloc 159allocates a page of memory at virtual address 160.Fa off 161in either the object 162.Fa uobj 163or the anonymous memory 164.Fa anon , 165which must be locked by the caller. 166Only one of 167.Fa uobj 168and 169.Fa anon 170can be non 171.Dv NULL . 172Returns 173.Dv NULL 174when no page can be found. 175The flags can be any of 176.Bd -literal 177#define UVM_PGA_USERESERVE 0x0001 /* ok to use reserve pages */ 178#define UVM_PGA_ZERO 0x0002 /* returned page must be zero'd */ 179.Ed 180.Pp 181.Dv UVM_PGA_USERESERVE 182means to allocate a page even if that will result in the number of free pages 183being lower than 184.Dv uvmexp.reserve_pagedaemon 185(if the current thread is the pagedaemon) or 186.Dv uvmexp.reserve_kernel 187(if the current thread is not the pagedaemon). 188.Dv UVM_PGA_ZERO 189causes the returned page to be filled with zeroes, either by allocating it 190from a pool of pre-zeroed pages or by zeroing it in-line as necessary. 191.Pp 192.Fn uvm_pagerealloc 193reallocates page 194.Fa pg 195to a new object 196.Fa newobj , 197at a new offset 198.Fa newoff . 199.Pp 200.Fn uvm_pagefree 201frees the physical page 202.Fa pg . 203If the content of the page is known to be zero-filled, 204caller should set 205.Dv PG_ZERO 206in pg-\*[Gt]flags so that the page allocator will use 207the page to serve future 208.Dv UVM_PGA_ZERO 209requests efficiently. 210.Pp 211.Fn uvm_pglistalloc 212allocates a list of pages for size 213.Fa size 214byte under various constraints. 215.Fa low 216and 217.Fa high 218describe the lowest and highest addresses acceptable for the list. 219If 220.Fa alignment 221is non-zero, it describes the required alignment of the list, in 222power-of-two notation. 223If 224.Fa boundary 225is non-zero, no segment of the list may cross this power-of-two 226boundary, relative to zero. 227.Fa nsegs 228is the maximum number of physically contiguous segments. 229If 230.Fa waitok 231is non-zero, the function may sleep until enough memory is available. 232(It also may give up in some situations, so a non-zero 233.Fa waitok 234does not imply that 235.Fn uvm_pglistalloc 236cannot return an error.) 237The allocated memory is returned in the 238.Fa rlist 239list; the caller has to provide storage only, the list is initialized by 240.Fn uvm_pglistalloc . 241.Pp 242.Fn uvm_pglistfree 243frees the list of pages pointed to by 244.Fa list . 245If the content of the page is known to be zero-filled, 246caller should set 247.Dv PG_ZERO 248in pg-\*[Gt]flags so that the page allocator will use 249the page to serve future 250.Dv UVM_PGA_ZERO 251requests efficiently. 252.Pp 253.Fn uvm_page_physload 254loads physical memory segments into VM space on the specified 255.Fa free_list . 256It must be called at system boot time to set up physical memory 257management pages. 258The arguments describe the 259.Fa start 260and 261.Fa end 262of the physical addresses of the segment, and the available start and end 263addresses of pages not already in use. 264If a system has memory banks of 265different speeds the slower memory should be given a higher 266.Fa free_list 267value. 268.\" XXX expand on "system boot time"! 269.Sh PROCESSES 270.Bl -ohang 271.It Ft void 272.Fn uvm_pageout "void" ; 273.It Ft void 274.Fn uvm_scheduler "void" ; 275.El 276.Pp 277.Fn uvm_pageout 278is the main loop for the page daemon. 279.Pp 280.Fn uvm_scheduler 281is the process zero main loop, which is to be called after the 282system has finished starting other processes. 283It handles the swapping in of runnable, swapped out processes in priority 284order. 285.Sh PAGE LOAN 286.Bl -ohang 287.It Ft int 288.Fn uvm_loan "struct vm_map *map" "vaddr_t start" "vsize_t len" "void *v" "int flags" ; 289.It Ft void 290.Fn uvm_unloan "void *v" "int npages" "int flags" ; 291.El 292.Pp 293.Fn uvm_loan 294loans pages in a map out to anons or to the kernel. 295.Fa map 296should be unlocked, 297.Fa start 298and 299.Fa len 300should be multiples of 301.Dv PAGE_SIZE . 302Argument 303.Fa flags 304should be one of 305.Bd -literal 306#define UVM_LOAN_TOANON 0x01 /* loan to anons */ 307#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */ 308.Ed 309.Pp 310.Fa v 311should be pointer to array of pointers to 312.Li struct anon 313or 314.Li struct vm_page , 315as appropriate. 316The caller has to allocate memory for the array and 317ensure it's big enough to hold 318.Fa len / PAGE_SIZE 319pointers. 320Returns 0 for success, or appropriate error number otherwise. 321Note that wired pages can't be loaned out and 322.Fn uvm_loan 323will fail in that case. 324.Pp 325.Fn uvm_unloan 326kills loans on pages or anons. 327The 328.Fa v 329must point to the array of pointers initialized by previous call to 330.Fn uvm_loan . 331.Fa npages 332should match number of pages allocated for loan, this also matches 333number of items in the array. 334Argument 335.Fa flags 336should be one of 337.Bd -literal 338#define UVM_LOAN_TOANON 0x01 /* loan to anons */ 339#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */ 340.Ed 341.Pp 342and should match what was used for previous call to 343.Fn uvm_loan . 344.Sh MISCELLANEOUS FUNCTIONS 345.Bl -ohang 346.It Ft struct uvm_object * 347.Fn uao_create "vsize_t size" "int flags" ; 348.It Ft void 349.Fn uao_detach "struct uvm_object *uobj" ; 350.It Ft void 351.Fn uao_reference "struct uvm_object *uobj" ; 352.It Ft bool 353.Fn uvm_chgkprot "void *addr" "size_t len" "int rw" ; 354.It Ft void 355.Fn uvm_kernacc "void *addr" "size_t len" "int rw" ; 356.It Ft int 357.Fn uvm_vslock "struct vmspace *vs" "void *addr" "size_t len" "vm_prot_t prot" ; 358.It Ft void 359.Fn uvm_vsunlock "struct vmspace *vs" "void *addr" "size_t len" ; 360.It Ft void 361.Fn uvm_meter "void" ; 362.It Ft void 363.Fn uvm_proc_fork "struct proc *p1" "struct proc *p2" "bool shared" ; 364.It Ft int 365.Fn uvm_grow "struct proc *p" "vaddr_t sp" ; 366.It Ft void 367.Fn uvn_findpages "struct uvm_object *uobj" "voff_t offset" "int *npagesp" "struct vm_page **pps" "int flags" ; 368.It Ft void 369.Fn uvm_vnp_setsize "struct vnode *vp" "voff_t newsize" ; 370.El 371.Pp 372The 373.Fn uao_create , 374.Fn uao_detach , 375and 376.Fn uao_reference 377functions operate on anonymous memory objects, such as those used to support 378System V shared memory. 379.Fn uao_create 380returns an object of size 381.Fa size 382with flags: 383.Bd -literal 384#define UAO_FLAG_KERNOBJ 0x1 /* create kernel object */ 385#define UAO_FLAG_KERNSWAP 0x2 /* enable kernel swap */ 386.Ed 387.Pp 388which can only be used once each at system boot time. 389.Fn uao_reference 390creates an additional reference to the named anonymous memory object. 391.Fn uao_detach 392removes a reference from the named anonymous memory object, destroying 393it if removing the last reference. 394.Pp 395.Fn uvm_chgkprot 396changes the protection of kernel memory from 397.Fa addr 398to 399.Fa addr + len 400to the value of 401.Fa rw . 402This is primarily useful for debuggers, for setting breakpoints. 403This function is only available with options 404.Dv KGDB . 405.Pp 406.Fn uvm_kernacc 407checks the access at address 408.Fa addr 409to 410.Fa addr + len 411for 412.Fa rw 413access in the kernel address space. 414.Pp 415.Fn uvm_vslock 416and 417.Fn uvm_vsunlock 418control the wiring and unwiring of pages for process 419.Fa p 420from 421.Fa addr 422to 423.Fa addr + len . 424These functions are normally used to wire memory for I/O. 425.Pp 426.Fn uvm_meter 427calculates the load average. 428.Pp 429.Fn uvm_proc_fork 430forks a virtual address space for process' (old) 431.Fa p1 432and (new) 433.Fa p2 . 434If the 435.Fa shared 436argument is non zero, p1 shares its address space with p2, 437otherwise a new address space is created. 438This function currently has no return value, and thus cannot fail. 439In the future, this function will be changed to allow it to 440fail in low memory conditions. 441.Pp 442.Fn uvm_grow 443increases the stack segment of process 444.Fa p 445to include 446.Fa sp . 447.Pp 448.Fn uvn_findpages 449looks up or creates pages in 450.Fa uobj 451at offset 452.Fa offset , 453marks them busy and returns them in the 454.Fa pps 455array. 456Currently 457.Fa uobj 458must be a vnode object. 459The number of pages requested is pointed to by 460.Fa npagesp , 461and this value is updated with the actual number of pages returned. 462The flags can be 463.Bd -literal 464#define UFP_ALL 0x00 /* return all pages requested */ 465#define UFP_NOWAIT 0x01 /* don't sleep */ 466#define UFP_NOALLOC 0x02 /* don't allocate new pages */ 467#define UFP_NOCACHE 0x04 /* don't return pages which already exist */ 468#define UFP_NORDONLY 0x08 /* don't return PG_READONLY pages */ 469.Ed 470.Pp 471.Dv UFP_ALL 472is a pseudo-flag meaning all requested pages should be returned. 473.Dv UFP_NOWAIT 474means that we must not sleep. 475.Dv UFP_NOALLOC 476causes any pages which do not already exist to be skipped. 477.Dv UFP_NOCACHE 478causes any pages which do already exist to be skipped. 479.Dv UFP_NORDONLY 480causes any pages which are marked PG_READONLY to be skipped. 481.Pp 482.Fn uvm_vnp_setsize 483sets the size of vnode 484.Fa vp 485to 486.Fa newsize . 487Caller must hold a reference to the vnode. 488If the vnode shrinks, pages no longer used are discarded. 489.Sh MISCELLANEOUS MACROS 490.Bl -ohang 491.It Ft paddr_t 492.Fn atop "paddr_t pa" ; 493.It Ft paddr_t 494.Fn ptoa "paddr_t pn" ; 495.It Ft paddr_t 496.Fn round_page "address" ; 497.It Ft paddr_t 498.Fn trunc_page "address" ; 499.El 500.Pp 501The 502.Fn atop 503macro converts a physical address 504.Fa pa 505into a page number. 506The 507.Fn ptoa 508macro does the opposite by converting a page number 509.Fa pn 510into a physical address. 511.Pp 512.Fn round_page 513and 514.Fn trunc_page 515macros return a page address boundary from rounding 516.Fa address 517up and down, respectively, to the nearest page boundary. 518These macros work for either addresses or byte counts. 519.Sh SYSCTL 520UVM provides support for the 521.Dv CTL_VM 522domain of the 523.Xr sysctl 3 524hierarchy. 525It handles the 526.Dv VM_LOADAVG , 527.Dv VM_METER , 528.Dv VM_UVMEXP , 529and 530.Dv VM_UVMEXP2 531nodes, which return the current load averages, calculates current VM 532totals, returns the uvmexp structure, and a kernel version independent 533view of the uvmexp structure, respectively. 534It also exports a number of tunables that control how much VM space is 535allowed to be consumed by various tasks. 536The load averages are typically accessed from userland using the 537.Xr getloadavg 3 538function. 539The uvmexp structure has all global state of the UVM system, 540and has the following members: 541.Bd -literal 542/* vm_page constants */ 543int pagesize; /* size of a page (PAGE_SIZE): must be power of 2 */ 544int pagemask; /* page mask */ 545int pageshift; /* page shift */ 546 547/* vm_page counters */ 548int npages; /* number of pages we manage */ 549int free; /* number of free pages */ 550int paging; /* number of pages in the process of being paged out */ 551int wired; /* number of wired pages */ 552int reserve_pagedaemon; /* number of pages reserved for pagedaemon */ 553int reserve_kernel; /* number of pages reserved for kernel */ 554 555/* pageout params */ 556int freemin; /* min number of free pages */ 557int freetarg; /* target number of free pages */ 558int inactarg; /* target number of inactive pages */ 559int wiredmax; /* max number of wired pages */ 560 561/* swap */ 562int nswapdev; /* number of configured swap devices in system */ 563int swpages; /* number of PAGE_SIZE'ed swap pages */ 564int swpginuse; /* number of swap pages in use */ 565int nswget; /* number of times fault calls uvm_swap_get() */ 566int nanon; /* number total of anon's in system */ 567int nfreeanon; /* number of free anon's */ 568 569/* stat counters */ 570int faults; /* page fault count */ 571int traps; /* trap count */ 572int intrs; /* interrupt count */ 573int swtch; /* context switch count */ 574int softs; /* software interrupt count */ 575int syscalls; /* system calls */ 576int pageins; /* pagein operation count */ 577 /* pageouts are in pdpageouts below */ 578int pgswapin; /* pages swapped in */ 579int pgswapout; /* pages swapped out */ 580int forks; /* forks */ 581int forks_ppwait; /* forks where parent waits */ 582int forks_sharevm; /* forks where vmspace is shared */ 583 584/* fault subcounters */ 585int fltnoram; /* number of times fault was out of ram */ 586int fltnoanon; /* number of times fault was out of anons */ 587int fltpgwait; /* number of times fault had to wait on a page */ 588int fltpgrele; /* number of times fault found a released page */ 589int fltrelck; /* number of times fault relock called */ 590int fltrelckok; /* number of times fault relock is a success */ 591int fltanget; /* number of times fault gets anon page */ 592int fltanretry; /* number of times fault retrys an anon get */ 593int fltamcopy; /* number of times fault clears "needs copy" */ 594int fltnamap; /* number of times fault maps a neighbor anon page */ 595int fltnomap; /* number of times fault maps a neighbor obj page */ 596int fltlget; /* number of times fault does a locked pgo_get */ 597int fltget; /* number of times fault does an unlocked get */ 598int flt_anon; /* number of times fault anon (case 1a) */ 599int flt_acow; /* number of times fault anon cow (case 1b) */ 600int flt_obj; /* number of times fault is on object page (2a) */ 601int flt_prcopy; /* number of times fault promotes with copy (2b) */ 602int flt_przero; /* number of times fault promotes with zerofill (2b) */ 603 604/* daemon counters */ 605int pdwoke; /* number of times daemon woke up */ 606int pdrevs; /* number of times daemon rev'd clock hand */ 607int pdfreed; /* number of pages daemon freed since boot */ 608int pdscans; /* number of pages daemon scanned since boot */ 609int pdanscan; /* number of anonymous pages scanned by daemon */ 610int pdobscan; /* number of object pages scanned by daemon */ 611int pdreact; /* number of pages daemon reactivated since boot */ 612int pdbusy; /* number of times daemon found a busy page */ 613int pdpageouts; /* number of times daemon started a pageout */ 614int pdpending; /* number of times daemon got a pending pageout */ 615int pddeact; /* number of pages daemon deactivates */ 616.Ed 617.Sh NOTES 618.Fn uvm_chgkprot 619is only available if the kernel has been compiled with options 620.Dv KGDB . 621.Pp 622All structure and types whose names begin with 623.Dq vm_ 624will be renamed to 625.Dq uvm_ . 626.Sh SEE ALSO 627.Xr swapctl 2 , 628.Xr getloadavg 3 , 629.Xr kvm 3 , 630.Xr sysctl 3 , 631.Xr ddb 4 , 632.Xr options 4 , 633.Xr memoryallocators 9 , 634.Xr pmap 9 , 635.Xr ubc 9 , 636.Xr uvm_km 9 , 637.Xr uvm_map 9 638.Rs 639.%A Charles D. Cranor 640.%A Gurudatta M. Parulkar 641.%T "The UVM Virtual Memory System" 642.%I USENIX Association 643.%B Proceedings of the USENIX Annual Technical Conference 644.%P 117-130 645.%D June 6-11, 1999 646.%U http://www.usenix.org/event/usenix99/full_papers/cranor/cranor.pdf 647.Re 648.Sh HISTORY 649UVM is a new VM system developed at Washington University in St. Louis 650(Missouri). 651UVM's roots lie partly in the Mach-based 652.Bx 4.4 653VM system, the 654.Fx 655VM system, and the SunOS 4 VM system. 656UVM's basic structure is based on the 657.Bx 4.4 658VM system. 659UVM's new anonymous memory system is based on the 660anonymous memory system found in the SunOS 4 VM (as described in papers 661published by Sun Microsystems, Inc.). 662UVM also includes a number of features new to 663.Bx 664including page loanout, map entry passing, simplified 665copy-on-write, and clustered anonymous memory pageout. 666UVM is also further documented in an August 1998 dissertation by 667Charles D. Cranor. 668.Pp 669UVM appeared in 670.Nx 1.4 . 671.Sh AUTHORS 672Charles D. Cranor 673.Aq chuck@ccrc.wustl.edu 674designed and implemented UVM. 675.Pp 676Matthew Green 677.Aq mrg@eterna.com.au 678wrote the swap-space management code and handled the logistical issues 679involved with merging UVM into the 680.Nx 681source tree. 682.Pp 683Chuck Silvers 684.Aq chuq@chuq.com 685implemented the aobj pager, thus allowing UVM to support System V shared 686memory and process swapping. 687