1.\" $NetBSD: uvm.9,v 1.97 2009/03/12 13:13:16 wiz Exp $ 2.\" 3.\" Copyright (c) 1998 Matthew R. Green 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 16.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 17.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 18.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 19.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 20.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 21.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 22.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 23.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 25.\" SUCH DAMAGE. 26.\" 27.Dd March 12, 2009 28.Dt UVM 9 29.Os 30.Sh NAME 31.Nm uvm 32.Nd virtual memory system external interface 33.Sh SYNOPSIS 34.In sys/param.h 35.In uvm/uvm.h 36.Sh DESCRIPTION 37The UVM virtual memory system manages access to the computer's memory 38resources. 39User processes and the kernel access these resources through 40UVM's external interface. 41UVM's external interface includes functions that: 42.Pp 43.Bl -hyphen -compact 44.It 45initialize UVM sub-systems 46.It 47manage virtual address spaces 48.It 49resolve page faults 50.It 51memory map files and devices 52.It 53perform uio-based I/O to virtual memory 54.It 55allocate and free kernel virtual memory 56.It 57allocate and free physical memory 58.El 59.Pp 60In addition to exporting these services, UVM has two kernel-level processes: 61pagedaemon and swapper. 62The pagedaemon process sleeps until physical memory becomes scarce. 63When that happens, pagedaemon is awoken. 64It scans physical memory, paging out and freeing memory that has not 65been recently used. 66The swapper process swaps in runnable processes that are currently swapped 67out, if there is room. 68.Pp 69There are also several miscellaneous functions. 70.Sh INITIALIZATION 71.Bl -ohang 72.It Ft void 73.Fn uvm_init "void" ; 74.It Ft void 75.Fn uvm_init_limits "struct lwp *l" ; 76.It Ft void 77.Fn uvm_setpagesize "void" ; 78.It Ft void 79.Fn uvm_swap_init "void" ; 80.El 81.Pp 82.Fn uvm_init 83sets up the UVM system at system boot time, after the 84console has been setup. 85It initializes global state, the page, map, kernel virtual memory state, 86machine-dependent physical map, kernel memory allocator, 87pager and anonymous memory sub-systems, and then enables 88paging of kernel objects. 89.Pp 90.Fn uvm_init_limits 91initializes process limits for the named process. 92This is for use by the system startup for process zero, before any 93other processes are created. 94.Pp 95.Fn uvm_setpagesize 96initializes the uvmexp members pagesize (if not already done by 97machine-dependent code), pageshift and pagemask. 98It should be called by machine-dependent code early in the 99.Fn pmap_init 100call (see 101.Xr pmap 9 ) . 102.Pp 103.Fn uvm_swap_init 104initializes the swap sub-system. 105.Sh VIRTUAL ADDRESS SPACE MANAGEMENT 106.Bl -ohang 107.It Ft int 108.Fn uvm_map "struct vm_map *map" "vaddr_t *startp" "vsize_t size" "struct uvm_object *uobj" "voff_t uoffset" "vsize_t align" "uvm_flag_t flags" ; 109.It Ft void 110.Fn uvm_unmap "struct vm_map *map" "vaddr_t start" "vaddr_t end" ; 111.It Ft int 112.Fn uvm_map_pageable "struct vm_map *map" "vaddr_t start" "vaddr_t end" "bool new_pageable" "int lockflags" ; 113.It Ft bool 114.Fn uvm_map_checkprot "struct vm_map *map" "vaddr_t start" "vaddr_t end" "vm_prot_t protection" ; 115.It Ft int 116.Fn uvm_map_protect "struct vm_map *map" "vaddr_t start" "vaddr_t end" "vm_prot_t new_prot" "bool set_max" ; 117.It Ft int 118.Fn uvm_deallocate "struct vm_map *map" "vaddr_t start" "vsize_t size" ; 119.It Ft struct vmspace * 120.Fn uvmspace_alloc "vaddr_t min" "vaddr_t max" "int pageable" ; 121.It Ft void 122.Fn uvmspace_exec "struct lwp *l" "vaddr_t start" "vaddr_t end" ; 123.It Ft struct vmspace * 124.Fn uvmspace_fork "struct vmspace *vm" ; 125.It Ft void 126.Fn uvmspace_free "struct vmspace *vm1" ; 127.It Ft void 128.Fn uvmspace_share "struct proc *p1" "struct proc *p2" ; 129.It Ft void 130.Fn uvmspace_unshare "struct lwp *l" ; 131.It Ft bool 132.Fn uvm_uarea_alloc "vaddr_t *uaddrp" ; 133.It Ft void 134.Fn uvm_uarea_free "vaddr_t uaddr" ; 135.El 136.Pp 137.Fn uvm_map 138establishes a valid mapping in map 139.Fa map , 140which must be unlocked. 141The new mapping has size 142.Fa size , 143which must be a multiple of 144.Dv PAGE_SIZE . 145The 146.Fa uobj 147and 148.Fa uoffset 149arguments can have four meanings. 150When 151.Fa uobj 152is 153.Dv NULL 154and 155.Fa uoffset 156is 157.Dv UVM_UNKNOWN_OFFSET , 158.Fn uvm_map 159does not use the machine-dependent 160.Dv PMAP_PREFER 161function. 162If 163.Fa uoffset 164is any other value, it is used as the hint to 165.Dv PMAP_PREFER . 166When 167.Fa uobj 168is not 169.Dv NULL 170and 171.Fa uoffset 172is 173.Dv UVM_UNKNOWN_OFFSET , 174.Fn uvm_map 175finds the offset based upon the virtual address, passed as 176.Fa startp . 177If 178.Fa uoffset 179is any other value, we are doing a normal mapping at this offset. 180The start address of the map will be returned in 181.Fa startp . 182.Pp 183.Fa align 184specifies alignment of mapping unless 185.Dv UVM_FLAG_FIXED 186is specified in 187.Fa flags . 188.Fa align 189must be a power of 2. 190.Pp 191.Fa flags 192passed to 193.Fn uvm_map 194are typically created using the 195.Fn UVM_MAPFLAG "vm_prot_t prot" "vm_prot_t maxprot" "vm_inherit_t inh" "int advice" "int flags" 196macro, which uses the following values. 197The 198.Fa prot 199and 200.Fa maxprot 201can take are: 202.Bd -literal 203#define UVM_PROT_MASK 0x07 /* protection mask */ 204#define UVM_PROT_NONE 0x00 /* protection none */ 205#define UVM_PROT_ALL 0x07 /* everything */ 206#define UVM_PROT_READ 0x01 /* read */ 207#define UVM_PROT_WRITE 0x02 /* write */ 208#define UVM_PROT_EXEC 0x04 /* exec */ 209#define UVM_PROT_R 0x01 /* read */ 210#define UVM_PROT_W 0x02 /* write */ 211#define UVM_PROT_RW 0x03 /* read-write */ 212#define UVM_PROT_X 0x04 /* exec */ 213#define UVM_PROT_RX 0x05 /* read-exec */ 214#define UVM_PROT_WX 0x06 /* write-exec */ 215#define UVM_PROT_RWX 0x07 /* read-write-exec */ 216.Ed 217.Pp 218The values that 219.Fa inh 220can take are: 221.Bd -literal 222#define UVM_INH_MASK 0x30 /* inherit mask */ 223#define UVM_INH_SHARE 0x00 /* "share" */ 224#define UVM_INH_COPY 0x10 /* "copy" */ 225#define UVM_INH_NONE 0x20 /* "none" */ 226#define UVM_INH_DONATE 0x30 /* "donate" \*[Lt]\*[Lt] not used */ 227.Ed 228.Pp 229The values that 230.Fa advice 231can take are: 232.Bd -literal 233#define UVM_ADV_NORMAL 0x0 /* 'normal' */ 234#define UVM_ADV_RANDOM 0x1 /* 'random' */ 235#define UVM_ADV_SEQUENTIAL 0x2 /* 'sequential' */ 236#define UVM_ADV_MASK 0x7 /* mask */ 237.Ed 238.Pp 239The values that 240.Fa flags 241can take are: 242.Bd -literal 243#define UVM_FLAG_FIXED 0x010000 /* find space */ 244#define UVM_FLAG_OVERLAY 0x020000 /* establish overlay */ 245#define UVM_FLAG_NOMERGE 0x040000 /* don't merge map entries */ 246#define UVM_FLAG_COPYONW 0x080000 /* set copy_on_write flag */ 247#define UVM_FLAG_AMAPPAD 0x100000 /* for bss: pad amap to reduce malloc() */ 248#define UVM_FLAG_TRYLOCK 0x200000 /* fail if we can not lock map */ 249.Ed 250.Pp 251The 252.Dv UVM_MAPFLAG 253macro arguments can be combined with an or operator. 254There are several special purpose macros for checking protection 255combinations, e.g., the 256.Dv UVM_PROT_WX 257macro. 258There are also some additional macros to extract bits from the flags. 259The 260.Dv UVM_PROTECTION , 261.Dv UVM_INHERIT , 262.Dv UVM_MAXPROTECTION 263and 264.Dv UVM_ADVICE 265macros return the protection, inheritance, maximum protection and advice, 266respectively. 267.Fn uvm_map 268returns a standard UVM return value. 269.Pp 270.Fn uvm_unmap 271removes a valid mapping, 272from 273.Fa start 274to 275.Fa end , 276in map 277.Fa map , 278which must be unlocked. 279.Pp 280.Fn uvm_map_pageable 281changes the pageability of the pages in the range from 282.Fa start 283to 284.Fa end 285in map 286.Fa map 287to 288.Fa new_pageable . 289.Fn uvm_map_pageable 290returns a standard UVM return value. 291.Pp 292.Fn uvm_map_checkprot 293checks the protection of the range from 294.Fa start 295to 296.Fa end 297in map 298.Fa map 299against 300.Fa protection . 301This returns either 302.Dv true 303or 304.Dv false . 305.Pp 306.Fn uvm_map_protect 307changes the protection 308.Fa start 309to 310.Fa end 311in map 312.Fa map 313to 314.Fa new_prot , 315also setting the maximum protection to the region to 316.Fa new_prot 317if 318.Fa set_max 319is true. 320This function returns a standard UVM return value. 321.Pp 322.Fn uvm_deallocate 323deallocates kernel memory in map 324.Fa map 325from address 326.Fa start 327to 328.Fa start + size . 329.Pp 330.Fn uvmspace_alloc 331allocates and returns a new address space, with ranges from 332.Fa min 333to 334.Fa max , 335setting the pageability of the address space to 336.Fa pageable . 337.Pp 338.Fn uvmspace_exec 339either reuses the address space of lwp 340.Fa l 341if there are no other references to it, or creates 342a new one with 343.Fn uvmspace_alloc . 344The range of valid addresses in the address space is reset to 345.Fa start 346through 347.Fa end . 348.Pp 349.Fn uvmspace_fork 350creates and returns a new address space based upon the 351.Fa vm1 352address space, typically used when allocating an address space for a 353child process. 354.Pp 355.Fn uvmspace_free 356lowers the reference count on the address space 357.Fa vm , 358freeing the data structures if there are no other references. 359.Pp 360.Fn uvmspace_share 361causes process 362.Pa p2 363to share the address space of 364.Fa p1 . 365.Pp 366.Fn uvmspace_unshare 367ensures that lwp 368.Fa l 369has its own, unshared address space, by creating a new one if 370necessary by calling 371.Fn uvmspace_fork . 372.Pp 373.Fn uvm_uarea_alloc 374allocates virtual space for a u-area (i.e., a kernel stack) and stores 375its virtual address in 376.Fa *uaddrp . 377The return value is 378.Dv true 379if the u-area is already backed by wired physical memory, otherwise 380.Dv false . 381.Pp 382.Fn uvm_uarea_free 383frees a u-area allocated with 384.Fn uvm_uarea_alloc , 385freeing both the virtual space and any physical pages which may have been 386allocated to back that virtual space later. 387.Sh PAGE FAULT HANDLING 388.Bl -ohang 389.It Ft int 390.Fn uvm_fault "struct vm_map *orig_map" "vaddr_t vaddr" "vm_prot_t access_type" ; 391.El 392.Pp 393.Fn uvm_fault 394is the main entry point for faults. 395It takes 396.Fa orig_map 397as the map the fault originated in, a 398.Fa vaddr 399offset into the map the fault occurred, and 400.Fa access_type 401describing the type of access requested. 402.Fn uvm_fault 403returns a standard UVM return value. 404.Sh MEMORY MAPPING FILES AND DEVICES 405.Bl -ohang 406.It Ft void 407.Fn uvm_vnp_setsize "struct vnode *vp" "voff_t newsize" ; 408.It Ft void * 409.Fn ubc_alloc "struct uvm_object *uobj" "voff_t offset" "vsize_t *lenp" \ 410"int advice" "int flags" ; 411.It Ft void 412.Fn ubc_release "void *va" "int flags" ; 413.It Ft int 414.Fn ubc_uiomove "struct uvm_object *uobj" "struct uio *uio" "vsize_t todo" \ 415"int advice" "int flags" ; 416.El 417.Pp 418.Fn uvm_vnp_setsize 419sets the size of vnode 420.Fa vp 421to 422.Fa newsize . 423Caller must hold a reference to the vnode. 424If the vnode shrinks, pages no longer used are discarded. 425.Pp 426.Fn ubc_alloc 427creates a kernel mapping of 428.Fa uobj 429starting at offset 430.Fa offset . 431The desired length of the mapping is pointed to by 432.Fa lenp , 433but the actual mapping may be smaller than this. 434.Fa lenp 435is updated to contain the actual length mapped. 436.Fa advice 437is the access pattern hint, which must be one of 438.Pp 439.Bl -tag -offset indent -width "UVM_ADV_SEQUENTIAL" -compact 440.It UVM_ADV_NORMAL 441No hint 442.It UVM_ADV_RANDOM 443Random access hint 444.It UVM_ADV_SEQUENTIAL 445Sequential access hint (from lower offset to higher offset) 446.El 447.Pp 448The possible 449.Fa flags 450are 451.Pp 452.Bl -tag -offset indent -width "UVM_ADV_SEQUENTIAL" -compact 453.It UBC_READ 454Mapping will be accessed for read. 455.It UBC_WRITE 456Mapping will be accessed for write. 457.It UBC_FAULTBUSY 458Fault in window's pages already during mapping operation. 459Makes sense only for write. 460.El 461.Pp 462Once the mapping is created, it must be accessed only by methods that can 463handle faults, such as 464.Fn uiomove 465or 466.Fn kcopy . 467Page faults on the mapping will result in the object's pager 468method being called to resolve the fault. 469.Pp 470.Fn ubc_release 471frees the mapping at 472.Fa va 473for reuse. 474The mapping may be cached to speed future accesses to the same region 475of the object. 476The flags can be any of 477.Pp 478.Bl -tag -offset indent -width "UVM_ADV_SEQUENTIAL" -compact 479.It UBC_UNMAP 480Do not cache mapping. 481.El 482.Pp 483.Fn ubc_uiomove 484allocates an UBC memory window, performs I/O on it and unmaps the window. 485The 486.Fa advice 487parameter takes the same values as the respective parameter in 488.Fn ubc_alloc 489and the 490.Fa flags 491parameter takes the same arguments as 492.Fn ubc_alloc 493and 494.Fn ubc_unmap . 495Additionally, the flag 496.Dv UBC_PARTIALOK 497can be provided to indicate that it is acceptable to return if an error 498occurs mid-transfer. 499.Sh VIRTUAL MEMORY I/O 500.Bl -ohang 501.It Ft int 502.Fn uvm_io "struct vm_map *map" "struct uio *uio" ; 503.El 504.Pp 505.Fn uvm_io 506performs the I/O described in 507.Fa uio 508on the memory described in 509.Fa map . 510.Sh ALLOCATION OF KERNEL MEMORY 511.Bl -ohang 512.It Ft vaddr_t 513.Fn uvm_km_alloc "struct vm_map *map" "vsize_t size" "vsize_t align" "uvm_flag_t flags" ; 514.It Ft void 515.Fn uvm_km_free "struct vm_map *map" "vaddr_t addr" "vsize_t size" "uvm_flag_t flags" ; 516.It Ft struct vm_map * 517.Fn uvm_km_suballoc "struct vm_map *map" "vaddr_t *min" "vaddr_t *max" \ 518"vsize_t size" "int flags" "bool fixed" "struct vm_map *submap" ; 519.El 520.Pp 521.Fn uvm_km_alloc 522allocates 523.Fa size 524bytes of kernel memory in map 525.Fa map . 526The first address of the allocated memory range will be aligned according to the 527.Fa align 528argument 529.Pq specify 0 if no alignment is necessary . 530The alignment must be a multiple of page size. 531The 532.Fa flags 533is a bitwise inclusive OR of the allocation type and operation flags. 534.Pp 535The allocation type should be one of: 536.Bl -tag -width UVM_KMF_PAGEABLE 537.It UVM_KMF_WIRED 538Wired memory. 539.It UVM_KMF_PAGEABLE 540Demand-paged zero-filled memory. 541.It UVM_KMF_VAONLY 542Virtual address only. 543No physical pages are mapped in the allocated region. 544If necessary, it's the caller's responsibility to enter page mappings. 545It's also the caller's responsibility to clean up the mappings before freeing 546the address range. 547.El 548.Pp 549The following operation flags are available: 550.Bl -tag -width UVM_KMF_PAGEABLE 551.It UVM_KMF_CANFAIL 552Can fail even if 553.Dv UVM_KMF_NOWAIT 554is not specified and 555.Dv UVM_KMF_WAITVA 556is specified. 557.It UVM_KMF_ZERO 558Request zero-filled memory. 559Only supported for 560.Dv UVM_KMF_WIRED . 561Shouldn't be used with other types. 562.It UVM_KMF_TRYLOCK 563Fail if we can't lock the map. 564.It UVM_KMF_NOWAIT 565Fail immediately if no memory is available. 566.It UVM_KMF_WAITVA 567Sleep to wait for the virtual address resources if needed. 568.El 569.Pp 570(If neither 571.Dv UVM_KMF_NOWAIT 572nor 573.Dv UVM_KMF_CANFAIL 574are specified and 575.Dv UVM_KMF_WAITVA 576is specified, 577.Fn uvm_km_alloc 578will never fail, but rather sleep indefinitely until the allocation succeeds.) 579.Pp 580Pageability of the pages allocated with 581.Dv UVM_KMF_PAGEABLE 582can be changed by 583.Fn uvm_map_pageable . 584In that case, the entire range must be changed atomically. 585Changing a part of the range is not supported. 586.Pp 587.Fn uvm_km_free 588frees the memory range allocated by 589.Fn uvm_km_alloc . 590.Fa addr 591must be an address returned by 592.Fn uvm_km_alloc . 593.Fa map 594and 595.Fa size 596must be the same as the ones used for the corresponding 597.Fn uvm_km_alloc . 598.Fa flags 599must be the allocation type used for the corresponding 600.Fn uvm_km_alloc . 601.Pp 602.Fn uvm_km_free 603is the only way to free memory ranges allocated by 604.Fn uvm_km_alloc . 605.Fn uvm_unmap 606must not be used. 607.Pp 608.Fn uvm_km_suballoc 609allocates submap from 610.Fa map , 611creating a new map if 612.Fa submap 613is 614.Dv NULL . 615The addresses of the submap can be specified exactly by setting the 616.Fa fixed 617argument to true, which causes the 618.Fa min 619argument to specify the beginning of the address in the submap. 620If 621.Fa fixed 622is false, any address of size 623.Fa size 624will be allocated from 625.Fa map 626and the start and end addresses returned in 627.Fa min 628and 629.Fa max . 630The 631.Fa flags 632are used to initialize the created submap. 633The following flags could be set: 634.Bl -tag -width VM_MAP_PAGEABLE 635.It VM_MAP_PAGEABLE 636Entries in the map may be paged out. 637.It VM_MAP_INTRSAFE 638Map should be interrupt-safe. 639.It VM_MAP_TOPDOWN 640A top-down mapping should be arranged. 641.El 642.Sh ALLOCATION OF PHYSICAL MEMORY 643.Bl -ohang 644.It Ft struct vm_page * 645.Fn uvm_pagealloc "struct uvm_object *uobj" "voff_t off" "struct vm_anon *anon" "int flags" ; 646.It Ft void 647.Fn uvm_pagerealloc "struct vm_page *pg" "struct uvm_object *newobj" "voff_t newoff" ; 648.It Ft void 649.Fn uvm_pagefree "struct vm_page *pg" ; 650.It Ft int 651.Fn uvm_pglistalloc "psize_t size" "paddr_t low" "paddr_t high" "paddr_t alignment" "paddr_t boundary" "struct pglist *rlist" "int nsegs" "int waitok" ; 652.It Ft void 653.Fn uvm_pglistfree "struct pglist *list" ; 654.It Ft void 655.Fn uvm_page_physload "vaddr_t start" "vaddr_t end" "vaddr_t avail_start" "vaddr_t avail_end" "int free_list" ; 656.El 657.Pp 658.Fn uvm_pagealloc 659allocates a page of memory at virtual address 660.Fa off 661in either the object 662.Fa uobj 663or the anonymous memory 664.Fa anon , 665which must be locked by the caller. 666Only one of 667.Fa uobj 668and 669.Fa anon 670can be non 671.Dv NULL . 672Returns 673.Dv NULL 674when no page can be found. 675The flags can be any of 676.Bd -literal 677#define UVM_PGA_USERESERVE 0x0001 /* ok to use reserve pages */ 678#define UVM_PGA_ZERO 0x0002 /* returned page must be zero'd */ 679.Ed 680.Pp 681.Dv UVM_PGA_USERESERVE 682means to allocate a page even if that will result in the number of free pages 683being lower than 684.Dv uvmexp.reserve_pagedaemon 685(if the current thread is the pagedaemon) or 686.Dv uvmexp.reserve_kernel 687(if the current thread is not the pagedaemon). 688.Dv UVM_PGA_ZERO 689causes the returned page to be filled with zeroes, either by allocating it 690from a pool of pre-zeroed pages or by zeroing it in-line as necessary. 691.Pp 692.Fn uvm_pagerealloc 693reallocates page 694.Fa pg 695to a new object 696.Fa newobj , 697at a new offset 698.Fa newoff . 699.Pp 700.Fn uvm_pagefree 701frees the physical page 702.Fa pg . 703If the content of the page is known to be zero-filled, 704caller should set 705.Dv PG_ZERO 706in pg-\*[Gt]flags so that the page allocator will use 707the page to serve future 708.Dv UVM_PGA_ZERO 709requests efficiently. 710.Pp 711.Fn uvm_pglistalloc 712allocates a list of pages for size 713.Fa size 714byte under various constraints. 715.Fa low 716and 717.Fa high 718describe the lowest and highest addresses acceptable for the list. 719If 720.Fa alignment 721is non-zero, it describes the required alignment of the list, in 722power-of-two notation. 723If 724.Fa boundary 725is non-zero, no segment of the list may cross this power-of-two 726boundary, relative to zero. 727.Fa nsegs 728is the maximum number of physically contiguous segments. 729If 730.Fa waitok 731is non-zero, the function may sleep until enough memory is available. 732(It also may give up in some situations, so a non-zero 733.Fa waitok 734does not imply that 735.Fn uvm_pglistalloc 736cannot return an error.) 737The allocated memory is returned in the 738.Fa rlist 739list; the caller has to provide storage only, the list is initialized by 740.Fn uvm_pglistalloc . 741.Pp 742.Fn uvm_pglistfree 743frees the list of pages pointed to by 744.Fa list . 745If the content of the page is known to be zero-filled, 746caller should set 747.Dv PG_ZERO 748in pg-\*[Gt]flags so that the page allocator will use 749the page to serve future 750.Dv UVM_PGA_ZERO 751requests efficiently. 752.Pp 753.Fn uvm_page_physload 754loads physical memory segments into VM space on the specified 755.Fa free_list . 756It must be called at system boot time to set up physical memory 757management pages. 758The arguments describe the 759.Fa start 760and 761.Fa end 762of the physical addresses of the segment, and the available start and end 763addresses of pages not already in use. 764If a system has memory banks of 765different speeds the slower memory should be given a higher 766.Fa free_list 767value. 768.\" XXX expand on "system boot time"! 769.Sh PROCESSES 770.Bl -ohang 771.It Ft void 772.Fn uvm_pageout "void" ; 773.It Ft void 774.Fn uvm_scheduler "void" ; 775.It Ft void 776.Fn uvm_swapin "struct lwp *l" ; 777.El 778.Pp 779.Fn uvm_pageout 780is the main loop for the page daemon. 781.Pp 782.Fn uvm_scheduler 783is the process zero main loop, which is to be called after the 784system has finished starting other processes. 785It handles the swapping in of runnable, swapped out processes in priority 786order. 787.Pp 788.Fn uvm_swapin 789swaps in the named lwp. 790.Sh PAGE LOAN 791.Bl -ohang 792.It Ft int 793.Fn uvm_loan "struct vm_map *map" "vaddr_t start" "vsize_t len" "void *v" "int flags" ; 794.It Ft void 795.Fn uvm_unloan "void *v" "int npages" "int flags" ; 796.El 797.Pp 798.Fn uvm_loan 799loans pages in a map out to anons or to the kernel. 800.Fa map 801should be unlocked, 802.Fa start 803and 804.Fa len 805should be multiples of 806.Dv PAGE_SIZE . 807Argument 808.Fa flags 809should be one of 810.Bd -literal 811#define UVM_LOAN_TOANON 0x01 /* loan to anons */ 812#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */ 813.Ed 814.Pp 815.Fa v 816should be pointer to array of pointers to 817.Li struct anon 818or 819.Li struct vm_page , 820as appropriate. 821The caller has to allocate memory for the array and 822ensure it's big enough to hold 823.Fa len / PAGE_SIZE 824pointers. 825Returns 0 for success, or appropriate error number otherwise. 826Note that wired pages can't be loaned out and 827.Fn uvm_loan 828will fail in that case. 829.Pp 830.Fn uvm_unloan 831kills loans on pages or anons. 832The 833.Fa v 834must point to the array of pointers initialized by previous call to 835.Fn uvm_loan . 836.Fa npages 837should match number of pages allocated for loan, this also matches 838number of items in the array. 839Argument 840.Fa flags 841should be one of 842.Bd -literal 843#define UVM_LOAN_TOANON 0x01 /* loan to anons */ 844#define UVM_LOAN_TOPAGE 0x02 /* loan to kernel */ 845.Ed 846.Pp 847and should match what was used for previous call to 848.Fn uvm_loan . 849.Sh MISCELLANEOUS FUNCTIONS 850.Bl -ohang 851.It Ft struct uvm_object * 852.Fn uao_create "vsize_t size" "int flags" ; 853.It Ft void 854.Fn uao_detach "struct uvm_object *uobj" ; 855.It Ft void 856.Fn uao_reference "struct uvm_object *uobj" ; 857.It Ft bool 858.Fn uvm_chgkprot "void *addr" "size_t len" "int rw" ; 859.It Ft void 860.Fn uvm_kernacc "void *addr" "size_t len" "int rw" ; 861.It Ft int 862.Fn uvm_vslock "struct vmspace *vs" "void *addr" "size_t len" "vm_prot_t prot" ; 863.It Ft void 864.Fn uvm_vsunlock "struct vmspace *vs" "void *addr" "size_t len" ; 865.It Ft void 866.Fn uvm_meter "void" ; 867.It Ft void 868.Fn uvm_fork "struct lwp *l1" "struct lwp *l2" "bool shared" ; 869.It Ft int 870.Fn uvm_grow "struct proc *p" "vaddr_t sp" ; 871.It Ft void 872.Fn uvn_findpages "struct uvm_object *uobj" "voff_t offset" "int *npagesp" "struct vm_page **pps" "int flags" ; 873.It Ft void 874.Fn uvm_swap_stats "int cmd" "struct swapent *sep" "int sec" "register_t *retval" ; 875.El 876.Pp 877The 878.Fn uao_create , 879.Fn uao_detach , 880and 881.Fn uao_reference 882functions operate on anonymous memory objects, such as those used to support 883System V shared memory. 884.Fn uao_create 885returns an object of size 886.Fa size 887with flags: 888.Bd -literal 889#define UAO_FLAG_KERNOBJ 0x1 /* create kernel object */ 890#define UAO_FLAG_KERNSWAP 0x2 /* enable kernel swap */ 891.Ed 892.Pp 893which can only be used once each at system boot time. 894.Fn uao_reference 895creates an additional reference to the named anonymous memory object. 896.Fn uao_detach 897removes a reference from the named anonymous memory object, destroying 898it if removing the last reference. 899.Pp 900.Fn uvm_chgkprot 901changes the protection of kernel memory from 902.Fa addr 903to 904.Fa addr + len 905to the value of 906.Fa rw . 907This is primarily useful for debuggers, for setting breakpoints. 908This function is only available with options 909.Dv KGDB . 910.Pp 911.Fn uvm_kernacc 912checks the access at address 913.Fa addr 914to 915.Fa addr + len 916for 917.Fa rw 918access in the kernel address space. 919.Pp 920.Fn uvm_vslock 921and 922.Fn uvm_vsunlock 923control the wiring and unwiring of pages for process 924.Fa p 925from 926.Fa addr 927to 928.Fa addr + len . 929These functions are normally used to wire memory for I/O. 930.Pp 931.Fn uvm_meter 932calculates the load average and wakes up the swapper if necessary. 933.Pp 934.Fn uvm_fork 935forks a virtual address space for process' (old) 936.Fa p1 937and (new) 938.Fa p2 . 939If the 940.Fa shared 941argument is non zero, p1 shares its address space with p2, 942otherwise a new address space is created. 943This function currently has no return value, and thus cannot fail. 944In the future, this function will be changed to allow it to 945fail in low memory conditions. 946.Pp 947.Fn uvm_grow 948increases the stack segment of process 949.Fa p 950to include 951.Fa sp . 952.Pp 953.Fn uvn_findpages 954looks up or creates pages in 955.Fa uobj 956at offset 957.Fa offset , 958marks them busy and returns them in the 959.Fa pps 960array. 961Currently 962.Fa uobj 963must be a vnode object. 964The number of pages requested is pointed to by 965.Fa npagesp , 966and this value is updated with the actual number of pages returned. 967The flags can be 968.Bd -literal 969#define UFP_ALL 0x00 /* return all pages requested */ 970#define UFP_NOWAIT 0x01 /* don't sleep */ 971#define UFP_NOALLOC 0x02 /* don't allocate new pages */ 972#define UFP_NOCACHE 0x04 /* don't return pages which already exist */ 973#define UFP_NORDONLY 0x08 /* don't return PG_READONLY pages */ 974.Ed 975.Pp 976.Dv UFP_ALL 977is a pseudo-flag meaning all requested pages should be returned. 978.Dv UFP_NOWAIT 979means that we must not sleep. 980.Dv UFP_NOALLOC 981causes any pages which do not already exist to be skipped. 982.Dv UFP_NOCACHE 983causes any pages which do already exist to be skipped. 984.Dv UFP_NORDONLY 985causes any pages which are marked PG_READONLY to be skipped. 986.Pp 987.Fn uvm_swap_stats 988implements the 989.Dv SWAP_STATS 990and 991.Dv SWAP_OSTATS 992operation of the 993.Xr swapctl 2 994system call. 995.Fa cmd 996is the requested command, 997.Dv SWAP_STATS 998or 999.Dv SWAP_OSTATS . 1000The function will copy no more than 1001.Fa sec 1002entries in the array pointed by 1003.Fa sep . 1004On return, 1005.Fa retval 1006holds the actual number of entries copied in the array. 1007.Sh SYSCTL 1008UVM provides support for the 1009.Dv CTL_VM 1010domain of the 1011.Xr sysctl 3 1012hierarchy. 1013It handles the 1014.Dv VM_LOADAVG , 1015.Dv VM_METER , 1016.Dv VM_UVMEXP , 1017and 1018.Dv VM_UVMEXP2 1019nodes, which return the current load averages, calculates current VM 1020totals, returns the uvmexp structure, and a kernel version independent 1021view of the uvmexp structure, respectively. 1022It also exports a number of tunables that control how much VM space is 1023allowed to be consumed by various tasks. 1024The load averages are typically accessed from userland using the 1025.Xr getloadavg 3 1026function. 1027The uvmexp structure has all global state of the UVM system, 1028and has the following members: 1029.Bd -literal 1030/* vm_page constants */ 1031int pagesize; /* size of a page (PAGE_SIZE): must be power of 2 */ 1032int pagemask; /* page mask */ 1033int pageshift; /* page shift */ 1034 1035/* vm_page counters */ 1036int npages; /* number of pages we manage */ 1037int free; /* number of free pages */ 1038int active; /* number of active pages */ 1039int inactive; /* number of pages that we free'd but may want back */ 1040int paging; /* number of pages in the process of being paged out */ 1041int wired; /* number of wired pages */ 1042int reserve_pagedaemon; /* number of pages reserved for pagedaemon */ 1043int reserve_kernel; /* number of pages reserved for kernel */ 1044 1045/* pageout params */ 1046int freemin; /* min number of free pages */ 1047int freetarg; /* target number of free pages */ 1048int inactarg; /* target number of inactive pages */ 1049int wiredmax; /* max number of wired pages */ 1050 1051/* swap */ 1052int nswapdev; /* number of configured swap devices in system */ 1053int swpages; /* number of PAGE_SIZE'ed swap pages */ 1054int swpginuse; /* number of swap pages in use */ 1055int nswget; /* number of times fault calls uvm_swap_get() */ 1056int nanon; /* number total of anon's in system */ 1057int nfreeanon; /* number of free anon's */ 1058 1059/* stat counters */ 1060int faults; /* page fault count */ 1061int traps; /* trap count */ 1062int intrs; /* interrupt count */ 1063int swtch; /* context switch count */ 1064int softs; /* software interrupt count */ 1065int syscalls; /* system calls */ 1066int pageins; /* pagein operation count */ 1067 /* pageouts are in pdpageouts below */ 1068int swapins; /* swapins */ 1069int swapouts; /* swapouts */ 1070int pgswapin; /* pages swapped in */ 1071int pgswapout; /* pages swapped out */ 1072int forks; /* forks */ 1073int forks_ppwait; /* forks where parent waits */ 1074int forks_sharevm; /* forks where vmspace is shared */ 1075 1076/* fault subcounters */ 1077int fltnoram; /* number of times fault was out of ram */ 1078int fltnoanon; /* number of times fault was out of anons */ 1079int fltpgwait; /* number of times fault had to wait on a page */ 1080int fltpgrele; /* number of times fault found a released page */ 1081int fltrelck; /* number of times fault relock called */ 1082int fltrelckok; /* number of times fault relock is a success */ 1083int fltanget; /* number of times fault gets anon page */ 1084int fltanretry; /* number of times fault retrys an anon get */ 1085int fltamcopy; /* number of times fault clears "needs copy" */ 1086int fltnamap; /* number of times fault maps a neighbor anon page */ 1087int fltnomap; /* number of times fault maps a neighbor obj page */ 1088int fltlget; /* number of times fault does a locked pgo_get */ 1089int fltget; /* number of times fault does an unlocked get */ 1090int flt_anon; /* number of times fault anon (case 1a) */ 1091int flt_acow; /* number of times fault anon cow (case 1b) */ 1092int flt_obj; /* number of times fault is on object page (2a) */ 1093int flt_prcopy; /* number of times fault promotes with copy (2b) */ 1094int flt_przero; /* number of times fault promotes with zerofill (2b) */ 1095 1096/* daemon counters */ 1097int pdwoke; /* number of times daemon woke up */ 1098int pdrevs; /* number of times daemon rev'd clock hand */ 1099int pdswout; /* number of times daemon called for swapout */ 1100int pdfreed; /* number of pages daemon freed since boot */ 1101int pdscans; /* number of pages daemon scanned since boot */ 1102int pdanscan; /* number of anonymous pages scanned by daemon */ 1103int pdobscan; /* number of object pages scanned by daemon */ 1104int pdreact; /* number of pages daemon reactivated since boot */ 1105int pdbusy; /* number of times daemon found a busy page */ 1106int pdpageouts; /* number of times daemon started a pageout */ 1107int pdpending; /* number of times daemon got a pending pageout */ 1108int pddeact; /* number of pages daemon deactivates */ 1109.Ed 1110.Sh NOTES 1111.Fn uvm_chgkprot 1112is only available if the kernel has been compiled with options 1113.Dv KGDB . 1114.Pp 1115All structure and types whose names begin with 1116.Dq vm_ 1117will be renamed to 1118.Dq uvm_ . 1119.Sh SEE ALSO 1120.Xr swapctl 2 , 1121.Xr getloadavg 3 , 1122.Xr kvm 3 , 1123.Xr sysctl 3 , 1124.Xr ddb 4 , 1125.Xr options 4 , 1126.Xr memoryallocators 9 , 1127.Xr pmap 9 1128.Sh HISTORY 1129UVM is a new VM system developed at Washington University in St. Louis 1130(Missouri). 1131UVM's roots lie partly in the Mach-based 1132.Bx 4.4 1133VM system, the 1134.Fx 1135VM system, and the SunOS 4 VM system. 1136UVM's basic structure is based on the 1137.Bx 4.4 1138VM system. 1139UVM's new anonymous memory system is based on the 1140anonymous memory system found in the SunOS 4 VM (as described in papers 1141published by Sun Microsystems, Inc.). 1142UVM also includes a number of features new to 1143.Bx 1144including page loanout, map entry passing, simplified 1145copy-on-write, and clustered anonymous memory pageout. 1146UVM is also further documented in an August 1998 dissertation by 1147Charles D. Cranor. 1148.Pp 1149UVM appeared in 1150.Nx 1.4 . 1151.Sh AUTHORS 1152Charles D. Cranor 1153.Aq chuck@ccrc.wustl.edu 1154designed and implemented UVM. 1155.Pp 1156Matthew Green 1157.Aq mrg@eterna.com.au 1158wrote the swap-space management code and handled the logistical issues 1159involved with merging UVM into the 1160.Nx 1161source tree. 1162.Pp 1163Chuck Silvers 1164.Aq chuq@chuq.com 1165implemented the aobj pager, thus allowing UVM to support System V shared 1166memory and process swapping. 1167He also designed and implemented the UBC part of UVM, which uses UVM pages 1168to cache vnode data rather than the traditional buffer cache buffers. 1169