xref: /netbsd-src/share/man/man9/uvm.9 (revision b5677b36047b601b9addaaa494a58ceae82c2a6c)
1.\"	$NetBSD: uvm.9,v 1.97 2009/03/12 13:13:16 wiz Exp $
2.\"
3.\" Copyright (c) 1998 Matthew R. Green
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\"
15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
16.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
17.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
18.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
19.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
20.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
21.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
22.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
23.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
25.\" SUCH DAMAGE.
26.\"
27.Dd March 12, 2009
28.Dt UVM 9
29.Os
30.Sh NAME
31.Nm uvm
32.Nd virtual memory system external interface
33.Sh SYNOPSIS
34.In sys/param.h
35.In uvm/uvm.h
36.Sh DESCRIPTION
37The UVM virtual memory system manages access to the computer's memory
38resources.
39User processes and the kernel access these resources through
40UVM's external interface.
41UVM's external interface includes functions that:
42.Pp
43.Bl -hyphen -compact
44.It
45initialize UVM sub-systems
46.It
47manage virtual address spaces
48.It
49resolve page faults
50.It
51memory map files and devices
52.It
53perform uio-based I/O to virtual memory
54.It
55allocate and free kernel virtual memory
56.It
57allocate and free physical memory
58.El
59.Pp
60In addition to exporting these services, UVM has two kernel-level processes:
61pagedaemon and swapper.
62The pagedaemon process sleeps until physical memory becomes scarce.
63When that happens, pagedaemon is awoken.
64It scans physical memory, paging out and freeing memory that has not
65been recently used.
66The swapper process swaps in runnable processes that are currently swapped
67out, if there is room.
68.Pp
69There are also several miscellaneous functions.
70.Sh INITIALIZATION
71.Bl -ohang
72.It Ft void
73.Fn uvm_init "void" ;
74.It Ft void
75.Fn uvm_init_limits "struct lwp *l" ;
76.It Ft void
77.Fn uvm_setpagesize "void" ;
78.It Ft void
79.Fn uvm_swap_init "void" ;
80.El
81.Pp
82.Fn uvm_init
83sets up the UVM system at system boot time, after the
84console has been setup.
85It initializes global state, the page, map, kernel virtual memory state,
86machine-dependent physical map, kernel memory allocator,
87pager and anonymous memory sub-systems, and then enables
88paging of kernel objects.
89.Pp
90.Fn uvm_init_limits
91initializes process limits for the named process.
92This is for use by the system startup for process zero, before any
93other processes are created.
94.Pp
95.Fn uvm_setpagesize
96initializes the uvmexp members pagesize (if not already done by
97machine-dependent code), pageshift and pagemask.
98It should be called by machine-dependent code early in the
99.Fn pmap_init
100call (see
101.Xr pmap 9 ) .
102.Pp
103.Fn uvm_swap_init
104initializes the swap sub-system.
105.Sh VIRTUAL ADDRESS SPACE MANAGEMENT
106.Bl -ohang
107.It Ft int
108.Fn uvm_map "struct vm_map *map" "vaddr_t *startp" "vsize_t size" "struct uvm_object *uobj" "voff_t uoffset" "vsize_t align" "uvm_flag_t flags" ;
109.It Ft void
110.Fn uvm_unmap "struct vm_map *map" "vaddr_t start" "vaddr_t end" ;
111.It Ft int
112.Fn uvm_map_pageable "struct vm_map *map" "vaddr_t start" "vaddr_t end" "bool new_pageable" "int lockflags" ;
113.It Ft bool
114.Fn uvm_map_checkprot "struct vm_map *map" "vaddr_t start" "vaddr_t end" "vm_prot_t protection" ;
115.It Ft int
116.Fn uvm_map_protect "struct vm_map *map" "vaddr_t start" "vaddr_t end" "vm_prot_t new_prot" "bool set_max" ;
117.It Ft int
118.Fn uvm_deallocate "struct vm_map *map" "vaddr_t start" "vsize_t size" ;
119.It Ft struct vmspace *
120.Fn uvmspace_alloc "vaddr_t min" "vaddr_t max" "int pageable" ;
121.It Ft void
122.Fn uvmspace_exec "struct lwp *l" "vaddr_t start" "vaddr_t end" ;
123.It Ft struct vmspace *
124.Fn uvmspace_fork "struct vmspace *vm" ;
125.It Ft void
126.Fn uvmspace_free "struct vmspace *vm1" ;
127.It Ft void
128.Fn uvmspace_share "struct proc *p1" "struct proc *p2" ;
129.It Ft void
130.Fn uvmspace_unshare "struct lwp *l" ;
131.It Ft bool
132.Fn uvm_uarea_alloc "vaddr_t *uaddrp" ;
133.It  Ft void
134.Fn uvm_uarea_free "vaddr_t uaddr" ;
135.El
136.Pp
137.Fn uvm_map
138establishes a valid mapping in map
139.Fa map ,
140which must be unlocked.
141The new mapping has size
142.Fa size ,
143which must be a multiple of
144.Dv PAGE_SIZE .
145The
146.Fa uobj
147and
148.Fa uoffset
149arguments can have four meanings.
150When
151.Fa uobj
152is
153.Dv NULL
154and
155.Fa uoffset
156is
157.Dv UVM_UNKNOWN_OFFSET ,
158.Fn uvm_map
159does not use the machine-dependent
160.Dv PMAP_PREFER
161function.
162If
163.Fa uoffset
164is any other value, it is used as the hint to
165.Dv PMAP_PREFER .
166When
167.Fa uobj
168is not
169.Dv NULL
170and
171.Fa uoffset
172is
173.Dv UVM_UNKNOWN_OFFSET ,
174.Fn uvm_map
175finds the offset based upon the virtual address, passed as
176.Fa startp .
177If
178.Fa uoffset
179is any other value, we are doing a normal mapping at this offset.
180The start address of the map will be returned in
181.Fa startp .
182.Pp
183.Fa align
184specifies alignment of mapping unless
185.Dv UVM_FLAG_FIXED
186is specified in
187.Fa flags .
188.Fa align
189must be a power of 2.
190.Pp
191.Fa flags
192passed to
193.Fn uvm_map
194are typically created using the
195.Fn UVM_MAPFLAG "vm_prot_t prot" "vm_prot_t maxprot" "vm_inherit_t inh" "int advice" "int flags"
196macro, which uses the following values.
197The
198.Fa prot
199and
200.Fa maxprot
201can take are:
202.Bd -literal
203#define UVM_PROT_MASK   0x07    /* protection mask */
204#define UVM_PROT_NONE   0x00    /* protection none */
205#define UVM_PROT_ALL    0x07    /* everything */
206#define UVM_PROT_READ   0x01    /* read */
207#define UVM_PROT_WRITE  0x02    /* write */
208#define UVM_PROT_EXEC   0x04    /* exec */
209#define UVM_PROT_R      0x01    /* read */
210#define UVM_PROT_W      0x02    /* write */
211#define UVM_PROT_RW     0x03    /* read-write */
212#define UVM_PROT_X      0x04    /* exec */
213#define UVM_PROT_RX     0x05    /* read-exec */
214#define UVM_PROT_WX     0x06    /* write-exec */
215#define UVM_PROT_RWX    0x07    /* read-write-exec */
216.Ed
217.Pp
218The values that
219.Fa inh
220can take are:
221.Bd -literal
222#define UVM_INH_MASK    0x30    /* inherit mask */
223#define UVM_INH_SHARE   0x00    /* "share" */
224#define UVM_INH_COPY    0x10    /* "copy" */
225#define UVM_INH_NONE    0x20    /* "none" */
226#define UVM_INH_DONATE  0x30    /* "donate" \*[Lt]\*[Lt] not used */
227.Ed
228.Pp
229The values that
230.Fa advice
231can take are:
232.Bd -literal
233#define UVM_ADV_NORMAL     0x0  /* 'normal' */
234#define UVM_ADV_RANDOM     0x1  /* 'random' */
235#define UVM_ADV_SEQUENTIAL 0x2  /* 'sequential' */
236#define UVM_ADV_MASK       0x7  /* mask */
237.Ed
238.Pp
239The values that
240.Fa flags
241can take are:
242.Bd -literal
243#define UVM_FLAG_FIXED   0x010000 /* find space */
244#define UVM_FLAG_OVERLAY 0x020000 /* establish overlay */
245#define UVM_FLAG_NOMERGE 0x040000 /* don't merge map entries */
246#define UVM_FLAG_COPYONW 0x080000 /* set copy_on_write flag */
247#define UVM_FLAG_AMAPPAD 0x100000 /* for bss: pad amap to reduce malloc() */
248#define UVM_FLAG_TRYLOCK 0x200000 /* fail if we can not lock map */
249.Ed
250.Pp
251The
252.Dv UVM_MAPFLAG
253macro arguments can be combined with an or operator.
254There are several special purpose macros for checking protection
255combinations, e.g., the
256.Dv UVM_PROT_WX
257macro.
258There are also some additional macros to extract bits from the flags.
259The
260.Dv UVM_PROTECTION ,
261.Dv UVM_INHERIT ,
262.Dv UVM_MAXPROTECTION
263and
264.Dv UVM_ADVICE
265macros return the protection, inheritance, maximum protection and advice,
266respectively.
267.Fn uvm_map
268returns a standard UVM return value.
269.Pp
270.Fn uvm_unmap
271removes a valid mapping,
272from
273.Fa start
274to
275.Fa end ,
276in map
277.Fa map ,
278which must be unlocked.
279.Pp
280.Fn uvm_map_pageable
281changes the pageability of the pages in the range from
282.Fa start
283to
284.Fa end
285in map
286.Fa map
287to
288.Fa new_pageable .
289.Fn uvm_map_pageable
290returns a standard UVM return value.
291.Pp
292.Fn uvm_map_checkprot
293checks the protection of the range from
294.Fa start
295to
296.Fa end
297in map
298.Fa map
299against
300.Fa protection .
301This returns either
302.Dv true
303or
304.Dv false .
305.Pp
306.Fn uvm_map_protect
307changes the protection
308.Fa start
309to
310.Fa end
311in map
312.Fa map
313to
314.Fa new_prot ,
315also setting the maximum protection to the region to
316.Fa new_prot
317if
318.Fa set_max
319is true.
320This function returns a standard UVM return value.
321.Pp
322.Fn uvm_deallocate
323deallocates kernel memory in map
324.Fa map
325from address
326.Fa start
327to
328.Fa start + size .
329.Pp
330.Fn uvmspace_alloc
331allocates and returns a new address space, with ranges from
332.Fa min
333to
334.Fa max ,
335setting the pageability of the address space to
336.Fa pageable .
337.Pp
338.Fn uvmspace_exec
339either reuses the address space of lwp
340.Fa l
341if there are no other references to it, or creates
342a new one with
343.Fn uvmspace_alloc .
344The range of valid addresses in the address space is reset to
345.Fa start
346through
347.Fa end .
348.Pp
349.Fn uvmspace_fork
350creates and returns a new address space based upon the
351.Fa vm1
352address space, typically used when allocating an address space for a
353child process.
354.Pp
355.Fn uvmspace_free
356lowers the reference count on the address space
357.Fa vm ,
358freeing the data structures if there are no other references.
359.Pp
360.Fn uvmspace_share
361causes process
362.Pa p2
363to share the address space of
364.Fa p1 .
365.Pp
366.Fn uvmspace_unshare
367ensures that lwp
368.Fa l
369has its own, unshared address space, by creating a new one if
370necessary by calling
371.Fn uvmspace_fork .
372.Pp
373.Fn uvm_uarea_alloc
374allocates virtual space for a u-area (i.e., a kernel stack) and stores
375its virtual address in
376.Fa *uaddrp .
377The return value is
378.Dv true
379if the u-area is already backed by wired physical memory, otherwise
380.Dv false .
381.Pp
382.Fn uvm_uarea_free
383frees a u-area allocated with
384.Fn uvm_uarea_alloc ,
385freeing both the virtual space and any physical pages which may have been
386allocated to back that virtual space later.
387.Sh PAGE FAULT HANDLING
388.Bl -ohang
389.It Ft int
390.Fn uvm_fault "struct vm_map *orig_map" "vaddr_t vaddr" "vm_prot_t access_type" ;
391.El
392.Pp
393.Fn uvm_fault
394is the main entry point for faults.
395It takes
396.Fa orig_map
397as the map the fault originated in, a
398.Fa vaddr
399offset into the map the fault occurred, and
400.Fa access_type
401describing the type of access requested.
402.Fn uvm_fault
403returns a standard UVM return value.
404.Sh MEMORY MAPPING FILES AND DEVICES
405.Bl -ohang
406.It Ft void
407.Fn uvm_vnp_setsize "struct vnode *vp" "voff_t newsize" ;
408.It Ft void *
409.Fn ubc_alloc "struct uvm_object *uobj" "voff_t offset" "vsize_t *lenp" \
410"int advice" "int flags" ;
411.It Ft void
412.Fn ubc_release "void *va" "int flags" ;
413.It Ft int
414.Fn ubc_uiomove "struct uvm_object *uobj" "struct uio *uio" "vsize_t todo" \
415"int advice" "int flags" ;
416.El
417.Pp
418.Fn uvm_vnp_setsize
419sets the size of vnode
420.Fa vp
421to
422.Fa newsize .
423Caller must hold a reference to the vnode.
424If the vnode shrinks, pages no longer used are discarded.
425.Pp
426.Fn ubc_alloc
427creates a kernel mapping of
428.Fa uobj
429starting at offset
430.Fa offset .
431The desired length of the mapping is pointed to by
432.Fa lenp ,
433but the actual mapping may be smaller than this.
434.Fa lenp
435is updated to contain the actual length mapped.
436.Fa advice
437is the access pattern hint, which must be one of
438.Pp
439.Bl -tag -offset indent -width "UVM_ADV_SEQUENTIAL" -compact
440.It UVM_ADV_NORMAL
441No hint
442.It UVM_ADV_RANDOM
443Random access hint
444.It UVM_ADV_SEQUENTIAL
445Sequential access hint (from lower offset to higher offset)
446.El
447.Pp
448The possible
449.Fa flags
450are
451.Pp
452.Bl -tag -offset indent -width "UVM_ADV_SEQUENTIAL" -compact
453.It UBC_READ
454Mapping will be accessed for read.
455.It UBC_WRITE
456Mapping will be accessed for write.
457.It UBC_FAULTBUSY
458Fault in window's pages already during mapping operation.
459Makes sense only for write.
460.El
461.Pp
462Once the mapping is created, it must be accessed only by methods that can
463handle faults, such as
464.Fn uiomove
465or
466.Fn kcopy .
467Page faults on the mapping will result in the object's pager
468method being called to resolve the fault.
469.Pp
470.Fn ubc_release
471frees the mapping at
472.Fa va
473for reuse.
474The mapping may be cached to speed future accesses to the same region
475of the object.
476The flags can be any of
477.Pp
478.Bl -tag -offset indent -width "UVM_ADV_SEQUENTIAL" -compact
479.It UBC_UNMAP
480Do not cache mapping.
481.El
482.Pp
483.Fn ubc_uiomove
484allocates an UBC memory window, performs I/O on it and unmaps the window.
485The
486.Fa advice
487parameter takes the same values as the respective parameter in
488.Fn ubc_alloc
489and the
490.Fa flags
491parameter takes the same arguments as
492.Fn ubc_alloc
493and
494.Fn ubc_unmap .
495Additionally, the flag
496.Dv UBC_PARTIALOK
497can be provided to indicate that it is acceptable to return if an error
498occurs mid-transfer.
499.Sh VIRTUAL MEMORY I/O
500.Bl -ohang
501.It Ft int
502.Fn uvm_io "struct vm_map *map" "struct uio *uio" ;
503.El
504.Pp
505.Fn uvm_io
506performs the I/O described in
507.Fa uio
508on the memory described in
509.Fa map .
510.Sh ALLOCATION OF KERNEL MEMORY
511.Bl -ohang
512.It Ft vaddr_t
513.Fn uvm_km_alloc "struct vm_map *map" "vsize_t size" "vsize_t align" "uvm_flag_t flags" ;
514.It Ft void
515.Fn uvm_km_free "struct vm_map *map" "vaddr_t addr" "vsize_t size" "uvm_flag_t flags" ;
516.It Ft struct vm_map *
517.Fn uvm_km_suballoc "struct vm_map *map" "vaddr_t *min" "vaddr_t *max" \
518"vsize_t size" "int flags" "bool fixed" "struct vm_map *submap" ;
519.El
520.Pp
521.Fn uvm_km_alloc
522allocates
523.Fa size
524bytes of kernel memory in map
525.Fa map .
526The first address of the allocated memory range will be aligned according to the
527.Fa align
528argument
529.Pq specify 0 if no alignment is necessary .
530The alignment must be a multiple of page size.
531The
532.Fa flags
533is a bitwise inclusive OR of the allocation type and operation flags.
534.Pp
535The allocation type should be one of:
536.Bl -tag -width UVM_KMF_PAGEABLE
537.It UVM_KMF_WIRED
538Wired memory.
539.It UVM_KMF_PAGEABLE
540Demand-paged zero-filled memory.
541.It UVM_KMF_VAONLY
542Virtual address only.
543No physical pages are mapped in the allocated region.
544If necessary, it's the caller's responsibility to enter page mappings.
545It's also the caller's responsibility to clean up the mappings before freeing
546the address range.
547.El
548.Pp
549The following operation flags are available:
550.Bl -tag -width UVM_KMF_PAGEABLE
551.It UVM_KMF_CANFAIL
552Can fail even if
553.Dv UVM_KMF_NOWAIT
554is not specified and
555.Dv UVM_KMF_WAITVA
556is specified.
557.It UVM_KMF_ZERO
558Request zero-filled memory.
559Only supported for
560.Dv UVM_KMF_WIRED .
561Shouldn't be used with other types.
562.It UVM_KMF_TRYLOCK
563Fail if we can't lock the map.
564.It UVM_KMF_NOWAIT
565Fail immediately if no memory is available.
566.It UVM_KMF_WAITVA
567Sleep to wait for the virtual address resources if needed.
568.El
569.Pp
570(If neither
571.Dv UVM_KMF_NOWAIT
572nor
573.Dv UVM_KMF_CANFAIL
574are specified and
575.Dv UVM_KMF_WAITVA
576is specified,
577.Fn uvm_km_alloc
578will never fail, but rather sleep indefinitely until the allocation succeeds.)
579.Pp
580Pageability of the pages allocated with
581.Dv UVM_KMF_PAGEABLE
582can be changed by
583.Fn uvm_map_pageable .
584In that case, the entire range must be changed atomically.
585Changing a part of the range is not supported.
586.Pp
587.Fn uvm_km_free
588frees the memory range allocated by
589.Fn uvm_km_alloc .
590.Fa addr
591must be an address returned by
592.Fn uvm_km_alloc .
593.Fa map
594and
595.Fa size
596must be the same as the ones used for the corresponding
597.Fn uvm_km_alloc .
598.Fa flags
599must be the allocation type used for the corresponding
600.Fn uvm_km_alloc .
601.Pp
602.Fn uvm_km_free
603is the only way to free memory ranges allocated by
604.Fn uvm_km_alloc .
605.Fn uvm_unmap
606must not be used.
607.Pp
608.Fn uvm_km_suballoc
609allocates submap from
610.Fa map ,
611creating a new map if
612.Fa submap
613is
614.Dv NULL .
615The addresses of the submap can be specified exactly by setting the
616.Fa fixed
617argument to true, which causes the
618.Fa min
619argument to specify the beginning of the address in the submap.
620If
621.Fa fixed
622is false, any address of size
623.Fa size
624will be allocated from
625.Fa map
626and the start and end addresses returned in
627.Fa min
628and
629.Fa max .
630The
631.Fa flags
632are used to initialize the created submap.
633The following flags could be set:
634.Bl -tag -width VM_MAP_PAGEABLE
635.It VM_MAP_PAGEABLE
636Entries in the map may be paged out.
637.It VM_MAP_INTRSAFE
638Map should be interrupt-safe.
639.It VM_MAP_TOPDOWN
640A top-down mapping should be arranged.
641.El
642.Sh ALLOCATION OF PHYSICAL MEMORY
643.Bl -ohang
644.It Ft struct vm_page *
645.Fn uvm_pagealloc "struct uvm_object *uobj" "voff_t off" "struct vm_anon *anon" "int flags" ;
646.It Ft void
647.Fn uvm_pagerealloc "struct vm_page *pg" "struct uvm_object *newobj" "voff_t newoff" ;
648.It Ft void
649.Fn uvm_pagefree "struct vm_page *pg" ;
650.It Ft int
651.Fn uvm_pglistalloc "psize_t size" "paddr_t low" "paddr_t high" "paddr_t alignment" "paddr_t boundary" "struct pglist *rlist" "int nsegs" "int waitok" ;
652.It Ft void
653.Fn uvm_pglistfree "struct pglist *list" ;
654.It Ft void
655.Fn uvm_page_physload "vaddr_t start" "vaddr_t end" "vaddr_t avail_start" "vaddr_t avail_end" "int free_list" ;
656.El
657.Pp
658.Fn uvm_pagealloc
659allocates a page of memory at virtual address
660.Fa off
661in either the object
662.Fa uobj
663or the anonymous memory
664.Fa anon ,
665which must be locked by the caller.
666Only one of
667.Fa uobj
668and
669.Fa anon
670can be non
671.Dv NULL .
672Returns
673.Dv NULL
674when no page can be found.
675The flags can be any of
676.Bd -literal
677#define UVM_PGA_USERESERVE      0x0001  /* ok to use reserve pages */
678#define UVM_PGA_ZERO            0x0002  /* returned page must be zero'd */
679.Ed
680.Pp
681.Dv UVM_PGA_USERESERVE
682means to allocate a page even if that will result in the number of free pages
683being lower than
684.Dv uvmexp.reserve_pagedaemon
685(if the current thread is the pagedaemon) or
686.Dv uvmexp.reserve_kernel
687(if the current thread is not the pagedaemon).
688.Dv UVM_PGA_ZERO
689causes the returned page to be filled with zeroes, either by allocating it
690from a pool of pre-zeroed pages or by zeroing it in-line as necessary.
691.Pp
692.Fn uvm_pagerealloc
693reallocates page
694.Fa pg
695to a new object
696.Fa newobj ,
697at a new offset
698.Fa newoff .
699.Pp
700.Fn uvm_pagefree
701frees the physical page
702.Fa pg .
703If the content of the page is known to be zero-filled,
704caller should set
705.Dv PG_ZERO
706in pg-\*[Gt]flags so that the page allocator will use
707the page to serve future
708.Dv UVM_PGA_ZERO
709requests efficiently.
710.Pp
711.Fn uvm_pglistalloc
712allocates a list of pages for size
713.Fa size
714byte under various constraints.
715.Fa low
716and
717.Fa high
718describe the lowest and highest addresses acceptable for the list.
719If
720.Fa alignment
721is non-zero, it describes the required alignment of the list, in
722power-of-two notation.
723If
724.Fa boundary
725is non-zero, no segment of the list may cross this power-of-two
726boundary, relative to zero.
727.Fa nsegs
728is the maximum number of physically contiguous segments.
729If
730.Fa waitok
731is non-zero, the function may sleep until enough memory is available.
732(It also may give up in some situations, so a non-zero
733.Fa waitok
734does not imply that
735.Fn uvm_pglistalloc
736cannot return an error.)
737The allocated memory is returned in the
738.Fa rlist
739list; the caller has to provide storage only, the list is initialized by
740.Fn uvm_pglistalloc .
741.Pp
742.Fn uvm_pglistfree
743frees the list of pages pointed to by
744.Fa list .
745If the content of the page is known to be zero-filled,
746caller should set
747.Dv PG_ZERO
748in pg-\*[Gt]flags so that the page allocator will use
749the page to serve future
750.Dv UVM_PGA_ZERO
751requests efficiently.
752.Pp
753.Fn uvm_page_physload
754loads physical memory segments into VM space on the specified
755.Fa free_list .
756It must be called at system boot time to set up physical memory
757management pages.
758The arguments describe the
759.Fa start
760and
761.Fa end
762of the physical addresses of the segment, and the available start and end
763addresses of pages not already in use.
764If a system has memory banks of
765different speeds the slower memory should be given a higher
766.Fa free_list
767value.
768.\" XXX expand on "system boot time"!
769.Sh PROCESSES
770.Bl -ohang
771.It Ft void
772.Fn uvm_pageout "void" ;
773.It Ft void
774.Fn uvm_scheduler "void" ;
775.It Ft void
776.Fn uvm_swapin "struct lwp *l" ;
777.El
778.Pp
779.Fn uvm_pageout
780is the main loop for the page daemon.
781.Pp
782.Fn uvm_scheduler
783is the process zero main loop, which is to be called after the
784system has finished starting other processes.
785It handles the swapping in of runnable, swapped out processes in priority
786order.
787.Pp
788.Fn uvm_swapin
789swaps in the named lwp.
790.Sh PAGE LOAN
791.Bl -ohang
792.It Ft int
793.Fn uvm_loan "struct vm_map *map" "vaddr_t start" "vsize_t len" "void *v" "int flags" ;
794.It Ft void
795.Fn uvm_unloan "void *v" "int npages" "int flags" ;
796.El
797.Pp
798.Fn uvm_loan
799loans pages in a map out to anons or to the kernel.
800.Fa map
801should be unlocked,
802.Fa start
803and
804.Fa len
805should be multiples of
806.Dv PAGE_SIZE .
807Argument
808.Fa flags
809should be one of
810.Bd -literal
811#define UVM_LOAN_TOANON       0x01    /* loan to anons */
812#define UVM_LOAN_TOPAGE       0x02    /* loan to kernel */
813.Ed
814.Pp
815.Fa v
816should be pointer to array of pointers to
817.Li struct anon
818or
819.Li struct vm_page ,
820as appropriate.
821The caller has to allocate memory for the array and
822ensure it's big enough to hold
823.Fa len / PAGE_SIZE
824pointers.
825Returns 0 for success, or appropriate error number otherwise.
826Note that wired pages can't be loaned out and
827.Fn uvm_loan
828will fail in that case.
829.Pp
830.Fn uvm_unloan
831kills loans on pages or anons.
832The
833.Fa v
834must point to the array of pointers initialized by previous call to
835.Fn uvm_loan .
836.Fa npages
837should match number of pages allocated for loan, this also matches
838number of items in the array.
839Argument
840.Fa flags
841should be one of
842.Bd -literal
843#define UVM_LOAN_TOANON       0x01    /* loan to anons */
844#define UVM_LOAN_TOPAGE       0x02    /* loan to kernel */
845.Ed
846.Pp
847and should match what was used for previous call to
848.Fn uvm_loan .
849.Sh MISCELLANEOUS FUNCTIONS
850.Bl -ohang
851.It Ft struct uvm_object *
852.Fn uao_create "vsize_t size" "int flags" ;
853.It Ft void
854.Fn uao_detach "struct uvm_object *uobj" ;
855.It Ft void
856.Fn uao_reference "struct uvm_object *uobj" ;
857.It Ft bool
858.Fn uvm_chgkprot "void *addr" "size_t len" "int rw" ;
859.It Ft void
860.Fn uvm_kernacc "void *addr" "size_t len" "int rw" ;
861.It Ft int
862.Fn uvm_vslock "struct vmspace *vs" "void *addr" "size_t len" "vm_prot_t prot" ;
863.It Ft void
864.Fn uvm_vsunlock "struct vmspace *vs" "void *addr" "size_t len" ;
865.It Ft void
866.Fn uvm_meter "void" ;
867.It Ft void
868.Fn uvm_fork "struct lwp *l1" "struct lwp *l2" "bool shared" ;
869.It Ft int
870.Fn uvm_grow "struct proc *p" "vaddr_t sp" ;
871.It Ft void
872.Fn uvn_findpages "struct uvm_object *uobj" "voff_t offset" "int *npagesp" "struct vm_page **pps" "int flags" ;
873.It Ft void
874.Fn uvm_swap_stats "int cmd" "struct swapent *sep" "int sec" "register_t *retval" ;
875.El
876.Pp
877The
878.Fn uao_create ,
879.Fn uao_detach ,
880and
881.Fn uao_reference
882functions operate on anonymous memory objects, such as those used to support
883System V shared memory.
884.Fn uao_create
885returns an object of size
886.Fa size
887with flags:
888.Bd -literal
889#define UAO_FLAG_KERNOBJ        0x1     /* create kernel object */
890#define UAO_FLAG_KERNSWAP       0x2     /* enable kernel swap */
891.Ed
892.Pp
893which can only be used once each at system boot time.
894.Fn uao_reference
895creates an additional reference to the named anonymous memory object.
896.Fn uao_detach
897removes a reference from the named anonymous memory object, destroying
898it if removing the last reference.
899.Pp
900.Fn uvm_chgkprot
901changes the protection of kernel memory from
902.Fa addr
903to
904.Fa addr + len
905to the value of
906.Fa rw .
907This is primarily useful for debuggers, for setting breakpoints.
908This function is only available with options
909.Dv KGDB .
910.Pp
911.Fn uvm_kernacc
912checks the access at address
913.Fa addr
914to
915.Fa addr + len
916for
917.Fa rw
918access in the kernel address space.
919.Pp
920.Fn uvm_vslock
921and
922.Fn uvm_vsunlock
923control the wiring and unwiring of pages for process
924.Fa p
925from
926.Fa addr
927to
928.Fa addr + len .
929These functions are normally used to wire memory for I/O.
930.Pp
931.Fn uvm_meter
932calculates the load average and wakes up the swapper if necessary.
933.Pp
934.Fn uvm_fork
935forks a virtual address space for process' (old)
936.Fa p1
937and (new)
938.Fa p2 .
939If the
940.Fa shared
941argument is non zero, p1 shares its address space with p2,
942otherwise a new address space is created.
943This function currently has no return value, and thus cannot fail.
944In the future, this function will be changed to allow it to
945fail in low memory conditions.
946.Pp
947.Fn uvm_grow
948increases the stack segment of process
949.Fa p
950to include
951.Fa sp .
952.Pp
953.Fn uvn_findpages
954looks up or creates pages in
955.Fa uobj
956at offset
957.Fa offset ,
958marks them busy and returns them in the
959.Fa pps
960array.
961Currently
962.Fa uobj
963must be a vnode object.
964The number of pages requested is pointed to by
965.Fa npagesp ,
966and this value is updated with the actual number of pages returned.
967The flags can be
968.Bd -literal
969#define UFP_ALL         0x00    /* return all pages requested */
970#define UFP_NOWAIT      0x01    /* don't sleep */
971#define UFP_NOALLOC     0x02    /* don't allocate new pages */
972#define UFP_NOCACHE     0x04    /* don't return pages which already exist */
973#define UFP_NORDONLY    0x08    /* don't return PG_READONLY pages */
974.Ed
975.Pp
976.Dv UFP_ALL
977is a pseudo-flag meaning all requested pages should be returned.
978.Dv UFP_NOWAIT
979means that we must not sleep.
980.Dv UFP_NOALLOC
981causes any pages which do not already exist to be skipped.
982.Dv UFP_NOCACHE
983causes any pages which do already exist to be skipped.
984.Dv UFP_NORDONLY
985causes any pages which are marked PG_READONLY to be skipped.
986.Pp
987.Fn uvm_swap_stats
988implements the
989.Dv SWAP_STATS
990and
991.Dv SWAP_OSTATS
992operation of the
993.Xr swapctl 2
994system call.
995.Fa cmd
996is the requested command,
997.Dv SWAP_STATS
998or
999.Dv SWAP_OSTATS .
1000The function will copy no more than
1001.Fa sec
1002entries in the array pointed by
1003.Fa sep .
1004On return,
1005.Fa retval
1006holds the actual number of entries copied in the array.
1007.Sh SYSCTL
1008UVM provides support for the
1009.Dv CTL_VM
1010domain of the
1011.Xr sysctl 3
1012hierarchy.
1013It handles the
1014.Dv VM_LOADAVG ,
1015.Dv VM_METER ,
1016.Dv VM_UVMEXP ,
1017and
1018.Dv VM_UVMEXP2
1019nodes, which return the current load averages, calculates current VM
1020totals, returns the uvmexp structure, and a kernel version independent
1021view of the uvmexp structure, respectively.
1022It also exports a number of tunables that control how much VM space is
1023allowed to be consumed by various tasks.
1024The load averages are typically accessed from userland using the
1025.Xr getloadavg 3
1026function.
1027The uvmexp structure has all global state of the UVM system,
1028and has the following members:
1029.Bd -literal
1030/* vm_page constants */
1031int pagesize;   /* size of a page (PAGE_SIZE): must be power of 2 */
1032int pagemask;   /* page mask */
1033int pageshift;  /* page shift */
1034
1035/* vm_page counters */
1036int npages;     /* number of pages we manage */
1037int free;       /* number of free pages */
1038int active;     /* number of active pages */
1039int inactive;   /* number of pages that we free'd but may want back */
1040int paging;     /* number of pages in the process of being paged out */
1041int wired;      /* number of wired pages */
1042int reserve_pagedaemon; /* number of pages reserved for pagedaemon */
1043int reserve_kernel; /* number of pages reserved for kernel */
1044
1045/* pageout params */
1046int freemin;    /* min number of free pages */
1047int freetarg;   /* target number of free pages */
1048int inactarg;   /* target number of inactive pages */
1049int wiredmax;   /* max number of wired pages */
1050
1051/* swap */
1052int nswapdev;   /* number of configured swap devices in system */
1053int swpages;    /* number of PAGE_SIZE'ed swap pages */
1054int swpginuse;  /* number of swap pages in use */
1055int nswget;     /* number of times fault calls uvm_swap_get() */
1056int nanon;      /* number total of anon's in system */
1057int nfreeanon;  /* number of free anon's */
1058
1059/* stat counters */
1060int faults;             /* page fault count */
1061int traps;              /* trap count */
1062int intrs;              /* interrupt count */
1063int swtch;              /* context switch count */
1064int softs;              /* software interrupt count */
1065int syscalls;           /* system calls */
1066int pageins;            /* pagein operation count */
1067                        /* pageouts are in pdpageouts below */
1068int swapins;            /* swapins */
1069int swapouts;           /* swapouts */
1070int pgswapin;           /* pages swapped in */
1071int pgswapout;          /* pages swapped out */
1072int forks;              /* forks */
1073int forks_ppwait;       /* forks where parent waits */
1074int forks_sharevm;      /* forks where vmspace is shared */
1075
1076/* fault subcounters */
1077int fltnoram;   /* number of times fault was out of ram */
1078int fltnoanon;  /* number of times fault was out of anons */
1079int fltpgwait;  /* number of times fault had to wait on a page */
1080int fltpgrele;  /* number of times fault found a released page */
1081int fltrelck;   /* number of times fault relock called */
1082int fltrelckok; /* number of times fault relock is a success */
1083int fltanget;   /* number of times fault gets anon page */
1084int fltanretry; /* number of times fault retrys an anon get */
1085int fltamcopy;  /* number of times fault clears "needs copy" */
1086int fltnamap;   /* number of times fault maps a neighbor anon page */
1087int fltnomap;   /* number of times fault maps a neighbor obj page */
1088int fltlget;    /* number of times fault does a locked pgo_get */
1089int fltget;     /* number of times fault does an unlocked get */
1090int flt_anon;   /* number of times fault anon (case 1a) */
1091int flt_acow;   /* number of times fault anon cow (case 1b) */
1092int flt_obj;    /* number of times fault is on object page (2a) */
1093int flt_prcopy; /* number of times fault promotes with copy (2b) */
1094int flt_przero; /* number of times fault promotes with zerofill (2b) */
1095
1096/* daemon counters */
1097int pdwoke;     /* number of times daemon woke up */
1098int pdrevs;     /* number of times daemon rev'd clock hand */
1099int pdswout;    /* number of times daemon called for swapout */
1100int pdfreed;    /* number of pages daemon freed since boot */
1101int pdscans;    /* number of pages daemon scanned since boot */
1102int pdanscan;   /* number of anonymous pages scanned by daemon */
1103int pdobscan;   /* number of object pages scanned by daemon */
1104int pdreact;    /* number of pages daemon reactivated since boot */
1105int pdbusy;     /* number of times daemon found a busy page */
1106int pdpageouts; /* number of times daemon started a pageout */
1107int pdpending;  /* number of times daemon got a pending pageout */
1108int pddeact;    /* number of pages daemon deactivates */
1109.Ed
1110.Sh NOTES
1111.Fn uvm_chgkprot
1112is only available if the kernel has been compiled with options
1113.Dv KGDB .
1114.Pp
1115All structure and types whose names begin with
1116.Dq vm_
1117will be renamed to
1118.Dq uvm_ .
1119.Sh SEE ALSO
1120.Xr swapctl 2 ,
1121.Xr getloadavg 3 ,
1122.Xr kvm 3 ,
1123.Xr sysctl 3 ,
1124.Xr ddb 4 ,
1125.Xr options 4 ,
1126.Xr memoryallocators 9 ,
1127.Xr pmap 9
1128.Sh HISTORY
1129UVM is a new VM system developed at Washington University in St. Louis
1130(Missouri).
1131UVM's roots lie partly in the Mach-based
1132.Bx 4.4
1133VM system, the
1134.Fx
1135VM system, and the SunOS 4 VM system.
1136UVM's basic structure is based on the
1137.Bx 4.4
1138VM system.
1139UVM's new anonymous memory system is based on the
1140anonymous memory system found in the SunOS 4 VM (as described in papers
1141published by Sun Microsystems, Inc.).
1142UVM also includes a number of features new to
1143.Bx
1144including page loanout, map entry passing, simplified
1145copy-on-write, and clustered anonymous memory pageout.
1146UVM is also further documented in an August 1998 dissertation by
1147Charles D. Cranor.
1148.Pp
1149UVM appeared in
1150.Nx 1.4 .
1151.Sh AUTHORS
1152Charles D. Cranor
1153.Aq chuck@ccrc.wustl.edu
1154designed and implemented UVM.
1155.Pp
1156Matthew Green
1157.Aq mrg@eterna.com.au
1158wrote the swap-space management code and handled the logistical issues
1159involved with merging UVM into the
1160.Nx
1161source tree.
1162.Pp
1163Chuck Silvers
1164.Aq chuq@chuq.com
1165implemented the aobj pager, thus allowing UVM to support System V shared
1166memory and process swapping.
1167He also designed and implemented the UBC part of UVM, which uses UVM pages
1168to cache vnode data rather than the traditional buffer cache buffers.
1169