xref: /netbsd-src/share/man/man9/uvm.9 (revision 2026b7285b519b6985686c4f29b6309b5e58de6d)
1.\"	$NetBSD: uvm.9,v 1.115 2024/02/04 05:43:06 mrg Exp $
2.\"
3.\" Copyright (c) 1998 Matthew R. Green
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\"
15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
16.\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
17.\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
18.\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
19.\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
20.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
21.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
22.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
23.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
25.\" SUCH DAMAGE.
26.\"
27.Dd March 23, 2015
28.Dt UVM 9
29.Os
30.Sh NAME
31.Nm uvm
32.Nd virtual memory system external interface
33.Sh SYNOPSIS
34.In sys/param.h
35.In uvm/uvm.h
36.Sh DESCRIPTION
37The UVM virtual memory system manages access to the computer's memory
38resources.
39User processes and the kernel access these resources through
40UVM's external interface.
41UVM's external interface includes functions that:
42.Pp
43.Bl -hyphen -compact
44.It
45initialize UVM sub-systems
46.It
47manage virtual address spaces
48.It
49resolve page faults
50.It
51memory map files and devices
52.It
53perform uio-based I/O to virtual memory
54.It
55allocate and free kernel virtual memory
56.It
57allocate and free physical memory
58.El
59.Pp
60In addition to exporting these services, UVM has two kernel-level processes:
61pagedaemon and swapper.
62The pagedaemon process sleeps until physical memory becomes scarce.
63When that happens, pagedaemon is awoken.
64It scans physical memory, paging out and freeing memory that has not
65been recently used.
66The swapper process swaps in runnable processes that are currently swapped
67out, if there is room.
68.Pp
69There are also several miscellaneous functions.
70.Sh INITIALIZATION
71.Bl -ohang
72.It Ft void
73.Fn uvm_init "void" ;
74.It Ft void
75.Fn uvm_init_limits "struct lwp *l" ;
76.It Ft void
77.Fn uvm_setpagesize "void" ;
78.It Ft void
79.Fn uvm_swap_init "void" ;
80.El
81.Pp
82.Fn uvm_init
83sets up the UVM system at system boot time, after the
84console has been setup.
85It initializes global state, the page, map, kernel virtual memory state,
86machine-dependent physical map, kernel memory allocator,
87pager and anonymous memory sub-systems, and then enables
88paging of kernel objects.
89.Pp
90.Fn uvm_init_limits
91initializes process limits for the named process.
92This is for use by the system startup for process zero, before any
93other processes are created.
94.Pp
95.Fn uvm_md_init
96does early boot initialization.
97This currently includes:
98.Fn uvm_setpagesize
99which initializes the uvmexp members pagesize (if not already done by
100machine-dependent code), pageshift and pagemask.
101.Fn uvm_physseg_init
102which initialises the
103.Xr uvm_hotplug 9
104subsystem.
105It should be called by machine-dependent code early in the
106.Fn pmap_init
107call (see
108.Xr pmap 9 ) .
109.Pp
110.Fn uvm_swap_init
111initializes the swap sub-system.
112.Sh VIRTUAL ADDRESS SPACE MANAGEMENT
113See
114.Xr uvm_map 9 .
115.Sh PAGE FAULT HANDLING
116.Bl -ohang
117.It Ft int
118.Fn uvm_fault "struct vm_map *orig_map" "vaddr_t vaddr" "vm_prot_t access_type" ;
119.El
120.Pp
121.Fn uvm_fault
122is the main entry point for faults.
123It takes
124.Fa orig_map
125as the map the fault originated in, a
126.Fa vaddr
127offset into the map the fault occurred, and
128.Fa access_type
129describing the type of access requested.
130.Fn uvm_fault
131returns a standard UVM return value.
132.Sh MEMORY MAPPING FILES AND DEVICES
133See
134.Xr ubc 9 .
135.Sh VIRTUAL MEMORY I/O
136.Bl -ohang
137.It Ft int
138.Fn uvm_io "struct vm_map *map" "struct uio *uio" ;
139.El
140.Pp
141.Fn uvm_io
142performs the I/O described in
143.Fa uio
144on the memory described in
145.Fa map .
146.Sh ALLOCATION OF KERNEL MEMORY
147See
148.Xr uvm_km 9 .
149.Sh ALLOCATION OF PHYSICAL MEMORY
150.Bl -ohang
151.It Ft struct vm_page *
152.Fn uvm_pagealloc "struct uvm_object *uobj" "voff_t off" "struct vm_anon *anon" "int flags" ;
153.It Ft void
154.Fn uvm_pagerealloc "struct vm_page *pg" "struct uvm_object *newobj" "voff_t newoff" ;
155.It Ft void
156.Fn uvm_pagefree "struct vm_page *pg" ;
157.It Ft int
158.Fn uvm_pglistalloc "psize_t size" "paddr_t low" "paddr_t high" "paddr_t alignment" "paddr_t boundary" "struct pglist *rlist" "int nsegs" "int waitok" ;
159.It Ft void
160.Fn uvm_pglistfree "struct pglist *list" ;
161.It Ft void
162.Fn uvm_page_physload "paddr_t start" "paddr_t end" "paddr_t avail_start" "paddr_t avail_end" "int free_list" ;
163.El
164.Pp
165.Fn uvm_pagealloc
166allocates a page of memory at virtual address
167.Fa off
168in either the object
169.Fa uobj
170or the anonymous memory
171.Fa anon ,
172which must be locked by the caller.
173Only one of
174.Fa uobj
175and
176.Fa anon
177can be non
178.Dv NULL .
179Returns
180.Dv NULL
181when no page can be found.
182The flags can be any of
183.Bd -literal
184#define UVM_PGA_USERESERVE      0x0001  /* ok to use reserve pages */
185#define UVM_PGA_ZERO            0x0002  /* returned page must be zero'd */
186.Ed
187.Pp
188.Dv UVM_PGA_USERESERVE
189means to allocate a page even if that will result in the number of free pages
190being lower than
191.Dv uvmexp.reserve_pagedaemon
192(if the current thread is the pagedaemon) or
193.Dv uvmexp.reserve_kernel
194(if the current thread is not the pagedaemon).
195.Dv UVM_PGA_ZERO
196causes the returned page to be filled with zeroes, either by allocating it
197from a pool of pre-zeroed pages or by zeroing it in-line as necessary.
198.Pp
199.Fn uvm_pagerealloc
200reallocates page
201.Fa pg
202to a new object
203.Fa newobj ,
204at a new offset
205.Fa newoff .
206.Pp
207.Fn uvm_pagefree
208frees the physical page
209.Fa pg .
210If the content of the page is known to be zero-filled,
211caller should set
212.Dv PG_ZERO
213in pg->flags so that the page allocator will use
214the page to serve future
215.Dv UVM_PGA_ZERO
216requests efficiently.
217.Pp
218.Fn uvm_pglistalloc
219allocates a list of pages for size
220.Fa size
221byte under various constraints.
222.Fa low
223and
224.Fa high
225describe the lowest and highest addresses acceptable for the list.
226If
227.Fa alignment
228is non-zero, it describes the required alignment of the list, in
229power-of-two notation.
230If
231.Fa boundary
232is non-zero, no segment of the list may cross this power-of-two
233boundary, relative to zero.
234.Fa nsegs
235is the maximum number of physically contiguous segments.
236If
237.Fa waitok
238is non-zero, the function may sleep until enough memory is available.
239(It also may give up in some situations, so a non-zero
240.Fa waitok
241does not imply that
242.Fn uvm_pglistalloc
243cannot return an error.)
244The allocated memory is returned in the
245.Fa rlist
246list; the caller has to provide storage only, the list is initialized by
247.Fn uvm_pglistalloc .
248.Pp
249.Fn uvm_pglistfree
250frees the list of pages pointed to by
251.Fa list .
252If the content of the page is known to be zero-filled,
253caller should set
254.Dv PG_ZERO
255in pg->flags so that the page allocator will use
256the page to serve future
257.Dv UVM_PGA_ZERO
258requests efficiently.
259.Pp
260.Fn uvm_page_physload
261loads physical memory segments into VM space on the specified
262.Fa free_list .
263It must be called at system boot time to set up physical memory
264management pages.
265The arguments describe the
266.Fa start
267and
268.Fa end
269of the physical addresses of the segment, and the available start and end
270addresses of pages not already in use.
271If a system has memory banks of
272different speeds the slower memory should be given a higher
273.Fa free_list
274value.
275.\" XXX expand on "system boot time"!
276.Sh PROCESSES
277.Bl -ohang
278.It Ft void
279.Fn uvm_pageout "void" ;
280.It Ft void
281.Fn uvm_scheduler "void" ;
282.El
283.Pp
284.Fn uvm_pageout
285is the main loop for the page daemon.
286.Pp
287.Fn uvm_scheduler
288is the process zero main loop, which is to be called after the
289system has finished starting other processes.
290It handles the swapping in of runnable, swapped out processes in priority
291order.
292.Sh PAGE LOAN
293.Bl -ohang
294.It Ft int
295.Fn uvm_loan "struct vm_map *map" "vaddr_t start" "vsize_t len" "void *v" "int flags" ;
296.It Ft void
297.Fn uvm_unloan "void *v" "int npages" "int flags" ;
298.El
299.Pp
300.Fn uvm_loan
301loans pages in a map out to anons or to the kernel.
302.Fa map
303should be unlocked,
304.Fa start
305and
306.Fa len
307should be multiples of
308.Dv PAGE_SIZE .
309Argument
310.Fa flags
311should be one of
312.Bd -literal
313#define UVM_LOAN_TOANON       0x01    /* loan to anons */
314#define UVM_LOAN_TOPAGE       0x02    /* loan to kernel */
315.Ed
316.Pp
317.Fa v
318should be pointer to array of pointers to
319.Li struct anon
320or
321.Li struct vm_page ,
322as appropriate.
323The caller has to allocate memory for the array and
324ensure it's big enough to hold
325.Fa len / PAGE_SIZE
326pointers.
327Returns 0 for success, or appropriate error number otherwise.
328Note that wired pages can't be loaned out and
329.Fn uvm_loan
330will fail in that case.
331.Pp
332.Fn uvm_unloan
333kills loans on pages or anons.
334The
335.Fa v
336must point to the array of pointers initialized by previous call to
337.Fn uvm_loan .
338.Fa npages
339should match number of pages allocated for loan, this also matches
340number of items in the array.
341Argument
342.Fa flags
343should be one of
344.Bd -literal
345#define UVM_LOAN_TOANON       0x01    /* loan to anons */
346#define UVM_LOAN_TOPAGE       0x02    /* loan to kernel */
347.Ed
348.Pp
349and should match what was used for previous call to
350.Fn uvm_loan .
351.Sh MISCELLANEOUS FUNCTIONS
352.Bl -ohang
353.It Ft struct uvm_object *
354.Fn uao_create "vsize_t size" "int flags" ;
355.It Ft void
356.Fn uao_detach "struct uvm_object *uobj" ;
357.It Ft void
358.Fn uao_reference "struct uvm_object *uobj" ;
359.It Ft bool
360.Fn uvm_chgkprot "void *addr" "size_t len" "int rw" ;
361.It Ft void
362.Fn uvm_kernacc "void *addr" "size_t len" "int rw" ;
363.It Ft int
364.Fn uvm_vslock "struct vmspace *vs" "void *addr" "size_t len" "vm_prot_t prot" ;
365.It Ft void
366.Fn uvm_vsunlock "struct vmspace *vs" "void *addr" "size_t len" ;
367.It Ft void
368.Fn uvm_meter "void" ;
369.It Ft void
370.Fn uvm_proc_fork "struct proc *p1" "struct proc *p2" "bool shared" ;
371.It Ft int
372.Fn uvm_grow "struct proc *p" "vaddr_t sp" ;
373.It Ft void
374.Fn uvn_findpages "struct uvm_object *uobj" "voff_t offset" "int *npagesp" "struct vm_page **pps" "int flags" ;
375.It Ft void
376.Fn uvm_vnp_setsize "struct vnode *vp" "voff_t newsize" ;
377.El
378.Pp
379The
380.Fn uao_create ,
381.Fn uao_detach ,
382and
383.Fn uao_reference
384functions operate on anonymous memory objects, such as those used to support
385System V shared memory.
386.Fn uao_create
387returns an object of size
388.Fa size
389with flags:
390.Bd -literal
391#define UAO_FLAG_KERNOBJ        0x1     /* create kernel object */
392#define UAO_FLAG_KERNSWAP       0x2     /* enable kernel swap */
393.Ed
394.Pp
395which can only be used once each at system boot time.
396.Fn uao_reference
397creates an additional reference to the named anonymous memory object.
398.Fn uao_detach
399removes a reference from the named anonymous memory object, destroying
400it if removing the last reference.
401.Pp
402.Fn uvm_chgkprot
403changes the protection of kernel memory from
404.Fa addr
405to
406.Fa addr + len
407to the value of
408.Fa rw .
409This is primarily useful for debuggers, for setting breakpoints.
410This function is only available with options
411.Dv KGDB .
412.Pp
413.Fn uvm_kernacc
414checks the access at address
415.Fa addr
416to
417.Fa addr + len
418for
419.Fa rw
420access in the kernel address space.
421.Pp
422.Fn uvm_vslock
423and
424.Fn uvm_vsunlock
425control the wiring and unwiring of pages for process
426.Fa p
427from
428.Fa addr
429to
430.Fa addr + len .
431These functions are normally used to wire memory for I/O.
432.Pp
433.Fn uvm_meter
434calculates the load average.
435.Pp
436.Fn uvm_proc_fork
437forks a virtual address space for process' (old)
438.Fa p1
439and (new)
440.Fa p2 .
441If the
442.Fa shared
443argument is non zero, p1 shares its address space with p2,
444otherwise a new address space is created.
445This function currently has no return value, and thus cannot fail.
446In the future, this function will be changed to allow it to
447fail in low memory conditions.
448.Pp
449.Fn uvm_grow
450increases the stack segment of process
451.Fa p
452to include
453.Fa sp .
454.Pp
455.Fn uvn_findpages
456looks up or creates pages in
457.Fa uobj
458at offset
459.Fa offset ,
460marks them busy and returns them in the
461.Fa pps
462array.
463Currently
464.Fa uobj
465must be a vnode object.
466The number of pages requested is pointed to by
467.Fa npagesp ,
468and this value is updated with the actual number of pages returned.
469The flags can be any bitwise inclusive-or of:
470.Pp
471.Bl -tag -offset abcd -compact -width UVM_ADV_SEQUENTIAL
472.It Dv UFP_ALL
473Zero pseudo-flag meaning return all pages.
474.It Dv UFP_NOWAIT
475Don't sleep \(em yield
476.Dv NULL
477for busy pages or for uncached pages for which allocation would sleep.
478.It Dv UFP_NOALLOC
479Don't allocate \(em yield
480.Dv NULL
481for uncached pages.
482.It Dv UFP_NOCACHE
483Don't use cached pages \(em yield
484.Dv NULL
485instead.
486.It Dv UFP_NORDONLY
487Don't yield read-only pages \(em yield
488.Dv NULL
489for pages marked
490.Dv PG_READONLY .
491.It Dv UFP_DIRTYONLY
492Don't yield clean pages \(em stop early at the first clean one.
493As a side effect, mark yielded dirty pages clean.
494Caller must write them to permanent storage before unbusying.
495.It Dv UFP_BACKWARD
496Traverse pages in reverse order.
497If
498.Fn uvn_findpages
499returns early, it will have filled
500.Li * Ns Fa npagesp
501entries at the end of
502.Fa pps
503rather than the beginning.
504.El
505.Pp
506.Fn uvm_vnp_setsize
507sets the size of vnode
508.Fa vp
509to
510.Fa newsize .
511Caller must hold a reference to the vnode.
512If the vnode shrinks, pages no longer used are discarded.
513.Sh MISCELLANEOUS MACROS
514.Bl -ohang
515.It Ft paddr_t
516.Fn atop "paddr_t pa" ;
517.It Ft paddr_t
518.Fn ptoa "paddr_t pn" ;
519.It Ft paddr_t
520.Fn round_page "address" ;
521.It Ft paddr_t
522.Fn trunc_page "address" ;
523.El
524.Pp
525The
526.Fn atop
527macro converts a physical address
528.Fa pa
529into a page number.
530The
531.Fn ptoa
532macro does the opposite by converting a page number
533.Fa pn
534into a physical address.
535.Pp
536.Fn round_page
537and
538.Fn trunc_page
539macros return a page address boundary from rounding
540.Fa address
541up and down, respectively, to the nearest page boundary.
542These macros work for either addresses or byte counts.
543.Sh SYSCTL
544UVM provides support for the
545.Dv CTL_VM
546domain of the
547.Xr sysctl 3
548hierarchy.
549It handles the
550.Dv VM_LOADAVG ,
551.Dv VM_METER ,
552.Dv VM_UVMEXP ,
553and
554.Dv VM_UVMEXP2
555nodes, which return the current load averages, calculates current VM
556totals, returns the uvmexp structure, and a kernel version independent
557view of the uvmexp structure, respectively.
558It also exports a number of tunables that control how much VM space is
559allowed to be consumed by various tasks.
560The load averages are typically accessed from userland using the
561.Xr getloadavg 3
562function.
563The uvmexp structure has all global state of the UVM system,
564and has the following members:
565.Bd -literal
566/* vm_page constants */
567int pagesize;   /* size of a page (PAGE_SIZE): must be power of 2 */
568int pagemask;   /* page mask */
569int pageshift;  /* page shift */
570
571/* vm_page counters */
572int npages;     /* number of pages we manage */
573int free;       /* number of free pages */
574int paging;     /* number of pages in the process of being paged out */
575int wired;      /* number of wired pages */
576int reserve_pagedaemon; /* number of pages reserved for pagedaemon */
577int reserve_kernel; /* number of pages reserved for kernel */
578
579/* pageout params */
580int freemin;    /* min number of free pages */
581int freetarg;   /* target number of free pages */
582int inactarg;   /* target number of inactive pages */
583int wiredmax;   /* max number of wired pages */
584
585/* swap */
586int nswapdev;   /* number of configured swap devices in system */
587int swpages;    /* number of PAGE_SIZE'ed swap pages */
588int swpginuse;  /* number of swap pages in use */
589int nswget;     /* number of times fault calls uvm_swap_get() */
590int nanon;      /* number total of anon's in system */
591int nfreeanon;  /* number of free anon's */
592
593/* stat counters */
594int faults;             /* page fault count */
595int traps;              /* trap count */
596int intrs;              /* interrupt count */
597int swtch;              /* context switch count */
598int softs;              /* software interrupt count */
599int syscalls;           /* system calls */
600int pageins;            /* pagein operation count */
601                        /* pageouts are in pdpageouts below */
602int pgswapin;           /* pages swapped in */
603int pgswapout;          /* pages swapped out */
604int forks;              /* forks */
605int forks_ppwait;       /* forks where parent waits */
606int forks_sharevm;      /* forks where vmspace is shared */
607
608/* fault subcounters */
609int fltnoram;   /* number of times fault was out of ram */
610int fltnoanon;  /* number of times fault was out of anons */
611int fltpgwait;  /* number of times fault had to wait on a page */
612int fltpgrele;  /* number of times fault found a released page */
613int fltrelck;   /* number of times fault relock called */
614int fltrelckok; /* number of times fault relock is a success */
615int fltanget;   /* number of times fault gets anon page */
616int fltanretry; /* number of times fault retrys an anon get */
617int fltamcopy;  /* number of times fault clears "needs copy" */
618int fltnamap;   /* number of times fault maps a neighbor anon page */
619int fltnomap;   /* number of times fault maps a neighbor obj page */
620int fltlget;    /* number of times fault does a locked pgo_get */
621int fltget;     /* number of times fault does an unlocked get */
622int flt_anon;   /* number of times fault anon (case 1a) */
623int flt_acow;   /* number of times fault anon cow (case 1b) */
624int flt_obj;    /* number of times fault is on object page (2a) */
625int flt_prcopy; /* number of times fault promotes with copy (2b) */
626int flt_przero; /* number of times fault promotes with zerofill (2b) */
627
628/* daemon counters */
629int pdwoke;     /* number of times daemon woke up */
630int pdrevs;     /* number of times daemon rev'd clock hand */
631int pdfreed;    /* number of pages daemon freed since boot */
632int pdscans;    /* number of pages daemon scanned since boot */
633int pdanscan;   /* number of anonymous pages scanned by daemon */
634int pdobscan;   /* number of object pages scanned by daemon */
635int pdreact;    /* number of pages daemon reactivated since boot */
636int pdbusy;     /* number of times daemon found a busy page */
637int pdpageouts; /* number of times daemon started a pageout */
638int pdpending;  /* number of times daemon got a pending pageout */
639int pddeact;    /* number of pages daemon deactivates */
640.Ed
641.Sh NOTES
642.Fn uvm_chgkprot
643is only available if the kernel has been compiled with options
644.Dv KGDB .
645.Pp
646All structure and types whose names begin with
647.Dq vm_
648will be renamed to
649.Dq uvm_ .
650.Sh SEE ALSO
651.Xr swapctl 2 ,
652.Xr getloadavg 3 ,
653.Xr kvm 3 ,
654.Xr sysctl 3 ,
655.Xr ddb 4 ,
656.Xr options 4 ,
657.Xr memoryallocators 9 ,
658.Xr pmap 9 ,
659.Xr ubc 9 ,
660.Xr uvm_km 9 ,
661.Xr uvm_map 9
662.Rs
663.%A Charles D. Cranor
664.%A Gurudatta M. Parulkar
665.%T "The UVM Virtual Memory System"
666.%I USENIX Association
667.%B Proceedings of the USENIX Annual Technical Conference
668.%P 117-130
669.%D June 6-11, 1999
670.%U http://www.usenix.org/event/usenix99/full_papers/cranor/cranor.pdf
671.Re
672.Sh HISTORY
673UVM is a new VM system developed at Washington University in St. Louis
674(Missouri).
675UVM's roots lie partly in the Mach-based
676.Bx 4.4
677VM system, the
678.Fx
679VM system, and the SunOS 4 VM system.
680UVM's basic structure is based on the
681.Bx 4.4
682VM system.
683UVM's new anonymous memory system is based on the
684anonymous memory system found in the SunOS 4 VM (as described in papers
685published by Sun Microsystems, Inc.).
686UVM also includes a number of features new to
687.Bx
688including page loanout, map entry passing, simplified
689copy-on-write, and clustered anonymous memory pageout.
690UVM is also further documented in an August 1998 dissertation by
691Charles D. Cranor.
692.Pp
693UVM appeared in
694.Nx 1.4 .
695.Sh AUTHORS
696.An -nosplit
697.An Charles D. Cranor
698.Aq Mt chuck@ccrc.wustl.edu
699designed and implemented UVM.
700.Pp
701.An Matthew Green
702.Aq Mt mrg@eterna23.net
703wrote the swap-space management code and handled the logistical issues
704involved with merging UVM into the
705.Nx
706source tree.
707.Pp
708.An Chuck Silvers
709.Aq Mt chuq@chuq.com
710implemented the aobj pager, thus allowing UVM to support System V shared
711memory and process swapping.
712