xref: /netbsd-src/share/man/man9/kmem.9 (revision d909946ca08dceb44d7d0f22ec9488679695d976)
1.\"	$NetBSD: kmem.9,v 1.20 2016/02/29 00:34:17 chs Exp $
2.\"
3.\" Copyright (c)2006 YAMAMOTO Takashi,
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\"
15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
16.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
18.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
19.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
20.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
21.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
22.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
23.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
25.\" SUCH DAMAGE.
26.\"
27.\" ------------------------------------------------------------
28.Dd February 28, 2016
29.Dt KMEM 9
30.Os
31.\" ------------------------------------------------------------
32.Sh NAME
33.Nm kmem
34.Nd kernel wired memory allocator
35.\" ------------------------------------------------------------
36.Sh SYNOPSIS
37.In sys/kmem.h
38.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
39.Ft void *
40.Fn kmem_alloc \
41"size_t size" "km_flag_t kmflags"
42.Ft void *
43.Fn kmem_zalloc \
44"size_t size" "km_flag_t kmflags"
45.Ft void
46.Fn kmem_free \
47"void *p" "size_t size"
48.\" ---
49.Ft void *
50.Fn kmem_intr_alloc \
51"size_t size" "km_flag_t kmflags"
52.Ft void *
53.Fn kmem_intr_zalloc \
54"size_t size" "km_flag_t kmflags"
55.Ft void
56.Fn kmem_intr_free \
57"void *p" "size_t size"
58.\" ---
59.Ft char *
60.Fn kmem_asprintf \
61"const char *fmt" "..."
62.\" ------------------------------------------------------------
63.Pp
64.Cd "options KMEM_SIZE"
65.Cd "options KMEM_REDZONE"
66.Cd "options KMEM_GUARD"
67.Sh DESCRIPTION
68.Fn kmem_alloc
69allocates kernel wired memory.
70It takes the following arguments.
71.Bl -tag -width kmflags
72.It Fa size
73Specify the size of allocation in bytes.
74.It Fa kmflags
75Either of the following:
76.Bl -tag -width KM_NOSLEEP
77.It Dv KM_SLEEP
78If the allocation cannot be satisfied immediately, sleep until enough
79memory is available.
80If
81.Dv KM_SLEEP
82is specified, then the allocation cannot fail.
83.It Dv KM_NOSLEEP
84Don't sleep.
85Immediately return
86.Dv NULL
87if there is not enough memory available.
88It should only be used when failure to allocate will not have harmful,
89user-visible effects.
90.Pp
91.Bf -symbolic
92Use of
93.Dv KM_NOSLEEP
94is strongly discouraged as it can create transient, hard to debug failures
95that occur when the system is under memory pressure.
96.Ef
97.Pp
98In situations where it is not possible to sleep, for example because locks
99are held by the caller, the code path should be restructured to allow the
100allocation to be made in another place.
101.El
102.El
103.Pp
104The contents of allocated memory are uninitialized.
105.Pp
106Unlike Solaris, kmem_alloc(0, flags) is illegal.
107.Pp
108.\" ------------------------------------------------------------
109.Fn kmem_zalloc
110is the equivalent of
111.Fn kmem_alloc ,
112except that it initializes the memory to zero.
113.Pp
114.\" ------------------------------------------------------------
115.Fn kmem_asprintf
116functions as the well known
117.Fn asprintf
118function, but allocates memory using
119.Fn kmem_alloc .
120This routine can sleep during allocation.
121The size of the allocated area is the length of the returned character string, plus one (for the NUL terminator).
122This must be taken into consideration when freeing the returned area with
123.Fn kmem_free .
124.Pp
125.\" ------------------------------------------------------------
126.Fn kmem_free
127frees kernel wired memory allocated by
128.Fn kmem_alloc
129or
130.Fn kmem_zalloc
131so that it can be used for other purposes.
132It takes the following arguments.
133.Bl -tag -width kmflags
134.It Fa p
135The pointer to the memory being freed.
136It must be the one returned by
137.Fn kmem_alloc
138or
139.Fn kmem_zalloc .
140.It Fa size
141The size of the memory being freed, in bytes.
142It must be the same as the
143.Fa size
144argument used for
145.Fn kmem_alloc
146or
147.Fn kmem_zalloc
148when the memory was allocated.
149.El
150.Pp
151Freeing
152.Dv NULL
153is illegal.
154.Pp
155.\" ------------------------------------------------------------
156.Fn kmem_intr_alloc ,
157.Fn kmem_intr_zalloc
158and
159.Fn kmem_intr_free
160are the equivalents of the above kmem routines which can be called
161from the interrupt context.
162These routines are for the special cases.
163Normally,
164.Xr pool_cache 9
165should be used for memory allocation from interrupt context.
166.\" ------------------------------------------------------------
167.Sh NOTES
168Making
169.Dv KM_SLEEP
170allocations while holding mutexes or reader/writer locks is discouraged, as the
171caller can sleep for an unbounded amount of time in order to satisfy the
172allocation.
173This can in turn block other threads that wish to acquire locks held by the
174caller.
175It should be noted that
176.Fn kmem_free
177may also block.
178.Pp
179For some locks this is permissible or even unavoidable.
180For others, particularly locks that may be taken from soft interrupt context,
181it is a serious problem.
182As a general rule it is better not to allow this type of situation to develop.
183One way to circumvent the problem is to make allocations speculative and part
184of a retryable sequence.
185For example:
186.Bd -literal
187  retry:
188        /* speculative unlocked check */
189        if (need to allocate) {
190                new_item = kmem_alloc(sizeof(*new_item), KM_SLEEP);
191        } else {
192                new_item = NULL;
193        }
194        mutex_enter(lock);
195        /* check while holding lock for true status */
196        if (need to allocate) {
197                if (new_item == NULL) {
198                        mutex_exit(lock);
199                        goto retry;
200                }
201                consume(new_item);
202                new_item = NULL;
203        }
204        mutex_exit(lock);
205        if (new_item != NULL) {
206                /* did not use it after all */
207                kmem_free(new_item, sizeof(*new_item));
208        }
209.Ed
210.\" ------------------------------------------------------------
211.Sh OPTIONS
212.Ss KMEM_SIZE
213Kernels compiled with the
214.Dv KMEM_SIZE
215option ensure the size given in
216.Fn kmem_free
217matches the actual allocated size.
218On
219.Fn kmem_alloc ,
220the kernel will allocate an additional contiguous kmem page of eight
221bytes in the buffer, will register the allocated size in the first kmem
222page of that buffer, and will return a pointer to the second kmem page
223in that same buffer.
224When freeing, the kernel reads the first page, and compares the
225size registered with the one given in
226.Fn kmem_free .
227Any mismatch triggers a panic.
228.Pp
229.Dv KMEM_SIZE
230is enabled by default on
231.Dv DIAGNOSTIC
232and
233.Dv DEBUG .
234.Ss KMEM_REDZONE
235Kernels compiled with the
236.Dv KMEM_REDZONE
237option add a dynamic pattern of two bytes at the end of each allocated
238buffer, and check this pattern when freeing to ensure the caller hasn't
239written outside the requested area.
240This option does not introduce a significant performance impact,
241but has two drawbacks: it only catches write overflows, and catches
242them only on
243.Fn kmem_free .
244.Pp
245.Dv KMEM_REDZONE
246is enabled by default on
247.Dv DIAGNOSTIC .
248.Ss KMEM_GUARD
249Kernels compiled with the
250.Dv KMEM_GUARD
251option perform CPU intensive sanity checks on kmem operations.
252It adds additional, very high overhead runtime verification to kmem
253operations.
254It must be enabled with
255.Dv KMEM_SIZE .
256.Pp
257.Dv KMEM_GUARD
258tries to catch the following types of bugs:
259.Bl -bullet
260.It
261Overflow at time of occurrence, by means of a guard page.
262An unmapped guard page sits immediately after the requested area;
263a read/write overflow therefore triggers a page fault.
264.It
265Underflow at
266.Fn kmem_free ,
267by using
268.Dv KMEM_SIZE Ap s
269registered size.
270If an underflow occurs, the size stored by
271.Dv KMEM_SIZE
272will be overwritten, which means that when freeing, the kernel will
273spot the mismatch.
274.It
275Use-after-free at time of occurrence.
276When freeing, the memory is unmapped, and depending on the value
277of kmem_guard_depth, the kernel will more or less delay the recycling
278of that memory.
279Which means that any ulterior read/write access to the memory will
280trigger a page fault, given it hasn't been recycled yet.
281.El
282.Pp
283To enable it, boot the system with the
284.Fl d
285option, which causes the debugger to be entered early during the kernel
286boot process.
287Issue commands such as the following:
288.Bd -literal
289db\*[Gt] w kmem_guard_depth 0t30000
290db\*[Gt] c
291.Ed
292.Pp
293This instructs
294.Dv kmem_guard
295to queue up to 60000 (30000*2) pages of unmapped KVA to catch
296use-after-free type errors.
297When
298.Fn kmem_free
299is called, memory backing a freed item is unmapped and the kernel VA
300space pushed onto a FIFO.
301The VA space will not be reused until another 30k items have been freed.
302Until reused the kernel will catch invalid accesses and panic with a page fault.
303Limitations:
304.Bl -bullet
305.It
306It has a severe impact on performance.
307.It
308It is best used on a 64-bit machine with lots of RAM.
309.El
310.Pp
311.Dv KMEM_GUARD
312is enabled by default on
313.Dv DEBUG .
314.El
315.Sh RETURN VALUES
316On success,
317.Fn kmem_alloc
318and
319.Fn kmem_zalloc
320return a pointer to allocated memory.
321Otherwise,
322.Dv NULL
323is returned.
324.\" ------------------------------------------------------------
325.Sh CODE REFERENCES
326The
327.Nm
328subsystem is implemented within the file
329.Pa sys/kern/subr_kmem.c .
330.\" ------------------------------------------------------------
331.Sh SEE ALSO
332.Xr intro 9 ,
333.Xr memoryallocators 9 ,
334.Xr percpu 9 ,
335.Xr pool_cache 9 ,
336.Xr uvm_km 9
337.\" ------------------------------------------------------------
338.Sh CAVEATS
339Neither
340.Fn kmem_alloc
341nor
342.Fn kmem_free
343can be used from interrupt context, from a soft interrupt, or from
344a callout.
345Use
346.Xr pool_cache 9
347in these situations.
348.\" ------------------------------------------------------------
349.Sh SECURITY CONSIDERATIONS
350As the memory allocated by
351.Fn kmem_alloc
352is uninitialized, it can contain security-sensitive data left by its
353previous user.
354It is the caller's responsibility not to expose it to the world.
355