xref: /netbsd-src/share/man/man9/kmem.9 (revision a6f3f22f245acb8ee3bbf6871d7dce989204fa97)
1.\"	$NetBSD: kmem.9,v 1.17 2015/07/28 09:52:43 wiz Exp $
2.\"
3.\" Copyright (c)2006 YAMAMOTO Takashi,
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\"
15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
16.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
18.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
19.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
20.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
21.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
22.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
23.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
25.\" SUCH DAMAGE.
26.\"
27.\" ------------------------------------------------------------
28.Dd July 28, 2015
29.Dt KMEM 9
30.Os
31.\" ------------------------------------------------------------
32.Sh NAME
33.Nm kmem
34.Nd kernel wired memory allocator
35.\" ------------------------------------------------------------
36.Sh SYNOPSIS
37.In sys/kmem.h
38.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
39.Ft void *
40.Fn kmem_alloc \
41"size_t size" "km_flag_t kmflags"
42.Ft void *
43.Fn kmem_zalloc \
44"size_t size" "km_flag_t kmflags"
45.Ft void
46.Fn kmem_free \
47"void *p" "size_t size"
48.\" ---
49.Ft void *
50.Fn kmem_intr_alloc \
51"size_t size" "km_flag_t kmflags"
52.Ft void *
53.Fn kmem_intr_zalloc \
54"size_t size" "km_flag_t kmflags"
55.Ft void
56.Fn kmem_intr_free \
57"void *p" "size_t size"
58.\" ---
59.Ft char *
60.Fn kmem_asprintf \
61"const char *fmt" "..."
62.\" ------------------------------------------------------------
63.Pp
64.Cd "options KMEM_SIZE"
65.Cd "options KMEM_REDZONE"
66.Cd "options KMEM_GUARD"
67.Sh DESCRIPTION
68.Fn kmem_alloc
69allocates kernel wired memory.
70It takes the following arguments.
71.Bl -tag -width kmflags
72.It Fa size
73Specify the size of allocation in bytes.
74.It Fa kmflags
75Either of the following:
76.Bl -tag -width KM_NOSLEEP
77.It Dv KM_SLEEP
78If the allocation cannot be satisfied immediately, sleep until enough
79memory is available.
80.It Dv KM_NOSLEEP
81Don't sleep.
82Immediately return
83.Dv NULL
84if there is not enough memory available.
85It should only be used when failure to allocate will not have harmful,
86user-visible effects.
87.Pp
88.Bf -symbolic
89Use of
90.Dv KM_NOSLEEP
91is strongly discouraged as it can create transient, hard to debug failures
92that occur when the system is under memory pressure.
93.Ef
94.Pp
95In situations where it is not possible to sleep, for example because locks
96are held by the caller, the code path should be restructured to allow the
97allocation to be made in another place.
98.El
99.El
100.Pp
101The contents of allocated memory are uninitialized.
102.Pp
103Unlike Solaris, kmem_alloc(0, flags) is illegal.
104.Pp
105.\" ------------------------------------------------------------
106.Fn kmem_zalloc
107is the equivalent of
108.Fn kmem_alloc ,
109except that it initializes the memory to zero.
110.Pp
111.\" ------------------------------------------------------------
112.Fn kmem_asprintf
113functions as the well known
114.Fn asprintf
115function, but allocates memory using
116.Fn kmem_alloc .
117This routine can sleep during allocation.
118The size of the allocated area is the length of the returned character string, plus one (for the NUL terminator).
119This must be taken into consideration when freeing the returned area with
120.Fn kmem_free .
121.Pp
122.\" ------------------------------------------------------------
123.Fn kmem_free
124frees kernel wired memory allocated by
125.Fn kmem_alloc
126or
127.Fn kmem_zalloc
128so that it can be used for other purposes.
129It takes the following arguments.
130.Bl -tag -width kmflags
131.It Fa p
132The pointer to the memory being freed.
133It must be the one returned by
134.Fn kmem_alloc
135or
136.Fn kmem_zalloc .
137.It Fa size
138The size of the memory being freed, in bytes.
139It must be the same as the
140.Fa size
141argument used for
142.Fn kmem_alloc
143or
144.Fn kmem_zalloc
145when the memory was allocated.
146.El
147.Pp
148Freeing
149.Dv NULL
150is illegal.
151.Pp
152.\" ------------------------------------------------------------
153.Fn kmem_intr_alloc ,
154.Fn kmem_intr_zalloc
155and
156.Fn kmem_intr_free
157are the equivalents of the above kmem routines which can be called
158from the interrupt context.
159These routines are for the special cases.
160Normally,
161.Xr pool_cache 9
162should be used for memory allocation from interrupt context.
163.\" ------------------------------------------------------------
164.Sh NOTES
165Making
166.Dv KM_SLEEP
167allocations while holding mutexes or reader/writer locks is discouraged, as the
168caller can sleep for an unbounded amount of time in order to satisfy the
169allocation.
170This can in turn block other threads that wish to acquire locks held by the
171caller.
172It should be noted that
173.Fn kmem_free
174may also block.
175.Pp
176For some locks this is permissible or even unavoidable.
177For others, particularly locks that may be taken from soft interrupt context,
178it is a serious problem.
179As a general rule it is better not to allow this type of situation to develop.
180One way to circumvent the problem is to make allocations speculative and part
181of a retryable sequence.
182For example:
183.Bd -literal
184  retry:
185        /* speculative unlocked check */
186        if (need to allocate) {
187                new_item = kmem_alloc(sizeof(*new_item), KM_SLEEP);
188        } else {
189                new_item = NULL;
190        }
191        mutex_enter(lock);
192        /* check while holding lock for true status */
193        if (need to allocate) {
194                if (new_item == NULL) {
195                        mutex_exit(lock);
196                        goto retry;
197                }
198                consume(new_item);
199                new_item = NULL;
200        }
201        mutex_exit(lock);
202        if (new_item != NULL) {
203                /* did not use it after all */
204                kmem_free(new_item, sizeof(*new_item));
205        }
206.Ed
207.\" ------------------------------------------------------------
208.Sh OPTIONS
209.Ss KMEM_SIZE
210Kernels compiled with the
211.Dv KMEM_SIZE
212option ensure the size given in
213.Fn kmem_free
214matches the actual allocated size.
215On
216.Fn kmem_alloc ,
217the kernel will allocate an additional contiguous kmem page of eight
218bytes in the buffer, will register the allocated size in the first kmem
219page of that buffer, and will return a pointer to the second kmem page
220in that same buffer.
221When freeing, the kernel reads the first page, and compares the
222size registered with the one given in
223.Fn kmem_free .
224Any mismatch triggers a panic.
225.Pp
226.Dv KMEM_SIZE
227is enabled by default on
228.Dv DIAGNOSTIC
229and
230.Dv DEBUG .
231.Ss KMEM_REDZONE
232Kernels compiled with the
233.Dv KMEM_REDZONE
234option add a dynamic pattern of two bytes at the end of each allocated
235buffer, and check this pattern when freeing to ensure the caller hasn't
236written outside the requested area.
237This option does not introduce a significant performance impact,
238but has two drawbacks: it only catches write overflows, and catches
239them only on
240.Fn kmem_free .
241.Pp
242.Dv KMEM_REDZONE
243is enabled by default on
244.Dv DIAGNOSTIC .
245.Ss KMEM_GUARD
246Kernels compiled with the
247.Dv KMEM_GUARD
248option perform CPU intensive sanity checks on kmem operations.
249It adds additional, very high overhead runtime verification to kmem
250operations.
251It must be enabled with
252.Dv KMEM_SIZE .
253.Pp
254.Dv KMEM_GUARD
255tries to catch the following types of bugs:
256.Bl -bullet
257.It
258Overflow at time of occurrence, by means of a guard page.
259An unmapped guard page sits immediately after the requested area;
260a read/write overflow therefore triggers a page fault.
261.It
262Underflow at
263.Fn kmem_free ,
264by using
265.Dv KMEM_SIZE Ap s
266registered size.
267If an underflow occurs, the size stored by
268.Dv KMEM_SIZE
269will be overwritten, which means that when freeing, the kernel will
270spot the mismatch.
271.It
272Use-after-free at time of occurrence.
273When freeing, the memory is unmapped, and depending on the value
274of kmem_guard_depth, the kernel will more or less delay the recycling
275of that memory.
276Which means that any ulterior read/write access to the memory will
277trigger a page fault, given it hasn't been recycled yet.
278.El
279.Pp
280To enable it, boot the system with the
281.Fl d
282option, which causes the debugger to be entered early during the kernel
283boot process.
284Issue commands such as the following:
285.Bd -literal
286db\*[Gt] w kmem_guard_depth 0t30000
287db\*[Gt] c
288.Ed
289.Pp
290This instructs
291.Dv kmem_guard
292to queue up to 60000 (30000*2) pages of unmapped KVA to catch
293use-after-free type errors.
294When
295.Fn kmem_free
296is called, memory backing a freed item is unmapped and the kernel VA
297space pushed onto a FIFO.
298The VA space will not be reused until another 30k items have been freed.
299Until reused the kernel will catch invalid accesses and panic with a page fault.
300Limitations:
301.Bl -bullet
302.It
303It has a severe impact on performance.
304.It
305It is best used on a 64-bit machine with lots of RAM.
306.El
307.Pp
308.Dv KMEM_GUARD
309is enabled by default on
310.Dv DEBUG .
311.El
312.Sh RETURN VALUES
313On success,
314.Fn kmem_alloc
315and
316.Fn kmem_zalloc
317return a pointer to allocated memory.
318Otherwise,
319.Dv NULL
320is returned.
321.\" ------------------------------------------------------------
322.Sh CODE REFERENCES
323The
324.Nm
325subsystem is implemented within the file
326.Pa sys/kern/subr_kmem.c .
327.\" ------------------------------------------------------------
328.Sh SEE ALSO
329.Xr intro 9 ,
330.Xr memoryallocators 9 ,
331.Xr percpu 9 ,
332.Xr pool_cache 9 ,
333.Xr uvm_km 9
334.\" ------------------------------------------------------------
335.Sh CAVEATS
336Neither
337.Fn kmem_alloc
338nor
339.Fn kmem_free
340can be used from interrupt context, from a soft interrupt, or from
341a callout.
342Use
343.Xr pool_cache 9
344in these situations.
345.\" ------------------------------------------------------------
346.Sh SECURITY CONSIDERATIONS
347As the memory allocated by
348.Fn kmem_alloc
349is uninitialized, it can contain security-sensitive data left by its
350previous user.
351It is the caller's responsibility not to expose it to the world.
352