xref: /netbsd-src/share/man/man9/kmem.9 (revision 946379e7b37692fc43f68eb0d1c10daa0a7f3b6c)
1.\"	$NetBSD: kmem.9,v 1.19 2015/12/11 10:05:17 wiz Exp $
2.\"
3.\" Copyright (c)2006 YAMAMOTO Takashi,
4.\" All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\"
15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
16.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
18.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
19.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
20.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
21.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
22.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
23.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
25.\" SUCH DAMAGE.
26.\"
27.\" ------------------------------------------------------------
28.Dd December 10, 2015
29.Dt KMEM 9
30.Os
31.\" ------------------------------------------------------------
32.Sh NAME
33.Nm kmem
34.Nd kernel wired memory allocator
35.\" ------------------------------------------------------------
36.Sh SYNOPSIS
37.In sys/kmem.h
38.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
39.Ft void *
40.Fn kmem_alloc \
41"size_t size" "km_flag_t kmflags"
42.Ft void *
43.Fn kmem_zalloc \
44"size_t size" "km_flag_t kmflags"
45.Ft void
46.Fn kmem_free \
47"void *p" "size_t size"
48.\" ---
49.Ft void *
50.Fn kmem_intr_alloc \
51"size_t size" "km_flag_t kmflags"
52.Ft void *
53.Fn kmem_intr_zalloc \
54"size_t size" "km_flag_t kmflags"
55.Ft void
56.Fn kmem_intr_free \
57"void *p" "size_t size"
58.\" ---
59.Ft char *
60.Fn kmem_asprintf \
61"const char *fmt" "..."
62.\" ------------------------------------------------------------
63.Pp
64.Cd "options KMEM_SIZE"
65.Cd "options KMEM_REDZONE"
66.Cd "options KMEM_GUARD"
67.Sh DESCRIPTION
68.Fn kmem_alloc
69allocates kernel wired memory.
70It takes the following arguments.
71.Bl -tag -width kmflags
72.It Fa size
73Specify the size of allocation in bytes.
74.It Fa kmflags
75Either of the following:
76.Bl -tag -width KM_NOSLEEP
77.It Dv KM_SLEEP
78If the allocation cannot be satisfied immediately, sleep until enough
79memory is available.
80Note that this does not mean that if
81.Dv KM_SLEEP
82is specified, then the allocation cannot fail.
83Under resource stress conditions, the allocation can fail and the
84function will return
85.Dv NULL .
86One such scenario is when the allocation size is larger than it can ever
87be allocated; another is when the system memory resources are exhausted
88to even allocate pools of pages.
89.It Dv KM_NOSLEEP
90Don't sleep.
91Immediately return
92.Dv NULL
93if there is not enough memory available.
94It should only be used when failure to allocate will not have harmful,
95user-visible effects.
96.Pp
97.Bf -symbolic
98Use of
99.Dv KM_NOSLEEP
100is strongly discouraged as it can create transient, hard to debug failures
101that occur when the system is under memory pressure.
102.Ef
103.Pp
104In situations where it is not possible to sleep, for example because locks
105are held by the caller, the code path should be restructured to allow the
106allocation to be made in another place.
107.El
108.El
109.Pp
110The contents of allocated memory are uninitialized.
111.Pp
112Unlike Solaris, kmem_alloc(0, flags) is illegal.
113.Pp
114.\" ------------------------------------------------------------
115.Fn kmem_zalloc
116is the equivalent of
117.Fn kmem_alloc ,
118except that it initializes the memory to zero.
119.Pp
120.\" ------------------------------------------------------------
121.Fn kmem_asprintf
122functions as the well known
123.Fn asprintf
124function, but allocates memory using
125.Fn kmem_alloc .
126This routine can sleep during allocation.
127The size of the allocated area is the length of the returned character string, plus one (for the NUL terminator).
128This must be taken into consideration when freeing the returned area with
129.Fn kmem_free .
130.Pp
131.\" ------------------------------------------------------------
132.Fn kmem_free
133frees kernel wired memory allocated by
134.Fn kmem_alloc
135or
136.Fn kmem_zalloc
137so that it can be used for other purposes.
138It takes the following arguments.
139.Bl -tag -width kmflags
140.It Fa p
141The pointer to the memory being freed.
142It must be the one returned by
143.Fn kmem_alloc
144or
145.Fn kmem_zalloc .
146One such scenario is when the allocation size is larger than it can ever
147be allocated; another is when the system memory resources are exhausted
148to even allocate pools of pages.
149.It Fa size
150The size of the memory being freed, in bytes.
151It must be the same as the
152.Fa size
153argument used for
154.Fn kmem_alloc
155or
156.Fn kmem_zalloc
157when the memory was allocated.
158.El
159.Pp
160Freeing
161.Dv NULL
162is illegal.
163.Pp
164.\" ------------------------------------------------------------
165.Fn kmem_intr_alloc ,
166.Fn kmem_intr_zalloc
167and
168.Fn kmem_intr_free
169are the equivalents of the above kmem routines which can be called
170from the interrupt context.
171These routines are for the special cases.
172Normally,
173.Xr pool_cache 9
174should be used for memory allocation from interrupt context.
175.\" ------------------------------------------------------------
176.Sh NOTES
177Making
178.Dv KM_SLEEP
179allocations while holding mutexes or reader/writer locks is discouraged, as the
180caller can sleep for an unbounded amount of time in order to satisfy the
181allocation.
182This can in turn block other threads that wish to acquire locks held by the
183caller.
184It should be noted that
185.Fn kmem_free
186may also block.
187.Pp
188Always check the return value of the allocators, even when
189.Dv KM_SLEEP
190is specified to avoid kernel crashes during resource stress conditions.
191.Pp
192For some locks this is permissible or even unavoidable.
193For others, particularly locks that may be taken from soft interrupt context,
194it is a serious problem.
195As a general rule it is better not to allow this type of situation to develop.
196One way to circumvent the problem is to make allocations speculative and part
197of a retryable sequence.
198For example:
199.Bd -literal
200  retry:
201        /* speculative unlocked check */
202        if (need to allocate) {
203                new_item = kmem_alloc(sizeof(*new_item), KM_SLEEP);
204        } else {
205                new_item = NULL;
206        }
207        mutex_enter(lock);
208        /* check while holding lock for true status */
209        if (need to allocate) {
210                if (new_item == NULL) {
211                        mutex_exit(lock);
212                        goto retry;
213                }
214                consume(new_item);
215                new_item = NULL;
216        }
217        mutex_exit(lock);
218        if (new_item != NULL) {
219                /* did not use it after all */
220                kmem_free(new_item, sizeof(*new_item));
221        }
222.Ed
223.\" ------------------------------------------------------------
224.Sh OPTIONS
225.Ss KMEM_SIZE
226Kernels compiled with the
227.Dv KMEM_SIZE
228option ensure the size given in
229.Fn kmem_free
230matches the actual allocated size.
231On
232.Fn kmem_alloc ,
233the kernel will allocate an additional contiguous kmem page of eight
234bytes in the buffer, will register the allocated size in the first kmem
235page of that buffer, and will return a pointer to the second kmem page
236in that same buffer.
237When freeing, the kernel reads the first page, and compares the
238size registered with the one given in
239.Fn kmem_free .
240Any mismatch triggers a panic.
241.Pp
242.Dv KMEM_SIZE
243is enabled by default on
244.Dv DIAGNOSTIC
245and
246.Dv DEBUG .
247.Ss KMEM_REDZONE
248Kernels compiled with the
249.Dv KMEM_REDZONE
250option add a dynamic pattern of two bytes at the end of each allocated
251buffer, and check this pattern when freeing to ensure the caller hasn't
252written outside the requested area.
253This option does not introduce a significant performance impact,
254but has two drawbacks: it only catches write overflows, and catches
255them only on
256.Fn kmem_free .
257.Pp
258.Dv KMEM_REDZONE
259is enabled by default on
260.Dv DIAGNOSTIC .
261.Ss KMEM_GUARD
262Kernels compiled with the
263.Dv KMEM_GUARD
264option perform CPU intensive sanity checks on kmem operations.
265It adds additional, very high overhead runtime verification to kmem
266operations.
267It must be enabled with
268.Dv KMEM_SIZE .
269.Pp
270.Dv KMEM_GUARD
271tries to catch the following types of bugs:
272.Bl -bullet
273.It
274Overflow at time of occurrence, by means of a guard page.
275An unmapped guard page sits immediately after the requested area;
276a read/write overflow therefore triggers a page fault.
277.It
278Underflow at
279.Fn kmem_free ,
280by using
281.Dv KMEM_SIZE Ap s
282registered size.
283If an underflow occurs, the size stored by
284.Dv KMEM_SIZE
285will be overwritten, which means that when freeing, the kernel will
286spot the mismatch.
287.It
288Use-after-free at time of occurrence.
289When freeing, the memory is unmapped, and depending on the value
290of kmem_guard_depth, the kernel will more or less delay the recycling
291of that memory.
292Which means that any ulterior read/write access to the memory will
293trigger a page fault, given it hasn't been recycled yet.
294.El
295.Pp
296To enable it, boot the system with the
297.Fl d
298option, which causes the debugger to be entered early during the kernel
299boot process.
300Issue commands such as the following:
301.Bd -literal
302db\*[Gt] w kmem_guard_depth 0t30000
303db\*[Gt] c
304.Ed
305.Pp
306This instructs
307.Dv kmem_guard
308to queue up to 60000 (30000*2) pages of unmapped KVA to catch
309use-after-free type errors.
310When
311.Fn kmem_free
312is called, memory backing a freed item is unmapped and the kernel VA
313space pushed onto a FIFO.
314The VA space will not be reused until another 30k items have been freed.
315Until reused the kernel will catch invalid accesses and panic with a page fault.
316Limitations:
317.Bl -bullet
318.It
319It has a severe impact on performance.
320.It
321It is best used on a 64-bit machine with lots of RAM.
322.El
323.Pp
324.Dv KMEM_GUARD
325is enabled by default on
326.Dv DEBUG .
327.El
328.Sh RETURN VALUES
329On success,
330.Fn kmem_alloc
331and
332.Fn kmem_zalloc
333return a pointer to allocated memory.
334Otherwise,
335.Dv NULL
336is returned.
337.\" ------------------------------------------------------------
338.Sh CODE REFERENCES
339The
340.Nm
341subsystem is implemented within the file
342.Pa sys/kern/subr_kmem.c .
343.\" ------------------------------------------------------------
344.Sh SEE ALSO
345.Xr intro 9 ,
346.Xr memoryallocators 9 ,
347.Xr percpu 9 ,
348.Xr pool_cache 9 ,
349.Xr uvm_km 9
350.\" ------------------------------------------------------------
351.Sh CAVEATS
352Neither
353.Fn kmem_alloc
354nor
355.Fn kmem_free
356can be used from interrupt context, from a soft interrupt, or from
357a callout.
358Use
359.Xr pool_cache 9
360in these situations.
361.\" ------------------------------------------------------------
362.Sh SECURITY CONSIDERATIONS
363As the memory allocated by
364.Fn kmem_alloc
365is uninitialized, it can contain security-sensitive data left by its
366previous user.
367It is the caller's responsibility not to expose it to the world.
368