1.\" $NetBSD: kmem.9,v 1.20 2016/02/29 00:34:17 chs Exp $ 2.\" 3.\" Copyright (c)2006 YAMAMOTO Takashi, 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 16.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 17.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 18.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 19.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 20.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 21.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 22.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 23.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 25.\" SUCH DAMAGE. 26.\" 27.\" ------------------------------------------------------------ 28.Dd February 28, 2016 29.Dt KMEM 9 30.Os 31.\" ------------------------------------------------------------ 32.Sh NAME 33.Nm kmem 34.Nd kernel wired memory allocator 35.\" ------------------------------------------------------------ 36.Sh SYNOPSIS 37.In sys/kmem.h 38.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 39.Ft void * 40.Fn kmem_alloc \ 41"size_t size" "km_flag_t kmflags" 42.Ft void * 43.Fn kmem_zalloc \ 44"size_t size" "km_flag_t kmflags" 45.Ft void 46.Fn kmem_free \ 47"void *p" "size_t size" 48.\" --- 49.Ft void * 50.Fn kmem_intr_alloc \ 51"size_t size" "km_flag_t kmflags" 52.Ft void * 53.Fn kmem_intr_zalloc \ 54"size_t size" "km_flag_t kmflags" 55.Ft void 56.Fn kmem_intr_free \ 57"void *p" "size_t size" 58.\" --- 59.Ft char * 60.Fn kmem_asprintf \ 61"const char *fmt" "..." 62.\" ------------------------------------------------------------ 63.Pp 64.Cd "options KMEM_SIZE" 65.Cd "options KMEM_REDZONE" 66.Cd "options KMEM_GUARD" 67.Sh DESCRIPTION 68.Fn kmem_alloc 69allocates kernel wired memory. 70It takes the following arguments. 71.Bl -tag -width kmflags 72.It Fa size 73Specify the size of allocation in bytes. 74.It Fa kmflags 75Either of the following: 76.Bl -tag -width KM_NOSLEEP 77.It Dv KM_SLEEP 78If the allocation cannot be satisfied immediately, sleep until enough 79memory is available. 80If 81.Dv KM_SLEEP 82is specified, then the allocation cannot fail. 83.It Dv KM_NOSLEEP 84Don't sleep. 85Immediately return 86.Dv NULL 87if there is not enough memory available. 88It should only be used when failure to allocate will not have harmful, 89user-visible effects. 90.Pp 91.Bf -symbolic 92Use of 93.Dv KM_NOSLEEP 94is strongly discouraged as it can create transient, hard to debug failures 95that occur when the system is under memory pressure. 96.Ef 97.Pp 98In situations where it is not possible to sleep, for example because locks 99are held by the caller, the code path should be restructured to allow the 100allocation to be made in another place. 101.El 102.El 103.Pp 104The contents of allocated memory are uninitialized. 105.Pp 106Unlike Solaris, kmem_alloc(0, flags) is illegal. 107.Pp 108.\" ------------------------------------------------------------ 109.Fn kmem_zalloc 110is the equivalent of 111.Fn kmem_alloc , 112except that it initializes the memory to zero. 113.Pp 114.\" ------------------------------------------------------------ 115.Fn kmem_asprintf 116functions as the well known 117.Fn asprintf 118function, but allocates memory using 119.Fn kmem_alloc . 120This routine can sleep during allocation. 121The size of the allocated area is the length of the returned character string, plus one (for the NUL terminator). 122This must be taken into consideration when freeing the returned area with 123.Fn kmem_free . 124.Pp 125.\" ------------------------------------------------------------ 126.Fn kmem_free 127frees kernel wired memory allocated by 128.Fn kmem_alloc 129or 130.Fn kmem_zalloc 131so that it can be used for other purposes. 132It takes the following arguments. 133.Bl -tag -width kmflags 134.It Fa p 135The pointer to the memory being freed. 136It must be the one returned by 137.Fn kmem_alloc 138or 139.Fn kmem_zalloc . 140.It Fa size 141The size of the memory being freed, in bytes. 142It must be the same as the 143.Fa size 144argument used for 145.Fn kmem_alloc 146or 147.Fn kmem_zalloc 148when the memory was allocated. 149.El 150.Pp 151Freeing 152.Dv NULL 153is illegal. 154.Pp 155.\" ------------------------------------------------------------ 156.Fn kmem_intr_alloc , 157.Fn kmem_intr_zalloc 158and 159.Fn kmem_intr_free 160are the equivalents of the above kmem routines which can be called 161from the interrupt context. 162These routines are for the special cases. 163Normally, 164.Xr pool_cache 9 165should be used for memory allocation from interrupt context. 166.\" ------------------------------------------------------------ 167.Sh NOTES 168Making 169.Dv KM_SLEEP 170allocations while holding mutexes or reader/writer locks is discouraged, as the 171caller can sleep for an unbounded amount of time in order to satisfy the 172allocation. 173This can in turn block other threads that wish to acquire locks held by the 174caller. 175It should be noted that 176.Fn kmem_free 177may also block. 178.Pp 179For some locks this is permissible or even unavoidable. 180For others, particularly locks that may be taken from soft interrupt context, 181it is a serious problem. 182As a general rule it is better not to allow this type of situation to develop. 183One way to circumvent the problem is to make allocations speculative and part 184of a retryable sequence. 185For example: 186.Bd -literal 187 retry: 188 /* speculative unlocked check */ 189 if (need to allocate) { 190 new_item = kmem_alloc(sizeof(*new_item), KM_SLEEP); 191 } else { 192 new_item = NULL; 193 } 194 mutex_enter(lock); 195 /* check while holding lock for true status */ 196 if (need to allocate) { 197 if (new_item == NULL) { 198 mutex_exit(lock); 199 goto retry; 200 } 201 consume(new_item); 202 new_item = NULL; 203 } 204 mutex_exit(lock); 205 if (new_item != NULL) { 206 /* did not use it after all */ 207 kmem_free(new_item, sizeof(*new_item)); 208 } 209.Ed 210.\" ------------------------------------------------------------ 211.Sh OPTIONS 212.Ss KMEM_SIZE 213Kernels compiled with the 214.Dv KMEM_SIZE 215option ensure the size given in 216.Fn kmem_free 217matches the actual allocated size. 218On 219.Fn kmem_alloc , 220the kernel will allocate an additional contiguous kmem page of eight 221bytes in the buffer, will register the allocated size in the first kmem 222page of that buffer, and will return a pointer to the second kmem page 223in that same buffer. 224When freeing, the kernel reads the first page, and compares the 225size registered with the one given in 226.Fn kmem_free . 227Any mismatch triggers a panic. 228.Pp 229.Dv KMEM_SIZE 230is enabled by default on 231.Dv DIAGNOSTIC 232and 233.Dv DEBUG . 234.Ss KMEM_REDZONE 235Kernels compiled with the 236.Dv KMEM_REDZONE 237option add a dynamic pattern of two bytes at the end of each allocated 238buffer, and check this pattern when freeing to ensure the caller hasn't 239written outside the requested area. 240This option does not introduce a significant performance impact, 241but has two drawbacks: it only catches write overflows, and catches 242them only on 243.Fn kmem_free . 244.Pp 245.Dv KMEM_REDZONE 246is enabled by default on 247.Dv DIAGNOSTIC . 248.Ss KMEM_GUARD 249Kernels compiled with the 250.Dv KMEM_GUARD 251option perform CPU intensive sanity checks on kmem operations. 252It adds additional, very high overhead runtime verification to kmem 253operations. 254It must be enabled with 255.Dv KMEM_SIZE . 256.Pp 257.Dv KMEM_GUARD 258tries to catch the following types of bugs: 259.Bl -bullet 260.It 261Overflow at time of occurrence, by means of a guard page. 262An unmapped guard page sits immediately after the requested area; 263a read/write overflow therefore triggers a page fault. 264.It 265Underflow at 266.Fn kmem_free , 267by using 268.Dv KMEM_SIZE Ap s 269registered size. 270If an underflow occurs, the size stored by 271.Dv KMEM_SIZE 272will be overwritten, which means that when freeing, the kernel will 273spot the mismatch. 274.It 275Use-after-free at time of occurrence. 276When freeing, the memory is unmapped, and depending on the value 277of kmem_guard_depth, the kernel will more or less delay the recycling 278of that memory. 279Which means that any ulterior read/write access to the memory will 280trigger a page fault, given it hasn't been recycled yet. 281.El 282.Pp 283To enable it, boot the system with the 284.Fl d 285option, which causes the debugger to be entered early during the kernel 286boot process. 287Issue commands such as the following: 288.Bd -literal 289db\*[Gt] w kmem_guard_depth 0t30000 290db\*[Gt] c 291.Ed 292.Pp 293This instructs 294.Dv kmem_guard 295to queue up to 60000 (30000*2) pages of unmapped KVA to catch 296use-after-free type errors. 297When 298.Fn kmem_free 299is called, memory backing a freed item is unmapped and the kernel VA 300space pushed onto a FIFO. 301The VA space will not be reused until another 30k items have been freed. 302Until reused the kernel will catch invalid accesses and panic with a page fault. 303Limitations: 304.Bl -bullet 305.It 306It has a severe impact on performance. 307.It 308It is best used on a 64-bit machine with lots of RAM. 309.El 310.Pp 311.Dv KMEM_GUARD 312is enabled by default on 313.Dv DEBUG . 314.El 315.Sh RETURN VALUES 316On success, 317.Fn kmem_alloc 318and 319.Fn kmem_zalloc 320return a pointer to allocated memory. 321Otherwise, 322.Dv NULL 323is returned. 324.\" ------------------------------------------------------------ 325.Sh CODE REFERENCES 326The 327.Nm 328subsystem is implemented within the file 329.Pa sys/kern/subr_kmem.c . 330.\" ------------------------------------------------------------ 331.Sh SEE ALSO 332.Xr intro 9 , 333.Xr memoryallocators 9 , 334.Xr percpu 9 , 335.Xr pool_cache 9 , 336.Xr uvm_km 9 337.\" ------------------------------------------------------------ 338.Sh CAVEATS 339Neither 340.Fn kmem_alloc 341nor 342.Fn kmem_free 343can be used from interrupt context, from a soft interrupt, or from 344a callout. 345Use 346.Xr pool_cache 9 347in these situations. 348.\" ------------------------------------------------------------ 349.Sh SECURITY CONSIDERATIONS 350As the memory allocated by 351.Fn kmem_alloc 352is uninitialized, it can contain security-sensitive data left by its 353previous user. 354It is the caller's responsibility not to expose it to the world. 355