1.\" $NetBSD: kmem.9,v 1.17 2015/07/28 09:52:43 wiz Exp $ 2.\" 3.\" Copyright (c)2006 YAMAMOTO Takashi, 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 16.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 17.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 18.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 19.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 20.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 21.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 22.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 23.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 25.\" SUCH DAMAGE. 26.\" 27.\" ------------------------------------------------------------ 28.Dd July 28, 2015 29.Dt KMEM 9 30.Os 31.\" ------------------------------------------------------------ 32.Sh NAME 33.Nm kmem 34.Nd kernel wired memory allocator 35.\" ------------------------------------------------------------ 36.Sh SYNOPSIS 37.In sys/kmem.h 38.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 39.Ft void * 40.Fn kmem_alloc \ 41"size_t size" "km_flag_t kmflags" 42.Ft void * 43.Fn kmem_zalloc \ 44"size_t size" "km_flag_t kmflags" 45.Ft void 46.Fn kmem_free \ 47"void *p" "size_t size" 48.\" --- 49.Ft void * 50.Fn kmem_intr_alloc \ 51"size_t size" "km_flag_t kmflags" 52.Ft void * 53.Fn kmem_intr_zalloc \ 54"size_t size" "km_flag_t kmflags" 55.Ft void 56.Fn kmem_intr_free \ 57"void *p" "size_t size" 58.\" --- 59.Ft char * 60.Fn kmem_asprintf \ 61"const char *fmt" "..." 62.\" ------------------------------------------------------------ 63.Pp 64.Cd "options KMEM_SIZE" 65.Cd "options KMEM_REDZONE" 66.Cd "options KMEM_GUARD" 67.Sh DESCRIPTION 68.Fn kmem_alloc 69allocates kernel wired memory. 70It takes the following arguments. 71.Bl -tag -width kmflags 72.It Fa size 73Specify the size of allocation in bytes. 74.It Fa kmflags 75Either of the following: 76.Bl -tag -width KM_NOSLEEP 77.It Dv KM_SLEEP 78If the allocation cannot be satisfied immediately, sleep until enough 79memory is available. 80.It Dv KM_NOSLEEP 81Don't sleep. 82Immediately return 83.Dv NULL 84if there is not enough memory available. 85It should only be used when failure to allocate will not have harmful, 86user-visible effects. 87.Pp 88.Bf -symbolic 89Use of 90.Dv KM_NOSLEEP 91is strongly discouraged as it can create transient, hard to debug failures 92that occur when the system is under memory pressure. 93.Ef 94.Pp 95In situations where it is not possible to sleep, for example because locks 96are held by the caller, the code path should be restructured to allow the 97allocation to be made in another place. 98.El 99.El 100.Pp 101The contents of allocated memory are uninitialized. 102.Pp 103Unlike Solaris, kmem_alloc(0, flags) is illegal. 104.Pp 105.\" ------------------------------------------------------------ 106.Fn kmem_zalloc 107is the equivalent of 108.Fn kmem_alloc , 109except that it initializes the memory to zero. 110.Pp 111.\" ------------------------------------------------------------ 112.Fn kmem_asprintf 113functions as the well known 114.Fn asprintf 115function, but allocates memory using 116.Fn kmem_alloc . 117This routine can sleep during allocation. 118The size of the allocated area is the length of the returned character string, plus one (for the NUL terminator). 119This must be taken into consideration when freeing the returned area with 120.Fn kmem_free . 121.Pp 122.\" ------------------------------------------------------------ 123.Fn kmem_free 124frees kernel wired memory allocated by 125.Fn kmem_alloc 126or 127.Fn kmem_zalloc 128so that it can be used for other purposes. 129It takes the following arguments. 130.Bl -tag -width kmflags 131.It Fa p 132The pointer to the memory being freed. 133It must be the one returned by 134.Fn kmem_alloc 135or 136.Fn kmem_zalloc . 137.It Fa size 138The size of the memory being freed, in bytes. 139It must be the same as the 140.Fa size 141argument used for 142.Fn kmem_alloc 143or 144.Fn kmem_zalloc 145when the memory was allocated. 146.El 147.Pp 148Freeing 149.Dv NULL 150is illegal. 151.Pp 152.\" ------------------------------------------------------------ 153.Fn kmem_intr_alloc , 154.Fn kmem_intr_zalloc 155and 156.Fn kmem_intr_free 157are the equivalents of the above kmem routines which can be called 158from the interrupt context. 159These routines are for the special cases. 160Normally, 161.Xr pool_cache 9 162should be used for memory allocation from interrupt context. 163.\" ------------------------------------------------------------ 164.Sh NOTES 165Making 166.Dv KM_SLEEP 167allocations while holding mutexes or reader/writer locks is discouraged, as the 168caller can sleep for an unbounded amount of time in order to satisfy the 169allocation. 170This can in turn block other threads that wish to acquire locks held by the 171caller. 172It should be noted that 173.Fn kmem_free 174may also block. 175.Pp 176For some locks this is permissible or even unavoidable. 177For others, particularly locks that may be taken from soft interrupt context, 178it is a serious problem. 179As a general rule it is better not to allow this type of situation to develop. 180One way to circumvent the problem is to make allocations speculative and part 181of a retryable sequence. 182For example: 183.Bd -literal 184 retry: 185 /* speculative unlocked check */ 186 if (need to allocate) { 187 new_item = kmem_alloc(sizeof(*new_item), KM_SLEEP); 188 } else { 189 new_item = NULL; 190 } 191 mutex_enter(lock); 192 /* check while holding lock for true status */ 193 if (need to allocate) { 194 if (new_item == NULL) { 195 mutex_exit(lock); 196 goto retry; 197 } 198 consume(new_item); 199 new_item = NULL; 200 } 201 mutex_exit(lock); 202 if (new_item != NULL) { 203 /* did not use it after all */ 204 kmem_free(new_item, sizeof(*new_item)); 205 } 206.Ed 207.\" ------------------------------------------------------------ 208.Sh OPTIONS 209.Ss KMEM_SIZE 210Kernels compiled with the 211.Dv KMEM_SIZE 212option ensure the size given in 213.Fn kmem_free 214matches the actual allocated size. 215On 216.Fn kmem_alloc , 217the kernel will allocate an additional contiguous kmem page of eight 218bytes in the buffer, will register the allocated size in the first kmem 219page of that buffer, and will return a pointer to the second kmem page 220in that same buffer. 221When freeing, the kernel reads the first page, and compares the 222size registered with the one given in 223.Fn kmem_free . 224Any mismatch triggers a panic. 225.Pp 226.Dv KMEM_SIZE 227is enabled by default on 228.Dv DIAGNOSTIC 229and 230.Dv DEBUG . 231.Ss KMEM_REDZONE 232Kernels compiled with the 233.Dv KMEM_REDZONE 234option add a dynamic pattern of two bytes at the end of each allocated 235buffer, and check this pattern when freeing to ensure the caller hasn't 236written outside the requested area. 237This option does not introduce a significant performance impact, 238but has two drawbacks: it only catches write overflows, and catches 239them only on 240.Fn kmem_free . 241.Pp 242.Dv KMEM_REDZONE 243is enabled by default on 244.Dv DIAGNOSTIC . 245.Ss KMEM_GUARD 246Kernels compiled with the 247.Dv KMEM_GUARD 248option perform CPU intensive sanity checks on kmem operations. 249It adds additional, very high overhead runtime verification to kmem 250operations. 251It must be enabled with 252.Dv KMEM_SIZE . 253.Pp 254.Dv KMEM_GUARD 255tries to catch the following types of bugs: 256.Bl -bullet 257.It 258Overflow at time of occurrence, by means of a guard page. 259An unmapped guard page sits immediately after the requested area; 260a read/write overflow therefore triggers a page fault. 261.It 262Underflow at 263.Fn kmem_free , 264by using 265.Dv KMEM_SIZE Ap s 266registered size. 267If an underflow occurs, the size stored by 268.Dv KMEM_SIZE 269will be overwritten, which means that when freeing, the kernel will 270spot the mismatch. 271.It 272Use-after-free at time of occurrence. 273When freeing, the memory is unmapped, and depending on the value 274of kmem_guard_depth, the kernel will more or less delay the recycling 275of that memory. 276Which means that any ulterior read/write access to the memory will 277trigger a page fault, given it hasn't been recycled yet. 278.El 279.Pp 280To enable it, boot the system with the 281.Fl d 282option, which causes the debugger to be entered early during the kernel 283boot process. 284Issue commands such as the following: 285.Bd -literal 286db\*[Gt] w kmem_guard_depth 0t30000 287db\*[Gt] c 288.Ed 289.Pp 290This instructs 291.Dv kmem_guard 292to queue up to 60000 (30000*2) pages of unmapped KVA to catch 293use-after-free type errors. 294When 295.Fn kmem_free 296is called, memory backing a freed item is unmapped and the kernel VA 297space pushed onto a FIFO. 298The VA space will not be reused until another 30k items have been freed. 299Until reused the kernel will catch invalid accesses and panic with a page fault. 300Limitations: 301.Bl -bullet 302.It 303It has a severe impact on performance. 304.It 305It is best used on a 64-bit machine with lots of RAM. 306.El 307.Pp 308.Dv KMEM_GUARD 309is enabled by default on 310.Dv DEBUG . 311.El 312.Sh RETURN VALUES 313On success, 314.Fn kmem_alloc 315and 316.Fn kmem_zalloc 317return a pointer to allocated memory. 318Otherwise, 319.Dv NULL 320is returned. 321.\" ------------------------------------------------------------ 322.Sh CODE REFERENCES 323The 324.Nm 325subsystem is implemented within the file 326.Pa sys/kern/subr_kmem.c . 327.\" ------------------------------------------------------------ 328.Sh SEE ALSO 329.Xr intro 9 , 330.Xr memoryallocators 9 , 331.Xr percpu 9 , 332.Xr pool_cache 9 , 333.Xr uvm_km 9 334.\" ------------------------------------------------------------ 335.Sh CAVEATS 336Neither 337.Fn kmem_alloc 338nor 339.Fn kmem_free 340can be used from interrupt context, from a soft interrupt, or from 341a callout. 342Use 343.Xr pool_cache 9 344in these situations. 345.\" ------------------------------------------------------------ 346.Sh SECURITY CONSIDERATIONS 347As the memory allocated by 348.Fn kmem_alloc 349is uninitialized, it can contain security-sensitive data left by its 350previous user. 351It is the caller's responsibility not to expose it to the world. 352