1.\" $NetBSD: kmem.9,v 1.19 2015/12/11 10:05:17 wiz Exp $ 2.\" 3.\" Copyright (c)2006 YAMAMOTO Takashi, 4.\" All rights reserved. 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 1. Redistributions of source code must retain the above copyright 10.\" notice, this list of conditions and the following disclaimer. 11.\" 2. Redistributions in binary form must reproduce the above copyright 12.\" notice, this list of conditions and the following disclaimer in the 13.\" documentation and/or other materials provided with the distribution. 14.\" 15.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 16.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 17.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 18.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 19.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 20.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 21.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 22.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 23.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 25.\" SUCH DAMAGE. 26.\" 27.\" ------------------------------------------------------------ 28.Dd December 10, 2015 29.Dt KMEM 9 30.Os 31.\" ------------------------------------------------------------ 32.Sh NAME 33.Nm kmem 34.Nd kernel wired memory allocator 35.\" ------------------------------------------------------------ 36.Sh SYNOPSIS 37.In sys/kmem.h 38.\" - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 39.Ft void * 40.Fn kmem_alloc \ 41"size_t size" "km_flag_t kmflags" 42.Ft void * 43.Fn kmem_zalloc \ 44"size_t size" "km_flag_t kmflags" 45.Ft void 46.Fn kmem_free \ 47"void *p" "size_t size" 48.\" --- 49.Ft void * 50.Fn kmem_intr_alloc \ 51"size_t size" "km_flag_t kmflags" 52.Ft void * 53.Fn kmem_intr_zalloc \ 54"size_t size" "km_flag_t kmflags" 55.Ft void 56.Fn kmem_intr_free \ 57"void *p" "size_t size" 58.\" --- 59.Ft char * 60.Fn kmem_asprintf \ 61"const char *fmt" "..." 62.\" ------------------------------------------------------------ 63.Pp 64.Cd "options KMEM_SIZE" 65.Cd "options KMEM_REDZONE" 66.Cd "options KMEM_GUARD" 67.Sh DESCRIPTION 68.Fn kmem_alloc 69allocates kernel wired memory. 70It takes the following arguments. 71.Bl -tag -width kmflags 72.It Fa size 73Specify the size of allocation in bytes. 74.It Fa kmflags 75Either of the following: 76.Bl -tag -width KM_NOSLEEP 77.It Dv KM_SLEEP 78If the allocation cannot be satisfied immediately, sleep until enough 79memory is available. 80Note that this does not mean that if 81.Dv KM_SLEEP 82is specified, then the allocation cannot fail. 83Under resource stress conditions, the allocation can fail and the 84function will return 85.Dv NULL . 86One such scenario is when the allocation size is larger than it can ever 87be allocated; another is when the system memory resources are exhausted 88to even allocate pools of pages. 89.It Dv KM_NOSLEEP 90Don't sleep. 91Immediately return 92.Dv NULL 93if there is not enough memory available. 94It should only be used when failure to allocate will not have harmful, 95user-visible effects. 96.Pp 97.Bf -symbolic 98Use of 99.Dv KM_NOSLEEP 100is strongly discouraged as it can create transient, hard to debug failures 101that occur when the system is under memory pressure. 102.Ef 103.Pp 104In situations where it is not possible to sleep, for example because locks 105are held by the caller, the code path should be restructured to allow the 106allocation to be made in another place. 107.El 108.El 109.Pp 110The contents of allocated memory are uninitialized. 111.Pp 112Unlike Solaris, kmem_alloc(0, flags) is illegal. 113.Pp 114.\" ------------------------------------------------------------ 115.Fn kmem_zalloc 116is the equivalent of 117.Fn kmem_alloc , 118except that it initializes the memory to zero. 119.Pp 120.\" ------------------------------------------------------------ 121.Fn kmem_asprintf 122functions as the well known 123.Fn asprintf 124function, but allocates memory using 125.Fn kmem_alloc . 126This routine can sleep during allocation. 127The size of the allocated area is the length of the returned character string, plus one (for the NUL terminator). 128This must be taken into consideration when freeing the returned area with 129.Fn kmem_free . 130.Pp 131.\" ------------------------------------------------------------ 132.Fn kmem_free 133frees kernel wired memory allocated by 134.Fn kmem_alloc 135or 136.Fn kmem_zalloc 137so that it can be used for other purposes. 138It takes the following arguments. 139.Bl -tag -width kmflags 140.It Fa p 141The pointer to the memory being freed. 142It must be the one returned by 143.Fn kmem_alloc 144or 145.Fn kmem_zalloc . 146One such scenario is when the allocation size is larger than it can ever 147be allocated; another is when the system memory resources are exhausted 148to even allocate pools of pages. 149.It Fa size 150The size of the memory being freed, in bytes. 151It must be the same as the 152.Fa size 153argument used for 154.Fn kmem_alloc 155or 156.Fn kmem_zalloc 157when the memory was allocated. 158.El 159.Pp 160Freeing 161.Dv NULL 162is illegal. 163.Pp 164.\" ------------------------------------------------------------ 165.Fn kmem_intr_alloc , 166.Fn kmem_intr_zalloc 167and 168.Fn kmem_intr_free 169are the equivalents of the above kmem routines which can be called 170from the interrupt context. 171These routines are for the special cases. 172Normally, 173.Xr pool_cache 9 174should be used for memory allocation from interrupt context. 175.\" ------------------------------------------------------------ 176.Sh NOTES 177Making 178.Dv KM_SLEEP 179allocations while holding mutexes or reader/writer locks is discouraged, as the 180caller can sleep for an unbounded amount of time in order to satisfy the 181allocation. 182This can in turn block other threads that wish to acquire locks held by the 183caller. 184It should be noted that 185.Fn kmem_free 186may also block. 187.Pp 188Always check the return value of the allocators, even when 189.Dv KM_SLEEP 190is specified to avoid kernel crashes during resource stress conditions. 191.Pp 192For some locks this is permissible or even unavoidable. 193For others, particularly locks that may be taken from soft interrupt context, 194it is a serious problem. 195As a general rule it is better not to allow this type of situation to develop. 196One way to circumvent the problem is to make allocations speculative and part 197of a retryable sequence. 198For example: 199.Bd -literal 200 retry: 201 /* speculative unlocked check */ 202 if (need to allocate) { 203 new_item = kmem_alloc(sizeof(*new_item), KM_SLEEP); 204 } else { 205 new_item = NULL; 206 } 207 mutex_enter(lock); 208 /* check while holding lock for true status */ 209 if (need to allocate) { 210 if (new_item == NULL) { 211 mutex_exit(lock); 212 goto retry; 213 } 214 consume(new_item); 215 new_item = NULL; 216 } 217 mutex_exit(lock); 218 if (new_item != NULL) { 219 /* did not use it after all */ 220 kmem_free(new_item, sizeof(*new_item)); 221 } 222.Ed 223.\" ------------------------------------------------------------ 224.Sh OPTIONS 225.Ss KMEM_SIZE 226Kernels compiled with the 227.Dv KMEM_SIZE 228option ensure the size given in 229.Fn kmem_free 230matches the actual allocated size. 231On 232.Fn kmem_alloc , 233the kernel will allocate an additional contiguous kmem page of eight 234bytes in the buffer, will register the allocated size in the first kmem 235page of that buffer, and will return a pointer to the second kmem page 236in that same buffer. 237When freeing, the kernel reads the first page, and compares the 238size registered with the one given in 239.Fn kmem_free . 240Any mismatch triggers a panic. 241.Pp 242.Dv KMEM_SIZE 243is enabled by default on 244.Dv DIAGNOSTIC 245and 246.Dv DEBUG . 247.Ss KMEM_REDZONE 248Kernels compiled with the 249.Dv KMEM_REDZONE 250option add a dynamic pattern of two bytes at the end of each allocated 251buffer, and check this pattern when freeing to ensure the caller hasn't 252written outside the requested area. 253This option does not introduce a significant performance impact, 254but has two drawbacks: it only catches write overflows, and catches 255them only on 256.Fn kmem_free . 257.Pp 258.Dv KMEM_REDZONE 259is enabled by default on 260.Dv DIAGNOSTIC . 261.Ss KMEM_GUARD 262Kernels compiled with the 263.Dv KMEM_GUARD 264option perform CPU intensive sanity checks on kmem operations. 265It adds additional, very high overhead runtime verification to kmem 266operations. 267It must be enabled with 268.Dv KMEM_SIZE . 269.Pp 270.Dv KMEM_GUARD 271tries to catch the following types of bugs: 272.Bl -bullet 273.It 274Overflow at time of occurrence, by means of a guard page. 275An unmapped guard page sits immediately after the requested area; 276a read/write overflow therefore triggers a page fault. 277.It 278Underflow at 279.Fn kmem_free , 280by using 281.Dv KMEM_SIZE Ap s 282registered size. 283If an underflow occurs, the size stored by 284.Dv KMEM_SIZE 285will be overwritten, which means that when freeing, the kernel will 286spot the mismatch. 287.It 288Use-after-free at time of occurrence. 289When freeing, the memory is unmapped, and depending on the value 290of kmem_guard_depth, the kernel will more or less delay the recycling 291of that memory. 292Which means that any ulterior read/write access to the memory will 293trigger a page fault, given it hasn't been recycled yet. 294.El 295.Pp 296To enable it, boot the system with the 297.Fl d 298option, which causes the debugger to be entered early during the kernel 299boot process. 300Issue commands such as the following: 301.Bd -literal 302db\*[Gt] w kmem_guard_depth 0t30000 303db\*[Gt] c 304.Ed 305.Pp 306This instructs 307.Dv kmem_guard 308to queue up to 60000 (30000*2) pages of unmapped KVA to catch 309use-after-free type errors. 310When 311.Fn kmem_free 312is called, memory backing a freed item is unmapped and the kernel VA 313space pushed onto a FIFO. 314The VA space will not be reused until another 30k items have been freed. 315Until reused the kernel will catch invalid accesses and panic with a page fault. 316Limitations: 317.Bl -bullet 318.It 319It has a severe impact on performance. 320.It 321It is best used on a 64-bit machine with lots of RAM. 322.El 323.Pp 324.Dv KMEM_GUARD 325is enabled by default on 326.Dv DEBUG . 327.El 328.Sh RETURN VALUES 329On success, 330.Fn kmem_alloc 331and 332.Fn kmem_zalloc 333return a pointer to allocated memory. 334Otherwise, 335.Dv NULL 336is returned. 337.\" ------------------------------------------------------------ 338.Sh CODE REFERENCES 339The 340.Nm 341subsystem is implemented within the file 342.Pa sys/kern/subr_kmem.c . 343.\" ------------------------------------------------------------ 344.Sh SEE ALSO 345.Xr intro 9 , 346.Xr memoryallocators 9 , 347.Xr percpu 9 , 348.Xr pool_cache 9 , 349.Xr uvm_km 9 350.\" ------------------------------------------------------------ 351.Sh CAVEATS 352Neither 353.Fn kmem_alloc 354nor 355.Fn kmem_free 356can be used from interrupt context, from a soft interrupt, or from 357a callout. 358Use 359.Xr pool_cache 9 360in these situations. 361.\" ------------------------------------------------------------ 362.Sh SECURITY CONSIDERATIONS 363As the memory allocated by 364.Fn kmem_alloc 365is uninitialized, it can contain security-sensitive data left by its 366previous user. 367It is the caller's responsibility not to expose it to the world. 368