kernmalloc.t - OpenGrok cross reference for /freebsd-src/share/doc/papers/kernmalloc/kernmalloc.t

Lines Matching +full:memory +full:- +full:to +full:- +full:memory
13 .\"    may be used to endorse or promote products derived from this software
17 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
68 pp. 295-303, June 1988.
74 Design of a General Purpose Memory Allocator for the 4.3BSD UNIX\(dg Kernel
78 .EH 'Design of a General Purpose Memory ...''McKusick, Karels'
79 .OH 'McKusick, Karels''Design of a General Purpose Memory ...'
94 The 4.3BSD UNIX kernel uses many memory allocation mechanisms,
96 This paper describes a general purpose dynamic memory allocator
98 The design of this allocator takes advantage of known memory usage
99 patterns in the UNIX kernel and a hybrid strategy that is time-efficient
100 for small allocations and space-efficient for large allocations.
101 This allocator replaces the multiple memory allocation interfaces 
102 with a single easy-to-program interface,
103 results in more efficient use of global memory by eliminating
104 partitioned and specialized memory pools,
106 relative to the current implementations.
108 the new memory allocator,
112 .H 1 "Kernel Memory Allocation in 4.3BSD
114 The 4.3BSD kernel has at least ten different memory allocators.
117 and others include information to describe I/O operations.
118 Often the allocations are for small pieces of memory that are only
120 In a user process such short-term
121 memory would be allocated on the run-time stack.
122 Because the kernel has a limited run-time stack,
123 it is not feasible to allocate even moderate blocks of memory on it.
124 Consequently, such memory must be allocated through a more dynamic mechanism.
127 it must allocate a one kilobye buffer to hold the name.
128 Other blocks of memory must be more persistent than a single system call
129 and really have to be allocated from dynamic memory.
133 Demands for dynamic memory allocation in the kernel have increased
135 Each time a new type of memory allocation has been required,
136 a specialized memory allocation scheme has been written to handle it.
137 Often the new memory allocation scheme has been built on top
140 memory allocation through the allocation of empty buffers [Thompson78].
142 finding the oldest buffer, pushing its contents to disk if they are dirty,
143 and moving physical memory into or out of the buffer to create 
145 To reduce the overhead, a ``new'' memory allocator was built in 4.3BSD
150 This memory allocation method has several drawbacks.
152 Second, it depletes the buffer pool, as it steals memory intended
153 to buffer disk blocks to other purposes.
157 A generalized memory allocator is needed to reduce the complexity
159 Rather than providing many semi-specialized ways of allocating memory,
162 programmers do not need to figure
163 out the most appropriate way to allocate memory.
168 To ease the task of understanding how to use it,
169 the memory allocator should have an interface similar to the interface
170 of the well-known memory allocator provided for
177 size of memory that is needed.
178 The range of sizes for memory requests should not be constrained.
179 The free routine should take a pointer to the storage being freed,
181 of the piece of memory being freed.
182 .H 1 "Criteria for a Kernel Memory Allocator
184 The design specification for a kernel memory allocator is similar to,
185 but not identical to,
186 the design criteria for a user level memory allocator.
187 The first criterion for a memory allocator is that it make good use
188 of the physical memory.
189 Good use of memory is measured by the amount of memory needed to hold
203 Here, ``requested'' is the sum of the memory that has been requested
205 ``Required'' is the amount of memory that has been
207 An allocator requires more memory than requested because of fragmentation
208 and a need to have a ready supply of free memory for future requests.
209 A perfect memory allocator would have a utilization of 100%.
213 Good memory utilization in the kernel is more important than
215 Because user processes run in virtual memory,
219 being ``requested'' need not tie up physical memory.
224 it is desirable to release unused memory in the ``required'' pool
225 rather than to hold it as is typically done with user processes.
227 releasing unused memory is fast;
228 a user process must do a system call to release memory.
230 The most important criterion for a memory allocator is that it be fast.
231 Because memory allocation is done frequently,
232 a slow memory allocator will degrade the system performance.
236 processes can allocate cheaply on their run-time stack.
242 Another problem with a slow memory allocator is that programmers
243 of frequently-used kernel interfaces will feel that they
244 cannot afford to use it as their primary memory allocator. 
245 Instead they will build their own memory allocator on top of the
246 original by maintaining their own pool of memory blocks.
247 Multiple allocators reduce the efficiency with which memory is used.
248 The kernel ends up with many different free lists of memory
251 consider the case of two subsystems that need memory.
253 the amount of memory tied up in the two lists will be the
254 sum of the greatest amount of memory that each of
257 the amount of memory tied up in the free list may be as low as the
258 greatest amount of memory that either subsystem used.
261 .H 1 "Existing User-level Implementations
264 implementations of user-level memory allocators.
266 Nearly all of the memory allocators tested made good use of memory, 
268 The fastest memory allocator in the survey by nearly a factor of two
269 was the memory allocator provided on 4.2BSD originally
272 the 4.2BSD memory allocator also wasted twice as much memory
275 The 4.2BSD user-level memory allocator works by maintaining a set of lists
277 Each list contains a set of memory blocks of its corresponding size.
278 To fulfill a memory request, 
279 the size of the request is rounded up to the next power of two.
280 A piece of memory is then removed from the list corresponding
281 to the specified power of two and returned to the requester.
282 Thus, a request for a block of memory of size 53 returns
283 a block from the 64-sized list.
284 A typical memory allocation requires a roundup calculation
286 Only if the list is empty is a real memory allocation done.
288 the block of memory is put back onto the list from which it came.
290 immediately preceding the memory block.
291 .H 1 "Considerations Unique to a Kernel Allocator
294 memory allocator for the kernel that do not apply to a user process
295 memory allocator.
296 First, the maximum memory allocation can be determined at 
298 This number is never more than the amount of physical memory on the machine,
300 memory dedicated to the operating system is uninteresting to use.
302 to manage its dynamically allocated memory.
303 These data structures never need to be
304 expanded to accommodate memory requests;
306 For a user process, the maximum amount of memory that may be allocated
307 is a function of the maximum size of its virtual memory.
308 Although it could allocate static data structures to manage
309 its entire virtual memory,
311 The other alternative is to allocate data structures as they are needed.
314 structures and additional mechanisms to link them all together.
316 Another special condition of the kernel memory allocator is that it
320 pieces from that arena which it then populates with physical memory.
324 allocated to its address space.
326 kernel exactly corresponds to the set of pages that it is really using.
327 .FI "One day memory usage on a Berkeley time-sharing machine"
331 A final special condition that applies to the kernel is that
332 all of the different uses of dynamic memory are known in advance.
333 Each one of these uses of dynamic memory can be assigned a type.
334 For each type of dynamic memory that is allocated,
338 its available memory and thus a single runaway
340 By putting limits on each type of memory,
341 the single general purpose memory allocator can provide the same
342 protection against memory starvation.\(dg
349 \*(Lb shows the memory usage of the kernel over a one day period
358 and are typically for long-lived objects
359 such as buffers to hold the superblock for
361 Thus, a memory allocator only needs to be
362 fast for small pieces of memory.
363 .H 1 "Implementation of the Kernel Memory Allocator
365 In reviewing the available memory allocators,
367 The kernel memory allocator that we ended up with is a hybrid
368 of the fast memory allocator found in the 4.2BSD C library
369 and a slower but more-memory-efficient first-fit allocator.
371 Small allocations are done using the 4.2BSD power-of-two list strategy;
373 the list to use and the removal of an element if it is available,
375 Macros are provided to avoid the cost of a subroutine call.
377 made to the allocator itself.
379 the lists corresponding to large allocations are always empty.
382 Similarly, freeing a block of memory can be done with a macro.
383 The macro computes the list on which to place the request
385 The free routine is called only if the block of memory is
386 considered to be a large allocation.
391 Because of the inefficiency of power-of-two allocation strategies
395 the utilization of memory within the kernel,
396 that showed that 95 to 98% of allocations are of size one kilobyte or less.
397 A frequent caller of the memory allocator
401 pieces of memory in multiples of pages.
406 observes that the power-of-two algorithm yields sizes of 1, 2, 4, 8, \&...
412 that a difference emerges where the power-of-two algorithm will use
418 Large allocations are first rounded up to be a multiple of the page size.
419 The allocator then uses a first-fit algorithm to find space in the
421 Thus a request for a five kilobyte piece of memory will use exactly
422 five pages of memory rather than eight kilobytes as with
423 the power-of-two allocation strategy.
424 When a large piece of memory is freed,
425 the memory pages are returned to the free memory pool,
426 and the address space is returned to the kernel address arena
429 Another technique to improve both the efficiency of memory utilization
431 is to cluster same-sized small allocations on a page.
432 When a list for a power-of-two allocation is empty,
434 This strategy speeds future allocations as several pieces of memory
440 Because the size is not specified when a block of memory is freed,
442 The 4.2BSD user-level allocator stores the size of each block
444 However, this strategy doubles the memory requirement for allocations that
445 require a power-of-two-sized block.
447 instead of storing the size of each piece of memory with the piece itself,
448 the size information is associated with the memory page.
450 the size of a piece of memory that is being freed,
456 memory whose size is exactly a power of two.
457 These requests would be nearly doubled if the user-level strategy were used.
458 Now they can be accommodated with no wasted memory.
461 which is willing to wait for memory to become available,
463 that cannot wait for memory to become available.
464 Clients indicate their willingness (and ability) to wait with a flag
466 For clients that are willing to wait,
469 If memory is unavailable and the client cannot wait,
471 These clients must be prepared to cope with this
473 (usually by giving up and hoping to do better later).
476 The new memory allocator was written about a year ago.
477 Conversion from the old memory allocators to the new allocator
485 Many of the special purpose memory allocators built on
496 Consequently applications that used to allocate network
497 buffers for their own uses have been switched over to using
501 it is hard to measure the amount of time spent allocating
502 and freeing memory in the kernel.
503 The usual approach is to compile a kernel for profiling
506 The old routines are difficult to quantify because
510 routine was used both to allocate one kilobyte memory blocks
511 and for its intended purpose of providing buffers to the filesystem.
513 To get a measure of the cost of memory allocation before
516 exclusive task was memory allocation.
518 of the running time of the multi-purpose routines that could
519 clearly be identified as memory allocation usage.
521 the time spent in the kernel could be accounted to memory allocation.
523 The new allocator is difficult to measure
524 because the usual case of the memory allocator is implemented as a macro.
528 we changed the macro always to call the memory allocation routine.
529 Running in this mode, the memory allocator accounted for six percent
537 significant new run-time costs.
540 on a per-page basis.
541 This technique allows the most frequently requested sizes to be
544 with the allocator to four kilobytes of information
545 per megabyte of memory under management (with a one kilobyte page size).
548 Our next project is to convert many of the static
549 kernel tables to be dynamically allocated.
553 First, it will reduce the amount of memory
557 (although a limit will be retained to constrain runaway clients).
558 Other researchers have already shown the memory savings
562 memory is never moved from one size list to another.
563 With the 4.2BSD memory allocator this causes problems,
565 a quarter megabyte piece of memory once,
568 memory can be shuffled between large requests so that large blocks
569 of memory are never stranded as they are with the 4.2BSD allocator.
570 However, pages allocated to small requests are allocated once
573 that size would acquire a large amount of memory
577 However, we have been investigating ways to handle such problems
583 the effect of the sorting would be to sort the list into distinct pages.
585 the page itself could be released back to the free pool so that 
586 it could be allocated to another purpose.
589 most allocations are short-lived, lasting only for the duration of
591 As new allocations would be made from the page sorted to
594 allow pages later in the list to be freed.
597 memory allocators remain in the current system.
599 That part of the system is expected to undergo major revision within
600 the next year or so, and it will probably be changed to use
604 the routine that manages the filesystem buffer pool memory
609 it manages the constant-sized buffer pool.
610 We plan to merge the filesystem buffer cache into the virtual memory system's
612 This change will allow the size of the buffer pool to be changed
613 according to memory load,
614 but will require a policy for balancing memory needs
619 we have made various versions of our allocator available to our test sites.
628 David Korn, Kiem-Phong Vo,
631 pp 489-506, June 1985.
636 pp 519-531, June 1985.
646 pp 1931-1946, 1978.