1*404b540aSrobert<?xml version="1.0" encoding="ISO-8859-1"?> 2*404b540aSrobert<!DOCTYPE html 3*404b540aSrobert PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 4*404b540aSrobert "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 5*404b540aSrobert 6*404b540aSrobert<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 7*404b540aSrobert<head> 8*404b540aSrobert <meta name="AUTHOR" content="Stefan Olsson <stefan@xapa.se>" /> 9*404b540aSrobert <meta name="KEYWORDS" content="c++, libstdc++, g++, allocator, memory" /> 10*404b540aSrobert <meta name="DESCRIPTION" content="Allocators and allocation" /> 11*404b540aSrobert <meta name="GENERATOR" content="emacs and ten fingers" /> 12*404b540aSrobert <title>A fixed-size, multi-thread optimized allocator</title> 13*404b540aSrobert<link rel="StyleSheet" href="../lib3styles.css" type="text/css" /> 14*404b540aSrobert<link rel="Start" href="../documentation.html" type="text/html" 15*404b540aSrobert title="GNU C++ Standard Library" /> 16*404b540aSrobert<link rel="Bookmark" href="howto.html" type="text/html" title="Extensions" /> 17*404b540aSrobert<link rel="Copyright" href="../17_intro/license.html" type="text/html" /> 18*404b540aSrobert</head> 19*404b540aSrobert<body> 20*404b540aSrobert 21*404b540aSrobert<h1 class="centered"><a name="top">A fixed-size, multi-thread optimized allocator</a></h1> 22*404b540aSrobert 23*404b540aSrobert<p class="fineprint"><em> 24*404b540aSrobert The latest version of this document is always available at 25*404b540aSrobert <a href="http://gcc.gnu.org/onlinedocs/libstdc++/ext/mt_allocator.html"> 26*404b540aSrobert http://gcc.gnu.org/onlinedocs/libstdc++/ext/mt_allocator.html</a>. 27*404b540aSrobert</em></p> 28*404b540aSrobert 29*404b540aSrobert<p><em> 30*404b540aSrobert To the <a href="http://gcc.gnu.org/libstdc++/">libstdc++-v3 homepage</a>. 31*404b540aSrobert</em></p> 32*404b540aSrobert 33*404b540aSrobert<!-- ####################################################### --> 34*404b540aSrobert<hr /> 35*404b540aSrobert<h3 class="left"> 36*404b540aSrobert <a name="intro">Introduction</a> 37*404b540aSrobert</h3> 38*404b540aSrobert 39*404b540aSrobert<p> The mt allocator [hereinafter referred to simply as "the 40*404b540aSrobertallocator"] is a fixed size (power of two) allocator that was 41*404b540aSrobertinitially developed specifically to suit the needs of multi threaded 42*404b540aSrobertapplications [hereinafter referred to as an MT application]. Over time 43*404b540aSrobertthe allocator has evolved and been improved in many ways, in 44*404b540aSrobertparticular it now also does a good job in single threaded applications 45*404b540aSrobert[hereinafter referred to as a ST application]. (Note: In this 46*404b540aSrobertdocument, when referring to single threaded applications this also 47*404b540aSrobertincludes applications that are compiled with gcc without thread 48*404b540aSrobertsupport enabled. This is accomplished using ifdef's on 49*404b540aSrobert__GTHREADS). This allocator is tunable, very flexible, and capable of 50*404b540aSroberthigh-performance. 51*404b540aSrobert</p> 52*404b540aSrobert 53*404b540aSrobert<p> 54*404b540aSrobertThe aim of this document is to describe - from a application point of 55*404b540aSrobertview - the "inner workings" of the allocator. 56*404b540aSrobert</p> 57*404b540aSrobert 58*404b540aSrobert<h3 class="left"> 59*404b540aSrobert <a name="design">Design Overview</a> 60*404b540aSrobert</h3> 61*404b540aSrobert 62*404b540aSrobert<p> There are three general components to the allocator: a datum 63*404b540aSrobertdescribing the characteristics of the memory pool, a policy class 64*404b540aSrobertcontaining this pool that links instantiation types to common or 65*404b540aSrobertindividual pools, and a class inheriting from the policy class that is 66*404b540aSrobertthe actual allocator. 67*404b540aSrobert</p> 68*404b540aSrobert 69*404b540aSrobert<p>The datum describing pools characteristics is 70*404b540aSrobert <pre> 71*404b540aSrobert template<bool _Thread> 72*404b540aSrobert class __pool 73*404b540aSrobert </pre> 74*404b540aSrobertThis class is parametrized on thread support, and is explicitly 75*404b540aSrobertspecialized for both multiple threads (with <code>bool==true</code>) 76*404b540aSrobertand single threads (via <code>bool==false</code>.) It is possible to 77*404b540aSrobertuse a custom pool datum instead of the default class that is provided. 78*404b540aSrobert</p> 79*404b540aSrobert 80*404b540aSrobert<p> There are two distinct policy classes, each of which can be used 81*404b540aSrobertwith either type of underlying pool datum. 82*404b540aSrobert</p> 83*404b540aSrobert 84*404b540aSrobert<pre> 85*404b540aSrobert template<bool _Thread> 86*404b540aSrobert struct __common_pool_policy 87*404b540aSrobert 88*404b540aSrobert template<typename _Tp, bool _Thread> 89*404b540aSrobert struct __per_type_pool_policy 90*404b540aSrobert</pre> 91*404b540aSrobert 92*404b540aSrobert<p> The first policy, <code>__common_pool_policy</code>, implements a 93*404b540aSrobertcommon pool. This means that allocators that are instantiated with 94*404b540aSrobertdifferent types, say <code>char</code> and <code>long</code> will both 95*404b540aSrobertuse the same pool. This is the default policy. 96*404b540aSrobert</p> 97*404b540aSrobert 98*404b540aSrobert<p> The second policy, <code>__per_type_pool_policy</code>, implements 99*404b540aSroberta separate pool for each instantiating type. Thus, <code>char</code> 100*404b540aSrobertand <code>long</code> will use separate pools. This allows per-type 101*404b540aSroberttuning, for instance. 102*404b540aSrobert</p> 103*404b540aSrobert 104*404b540aSrobert<p> Putting this all together, the actual allocator class is 105*404b540aSrobert<pre> 106*404b540aSrobert template<typename _Tp, typename _Poolp = __default_policy> 107*404b540aSrobert class __mt_alloc : public __mt_alloc_base<_Tp>, _Poolp 108*404b540aSrobert</pre> 109*404b540aSrobertThis class has the interface required for standard library allocator 110*404b540aSrobertclasses, namely member functions <code>allocate</code> and 111*404b540aSrobert<code>deallocate</code>, plus others. 112*404b540aSrobert</p> 113*404b540aSrobert 114*404b540aSrobert<h3 class="left"> 115*404b540aSrobert <a name="init">Tunable parameters</a> 116*404b540aSrobert</h3> 117*404b540aSrobert 118*404b540aSrobert<p>Certain allocation parameters can be modified, or tuned. There 119*404b540aSrobertexists a nested <pre>struct __pool_base::_Tune</pre> that contains all 120*404b540aSrobertthese parameters, which include settings for 121*404b540aSrobert</p> 122*404b540aSrobert <ul> 123*404b540aSrobert <li>Alignment </li> 124*404b540aSrobert <li>Maximum bytes before calling <code>::operator new</code> directly</li> 125*404b540aSrobert <li>Minimum bytes</li> 126*404b540aSrobert <li>Size of underlying global allocations</li> 127*404b540aSrobert <li>Maximum number of supported threads</li> 128*404b540aSrobert <li>Migration of deallocations to the global free list</li> 129*404b540aSrobert <li>Shunt for global <code>new</code> and <code>delete</code></li> 130*404b540aSrobert </ul> 131*404b540aSrobert<p>Adjusting parameters for a given instance of an allocator can only 132*404b540aSroberthappen before any allocations take place, when the allocator itself is 133*404b540aSrobertinitialized. For instance: 134*404b540aSrobert</p> 135*404b540aSrobert<pre> 136*404b540aSrobert#include <ext/mt_allocator.h> 137*404b540aSrobert 138*404b540aSrobertstruct pod 139*404b540aSrobert{ 140*404b540aSrobert int i; 141*404b540aSrobert int j; 142*404b540aSrobert}; 143*404b540aSrobert 144*404b540aSrobertint main() 145*404b540aSrobert{ 146*404b540aSrobert typedef pod value_type; 147*404b540aSrobert typedef __gnu_cxx::__mt_alloc<value_type> allocator_type; 148*404b540aSrobert typedef __gnu_cxx::__pool_base::_Tune tune_type; 149*404b540aSrobert 150*404b540aSrobert tune_type t_default; 151*404b540aSrobert tune_type t_opt(16, 5120, 32, 5120, 20, 10, false); 152*404b540aSrobert tune_type t_single(16, 5120, 32, 5120, 1, 10, false); 153*404b540aSrobert 154*404b540aSrobert tune_type t; 155*404b540aSrobert t = allocator_type::_M_get_options(); 156*404b540aSrobert allocator_type::_M_set_options(t_opt); 157*404b540aSrobert t = allocator_type::_M_get_options(); 158*404b540aSrobert 159*404b540aSrobert allocator_type a; 160*404b540aSrobert allocator_type::pointer p1 = a.allocate(128); 161*404b540aSrobert allocator_type::pointer p2 = a.allocate(5128); 162*404b540aSrobert 163*404b540aSrobert a.deallocate(p1, 128); 164*404b540aSrobert a.deallocate(p2, 5128); 165*404b540aSrobert 166*404b540aSrobert return 0; 167*404b540aSrobert} 168*404b540aSrobert</pre> 169*404b540aSrobert 170*404b540aSrobert<h3 class="left"> 171*404b540aSrobert <a name="init">Initialization</a> 172*404b540aSrobert</h3> 173*404b540aSrobert 174*404b540aSrobert<p> 175*404b540aSrobertThe static variables (pointers to freelists, tuning parameters etc) 176*404b540aSrobertare initialized as above, or are set to the global defaults. 177*404b540aSrobert</p> 178*404b540aSrobert 179*404b540aSrobert<p> 180*404b540aSrobertThe very first allocate() call will always call the 181*404b540aSrobert_S_initialize_once() function. In order to make sure that this 182*404b540aSrobertfunction is called exactly once we make use of a __gthread_once call 183*404b540aSrobertin MT applications and check a static bool (_S_init) in ST 184*404b540aSrobertapplications. 185*404b540aSrobert</p> 186*404b540aSrobert 187*404b540aSrobert<p> 188*404b540aSrobertThe _S_initialize() function: 189*404b540aSrobert- If the GLIBCXX_FORCE_NEW environment variable is set, it sets the bool 190*404b540aSrobert _S_force_new to true and then returns. This will cause subsequent calls to 191*404b540aSrobert allocate() to return memory directly from a new() call, and deallocate will 192*404b540aSrobert only do a delete() call. 193*404b540aSrobert</p> 194*404b540aSrobert 195*404b540aSrobert<p> 196*404b540aSrobert- If the GLIBCXX_FORCE_NEW environment variable is not set, both ST and MT 197*404b540aSrobert applications will: 198*404b540aSrobert - Calculate the number of bins needed. A bin is a specific power of two size 199*404b540aSrobert of bytes. I.e., by default the allocator will deal with requests of up to 200*404b540aSrobert 128 bytes (or whatever the value of _S_max_bytes is when _S_init() is 201*404b540aSrobert called). This means that there will be bins of the following sizes 202*404b540aSrobert (in bytes): 1, 2, 4, 8, 16, 32, 64, 128. 203*404b540aSrobert 204*404b540aSrobert - Create the _S_binmap array. All requests are rounded up to the next 205*404b540aSrobert "large enough" bin. I.e., a request for 29 bytes will cause a block from 206*404b540aSrobert the "32 byte bin" to be returned to the application. The purpose of 207*404b540aSrobert _S_binmap is to speed up the process of finding out which bin to use. 208*404b540aSrobert I.e., the value of _S_binmap[ 29 ] is initialized to 5 (bin 5 = 32 bytes). 209*404b540aSrobert</p> 210*404b540aSrobert<p> 211*404b540aSrobert - Create the _S_bin array. This array consists of bin_records. There will be 212*404b540aSrobert as many bin_records in this array as the number of bins that we calculated 213*404b540aSrobert earlier. I.e., if _S_max_bytes = 128 there will be 8 entries. 214*404b540aSrobert Each bin_record is then initialized: 215*404b540aSrobert - bin_record->first = An array of pointers to block_records. There will be 216*404b540aSrobert as many block_records pointers as there are maximum number of threads 217*404b540aSrobert (in a ST application there is only 1 thread, in a MT application there 218*404b540aSrobert are _S_max_threads). 219*404b540aSrobert This holds the pointer to the first free block for each thread in this 220*404b540aSrobert bin. I.e., if we would like to know where the first free block of size 32 221*404b540aSrobert for thread number 3 is we would look this up by: _S_bin[ 5 ].first[ 3 ] 222*404b540aSrobert 223*404b540aSrobert The above created block_record pointers members are now initialized to 224*404b540aSrobert their initial values. I.e. _S_bin[ n ].first[ n ] = NULL; 225*404b540aSrobert</p> 226*404b540aSrobert 227*404b540aSrobert<p> 228*404b540aSrobert- Additionally a MT application will: 229*404b540aSrobert - Create a list of free thread id's. The pointer to the first entry 230*404b540aSrobert is stored in _S_thread_freelist_first. The reason for this approach is 231*404b540aSrobert that the __gthread_self() call will not return a value that corresponds to 232*404b540aSrobert the maximum number of threads allowed but rather a process id number or 233*404b540aSrobert something else. So what we do is that we create a list of thread_records. 234*404b540aSrobert This list is _S_max_threads long and each entry holds a size_t thread_id 235*404b540aSrobert which is initialized to 1, 2, 3, 4, 5 and so on up to _S_max_threads. 236*404b540aSrobert Each time a thread calls allocate() or deallocate() we call 237*404b540aSrobert _S_get_thread_id() which looks at the value of _S_thread_key which is a 238*404b540aSrobert thread local storage pointer. If this is NULL we know that this is a newly 239*404b540aSrobert created thread and we pop the first entry from this list and saves the 240*404b540aSrobert pointer to this record in the _S_thread_key variable. The next time 241*404b540aSrobert we will get the pointer to the thread_record back and we use the 242*404b540aSrobert thread_record->thread_id as identification. I.e., the first thread that 243*404b540aSrobert calls allocate will get the first record in this list and thus be thread 244*404b540aSrobert number 1 and will then find the pointer to its first free 32 byte block 245*404b540aSrobert in _S_bin[ 5 ].first[ 1 ] 246*404b540aSrobert When we create the _S_thread_key we also define a destructor 247*404b540aSrobert (_S_thread_key_destr) which means that when the thread dies, this 248*404b540aSrobert thread_record is returned to the front of this list and the thread id 249*404b540aSrobert can then be reused if a new thread is created. 250*404b540aSrobert This list is protected by a mutex (_S_thread_freelist_mutex) which is only 251*404b540aSrobert locked when records are removed/added to the list. 252*404b540aSrobert</p> 253*404b540aSrobert<p> 254*404b540aSrobert - Initialize the free and used counters of each bin_record: 255*404b540aSrobert - bin_record->free = An array of size_t. This keeps track of the number 256*404b540aSrobert of blocks on a specific thread's freelist in each bin. I.e., if a thread 257*404b540aSrobert has 12 32-byte blocks on it's freelists and allocates one of these, this 258*404b540aSrobert counter would be decreased to 11. 259*404b540aSrobert 260*404b540aSrobert - bin_record->used = An array of size_t. This keeps track of the number 261*404b540aSrobert of blocks currently in use of this size by this thread. I.e., if a thread 262*404b540aSrobert has made 678 requests (and no deallocations...) of 32-byte blocks this 263*404b540aSrobert counter will read 678. 264*404b540aSrobert 265*404b540aSrobert The above created arrays are now initialized with their initial values. 266*404b540aSrobert I.e. _S_bin[ n ].free[ n ] = 0; 267*404b540aSrobert</p> 268*404b540aSrobert<p> 269*404b540aSrobert - Initialize the mutex of each bin_record: The bin_record->mutex 270*404b540aSrobert is used to protect the global freelist. This concept of a global 271*404b540aSrobert freelist is explained in more detail in the section "A multi 272*404b540aSrobert threaded example", but basically this mutex is locked whenever a 273*404b540aSrobert block of memory is retrieved or returned to the global freelist 274*404b540aSrobert for this specific bin. This only occurs when a number of blocks 275*404b540aSrobert are grabbed from the global list to a thread specific list or when 276*404b540aSrobert a thread decides to return some blocks to the global freelist. 277*404b540aSrobert</p> 278*404b540aSrobert 279*404b540aSrobert<p> Notes about deallocation. This allocator does not explicitly 280*404b540aSrobertrelease memory. Because of this, memory debugging programs like 281*404b540aSrobertvalgrind or purify may notice leaks: sorry about this 282*404b540aSrobertinconvenience. Operating systems will reclaim allocated memory at 283*404b540aSrobertprogram termination anyway. If sidestepping this kind of noise is 284*404b540aSrobertdesired, there are three options: use an allocator, like 285*404b540aSrobert<code>new_allocator</code> that releases memory while debugging, use 286*404b540aSrobertGLIBCXX_FORCE_NEW to bypass the allocator's internal pools, or use a 287*404b540aSrobertcustom pool datum that releases resources on destruction.</p> 288*404b540aSrobert 289*404b540aSrobert<p>On systems with the function <code>__cxa_atexit</code>, the 290*404b540aSrobertallocator can be forced to free all memory allocated before program 291*404b540aSroberttermination with the member function 292*404b540aSrobert<code>__pool_type::_M_destroy</code>. However, because this member 293*404b540aSrobertfunction relies on the precise and exactly-conforming ordering of 294*404b540aSrobertstatic destructors, including those of a static local 295*404b540aSrobert<code>__pool</code> object, it should not be used, ever, on systems 296*404b540aSrobertthat don't have the necessary underlying support. In addition, in 297*404b540aSrobertpractice, forcing deallocation can be tricky, as it requires the 298*404b540aSrobert<code>__pool</code> object to be fully-constructed before the object 299*404b540aSrobertthat uses it is fully constructed. For most (but not all) STL 300*404b540aSrobertcontainers, this works, as an instance of the allocator is constructed 301*404b540aSrobertas part of a container's constructor. However, this assumption is 302*404b540aSrobertimplementation-specific, and subject to change. For an example of a 303*404b540aSrobertpool that frees memory, see the following 304*404b540aSrobert <a href="http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc%2b%2b-v3/testsuite/ext/mt_allocator/deallocate_local-6.cc"> 305*404b540aSrobert example.</a> 306*404b540aSrobert</p> 307*404b540aSrobert 308*404b540aSrobert<h3 class="left"> 309*404b540aSrobert <a name="st_example">A single threaded example (and a primer for the multi threaded example!)</a> 310*404b540aSrobert</h3> 311*404b540aSrobert 312*404b540aSrobert<p> 313*404b540aSrobertLet's start by describing how the data on a freelist is laid out in memory. 314*404b540aSrobertThis is the first two blocks in freelist for thread id 3 in bin 3 (8 bytes): 315*404b540aSrobert</p> 316*404b540aSrobert<pre> 317*404b540aSrobert+----------------+ 318*404b540aSrobert| next* ---------|--+ (_S_bin[ 3 ].first[ 3 ] points here) 319*404b540aSrobert| | | 320*404b540aSrobert| | | 321*404b540aSrobert| | | 322*404b540aSrobert+----------------+ | 323*404b540aSrobert| thread_id = 3 | | 324*404b540aSrobert| | | 325*404b540aSrobert| | | 326*404b540aSrobert| | | 327*404b540aSrobert+----------------+ | 328*404b540aSrobert| DATA | | (A pointer to here is what is returned to the 329*404b540aSrobert| | | the application when needed) 330*404b540aSrobert| | | 331*404b540aSrobert| | | 332*404b540aSrobert| | | 333*404b540aSrobert| | | 334*404b540aSrobert| | | 335*404b540aSrobert| | | 336*404b540aSrobert+----------------+ | 337*404b540aSrobert+----------------+ | 338*404b540aSrobert| next* |<-+ (If next == NULL it's the last one on the list) 339*404b540aSrobert| | 340*404b540aSrobert| | 341*404b540aSrobert| | 342*404b540aSrobert+----------------+ 343*404b540aSrobert| thread_id = 3 | 344*404b540aSrobert| | 345*404b540aSrobert| | 346*404b540aSrobert| | 347*404b540aSrobert+----------------+ 348*404b540aSrobert| DATA | 349*404b540aSrobert| | 350*404b540aSrobert| | 351*404b540aSrobert| | 352*404b540aSrobert| | 353*404b540aSrobert| | 354*404b540aSrobert| | 355*404b540aSrobert| | 356*404b540aSrobert+----------------+ 357*404b540aSrobert</pre> 358*404b540aSrobert 359*404b540aSrobert<p> 360*404b540aSrobertWith this in mind we simplify things a bit for a while and say that there is 361*404b540aSrobertonly one thread (a ST application). In this case all operations are made to 362*404b540aSrobertwhat is referred to as the global pool - thread id 0 (No thread may be 363*404b540aSrobertassigned this id since they span from 1 to _S_max_threads in a MT application). 364*404b540aSrobert</p> 365*404b540aSrobert<p> 366*404b540aSrobertWhen the application requests memory (calling allocate()) we first look at the 367*404b540aSrobertrequested size and if this is > _S_max_bytes we call new() directly and return. 368*404b540aSrobert</p> 369*404b540aSrobert<p> 370*404b540aSrobertIf the requested size is within limits we start by finding out from which 371*404b540aSrobertbin we should serve this request by looking in _S_binmap. 372*404b540aSrobert</p> 373*404b540aSrobert<p> 374*404b540aSrobertA quick look at _S_bin[ bin ].first[ 0 ] tells us if there are any blocks of 375*404b540aSrobertthis size on the freelist (0). If this is not NULL - fine, just remove the 376*404b540aSrobertblock that _S_bin[ bin ].first[ 0 ] points to from the list, 377*404b540aSrobertupdate _S_bin[ bin ].first[ 0 ] and return a pointer to that blocks data. 378*404b540aSrobert</p> 379*404b540aSrobert<p> 380*404b540aSrobertIf the freelist is empty (the pointer is NULL) we must get memory from the 381*404b540aSrobertsystem and build us a freelist within this memory. All requests for new memory 382*404b540aSrobertis made in chunks of _S_chunk_size. Knowing the size of a block_record and 383*404b540aSrobertthe bytes that this bin stores we then calculate how many blocks we can create 384*404b540aSrobertwithin this chunk, build the list, remove the first block, update the pointer 385*404b540aSrobert(_S_bin[ bin ].first[ 0 ]) and return a pointer to that blocks data. 386*404b540aSrobert</p> 387*404b540aSrobert 388*404b540aSrobert<p> 389*404b540aSrobertDeallocation is equally simple; the pointer is casted back to a block_record 390*404b540aSrobertpointer, lookup which bin to use based on the size, add the block to the front 391*404b540aSrobertof the global freelist and update the pointer as needed 392*404b540aSrobert(_S_bin[ bin ].first[ 0 ]). 393*404b540aSrobert</p> 394*404b540aSrobert 395*404b540aSrobert<p> 396*404b540aSrobertThe decision to add deallocated blocks to the front of the freelist was made 397*404b540aSrobertafter a set of performance measurements that showed that this is roughly 10% 398*404b540aSrobertfaster than maintaining a set of "last pointers" as well. 399*404b540aSrobert</p> 400*404b540aSrobert 401*404b540aSrobert<h3 class="left"> 402*404b540aSrobert <a name="mt_example">A multi threaded example</a> 403*404b540aSrobert</h3> 404*404b540aSrobert 405*404b540aSrobert<p> 406*404b540aSrobertIn the ST example we never used the thread_id variable present in each block. 407*404b540aSrobertLet's start by explaining the purpose of this in a MT application. 408*404b540aSrobert</p> 409*404b540aSrobert 410*404b540aSrobert<p> 411*404b540aSrobertThe concept of "ownership" was introduced since many MT applications 412*404b540aSrobertallocate and deallocate memory to shared containers from different 413*404b540aSrobertthreads (such as a cache shared amongst all threads). This introduces 414*404b540aSroberta problem if the allocator only returns memory to the current threads 415*404b540aSrobertfreelist (I.e., there might be one thread doing all the allocation and 416*404b540aSrobertthus obtaining ever more memory from the system and another thread 417*404b540aSrobertthat is getting a longer and longer freelist - this will in the end 418*404b540aSrobertconsume all available memory). 419*404b540aSrobert</p> 420*404b540aSrobert 421*404b540aSrobert<p> 422*404b540aSrobertEach time a block is moved from the global list (where ownership is 423*404b540aSrobertirrelevant), to a threads freelist (or when a new freelist is built 424*404b540aSrobertfrom a chunk directly onto a threads freelist or when a deallocation 425*404b540aSrobertoccurs on a block which was not allocated by the same thread id as the 426*404b540aSrobertone doing the deallocation) the thread id is set to the current one. 427*404b540aSrobert</p> 428*404b540aSrobert 429*404b540aSrobert<p> 430*404b540aSrobertWhat's the use? Well, when a deallocation occurs we can now look at 431*404b540aSrobertthe thread id and find out if it was allocated by another thread id 432*404b540aSrobertand decrease the used counter of that thread instead, thus keeping the 433*404b540aSrobertfree and used counters correct. And keeping the free and used counters 434*404b540aSrobertcorrects is very important since the relationship between these two 435*404b540aSrobertvariables decides if memory should be returned to the global pool or 436*404b540aSrobertnot when a deallocation occurs. 437*404b540aSrobert</p> 438*404b540aSrobert 439*404b540aSrobert<p> 440*404b540aSrobertWhen the application requests memory (calling allocate()) we first 441*404b540aSrobertlook at the requested size and if this is > _S_max_bytes we call new() 442*404b540aSrobertdirectly and return. 443*404b540aSrobert</p> 444*404b540aSrobert 445*404b540aSrobert<p> 446*404b540aSrobertIf the requested size is within limits we start by finding out from which 447*404b540aSrobertbin we should serve this request by looking in _S_binmap. 448*404b540aSrobert</p> 449*404b540aSrobert 450*404b540aSrobert<p> 451*404b540aSrobertA call to _S_get_thread_id() returns the thread id for the calling thread 452*404b540aSrobert(and if no value has been set in _S_thread_key, a new id is assigned and 453*404b540aSrobertreturned). 454*404b540aSrobert</p> 455*404b540aSrobert 456*404b540aSrobert<p> 457*404b540aSrobertA quick look at _S_bin[ bin ].first[ thread_id ] tells us if there are 458*404b540aSrobertany blocks of this size on the current threads freelist. If this is 459*404b540aSrobertnot NULL - fine, just remove the block that _S_bin[ bin ].first[ 460*404b540aSrobertthread_id ] points to from the list, update _S_bin[ bin ].first[ 461*404b540aSrobertthread_id ], update the free and used counters and return a pointer to 462*404b540aSrobertthat blocks data. 463*404b540aSrobert</p> 464*404b540aSrobert 465*404b540aSrobert<p> 466*404b540aSrobertIf the freelist is empty (the pointer is NULL) we start by looking at 467*404b540aSrobertthe global freelist (0). If there are blocks available on the global 468*404b540aSrobertfreelist we lock this bins mutex and move up to block_count (the 469*404b540aSrobertnumber of blocks of this bins size that will fit into a _S_chunk_size) 470*404b540aSrobertor until end of list - whatever comes first - to the current threads 471*404b540aSrobertfreelist and at the same time change the thread_id ownership and 472*404b540aSrobertupdate the counters and pointers. When the bins mutex has been 473*404b540aSrobertunlocked, we remove the block that _S_bin[ bin ].first[ thread_id ] 474*404b540aSrobertpoints to from the list, update _S_bin[ bin ].first[ thread_id ], 475*404b540aSrobertupdate the free and used counters, and return a pointer to that blocks 476*404b540aSrobertdata. 477*404b540aSrobert</p> 478*404b540aSrobert 479*404b540aSrobert<p> 480*404b540aSrobertThe reason that the number of blocks moved to the current threads 481*404b540aSrobertfreelist is limited to block_count is to minimize the chance that a 482*404b540aSrobertsubsequent deallocate() call will return the excess blocks to the 483*404b540aSrobertglobal freelist (based on the _S_freelist_headroom calculation, see 484*404b540aSrobertbelow). 485*404b540aSrobert</p> 486*404b540aSrobert 487*404b540aSrobert<p> 488*404b540aSrobertHowever if there isn't any memory on the global pool we need to get 489*404b540aSrobertmemory from the system - this is done in exactly the same way as in a 490*404b540aSrobertsingle threaded application with one major difference; the list built 491*404b540aSrobertin the newly allocated memory (of _S_chunk_size size) is added to the 492*404b540aSrobertcurrent threads freelist instead of to the global. 493*404b540aSrobert</p> 494*404b540aSrobert 495*404b540aSrobert<p> 496*404b540aSrobertThe basic process of a deallocation call is simple: always add the 497*404b540aSrobertblock to the front of the current threads freelist and update the 498*404b540aSrobertcounters and pointers (as described earlier with the specific check of 499*404b540aSrobertownership that causes the used counter of the thread that originally 500*404b540aSrobertallocated the block to be decreased instead of the current threads 501*404b540aSrobertcounter). 502*404b540aSrobert</p> 503*404b540aSrobert 504*404b540aSrobert<p> 505*404b540aSrobertAnd here comes the free and used counters to service. Each time a 506*404b540aSrobertdeallocation() call is made, the length of the current threads 507*404b540aSrobertfreelist is compared to the amount memory in use by this thread. 508*404b540aSrobert</p> 509*404b540aSrobert 510*404b540aSrobert<p> 511*404b540aSrobertLet's go back to the example of an application that has one thread 512*404b540aSrobertthat does all the allocations and one that deallocates. Both these 513*404b540aSrobertthreads use say 516 32-byte blocks that was allocated during thread 514*404b540aSrobertcreation for example. Their used counters will both say 516 at this 515*404b540aSrobertpoint. The allocation thread now grabs 1000 32-byte blocks and puts 516*404b540aSrobertthem in a shared container. The used counter for this thread is now 517*404b540aSrobert1516. 518*404b540aSrobert</p> 519*404b540aSrobert 520*404b540aSrobert<p> 521*404b540aSrobertThe deallocation thread now deallocates 500 of these blocks. For each 522*404b540aSrobertdeallocation made the used counter of the allocating thread is 523*404b540aSrobertdecreased and the freelist of the deallocation thread gets longer and 524*404b540aSrobertlonger. But the calculation made in deallocate() will limit the length 525*404b540aSrobertof the freelist in the deallocation thread to _S_freelist_headroom % 526*404b540aSrobertof it's used counter. In this case, when the freelist (given that the 527*404b540aSrobert_S_freelist_headroom is at it's default value of 10%) exceeds 52 528*404b540aSrobert(516/10) blocks will be returned to the global pool where the 529*404b540aSrobertallocating thread may pick them up and reuse them. 530*404b540aSrobert</p> 531*404b540aSrobert 532*404b540aSrobert<p> 533*404b540aSrobertIn order to reduce lock contention (since this requires this bins 534*404b540aSrobertmutex to be locked) this operation is also made in chunks of blocks 535*404b540aSrobert(just like when chunks of blocks are moved from the global freelist to 536*404b540aSroberta threads freelist mentioned above). The "formula" used can probably 537*404b540aSrobertbe improved to further reduce the risk of blocks being "bounced back 538*404b540aSrobertand forth" between freelists. 539*404b540aSrobert</p> 540*404b540aSrobert 541*404b540aSrobert<hr /> 542*404b540aSrobert<p>Return <a href="#top">to the top of the page</a> or 543*404b540aSrobert <a href="http://gcc.gnu.org/libstdc++/">to the libstdc++ homepage</a>. 544*404b540aSrobert</p> 545*404b540aSrobert 546*404b540aSrobert 547*404b540aSrobert<!-- ####################################################### --> 548*404b540aSrobert 549*404b540aSrobert<hr /> 550*404b540aSrobert<p class="fineprint"><em> 551*404b540aSrobertSee <a href="../17_intro/license.html">license.html</a> for copying conditions. 552*404b540aSrobertComments and suggestions are welcome, and may be sent to 553*404b540aSrobert<a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>. 554*404b540aSrobert</em></p> 555*404b540aSrobert 556*404b540aSrobert 557*404b540aSrobert</body> 558*404b540aSrobert</html> 559