xref: /openbsd-src/gnu/gcc/libstdc++-v3/docs/html/ext/mt_allocator.html (revision 404b540a9034ac75a6199ad1a32d1bbc7a0d4210)
1*404b540aSrobert<?xml version="1.0" encoding="ISO-8859-1"?>
2*404b540aSrobert<!DOCTYPE html
3*404b540aSrobert          PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
4*404b540aSrobert          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
5*404b540aSrobert
6*404b540aSrobert<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
7*404b540aSrobert<head>
8*404b540aSrobert   <meta name="AUTHOR" content="Stefan Olsson &lt;stefan@xapa.se&gt;" />
9*404b540aSrobert   <meta name="KEYWORDS" content="c++, libstdc++, g++, allocator, memory" />
10*404b540aSrobert   <meta name="DESCRIPTION" content="Allocators and allocation" />
11*404b540aSrobert   <meta name="GENERATOR" content="emacs and ten fingers" />
12*404b540aSrobert   <title>A fixed-size, multi-thread optimized allocator</title>
13*404b540aSrobert<link rel="StyleSheet" href="../lib3styles.css" type="text/css" />
14*404b540aSrobert<link rel="Start" href="../documentation.html" type="text/html"
15*404b540aSrobert  title="GNU C++ Standard Library" />
16*404b540aSrobert<link rel="Bookmark" href="howto.html" type="text/html" title="Extensions" />
17*404b540aSrobert<link rel="Copyright" href="../17_intro/license.html" type="text/html" />
18*404b540aSrobert</head>
19*404b540aSrobert<body>
20*404b540aSrobert
21*404b540aSrobert<h1 class="centered"><a name="top">A fixed-size, multi-thread optimized allocator</a></h1>
22*404b540aSrobert
23*404b540aSrobert<p class="fineprint"><em>
24*404b540aSrobert   The latest version of this document is always available at
25*404b540aSrobert   <a href="http://gcc.gnu.org/onlinedocs/libstdc++/ext/mt_allocator.html">
26*404b540aSrobert   http://gcc.gnu.org/onlinedocs/libstdc++/ext/mt_allocator.html</a>.
27*404b540aSrobert</em></p>
28*404b540aSrobert
29*404b540aSrobert<p><em>
30*404b540aSrobert   To the <a href="http://gcc.gnu.org/libstdc++/">libstdc++-v3 homepage</a>.
31*404b540aSrobert</em></p>
32*404b540aSrobert
33*404b540aSrobert<!-- ####################################################### -->
34*404b540aSrobert<hr />
35*404b540aSrobert<h3 class="left">
36*404b540aSrobert  <a name="intro">Introduction</a>
37*404b540aSrobert</h3>
38*404b540aSrobert
39*404b540aSrobert<p> The mt allocator [hereinafter referred to simply as "the
40*404b540aSrobertallocator"] is a fixed size (power of two) allocator that was
41*404b540aSrobertinitially developed specifically to suit the needs of multi threaded
42*404b540aSrobertapplications [hereinafter referred to as an MT application]. Over time
43*404b540aSrobertthe allocator has evolved and been improved in many ways, in
44*404b540aSrobertparticular it now also does a good job in single threaded applications
45*404b540aSrobert[hereinafter referred to as a ST application]. (Note: In this
46*404b540aSrobertdocument, when referring to single threaded applications this also
47*404b540aSrobertincludes applications that are compiled with gcc without thread
48*404b540aSrobertsupport enabled. This is accomplished using ifdef's on
49*404b540aSrobert__GTHREADS). This allocator is tunable, very flexible, and capable of
50*404b540aSroberthigh-performance.
51*404b540aSrobert</p>
52*404b540aSrobert
53*404b540aSrobert<p>
54*404b540aSrobertThe aim of this document is to describe - from a application point of
55*404b540aSrobertview - the "inner workings" of the allocator.
56*404b540aSrobert</p>
57*404b540aSrobert
58*404b540aSrobert<h3 class="left">
59*404b540aSrobert  <a name="design">Design Overview</a>
60*404b540aSrobert</h3>
61*404b540aSrobert
62*404b540aSrobert<p> There are three general components to the allocator: a datum
63*404b540aSrobertdescribing the characteristics of the memory pool, a policy class
64*404b540aSrobertcontaining this pool that links instantiation types to common or
65*404b540aSrobertindividual pools, and a class inheriting from the policy class that is
66*404b540aSrobertthe actual allocator.
67*404b540aSrobert</p>
68*404b540aSrobert
69*404b540aSrobert<p>The datum describing pools characteristics is
70*404b540aSrobert <pre>
71*404b540aSrobert   template&lt;bool _Thread&gt;
72*404b540aSrobert     class __pool
73*404b540aSrobert </pre>
74*404b540aSrobertThis class is parametrized on thread support, and is explicitly
75*404b540aSrobertspecialized for both multiple threads (with <code>bool==true</code>)
76*404b540aSrobertand single threads (via <code>bool==false</code>.) It is possible to
77*404b540aSrobertuse a custom pool datum instead of the default class that is provided.
78*404b540aSrobert</p>
79*404b540aSrobert
80*404b540aSrobert<p> There are two distinct policy classes, each of which can be used
81*404b540aSrobertwith either type of underlying pool datum.
82*404b540aSrobert</p>
83*404b540aSrobert
84*404b540aSrobert<pre>
85*404b540aSrobert  template&lt;bool _Thread&gt;
86*404b540aSrobert    struct __common_pool_policy
87*404b540aSrobert
88*404b540aSrobert  template&lt;typename _Tp, bool _Thread&gt;
89*404b540aSrobert    struct __per_type_pool_policy
90*404b540aSrobert</pre>
91*404b540aSrobert
92*404b540aSrobert<p> The first policy, <code>__common_pool_policy</code>, implements a
93*404b540aSrobertcommon pool. This means that allocators that are instantiated with
94*404b540aSrobertdifferent types, say <code>char</code> and <code>long</code> will both
95*404b540aSrobertuse the same pool. This is the default policy.
96*404b540aSrobert</p>
97*404b540aSrobert
98*404b540aSrobert<p> The second policy, <code>__per_type_pool_policy</code>, implements
99*404b540aSroberta separate pool for each instantiating type. Thus, <code>char</code>
100*404b540aSrobertand <code>long</code> will use separate pools. This allows per-type
101*404b540aSroberttuning, for instance.
102*404b540aSrobert</p>
103*404b540aSrobert
104*404b540aSrobert<p> Putting this all together, the actual allocator class is
105*404b540aSrobert<pre>
106*404b540aSrobert  template&lt;typename _Tp, typename _Poolp = __default_policy&gt;
107*404b540aSrobert    class __mt_alloc : public __mt_alloc_base&lt;_Tp&gt;,  _Poolp
108*404b540aSrobert</pre>
109*404b540aSrobertThis class has the interface required for standard library allocator
110*404b540aSrobertclasses, namely member functions <code>allocate</code> and
111*404b540aSrobert<code>deallocate</code>, plus others.
112*404b540aSrobert</p>
113*404b540aSrobert
114*404b540aSrobert<h3 class="left">
115*404b540aSrobert  <a name="init">Tunable parameters</a>
116*404b540aSrobert</h3>
117*404b540aSrobert
118*404b540aSrobert<p>Certain allocation parameters can be modified, or tuned. There
119*404b540aSrobertexists a nested <pre>struct __pool_base::_Tune</pre> that contains all
120*404b540aSrobertthese parameters, which include settings for
121*404b540aSrobert</p>
122*404b540aSrobert   <ul>
123*404b540aSrobert     <li>Alignment </li>
124*404b540aSrobert     <li>Maximum bytes before calling <code>::operator new</code> directly</li>
125*404b540aSrobert     <li>Minimum bytes</li>
126*404b540aSrobert     <li>Size of underlying global allocations</li>
127*404b540aSrobert     <li>Maximum number of supported threads</li>
128*404b540aSrobert     <li>Migration of deallocations to the global free list</li>
129*404b540aSrobert     <li>Shunt for global <code>new</code> and <code>delete</code></li>
130*404b540aSrobert   </ul>
131*404b540aSrobert<p>Adjusting parameters for a given instance of an allocator can only
132*404b540aSroberthappen before any allocations take place, when the allocator itself is
133*404b540aSrobertinitialized. For instance:
134*404b540aSrobert</p>
135*404b540aSrobert<pre>
136*404b540aSrobert#include &lt;ext/mt_allocator.h&gt;
137*404b540aSrobert
138*404b540aSrobertstruct pod
139*404b540aSrobert{
140*404b540aSrobert  int i;
141*404b540aSrobert  int j;
142*404b540aSrobert};
143*404b540aSrobert
144*404b540aSrobertint main()
145*404b540aSrobert{
146*404b540aSrobert  typedef pod value_type;
147*404b540aSrobert  typedef __gnu_cxx::__mt_alloc&lt;value_type&gt; allocator_type;
148*404b540aSrobert  typedef __gnu_cxx::__pool_base::_Tune tune_type;
149*404b540aSrobert
150*404b540aSrobert  tune_type t_default;
151*404b540aSrobert  tune_type t_opt(16, 5120, 32, 5120, 20, 10, false);
152*404b540aSrobert  tune_type t_single(16, 5120, 32, 5120, 1, 10, false);
153*404b540aSrobert
154*404b540aSrobert  tune_type t;
155*404b540aSrobert  t = allocator_type::_M_get_options();
156*404b540aSrobert  allocator_type::_M_set_options(t_opt);
157*404b540aSrobert  t = allocator_type::_M_get_options();
158*404b540aSrobert
159*404b540aSrobert  allocator_type a;
160*404b540aSrobert  allocator_type::pointer p1 = a.allocate(128);
161*404b540aSrobert  allocator_type::pointer p2 = a.allocate(5128);
162*404b540aSrobert
163*404b540aSrobert  a.deallocate(p1, 128);
164*404b540aSrobert  a.deallocate(p2, 5128);
165*404b540aSrobert
166*404b540aSrobert  return 0;
167*404b540aSrobert}
168*404b540aSrobert</pre>
169*404b540aSrobert
170*404b540aSrobert<h3 class="left">
171*404b540aSrobert  <a name="init">Initialization</a>
172*404b540aSrobert</h3>
173*404b540aSrobert
174*404b540aSrobert<p>
175*404b540aSrobertThe static variables (pointers to freelists, tuning parameters etc)
176*404b540aSrobertare initialized as above, or are set to the global defaults.
177*404b540aSrobert</p>
178*404b540aSrobert
179*404b540aSrobert<p>
180*404b540aSrobertThe very first allocate() call will always call the
181*404b540aSrobert_S_initialize_once() function.  In order to make sure that this
182*404b540aSrobertfunction is called exactly once we make use of a __gthread_once call
183*404b540aSrobertin MT applications and check a static bool (_S_init) in ST
184*404b540aSrobertapplications.
185*404b540aSrobert</p>
186*404b540aSrobert
187*404b540aSrobert<p>
188*404b540aSrobertThe _S_initialize() function:
189*404b540aSrobert- If the GLIBCXX_FORCE_NEW environment variable is set, it sets the bool
190*404b540aSrobert  _S_force_new to true and then returns. This will cause subsequent calls to
191*404b540aSrobert  allocate() to return memory directly from a new() call, and deallocate will
192*404b540aSrobert  only do a delete() call.
193*404b540aSrobert</p>
194*404b540aSrobert
195*404b540aSrobert<p>
196*404b540aSrobert- If the GLIBCXX_FORCE_NEW environment variable is not set, both ST and MT
197*404b540aSrobert  applications will:
198*404b540aSrobert  - Calculate the number of bins needed. A bin is a specific power of two size
199*404b540aSrobert    of bytes. I.e., by default the allocator will deal with requests of up to
200*404b540aSrobert    128 bytes (or whatever the value of _S_max_bytes is when _S_init() is
201*404b540aSrobert    called). This means that there will be bins of the following sizes
202*404b540aSrobert    (in bytes): 1, 2, 4, 8, 16, 32, 64, 128.
203*404b540aSrobert
204*404b540aSrobert  - Create the _S_binmap array. All requests are rounded up to the next
205*404b540aSrobert    "large enough" bin. I.e., a request for 29 bytes will cause a block from
206*404b540aSrobert    the "32 byte bin" to be returned to the application. The purpose of
207*404b540aSrobert    _S_binmap is to speed up the process of finding out which bin to use.
208*404b540aSrobert    I.e., the value of _S_binmap[ 29 ] is initialized to 5 (bin 5 = 32 bytes).
209*404b540aSrobert</p>
210*404b540aSrobert<p>
211*404b540aSrobert  - Create the _S_bin array. This array consists of bin_records. There will be
212*404b540aSrobert    as many bin_records in this array as the number of bins that we calculated
213*404b540aSrobert    earlier. I.e., if _S_max_bytes = 128 there will be 8 entries.
214*404b540aSrobert    Each bin_record is then initialized:
215*404b540aSrobert    - bin_record-&gt;first = An array of pointers to block_records. There will be
216*404b540aSrobert      as many block_records pointers as there are maximum number of threads
217*404b540aSrobert      (in a ST application there is only 1 thread, in a MT application there
218*404b540aSrobert      are _S_max_threads).
219*404b540aSrobert      This holds the pointer to the first free block for each thread in this
220*404b540aSrobert      bin. I.e., if we would like to know where the first free block of size 32
221*404b540aSrobert      for thread number 3 is we would look this up by: _S_bin[ 5 ].first[ 3 ]
222*404b540aSrobert
223*404b540aSrobert    The above created block_record pointers members are now initialized to
224*404b540aSrobert    their initial values. I.e. _S_bin[ n ].first[ n ] = NULL;
225*404b540aSrobert</p>
226*404b540aSrobert
227*404b540aSrobert<p>
228*404b540aSrobert- Additionally a MT application will:
229*404b540aSrobert  - Create a list of free thread id's. The pointer to the first entry
230*404b540aSrobert    is stored in _S_thread_freelist_first. The reason for this approach is
231*404b540aSrobert    that the __gthread_self() call will not return a value that corresponds to
232*404b540aSrobert    the maximum number of threads allowed but rather a process id number or
233*404b540aSrobert    something else. So what we do is that we create a list of thread_records.
234*404b540aSrobert    This list is _S_max_threads long and each entry holds a size_t thread_id
235*404b540aSrobert    which is initialized to 1, 2, 3, 4, 5 and so on up to _S_max_threads.
236*404b540aSrobert    Each time a thread calls allocate() or deallocate() we call
237*404b540aSrobert    _S_get_thread_id() which looks at the value of _S_thread_key which is a
238*404b540aSrobert    thread local storage pointer. If this is NULL we know that this is a newly
239*404b540aSrobert    created thread and we pop the first entry from this list and saves the
240*404b540aSrobert    pointer to this record in the _S_thread_key variable. The next time
241*404b540aSrobert    we will get the pointer to the thread_record back and we use the
242*404b540aSrobert    thread_record-&gt;thread_id as identification. I.e., the first thread that
243*404b540aSrobert    calls allocate will get the first record in this list and thus be thread
244*404b540aSrobert    number 1 and will then find the pointer to its first free 32 byte block
245*404b540aSrobert    in _S_bin[ 5 ].first[ 1 ]
246*404b540aSrobert    When we create the _S_thread_key we also define a destructor
247*404b540aSrobert    (_S_thread_key_destr) which means that when the thread dies, this
248*404b540aSrobert    thread_record is returned to the front of this list and the thread id
249*404b540aSrobert    can then be reused if a new thread is created.
250*404b540aSrobert    This list is protected by a mutex (_S_thread_freelist_mutex) which is only
251*404b540aSrobert    locked when records are removed/added to the list.
252*404b540aSrobert</p>
253*404b540aSrobert<p>
254*404b540aSrobert  - Initialize the free and used counters of each bin_record:
255*404b540aSrobert    - bin_record-&gt;free = An array of size_t. This keeps track of the number
256*404b540aSrobert      of blocks on a specific thread's freelist in each bin. I.e., if a thread
257*404b540aSrobert      has 12 32-byte blocks on it's freelists and allocates one of these, this
258*404b540aSrobert      counter would be decreased to 11.
259*404b540aSrobert
260*404b540aSrobert    - bin_record-&gt;used = An array of size_t. This keeps track of the number
261*404b540aSrobert      of blocks currently in use of this size by this thread. I.e., if a thread
262*404b540aSrobert      has made 678 requests (and no deallocations...) of 32-byte blocks this
263*404b540aSrobert      counter will read 678.
264*404b540aSrobert
265*404b540aSrobert    The above created arrays are now initialized with their initial values.
266*404b540aSrobert    I.e. _S_bin[ n ].free[ n ] = 0;
267*404b540aSrobert</p>
268*404b540aSrobert<p>
269*404b540aSrobert  - Initialize the mutex of each bin_record: The bin_record-&gt;mutex
270*404b540aSrobert    is used to protect the global freelist. This concept of a global
271*404b540aSrobert    freelist is explained in more detail in the section "A multi
272*404b540aSrobert    threaded example", but basically this mutex is locked whenever a
273*404b540aSrobert    block of memory is retrieved or returned to the global freelist
274*404b540aSrobert    for this specific bin. This only occurs when a number of blocks
275*404b540aSrobert    are grabbed from the global list to a thread specific list or when
276*404b540aSrobert    a thread decides to return some blocks to the global freelist.
277*404b540aSrobert</p>
278*404b540aSrobert
279*404b540aSrobert<p> Notes about deallocation. This allocator does not explicitly
280*404b540aSrobertrelease memory. Because of this, memory debugging programs like
281*404b540aSrobertvalgrind or purify may notice leaks: sorry about this
282*404b540aSrobertinconvenience. Operating systems will reclaim allocated memory at
283*404b540aSrobertprogram termination anyway. If sidestepping this kind of noise is
284*404b540aSrobertdesired, there are three options: use an allocator, like
285*404b540aSrobert<code>new_allocator</code> that releases memory while debugging, use
286*404b540aSrobertGLIBCXX_FORCE_NEW to bypass the allocator's internal pools, or use a
287*404b540aSrobertcustom pool datum that releases resources on destruction.</p>
288*404b540aSrobert
289*404b540aSrobert<p>On systems with the function <code>__cxa_atexit</code>, the
290*404b540aSrobertallocator can be forced to free all memory allocated before program
291*404b540aSroberttermination with the member function
292*404b540aSrobert<code>__pool_type::_M_destroy</code>. However, because this member
293*404b540aSrobertfunction relies on the precise and exactly-conforming ordering of
294*404b540aSrobertstatic destructors, including those of a static local
295*404b540aSrobert<code>__pool</code> object, it should not be used, ever, on systems
296*404b540aSrobertthat don't have the necessary underlying support. In addition, in
297*404b540aSrobertpractice, forcing deallocation can be tricky, as it requires the
298*404b540aSrobert<code>__pool</code> object to be fully-constructed before the object
299*404b540aSrobertthat uses it is fully constructed. For most (but not all) STL
300*404b540aSrobertcontainers, this works, as an instance of the allocator is constructed
301*404b540aSrobertas part of a container's constructor. However, this assumption is
302*404b540aSrobertimplementation-specific, and subject to change. For an example of a
303*404b540aSrobertpool that frees memory, see the following
304*404b540aSrobert    <a href="http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc%2b%2b-v3/testsuite/ext/mt_allocator/deallocate_local-6.cc">
305*404b540aSrobert    example.</a>
306*404b540aSrobert</p>
307*404b540aSrobert
308*404b540aSrobert<h3 class="left">
309*404b540aSrobert  <a name="st_example">A single threaded example (and a primer for the multi threaded example!)</a>
310*404b540aSrobert</h3>
311*404b540aSrobert
312*404b540aSrobert<p>
313*404b540aSrobertLet's start by describing how the data on a freelist is laid out in memory.
314*404b540aSrobertThis is the first two blocks in freelist for thread id 3 in bin 3 (8 bytes):
315*404b540aSrobert</p>
316*404b540aSrobert<pre>
317*404b540aSrobert+----------------+
318*404b540aSrobert| next* ---------|--+  (_S_bin[ 3 ].first[ 3 ] points here)
319*404b540aSrobert|                |  |
320*404b540aSrobert|                |  |
321*404b540aSrobert|                |  |
322*404b540aSrobert+----------------+  |
323*404b540aSrobert| thread_id = 3  |  |
324*404b540aSrobert|                |  |
325*404b540aSrobert|                |  |
326*404b540aSrobert|                |  |
327*404b540aSrobert+----------------+  |
328*404b540aSrobert| DATA           |  |  (A pointer to here is what is returned to the
329*404b540aSrobert|                |  |   the application when needed)
330*404b540aSrobert|                |  |
331*404b540aSrobert|                |  |
332*404b540aSrobert|                |  |
333*404b540aSrobert|                |  |
334*404b540aSrobert|                |  |
335*404b540aSrobert|                |  |
336*404b540aSrobert+----------------+  |
337*404b540aSrobert+----------------+  |
338*404b540aSrobert| next*          |&lt;-+  (If next == NULL it's the last one on the list)
339*404b540aSrobert|                |
340*404b540aSrobert|                |
341*404b540aSrobert|                |
342*404b540aSrobert+----------------+
343*404b540aSrobert| thread_id = 3  |
344*404b540aSrobert|                |
345*404b540aSrobert|                |
346*404b540aSrobert|                |
347*404b540aSrobert+----------------+
348*404b540aSrobert| DATA           |
349*404b540aSrobert|                |
350*404b540aSrobert|                |
351*404b540aSrobert|                |
352*404b540aSrobert|                |
353*404b540aSrobert|                |
354*404b540aSrobert|                |
355*404b540aSrobert|                |
356*404b540aSrobert+----------------+
357*404b540aSrobert</pre>
358*404b540aSrobert
359*404b540aSrobert<p>
360*404b540aSrobertWith this in mind we simplify things a bit for a while and say that there is
361*404b540aSrobertonly one thread (a ST application). In this case all operations are made to
362*404b540aSrobertwhat is referred to as the global pool - thread id 0 (No thread may be
363*404b540aSrobertassigned this id since they span from 1 to _S_max_threads in a MT application).
364*404b540aSrobert</p>
365*404b540aSrobert<p>
366*404b540aSrobertWhen the application requests memory (calling allocate()) we first look at the
367*404b540aSrobertrequested size and if this is &gt; _S_max_bytes we call new() directly and return.
368*404b540aSrobert</p>
369*404b540aSrobert<p>
370*404b540aSrobertIf the requested size is within limits we start by finding out from which
371*404b540aSrobertbin we should serve this request by looking in _S_binmap.
372*404b540aSrobert</p>
373*404b540aSrobert<p>
374*404b540aSrobertA quick look at _S_bin[ bin ].first[ 0 ] tells us if there are any blocks of
375*404b540aSrobertthis size on the freelist (0). If this is not NULL - fine, just remove the
376*404b540aSrobertblock that _S_bin[ bin ].first[ 0 ] points to from the list,
377*404b540aSrobertupdate _S_bin[ bin ].first[ 0 ] and return a pointer to that blocks data.
378*404b540aSrobert</p>
379*404b540aSrobert<p>
380*404b540aSrobertIf the freelist is empty (the pointer is NULL) we must get memory from the
381*404b540aSrobertsystem and build us a freelist within this memory. All requests for new memory
382*404b540aSrobertis made in chunks of _S_chunk_size. Knowing the size of a block_record and
383*404b540aSrobertthe bytes that this bin stores we then calculate how many blocks we can create
384*404b540aSrobertwithin this chunk, build the list, remove the first block, update the pointer
385*404b540aSrobert(_S_bin[ bin ].first[ 0 ]) and return a pointer to that blocks data.
386*404b540aSrobert</p>
387*404b540aSrobert
388*404b540aSrobert<p>
389*404b540aSrobertDeallocation is equally simple; the pointer is casted back to a block_record
390*404b540aSrobertpointer, lookup which bin to use based on the size, add the block to the front
391*404b540aSrobertof the global freelist and update the pointer as needed
392*404b540aSrobert(_S_bin[ bin ].first[ 0 ]).
393*404b540aSrobert</p>
394*404b540aSrobert
395*404b540aSrobert<p>
396*404b540aSrobertThe decision to add deallocated blocks to the front of the freelist was made
397*404b540aSrobertafter a set of performance measurements that showed that this is roughly 10%
398*404b540aSrobertfaster than maintaining a set of "last pointers" as well.
399*404b540aSrobert</p>
400*404b540aSrobert
401*404b540aSrobert<h3 class="left">
402*404b540aSrobert  <a name="mt_example">A multi threaded example</a>
403*404b540aSrobert</h3>
404*404b540aSrobert
405*404b540aSrobert<p>
406*404b540aSrobertIn the ST example we never used the thread_id variable present in each block.
407*404b540aSrobertLet's start by explaining the purpose of this in a MT application.
408*404b540aSrobert</p>
409*404b540aSrobert
410*404b540aSrobert<p>
411*404b540aSrobertThe concept of "ownership" was introduced since many MT applications
412*404b540aSrobertallocate and deallocate memory to shared containers from different
413*404b540aSrobertthreads (such as a cache shared amongst all threads). This introduces
414*404b540aSroberta problem if the allocator only returns memory to the current threads
415*404b540aSrobertfreelist (I.e., there might be one thread doing all the allocation and
416*404b540aSrobertthus obtaining ever more memory from the system and another thread
417*404b540aSrobertthat is getting a longer and longer freelist - this will in the end
418*404b540aSrobertconsume all available memory).
419*404b540aSrobert</p>
420*404b540aSrobert
421*404b540aSrobert<p>
422*404b540aSrobertEach time a block is moved from the global list (where ownership is
423*404b540aSrobertirrelevant), to a threads freelist (or when a new freelist is built
424*404b540aSrobertfrom a chunk directly onto a threads freelist or when a deallocation
425*404b540aSrobertoccurs on a block which was not allocated by the same thread id as the
426*404b540aSrobertone doing the deallocation) the thread id is set to the current one.
427*404b540aSrobert</p>
428*404b540aSrobert
429*404b540aSrobert<p>
430*404b540aSrobertWhat's the use? Well, when a deallocation occurs we can now look at
431*404b540aSrobertthe thread id and find out if it was allocated by another thread id
432*404b540aSrobertand decrease the used counter of that thread instead, thus keeping the
433*404b540aSrobertfree and used counters correct. And keeping the free and used counters
434*404b540aSrobertcorrects is very important since the relationship between these two
435*404b540aSrobertvariables decides if memory should be returned to the global pool or
436*404b540aSrobertnot when a deallocation occurs.
437*404b540aSrobert</p>
438*404b540aSrobert
439*404b540aSrobert<p>
440*404b540aSrobertWhen the application requests memory (calling allocate()) we first
441*404b540aSrobertlook at the requested size and if this is &gt; _S_max_bytes we call new()
442*404b540aSrobertdirectly and return.
443*404b540aSrobert</p>
444*404b540aSrobert
445*404b540aSrobert<p>
446*404b540aSrobertIf the requested size is within limits we start by finding out from which
447*404b540aSrobertbin we should serve this request by looking in _S_binmap.
448*404b540aSrobert</p>
449*404b540aSrobert
450*404b540aSrobert<p>
451*404b540aSrobertA call to _S_get_thread_id() returns the thread id for the calling thread
452*404b540aSrobert(and if no value has been set in _S_thread_key, a new id is assigned and
453*404b540aSrobertreturned).
454*404b540aSrobert</p>
455*404b540aSrobert
456*404b540aSrobert<p>
457*404b540aSrobertA quick look at _S_bin[ bin ].first[ thread_id ] tells us if there are
458*404b540aSrobertany blocks of this size on the current threads freelist. If this is
459*404b540aSrobertnot NULL - fine, just remove the block that _S_bin[ bin ].first[
460*404b540aSrobertthread_id ] points to from the list, update _S_bin[ bin ].first[
461*404b540aSrobertthread_id ], update the free and used counters and return a pointer to
462*404b540aSrobertthat blocks data.
463*404b540aSrobert</p>
464*404b540aSrobert
465*404b540aSrobert<p>
466*404b540aSrobertIf the freelist is empty (the pointer is NULL) we start by looking at
467*404b540aSrobertthe global freelist (0). If there are blocks available on the global
468*404b540aSrobertfreelist we lock this bins mutex and move up to block_count (the
469*404b540aSrobertnumber of blocks of this bins size that will fit into a _S_chunk_size)
470*404b540aSrobertor until end of list - whatever comes first - to the current threads
471*404b540aSrobertfreelist and at the same time change the thread_id ownership and
472*404b540aSrobertupdate the counters and pointers. When the bins mutex has been
473*404b540aSrobertunlocked, we remove the block that _S_bin[ bin ].first[ thread_id ]
474*404b540aSrobertpoints to from the list, update _S_bin[ bin ].first[ thread_id ],
475*404b540aSrobertupdate the free and used counters, and return a pointer to that blocks
476*404b540aSrobertdata.
477*404b540aSrobert</p>
478*404b540aSrobert
479*404b540aSrobert<p>
480*404b540aSrobertThe reason that the number of blocks moved to the current threads
481*404b540aSrobertfreelist is limited to block_count is to minimize the chance that a
482*404b540aSrobertsubsequent deallocate() call will return the excess blocks to the
483*404b540aSrobertglobal freelist (based on the _S_freelist_headroom calculation, see
484*404b540aSrobertbelow).
485*404b540aSrobert</p>
486*404b540aSrobert
487*404b540aSrobert<p>
488*404b540aSrobertHowever if there isn't any memory on the global pool we need to get
489*404b540aSrobertmemory from the system - this is done in exactly the same way as in a
490*404b540aSrobertsingle threaded application with one major difference; the list built
491*404b540aSrobertin the newly allocated memory (of _S_chunk_size size) is added to the
492*404b540aSrobertcurrent threads freelist instead of to the global.
493*404b540aSrobert</p>
494*404b540aSrobert
495*404b540aSrobert<p>
496*404b540aSrobertThe basic process of a deallocation call is simple: always add the
497*404b540aSrobertblock to the front of the current threads freelist and update the
498*404b540aSrobertcounters and pointers (as described earlier with the specific check of
499*404b540aSrobertownership that causes the used counter of the thread that originally
500*404b540aSrobertallocated the block to be decreased instead of the current threads
501*404b540aSrobertcounter).
502*404b540aSrobert</p>
503*404b540aSrobert
504*404b540aSrobert<p>
505*404b540aSrobertAnd here comes the free and used counters to service. Each time a
506*404b540aSrobertdeallocation() call is made, the length of the current threads
507*404b540aSrobertfreelist is compared to the amount memory in use by this thread.
508*404b540aSrobert</p>
509*404b540aSrobert
510*404b540aSrobert<p>
511*404b540aSrobertLet's go back to the example of an application that has one thread
512*404b540aSrobertthat does all the allocations and one that deallocates. Both these
513*404b540aSrobertthreads use say 516 32-byte blocks that was allocated during thread
514*404b540aSrobertcreation for example.  Their used counters will both say 516 at this
515*404b540aSrobertpoint. The allocation thread now grabs 1000 32-byte blocks and puts
516*404b540aSrobertthem in a shared container. The used counter for this thread is now
517*404b540aSrobert1516.
518*404b540aSrobert</p>
519*404b540aSrobert
520*404b540aSrobert<p>
521*404b540aSrobertThe deallocation thread now deallocates 500 of these blocks. For each
522*404b540aSrobertdeallocation made the used counter of the allocating thread is
523*404b540aSrobertdecreased and the freelist of the deallocation thread gets longer and
524*404b540aSrobertlonger. But the calculation made in deallocate() will limit the length
525*404b540aSrobertof the freelist in the deallocation thread to _S_freelist_headroom %
526*404b540aSrobertof it's used counter.  In this case, when the freelist (given that the
527*404b540aSrobert_S_freelist_headroom is at it's default value of 10%) exceeds 52
528*404b540aSrobert(516/10) blocks will be returned to the global pool where the
529*404b540aSrobertallocating thread may pick them up and reuse them.
530*404b540aSrobert</p>
531*404b540aSrobert
532*404b540aSrobert<p>
533*404b540aSrobertIn order to reduce lock contention (since this requires this bins
534*404b540aSrobertmutex to be locked) this operation is also made in chunks of blocks
535*404b540aSrobert(just like when chunks of blocks are moved from the global freelist to
536*404b540aSroberta threads freelist mentioned above). The "formula" used can probably
537*404b540aSrobertbe improved to further reduce the risk of blocks being "bounced back
538*404b540aSrobertand forth" between freelists.
539*404b540aSrobert</p>
540*404b540aSrobert
541*404b540aSrobert<hr />
542*404b540aSrobert<p>Return <a href="#top">to the top of the page</a> or
543*404b540aSrobert   <a href="http://gcc.gnu.org/libstdc++/">to the libstdc++ homepage</a>.
544*404b540aSrobert</p>
545*404b540aSrobert
546*404b540aSrobert
547*404b540aSrobert<!-- ####################################################### -->
548*404b540aSrobert
549*404b540aSrobert<hr />
550*404b540aSrobert<p class="fineprint"><em>
551*404b540aSrobertSee <a href="../17_intro/license.html">license.html</a> for copying conditions.
552*404b540aSrobertComments and suggestions are welcome, and may be sent to
553*404b540aSrobert<a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>.
554*404b540aSrobert</em></p>
555*404b540aSrobert
556*404b540aSrobert
557*404b540aSrobert</body>
558*404b540aSrobert</html>
559