xref: /openbsd-src/gnu/gcc/libstdc++-v3/docs/html/ext/pb_ds/motivation.html (revision 404b540a9034ac75a6199ad1a32d1bbc7a0d4210)
1*404b540aSrobert<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2*404b540aSrobert    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3*404b540aSrobert
4*404b540aSrobert<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
5*404b540aSrobert<head>
6*404b540aSrobert  <meta name="generator" content=
7*404b540aSrobert  "HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" />
8*404b540aSrobert
9*404b540aSrobert  <title>Motivation</title>
10*404b540aSrobert  <meta http-equiv="Content-Type" content=
11*404b540aSrobert  "text/html; charset=us-ascii" />
12*404b540aSrobert  </head>
13*404b540aSrobert
14*404b540aSrobert<body>
15*404b540aSrobert  <div id="page">
16*404b540aSrobert    <h1>Motivation</h1>
17*404b540aSrobert
18*404b540aSrobert    <p>Many fine associative-container libraries were already
19*404b540aSrobert    written, most notably, the STL's associative containers. Why
20*404b540aSrobert    then write another library? This section shows some possible
21*404b540aSrobert    advantages of this library, when considering the challenges in
22*404b540aSrobert    <a href="introduction.html">Introduction</a>. Many of these
23*404b540aSrobert    points stem from the fact that the STL introduced
24*404b540aSrobert    associative-containers in a two-step process (first
25*404b540aSrobert    standardizing tree-based containers, only then adding
26*404b540aSrobert    hash-based containers, which are fundamentally different), did
27*404b540aSrobert    not standardize priority queues as containers, and (in our
28*404b540aSrobert    opinion) overloads the iterator concept.</p>
29*404b540aSrobert
30*404b540aSrobert    <h2><a name="assoc" id="assoc">Associative Containers</a></h2>
31*404b540aSrobert
32*404b540aSrobert    <h3><a name="assoc_policies" id="assoc_policies">More
33*404b540aSrobert    Configuration Choices</a></h3>
34*404b540aSrobert
35*404b540aSrobert    <p>Associative containers require a relatively large number of
36*404b540aSrobert    policies to function efficiently in various settings. In some
37*404b540aSrobert    cases this is needed for making their common operations more
38*404b540aSrobert    efficient, and in other cases this allows them to support a
39*404b540aSrobert    larger set of operations</p>
40*404b540aSrobert
41*404b540aSrobert    <ol>
42*404b540aSrobert      <li>Hash-based containers, for example, support look-up and
43*404b540aSrobert      insertion methods (<i>e.g.</i>, <tt>find</tt> and
44*404b540aSrobert      <tt>insert</tt>). In order to locate elements quickly, they
45*404b540aSrobert      are supplied a hash functor, which instruct how to transform
46*404b540aSrobert      a key object into some size type; <i>e.g.</i>, a hash functor
47*404b540aSrobert      might transform <tt>"hello"</tt> into <tt>1123002298</tt>. A
48*404b540aSrobert      hash table, though, requires transforming each key object
49*404b540aSrobert      into some size-type type in some specific domain;
50*404b540aSrobert      <i>e.g.</i>, a hash table with a 128-long table might
51*404b540aSrobert      transform <tt>"hello"</tt> into position 63. The policy by
52*404b540aSrobert      which the hash value is transformed into a position within
53*404b540aSrobert      the table can dramatically affect performance (see <a href=
54*404b540aSrobert      "hash_based_containers.html#hash_policies">Design::Associative
55*404b540aSrobert      Containers::Hash-Based Containers::Hash Policies</a>).
56*404b540aSrobert      Hash-based containers also do not resize naturally (as
57*404b540aSrobert      opposed to tree-based containers, for example). The
58*404b540aSrobert      appropriate resize policy is unfortunately intertwined with
59*404b540aSrobert      the policy that transforms hash value into a position within
60*404b540aSrobert      the table (see <a href=
61*404b540aSrobert      "hash_based_containers.html#resize_policies">Design::Associative
62*404b540aSrobert      Containers::Hash-Based Containers::Resize Policies</a>).
63*404b540aSrobert
64*404b540aSrobert        <p><a href=
65*404b540aSrobert        "assoc_performance_tests.html#hash_based">Associative-Container
66*404b540aSrobert        Performance Tests::Hash-Based Containers</a> quantifies
67*404b540aSrobert        some of these points.</p>
68*404b540aSrobert      </li>
69*404b540aSrobert
70*404b540aSrobert      <li>Tree-based containers, for example, also support look-up
71*404b540aSrobert      and insertion methods, and are primarily useful when
72*404b540aSrobert      maintaining order between elements is important. In some
73*404b540aSrobert      cases, though, one can utilize their balancing algorithms for
74*404b540aSrobert      completely different purposes.
75*404b540aSrobert
76*404b540aSrobert        <p>Figure <a href="#node_invariants">Metadata for
77*404b540aSrobert        order-statistics and interval intersections</a>-A, for
78*404b540aSrobert        example, shows a tree whose each node contains two entries:
79*404b540aSrobert        a floating-point key, and some size-type <i>metadata</i>
80*404b540aSrobert        (in bold beneath it) that is the number of nodes in the
81*404b540aSrobert        sub-tree. (<i>E.g.</i>, the root has key 0.99, and has 5
82*404b540aSrobert        nodes (including itself) in its sub-tree.) A container based
83*404b540aSrobert        on this data structure can obviously answer efficiently
84*404b540aSrobert        whether 0.3 is in the container object, but it can also
85*404b540aSrobert        answer what is the order of 0.3 among all those in the
86*404b540aSrobert        container object [<a href=
87*404b540aSrobert        "references.html#clrs2001">clrs2001</a>] (see <a href=
88*404b540aSrobert        "assoc_examples.html#tree_like_based">Associative Container
89*404b540aSrobert        Examples::Tree-Like-Based Containers (Trees and
90*404b540aSrobert        Tries)</a>).</p>
91*404b540aSrobert
92*404b540aSrobert        <p>As another example, Figure <a href=
93*404b540aSrobert        "#node_invariants">Metadata for order-statistics and
94*404b540aSrobert        interval intersections</a>-B shows a tree whose each node
95*404b540aSrobert        contains two entries: a half-open geometric line interval,
96*404b540aSrobert        and a number <i>metadata</i> (in bold beneath it) that is
97*404b540aSrobert        the largest endpoint of all intervals in its sub-tree.
98*404b540aSrobert        (<i>E.g.</i>, the root describes the interval <i>[20,
99*404b540aSrobert        36)</i>, and the largest endpoint in its sub-tree is 99.) A
100*404b540aSrobert        container based on this data structure can obviously answer
101*404b540aSrobert        efficiently whether <i>[3, 41)</i> is in the container
102*404b540aSrobert        object, but it can also answer efficiently whether the
103*404b540aSrobert        container object has intervals that intersect <i>[3,
104*404b540aSrobert        41)</i> (see <a href=
105*404b540aSrobert        "assoc_examples.html#tree_like_based">Associative Container
106*404b540aSrobert        Examples::Tree-Like-Based Containers (Trees and
107*404b540aSrobert        Tries)</a>). These types of queries are very useful in
108*404b540aSrobert        geometric algorithms and lease-management algorithms.</p>
109*404b540aSrobert
110*404b540aSrobert        <p>It is important to note, however, that as the trees are
111*404b540aSrobert        modified, their internal structure changes. To maintain
112*404b540aSrobert        these invariants, one must supply some policy that is aware
113*404b540aSrobert        of these changes (see <a href=
114*404b540aSrobert        "tree_based_containers.html#invariants">Design::Associative
115*404b540aSrobert        Containers::Tree-Based Containers::Node Invariants</a>);
116*404b540aSrobert        without this, it would be better to use a linked list (in
117*404b540aSrobert        itself very efficient for these purposes).</p>
118*404b540aSrobert
119*404b540aSrobert        <p><a href=
120*404b540aSrobert        "assoc_performance_tests.html#tree_like_based">Associative-Container
121*404b540aSrobert        Performance Tests::Tree-Like-Based Containers</a>
122*404b540aSrobert        quantifies some of these points.</p>
123*404b540aSrobert      </li>
124*404b540aSrobert    </ol>
125*404b540aSrobert
126*404b540aSrobert    <h6 class="c1"><a name="node_invariants" id=
127*404b540aSrobert    "node_invariants"><img src="node_invariants.png" alt=
128*404b540aSrobert    "no image" /></a></h6>
129*404b540aSrobert
130*404b540aSrobert    <h6 class="c1">Metadata for order-statistics and interval
131*404b540aSrobert    intersections.</h6>
132*404b540aSrobert
133*404b540aSrobert    <h3><a name="assoc_ds_genericity" id="assoc_ds_genericity">More
134*404b540aSrobert    Data Structures and Traits</a></h3>
135*404b540aSrobert
136*404b540aSrobert    <p>The STL contains associative containers based on red-black
137*404b540aSrobert    trees and collision-chaining hash tables. These are obviously
138*404b540aSrobert    very useful, but they are not ideal for all types of
139*404b540aSrobert    settings.</p>
140*404b540aSrobert
141*404b540aSrobert    <p>Figure <a href=
142*404b540aSrobert    "#different_underlying_data_structures">Different underlying
143*404b540aSrobert    data structures</a> shows different underlying data structures
144*404b540aSrobert    (the ones currently supported in <tt>pb_ds</tt>). A shows a
145*404b540aSrobert    collision-chaining hash-table, B shows a probing hash-table, C
146*404b540aSrobert    shows a red-black tree, D shows a splay tree, E shows a tree
147*404b540aSrobert    based on an ordered vector(implicit in the order of the
148*404b540aSrobert    elements), F shows a PATRICIA trie, and G shows a list-based
149*404b540aSrobert    container with update policies.</p>
150*404b540aSrobert
151*404b540aSrobert    <p>Each of these data structures has some performance benefits,
152*404b540aSrobert    in terms of speed, size or both (see <a href=
153*404b540aSrobert    "assoc_performance_tests.html">Associative-Container
154*404b540aSrobert    Performance Tests</a> and <a href=
155*404b540aSrobert    "assoc_performance_tests.html#dss_family_choice">Associative-Container
156*404b540aSrobert    Performance Tests::Observations::Underlying Data-Structure
157*404b540aSrobert    Families</a>). For now, though, note that <i>e.g.</i>,
158*404b540aSrobert    vector-based trees and probing hash tables manipulate memory
159*404b540aSrobert    more efficiently than red-black trees and collision-chaining
160*404b540aSrobert    hash tables, and that list-based associative containers are
161*404b540aSrobert    very useful for constructing "multimaps" (see <a href=
162*404b540aSrobert    "#assoc_mapping_semantics">Alternative to Multiple Equivalent
163*404b540aSrobert    Keys</a>, <a href=
164*404b540aSrobert    "assoc_performance_tests.html#multimaps">Associative Container
165*404b540aSrobert    Performance Tests::Multimaps</a>, and <a href=
166*404b540aSrobert    "assoc_performance_tests.html#msc">Associative Container
167*404b540aSrobert    Performance Tests::Observations::Mapping-Semantics
168*404b540aSrobert    Considerations</a>).</p>
169*404b540aSrobert
170*404b540aSrobert    <h6 class="c1"><a name="different_underlying_data_structures"
171*404b540aSrobert    id="different_underlying_data_structures"><img src=
172*404b540aSrobert    "different_underlying_dss.png" alt="no image" /></a></h6>
173*404b540aSrobert
174*404b540aSrobert    <h6 class="c1">Different underlying data structures.</h6>
175*404b540aSrobert
176*404b540aSrobert    <p>Now consider a function manipulating a generic associative
177*404b540aSrobert    container, <i>e.g.</i>,</p>
178*404b540aSrobert    <pre>
179*404b540aSrobert<b>template</b>&lt;
180*404b540aSrobert    <b>class</b> Cntnr&gt;
181*404b540aSrobert<b>int</b>
182*404b540aSrobert    some_op_sequence
183*404b540aSrobert    (Cntnr &amp;r_cnt)
184*404b540aSrobert{
185*404b540aSrobert    ...
186*404b540aSrobert}
187*404b540aSrobert</pre>
188*404b540aSrobert
189*404b540aSrobert    <p>Ideally, the underlying data structure of <tt>Cntnr</tt>
190*404b540aSrobert    would not affect what can be done with <tt>r_cnt</tt>.
191*404b540aSrobert    Unfortunately, this is not the case.</p>
192*404b540aSrobert
193*404b540aSrobert    <p>For example, if <tt>Cntnr</tt> is <tt>std::map</tt>, then
194*404b540aSrobert    the function can use <tt>std::for_each(r_cnt.find(foo),
195*404b540aSrobert    r_cnt.find(bar), foobar)</tt> in order to apply <tt>foobar</tt>
196*404b540aSrobert    to all elements between <tt>foo</tt> and <tt>bar</tt>. If
197*404b540aSrobert    <tt>Cntnr</tt> is a hash-based container, then this call's
198*404b540aSrobert    results are undefined.</p>
199*404b540aSrobert
200*404b540aSrobert    <p>Also, if <tt>Cntnr</tt> is tree-based, the type and object
201*404b540aSrobert    of the comparison functor can be accessed. If <tt>Cntnr</tt> is
202*404b540aSrobert    hash based, these queries are nonsensical.</p>
203*404b540aSrobert
204*404b540aSrobert    <p>There are various other differences based on the container's
205*404b540aSrobert    underlying data structure. For one, they can be constructed by,
206*404b540aSrobert    and queried for, different policies. Furthermore:</p>
207*404b540aSrobert
208*404b540aSrobert    <ol>
209*404b540aSrobert      <li>Containers based on C, D, E and F store elements in a
210*404b540aSrobert      meaningful order; the others store elements in a meaningless
211*404b540aSrobert      (and probably time-varying) order. By implication, only
212*404b540aSrobert      containers based on C, D, E and F can support erase
213*404b540aSrobert      operations taking an iterator and returning an iterator to
214*404b540aSrobert      the following element without performance loss (see <a href=
215*404b540aSrobert      "#assoc_ers_methods">Slightly Different Methods::Methods
216*404b540aSrobert      Related to Erase</a>).</li>
217*404b540aSrobert
218*404b540aSrobert      <li>Containers based on C, D, E, and F can be split and
219*404b540aSrobert      joined efficiently, while the others cannot. Containers based
220*404b540aSrobert      on C and D, furthermore, can guarantee that this is
221*404b540aSrobert      exception-free; containers based on E cannot guarantee
222*404b540aSrobert      this.</li>
223*404b540aSrobert
224*404b540aSrobert      <li>Containers based on all but E can guarantee that erasing
225*404b540aSrobert      an element is exception free; containers based on E cannot
226*404b540aSrobert      guarantee this. Containers based on all but B and E can
227*404b540aSrobert      guarantee that modifying an object of their type does not
228*404b540aSrobert      invalidate iterators or references to their elements, while
229*404b540aSrobert      containers based on B and E cannot. Containers based on C, D,
230*404b540aSrobert      and E can furthermore make a stronger guarantee, namely that
231*404b540aSrobert      modifying an object of their type does not affect the order
232*404b540aSrobert      of iterators.</li>
233*404b540aSrobert    </ol>
234*404b540aSrobert
235*404b540aSrobert    <p>A unified tag and traits system (as used for the STL's
236*404b540aSrobert    iterators, for example) can ease generic manipulation of
237*404b540aSrobert    associative containers based on different underlying
238*404b540aSrobert    data structures (see <a href=
239*404b540aSrobert    "tutorial.html#assoc_ds_gen">Tutorial::Associative
240*404b540aSrobert    Containers::Determining Containers' Attributes</a> and <a href=
241*404b540aSrobert    "ds_gen.html#container_traits">Design::Associative
242*404b540aSrobert    Containers::Data-Structure Genericity::Data-Structure Tags and
243*404b540aSrobert    Traits</a>).</p>
244*404b540aSrobert
245*404b540aSrobert    <h3><a name="assoc_diff_it" id="assoc_diff_it">Differentiating
246*404b540aSrobert    between Iterator Types</a></h3>
247*404b540aSrobert
248*404b540aSrobert    <p>Iterators are centric to the STL's design, because of the
249*404b540aSrobert    container/algorithm/iterator decomposition that allows an
250*404b540aSrobert    algorithm to operate on a range through iterators of some
251*404b540aSrobert    sequence (<i>e.g.</i>, one originating from a container).
252*404b540aSrobert    Iterators, then, are useful because they allow going over a
253*404b540aSrobert    <u>sequence</u>. The STL also uses iterators for accessing a
254*404b540aSrobert    <u>specific</u> element - <i>e.g.</i>, when an associative
255*404b540aSrobert    container returns one through <tt>find</tt>. The STL, however,
256*404b540aSrobert    consistently uses the same types of iterators for both
257*404b540aSrobert    purposes: going over a range, and accessing a specific found
258*404b540aSrobert    element. Before the introduction of hash-based containers to
259*404b540aSrobert    the STL, this made sense (with the exception of priority
260*404b540aSrobert    queues, which are discussed in <a href="#pq">Priority
261*404b540aSrobert    Queues</a>).</p>
262*404b540aSrobert
263*404b540aSrobert    <p>Using the STL's associative containers together with
264*404b540aSrobert    non-order-preserving associative containers (and also because
265*404b540aSrobert    of priority-queues container), there is a possible need for
266*404b540aSrobert    different types of iterators for self-organizing containers -
267*404b540aSrobert    the iterator concept seems overloaded to mean two different
268*404b540aSrobert    things (in some cases). The following subsections explain this;
269*404b540aSrobert    <a href="tutorial.html#assoc_find_range">Tutorial::Associative
270*404b540aSrobert    Containers::Point-Type and Range-Type Methods</a> explains an
271*404b540aSrobert    alternative design which does not complicate the use of
272*404b540aSrobert    order-preserving containers, but is better for unordered
273*404b540aSrobert    containers; <a href=
274*404b540aSrobert    "ds_gen.html#find_range">Design::Associative
275*404b540aSrobert    Containers::Data-Structure Genericity::Point-Type and
276*404b540aSrobert    Range-Type Methods</a> explains the design further.</p>
277*404b540aSrobert
278*404b540aSrobert    <h4><a name="assoc_find_it_range_it" id=
279*404b540aSrobert    "assoc_find_it_range_it">Using Point-Type Iterators for
280*404b540aSrobert    Range-Type Operations</a></h4>
281*404b540aSrobert
282*404b540aSrobert    <p>Suppose <tt>cntnr</tt> is some associative container, and
283*404b540aSrobert    say <tt>c</tt> is an object of type <tt>cntnr</tt>. Then what
284*404b540aSrobert    will be the outcome of</p>
285*404b540aSrobert    <pre>
286*404b540aSrobertstd::for_each(c.find(1), c.find(5), foo);
287*404b540aSrobert</pre>
288*404b540aSrobert
289*404b540aSrobert    <p>If <tt>cntnr</tt> is a tree-based container object, then an
290*404b540aSrobert    in-order walk will apply <tt>foo</tt> to the relevant elements,
291*404b540aSrobert    <i>e.g.</i>, as in Figure <a href="#range_it_in_hts">Range
292*404b540aSrobert    iteration in different data structures</a> -A. If <tt>c</tt> is
293*404b540aSrobert    a hash-based container, then the order of elements between any
294*404b540aSrobert    two elements is undefined (and probably time-varying); there is
295*404b540aSrobert    no guarantee that the elements traversed will coincide with the
296*404b540aSrobert    <i>logical</i> elements between 1 and 5, <i>e.g.</i>, as in
297*404b540aSrobert    Figure <a href="#range_it_in_hts">Range iteration in different
298*404b540aSrobert    data structures</a>-B.</p>
299*404b540aSrobert
300*404b540aSrobert    <h6 class="c1"><a name="range_it_in_hts" id=
301*404b540aSrobert    "range_it_in_hts"><img src="point_iterators_range_ops_1.png"
302*404b540aSrobert    alt="no image" /></a></h6>
303*404b540aSrobert
304*404b540aSrobert    <h6 class="c1">Range iteration in different
305*404b540aSrobert    data structures.</h6>
306*404b540aSrobert
307*404b540aSrobert    <p>In our opinion, this problem is not caused just because
308*404b540aSrobert    red-black trees are order preserving while collision-chaining
309*404b540aSrobert    hash tables are (generally) not - it is more fundamental. Most
310*404b540aSrobert    of the STL's containers order sequences in a well-defined
311*404b540aSrobert    manner that is determined by their <u>interface</u>: calling
312*404b540aSrobert    <tt>insert</tt> on a tree-based container modifies its sequence
313*404b540aSrobert    in a predictable way, as does calling <tt>push_back</tt> on a
314*404b540aSrobert    list or a vector. Conversely, collision-chaining hash tables,
315*404b540aSrobert    probing hash tables, priority queues, and list-based containers
316*404b540aSrobert    (which are very useful for "multimaps") are self-organizing
317*404b540aSrobert    data structures; the effect of each operation modifies their
318*404b540aSrobert    sequences in a manner that is (practically) determined by their
319*404b540aSrobert    <u>implementation</u>.</p>
320*404b540aSrobert
321*404b540aSrobert    <p>Consequently, applying an algorithm to a sequence obtained
322*404b540aSrobert    from most containers <u>may or may not</u> make sense, but
323*404b540aSrobert    applying it to a sub-sequence of a self-organizing container
324*404b540aSrobert    <u>does not</u>.</p>
325*404b540aSrobert
326*404b540aSrobert    <h4><a name="assoc_range_it_for_find_it" id=
327*404b540aSrobert    "assoc_range_it_for_find_it">The Cost of Enabling Range
328*404b540aSrobert    Capabilities to Point-Type Iterators</a></h4>
329*404b540aSrobert
330*404b540aSrobert    <p>Suppose <tt>c</tt> is some collision-chaining hash-based
331*404b540aSrobert    container object, and one calls <tt>c.find(3)</tt>. Then what
332*404b540aSrobert    composes the returned iterator?</p>
333*404b540aSrobert
334*404b540aSrobert    <p>Figure <a href="#find_its_in_hash_tables">Point-type
335*404b540aSrobert    iterators in hash tables</a>-A shows the simplest (and most
336*404b540aSrobert    efficient) implementation of a collision-chaining hash table.
337*404b540aSrobert    The little box marked <tt>point_iterator</tt> shows an object
338*404b540aSrobert    that contains a pointer to the element's node. Note that this
339*404b540aSrobert    "iterator" has no way to move to the next element (<i>i.e.</i>,
340*404b540aSrobert    it cannot support <tt><b>operator</b>++</tt>). Conversely, the
341*404b540aSrobert    little box marked <tt>iterator</tt> stores both a pointer to
342*404b540aSrobert    the element, as well as some other information (<i>e.g.</i>,
343*404b540aSrobert    the bucket number of the element). the second iterator, then,
344*404b540aSrobert    is "heavier" than the first one- it requires more time and
345*404b540aSrobert    space. If we were to use a different container to
346*404b540aSrobert    cross-reference into this hash-table using these iterators - it
347*404b540aSrobert    would take much more space. As noted in <a href=
348*404b540aSrobert    "#assoc_find_it_range_it">Using Point-Type Iterators for
349*404b540aSrobert    Range-Type Operations</a>, nothing much can be done by
350*404b540aSrobert    incrementing these iterators, so why is this extra information
351*404b540aSrobert    needed?</p>
352*404b540aSrobert
353*404b540aSrobert    <p>Alternatively, one might create a collision-chaining
354*404b540aSrobert    hash-table where the lists might be linked, forming a
355*404b540aSrobert    monolithic total-element list, as in Figure <a href=
356*404b540aSrobert    "#find_its_in_hash_tables">Point-type iterators in hash
357*404b540aSrobert    tables</a>-B (this seems similar to the Dinkumware design
358*404b540aSrobert    [<a href="references.html#dinkumware_stl">dinkumware_stl</a>]).
359*404b540aSrobert    Here the iterators are as light as can be, but the hash-table's
360*404b540aSrobert    operations are more complicated.</p>
361*404b540aSrobert
362*404b540aSrobert    <h6 class="c1"><a name="find_its_in_hash_tables" id=
363*404b540aSrobert    "find_its_in_hash_tables"><img src=
364*404b540aSrobert    "point_iterators_range_ops_2.png" alt="no image" /></a></h6>
365*404b540aSrobert
366*404b540aSrobert    <h6 class="c1">Point-type iterators in hash tables.</h6>
367*404b540aSrobert
368*404b540aSrobert    <p>It should be noted that containers based on
369*404b540aSrobert    collision-chaining hash-tables are not the only ones with this
370*404b540aSrobert    type of behavior; many other self-organizing data structures
371*404b540aSrobert    display it as well.</p>
372*404b540aSrobert
373*404b540aSrobert    <h4><a name="assoc_inv_guar" id="assoc_inv_guar">Invalidation
374*404b540aSrobert    Guarantees</a></h4>
375*404b540aSrobert
376*404b540aSrobert    <p>Consider the following snippet:</p>
377*404b540aSrobert    <pre>
378*404b540aSrobertit = c.find(3);
379*404b540aSrobert
380*404b540aSrobertc.erase(5);
381*404b540aSrobert</pre>
382*404b540aSrobert
383*404b540aSrobert    <p>Following the call to <tt>erase</tt>, what is the validity
384*404b540aSrobert    of <tt>it</tt>: can it be de-referenced? can it be
385*404b540aSrobert    incremented?</p>
386*404b540aSrobert
387*404b540aSrobert    <p>The answer depends on the underlying data structure of the
388*404b540aSrobert    container. Figure <a href=
389*404b540aSrobert    "#invalidation_guarantee_erase">Effect of erase in different
390*404b540aSrobert    underlying data structures</a> shows three cases: A1 and A2
391*404b540aSrobert    show a red-black tree; B1 and B2 show a probing hash-table; C1
392*404b540aSrobert    and C2 show a collision-chaining hash table.</p>
393*404b540aSrobert
394*404b540aSrobert    <h6 class="c1"><a name="invalidation_guarantee_erase" id=
395*404b540aSrobert    "invalidation_guarantee_erase"><img src=
396*404b540aSrobert    "invalidation_guarantee_erase.png" alt="no image" /></a></h6>
397*404b540aSrobert
398*404b540aSrobert    <h6 class="c1">Effect of erase in different underlying
399*404b540aSrobert    data structures.</h6>
400*404b540aSrobert
401*404b540aSrobert    <ol>
402*404b540aSrobert      <li>Erasing 5 from A1 yields A2. Clearly, an iterator to 3
403*404b540aSrobert      can be de-referenced and incremented. The sequence of
404*404b540aSrobert      iterators changed, but in a way that is well-defined by the
405*404b540aSrobert      <u>interface</u>.</li>
406*404b540aSrobert
407*404b540aSrobert      <li>Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
408*404b540aSrobert      not valid at all - it cannot be de-referenced or incremented;
409*404b540aSrobert      the order of iterators changed in a way that is (practically)
410*404b540aSrobert      determined by the <u>implementation</u> and not by the
411*404b540aSrobert      <u>interface</u>.</li>
412*404b540aSrobert
413*404b540aSrobert      <li>Erasing 5 from C1 yields C2. Here the situation is more
414*404b540aSrobert      complicated. On the one hand, there is no problem in
415*404b540aSrobert      de-referencing <tt>it</tt>. On the other hand, the order of
416*404b540aSrobert      iterators changed in a way that is (practically) determined
417*404b540aSrobert      by the <u>implementation</u> and not by the
418*404b540aSrobert      <u>interface</u>.</li>
419*404b540aSrobert    </ol>
420*404b540aSrobert
421*404b540aSrobert    <p>So in classic STL, it is not always possible to express
422*404b540aSrobert    whether <tt>it</tt> is valid or not. This is true also for
423*404b540aSrobert    <tt>insert</tt>, <i>e.g.</i>. Again, the iterator concept seems
424*404b540aSrobert    overloaded.</p>
425*404b540aSrobert
426*404b540aSrobert    <h3><a name="assoc_methods" id="assoc_methods">Slightly
427*404b540aSrobert    Different Methods</a></h3>
428*404b540aSrobert
429*404b540aSrobert    <p>[<a href="references.html#meyers02both">meyers02both</a>]
430*404b540aSrobert    points out that a class's methods should comprise only
431*404b540aSrobert    operations which depend on the class's internal structure;
432*404b540aSrobert    other operations are best designed as external functions.
433*404b540aSrobert    Possibly, therefore, the STL's associative containers lack some
434*404b540aSrobert    useful methods, and provide some other methods which would be
435*404b540aSrobert    better left out (<i>e.g.</i>, [<a href=
436*404b540aSrobert    "references.html#sgi_stl">sgi_stl</a>] ).</p>
437*404b540aSrobert
438*404b540aSrobert    <h4><a name="assoc_ers_methods" id="assoc_ers_methods">Methods
439*404b540aSrobert    Related to Erase</a></h4>
440*404b540aSrobert
441*404b540aSrobert    <ol>
442*404b540aSrobert      <li>Order-preserving STL associative containers provide the
443*404b540aSrobert      method
444*404b540aSrobert        <pre>
445*404b540aSrobertiterator
446*404b540aSrobert    erase
447*404b540aSrobert    (iterator it)
448*404b540aSrobert</pre>which takes an iterator, erases the corresponding element,
449*404b540aSrobertand returns an iterator to the following element. Also hash-based
450*404b540aSrobertSTL associative containers provide this method. This <u>seemingly
451*404b540aSrobertincreases</u> genericity between associative containers, since, <i>
452*404b540aSrobert        e.g.</i>, it is possible to use
453*404b540aSrobert        <pre>
454*404b540aSrobert<b>typename</b> C::iterator it = c.begin();
455*404b540aSrobert<b>typename</b> C::iterator e_it = c.end();
456*404b540aSrobert
457*404b540aSrobert<b>while</b>(it != e_it)
458*404b540aSrobert    it = pred(*it)? c.erase(it) : ++it;
459*404b540aSrobert</pre>in order to erase from a container object <tt>
460*404b540aSrobert        c</tt> all element which match a predicate <tt>pred</tt>.
461*404b540aSrobert        However, in a different sense this actually
462*404b540aSrobert        <u>decreases</u> genericity: an integral implication of
463*404b540aSrobert        this method is that tree-based associative containers'
464*404b540aSrobert        memory use is linear in the total number of elements they
465*404b540aSrobert        store, while hash-based containers' memory use is unbounded
466*404b540aSrobert        in the total number of elements they store. Assume a
467*404b540aSrobert        hash-based container is allowed to decrease its size when
468*404b540aSrobert        an element is erased. Then the elements might be rehashed,
469*404b540aSrobert        which means that there is no "next" element - it is simply
470*404b540aSrobert        undefined. Consequently, it is possible to infer from the
471*404b540aSrobert        fact that STL hash-based containers provide this method
472*404b540aSrobert        that they cannot downsize when elements are erased
473*404b540aSrobert        (<a href="assoc_performance_tests.html#hash_based">Performance
474*404b540aSrobert        Tests::Hash-Based Container Tests</a> quantifies this.) As
475*404b540aSrobert        a consequence, different code is needed to manipulate
476*404b540aSrobert        different containers, assuming that memory should be
477*404b540aSrobert        conserved. <tt>pb_ds</tt>'s non-order preserving
478*404b540aSrobert        associative containers omit this method.
479*404b540aSrobert      </li>
480*404b540aSrobert
481*404b540aSrobert      <li>All of <tt>pb_ds</tt>'s associative containers include a
482*404b540aSrobert      conditional-erase method
483*404b540aSrobert        <pre>
484*404b540aSrobert<b>template</b>&lt;
485*404b540aSrobert    <b>class</b> Pred&gt;
486*404b540aSrobertsize_type
487*404b540aSrobert    erase_if
488*404b540aSrobert    (Pred pred)
489*404b540aSrobert</pre>which erases all elements matching a predicate. This is
490*404b540aSrobertprobably the only way to ensure linear-time multiple-item erase
491*404b540aSrobertwhich can actually downsize a container.
492*404b540aSrobert      </li>
493*404b540aSrobert
494*404b540aSrobert      <li>STL associative containers provide methods for
495*404b540aSrobert      multiple-item erase of the form
496*404b540aSrobert        <pre>
497*404b540aSrobertsize_type
498*404b540aSrobert    erase
499*404b540aSrobert    (It b,
500*404b540aSrobert        It e)
501*404b540aSrobert</pre>erasing a range of elements given by a pair of iterators. For
502*404b540aSroberttree-based or trie-based containers, this can implemented more
503*404b540aSrobertefficiently as a (small) sequence of split and join operations. For
504*404b540aSrobertother, unordered, containers, this method isn't much better than an
505*404b540aSrobertexternal loop. Moreover, if <tt>c</tt> is a hash-based container,
506*404b540aSrobertthen, <i>e.g.</i>, <tt>c.erase(c.find(2), c.find(5))</tt> is almost
507*404b540aSrobertcertain to do something different than erasing all elements whose
508*404b540aSrobertkeys are between 2 and 5, and is likely to produce other undefined
509*404b540aSrobertbehavior.
510*404b540aSrobert      </li>
511*404b540aSrobert    </ol>
512*404b540aSrobert
513*404b540aSrobert    <h4><a name="assoc_split_join_methods" id=
514*404b540aSrobert    "assoc_split_join_methods">Methods Related to Split and
515*404b540aSrobert    Join</a></h4>
516*404b540aSrobert
517*404b540aSrobert    <p>It is well-known that tree-based and trie-based container
518*404b540aSrobert    objects can be efficiently split or joined [<a href=
519*404b540aSrobert    "references.html#clrs2001">clrs2001</a>]. Externally splitting
520*404b540aSrobert    or joining trees is super-linear, and, furthermore, can throw
521*404b540aSrobert    exceptions. Split and join methods, consequently, seem good
522*404b540aSrobert    choices for tree-based container methods, especially, since as
523*404b540aSrobert    noted just before, they are efficient replacements for erasing
524*404b540aSrobert    sub-sequences. <a href=
525*404b540aSrobert    "assoc_performance_tests.html#tree_like_based">Performance
526*404b540aSrobert    Tests::Tree-Like-Based Container Tests</a> quantifies this.</p>
527*404b540aSrobert
528*404b540aSrobert    <h4><a name="assoc_insert_methods" id=
529*404b540aSrobert    "assoc_insert_methods">Methods Related to Insert</a></h4>
530*404b540aSrobert
531*404b540aSrobert    <p>STL associative containers provide methods of the form</p>
532*404b540aSrobert    <pre>
533*404b540aSrobert<b>template</b>&lt;
534*404b540aSrobert    <b>class</b> It&gt;
535*404b540aSrobertsize_type
536*404b540aSrobert    insert
537*404b540aSrobert    (It b,
538*404b540aSrobert        It e);
539*404b540aSrobert</pre>for inserting a range of elements given by a pair of
540*404b540aSrobertiterators. At best, this can be implemented as an external loop,
541*404b540aSrobertor, even more efficiently, as a join operation (for the case of
542*404b540aSroberttree-based or trie-based containers). Moreover, these methods seem
543*404b540aSrobertsimilar to constructors taking a range given by a pair of
544*404b540aSrobertiterators; the constructors, however, are transactional, whereas
545*404b540aSrobertthe insert methods are not; this is possibly confusing.
546*404b540aSrobert
547*404b540aSrobert    <h4><a name="assoc_equiv_comp_methods" id=
548*404b540aSrobert    "assoc_equiv_comp_methods">Functions Related to
549*404b540aSrobert    Comparison</a></h4>
550*404b540aSrobert
551*404b540aSrobert    <p>Associative containers are parametrized by policies
552*404b540aSrobert    allowing to test key equivalence; <i>e.g.</i> a hash-based
553*404b540aSrobert    container can do this through its equivalence functor, and a
554*404b540aSrobert    tree-based container can do this through its comparison
555*404b540aSrobert    functor. In addition, some STL associative containers have
556*404b540aSrobert    global function operators, <i>e.g.</i>,
557*404b540aSrobert    <tt><b>operator</b>==</tt> and <tt><b>operator</b>&lt;=</tt>,
558*404b540aSrobert    that allow comparing entire associative containers.</p>
559*404b540aSrobert
560*404b540aSrobert    <p>In our opinion, these functions are better left out. To
561*404b540aSrobert    begin with, they do not significantly improve over an external
562*404b540aSrobert    loop. More importantly, however, they are possibly misleading -
563*404b540aSrobert    <tt><b>operator</b>==</tt>, for example, usually checks for
564*404b540aSrobert    equivalence, or interchangeability, but the associative
565*404b540aSrobert    container cannot check for values' equivalence, only keys'
566*404b540aSrobert    equivalence; also, are two containers considered equivalent if
567*404b540aSrobert    they store the same values in different order? this is an
568*404b540aSrobert    arbitrary decision.</p>
569*404b540aSrobert
570*404b540aSrobert    <h3><a name="assoc_mapping_semantics" id=
571*404b540aSrobert    "assoc_mapping_semantics">Alternative to Multiple Equivalent
572*404b540aSrobert    Keys</a></h3>
573*404b540aSrobert
574*404b540aSrobert    <p>Maps (or sets) allow mapping (or storing) unique-key values.
575*404b540aSrobert    The STL, however, also supplies associative containers which
576*404b540aSrobert    map (or store) multiple values with equivalent keys:
577*404b540aSrobert    <tt>std::multimap</tt>, <tt>std::multiset</tt>,
578*404b540aSrobert    <tt>std::tr1::unordered_multimap</tt>, and
579*404b540aSrobert    <tt>unordered_multiset</tt>. We first discuss how these might
580*404b540aSrobert    be used, then why we think it is best to avoid them.</p>
581*404b540aSrobert
582*404b540aSrobert    <p>Suppose one builds a simple bank-account application that
583*404b540aSrobert    records for each client (identified by an <tt>std::string</tt>)
584*404b540aSrobert    and account-id (marked by an <tt><b>unsigned long</b></tt>) -
585*404b540aSrobert    the balance in the account (described by a
586*404b540aSrobert    <tt><b>float</b></tt>). Suppose further that ordering this
587*404b540aSrobert    information is not useful, so a hash-based container is
588*404b540aSrobert    preferable to a tree based container. Then one can use</p>
589*404b540aSrobert    <pre>
590*404b540aSrobertstd::tr1::unordered_map&lt;std::pair&lt;std::string, <b>unsigned long</b>&gt;, <b>float</b>, ...&gt;
591*404b540aSrobert</pre>which <u>hashes every combination of client and
592*404b540aSrobertaccount-id</u>. This might work well, except for the fact that it
593*404b540aSrobertis now impossible to efficiently list all of the accounts of a
594*404b540aSrobertspecific client (this would practically require iterating over all
595*404b540aSrobertentries). Instead, one can use
596*404b540aSrobert    <pre>
597*404b540aSrobertstd::tr1::unordered_multimap&lt;std::pair&lt;std::string, <tt><b>unsigned long</b></tt>&gt;, <b>float</b>, ...&gt;
598*404b540aSrobert</pre>which <u>hashes every client</u>, and <u>decides equivalence
599*404b540aSrobertbased on client</u> only. This will ensure that all accounts
600*404b540aSrobertbelonging to a specific user are stored consecutively.
601*404b540aSrobert
602*404b540aSrobert    <p>Also, suppose one wants an integers' priority queue
603*404b540aSrobert    (<i>i.e.,</i> a container that supports <tt>push</tt>,
604*404b540aSrobert    <tt>pop</tt>, and <tt>top</tt> operations, the last of which
605*404b540aSrobert    returns the largest <tt><b>int</b></tt>) that also supports
606*404b540aSrobert    operations such as <tt>find</tt> and <tt>lower_bound</tt>. A
607*404b540aSrobert    reasonable solution is to build an adapter over
608*404b540aSrobert    <tt>std::set&lt;<b>int</b>&gt;</tt>. In this adapter,
609*404b540aSrobert    <i>e.g.</i>, <tt>push</tt> will just call the tree-based
610*404b540aSrobert    associative container's <tt>insert</tt> method; <tt>pop</tt>
611*404b540aSrobert    will call its <tt>end</tt> method, and use it to return the
612*404b540aSrobert    preceding element (which must be the largest). Then this might
613*404b540aSrobert    work well, except that the container object cannot hold
614*404b540aSrobert    multiple instances of the same integer (<tt>push(4)</tt>,
615*404b540aSrobert    <i>e.g.</i>, will be a no-op if <tt>4</tt> is already in the
616*404b540aSrobert    container object). If multiple keys are necessary, then one
617*404b540aSrobert    might build the adapter over an
618*404b540aSrobert    <tt>std::multiset&lt;<b>int</b>&gt;</tt>.</p>
619*404b540aSrobert
620*404b540aSrobert    <p class="c1">STL non-unique-mapping containers, then, are
621*404b540aSrobert    useful when (1) a key can be decomposed in to a primary key and
622*404b540aSrobert    a secondary key, (2) a key is needed multiple times, or (3) any
623*404b540aSrobert    combination of (1) and (2).</p>
624*404b540aSrobert
625*404b540aSrobert    <p>Figure <a href="#embedded_lists_1">Non-unique mapping
626*404b540aSrobert    containers in the STL's design</a> shows how the STL's design
627*404b540aSrobert    works internally; in this figure nodes shaded equally represent
628*404b540aSrobert    equivalent-key values. Equivalent keys are stored consecutively
629*404b540aSrobert    using the properties of the underlying data structure: binary
630*404b540aSrobert    search trees (Figure <a href="#embedded_lists_1">Non-unique
631*404b540aSrobert    mapping containers in the STL's design</a>-A) store
632*404b540aSrobert    equivalent-key values consecutively (in the sense of an
633*404b540aSrobert    in-order walk) naturally; collision-chaining hash tables
634*404b540aSrobert    (Figure <a href="#embedded_lists_1">Non-unique mapping
635*404b540aSrobert    containers in the STL's design</a>-B) store equivalent-key
636*404b540aSrobert    values in the same bucket, the bucket can be arranged so that
637*404b540aSrobert    equivalent-key values are consecutive.</p>
638*404b540aSrobert
639*404b540aSrobert    <h6 class="c1"><a name="embedded_lists_1" id=
640*404b540aSrobert    "embedded_lists_1"><img src="embedded_lists_1.png" alt=
641*404b540aSrobert    "no image" /></a></h6>
642*404b540aSrobert
643*404b540aSrobert    <h6 class="c1">Non-unique mapping containers in the STL's
644*404b540aSrobert    design.</h6>
645*404b540aSrobert
646*404b540aSrobert    <p>Put differently, STL non-unique mapping
647*404b540aSrobert    associative-containers are associative containers that map
648*404b540aSrobert    primary keys to linked lists that are embedded into the
649*404b540aSrobert    container. Figure <a href="#embedded_lists_2">Effect of
650*404b540aSrobert    embedded lists in STL multimaps</a> shows again the two
651*404b540aSrobert    containers from Figure <a href="#embedded_lists_1">Non-unique
652*404b540aSrobert    mapping containers in the STL's design</a>, this time with the
653*404b540aSrobert    embedded linked lists of the grayed nodes marked
654*404b540aSrobert    explicitly.</p>
655*404b540aSrobert
656*404b540aSrobert    <h6 class="c1"><a name="embedded_lists_2" id=
657*404b540aSrobert    "embedded_lists_2"><img src="embedded_lists_2.png" alt=
658*404b540aSrobert    "no image" /></a></h6>
659*404b540aSrobert
660*404b540aSrobert    <h6 class="c1">Effect of embedded lists in STL multimaps.</h6>
661*404b540aSrobert
662*404b540aSrobert    <p>These embedded linked lists have several disadvantages.</p>
663*404b540aSrobert
664*404b540aSrobert    <ol>
665*404b540aSrobert      <li>The underlying data structure embeds the linked lists
666*404b540aSrobert      according to its own consideration, which means that the
667*404b540aSrobert      search path for a value might include several different
668*404b540aSrobert      equivalent-key values. For example, the search path for the
669*404b540aSrobert      the black node in either of Figures <a href=
670*404b540aSrobert      "#embedded_lists_1">Non-unique mapping containers in the
671*404b540aSrobert      STL's design</a> A or B, includes more than a single gray
672*404b540aSrobert      node.</li>
673*404b540aSrobert
674*404b540aSrobert      <li>The links of the linked lists are the underlying
675*404b540aSrobert      data structures' nodes, which typically are quite structured.
676*404b540aSrobert      <i>E.g.</i>, in the case of tree-based containers (Figure
677*404b540aSrobert      <a href="#embedded_lists_2">Effect of embedded lists in STL
678*404b540aSrobert      multimaps</a>-B), each "link" is actually a node with three
679*404b540aSrobert      pointers (one to a parent and two to children), and a
680*404b540aSrobert      relatively-complicated iteration algorithm. The linked lists,
681*404b540aSrobert      therefore, can take up quite a lot of memory, and iterating
682*404b540aSrobert      over all values equal to a given key (<i>e.g.</i>, through
683*404b540aSrobert      the return value of the STL's <tt>equal_range</tt>) can be
684*404b540aSrobert      expensive.</li>
685*404b540aSrobert
686*404b540aSrobert      <li>The primary key is stored multiply; this uses more
687*404b540aSrobert      memory.</li>
688*404b540aSrobert
689*404b540aSrobert      <li>Finally, the interface of this design excludes several
690*404b540aSrobert      useful underlying data structures. <i>E.g.</i>, of all the
691*404b540aSrobert      unordered self-organizing data structures, practically only
692*404b540aSrobert      collision-chaining hash tables can (efficiently) guarantee
693*404b540aSrobert      that equivalent-key values are stored consecutively.</li>
694*404b540aSrobert    </ol>
695*404b540aSrobert
696*404b540aSrobert    <p>The above reasons hold even when the ratio of secondary keys
697*404b540aSrobert    to primary keys (or average number of identical keys) is small,
698*404b540aSrobert    but when it is large, there are more severe problems:</p>
699*404b540aSrobert
700*404b540aSrobert    <ol>
701*404b540aSrobert      <li>The underlying data structures order the links inside
702*404b540aSrobert      each embedded linked-lists according to their internal
703*404b540aSrobert      considerations, which effectively means that each of the
704*404b540aSrobert      links is unordered. Irrespective of the underlying
705*404b540aSrobert      data structure, searching for a specific value can degrade to
706*404b540aSrobert      linear complexity.</li>
707*404b540aSrobert
708*404b540aSrobert      <li>Similarly to the above point, it is impossible to apply
709*404b540aSrobert      to the secondary keys considerations that apply to primary
710*404b540aSrobert      keys. For example, it is not possible to maintain secondary
711*404b540aSrobert      keys by sorted order.</li>
712*404b540aSrobert
713*404b540aSrobert      <li>While the interface "understands" that all equivalent-key
714*404b540aSrobert      values constitute a distinct list (<i>e.g.</i>, through
715*404b540aSrobert      <tt>equal_range</tt>), the underlying data structure
716*404b540aSrobert      typically does not. This means, <i>e.g.</i>, that operations
717*404b540aSrobert      such as erasing from a tree-based container all values whose
718*404b540aSrobert      keys are equivalent to a a given key can be super-linear in
719*404b540aSrobert      the size of the tree; this is also true also for several
720*404b540aSrobert      other operations that target a specific list.</li>
721*404b540aSrobert    </ol>
722*404b540aSrobert
723*404b540aSrobert    <p>In <tt>pb_ds</tt>, therefore, all associative containers map
724*404b540aSrobert    (or store) unique-key values. One can (1) map primary keys to
725*404b540aSrobert    secondary associative-containers (<i>i.e.</i>, containers of
726*404b540aSrobert    secondary keys) or non-associative containers (2) map identical
727*404b540aSrobert    keys to a size-type representing the number of times they
728*404b540aSrobert    occur, or (3) any combination of (1) and (2). Instead of
729*404b540aSrobert    allowing multiple equivalent-key values, <tt>pb_ds</tt>
730*404b540aSrobert    supplies associative containers based on underlying
731*404b540aSrobert    data structures that are suitable as secondary
732*404b540aSrobert    associative-containers (see <a href=
733*404b540aSrobert    "assoc_performance_tests.html#msc">Associative-Container
734*404b540aSrobert    Performance Tests::Observations::Mapping-Semantics
735*404b540aSrobert    Considerations</a>).</p>
736*404b540aSrobert
737*404b540aSrobert    <p>Figures <a href="#embedded_lists_3">Non-unique mapping
738*404b540aSrobert    containers in <tt>pb_ds</tt></a> A and B show the equivalent
739*404b540aSrobert    structures in <tt>pb_ds</tt>'s design, to those in Figures
740*404b540aSrobert    <a href="#embedded_lists_1">Non-unique mapping containers in
741*404b540aSrobert    the STL's design</a> A and B, respectively. Each shaded box
742*404b540aSrobert    represents some size-type or secondary
743*404b540aSrobert    associative-container.</p>
744*404b540aSrobert
745*404b540aSrobert    <h6 class="c1"><a name="embedded_lists_3" id=
746*404b540aSrobert    "embedded_lists_3"><img src="embedded_lists_3.png" alt=
747*404b540aSrobert    "no image" /></a></h6>
748*404b540aSrobert
749*404b540aSrobert    <h6 class="c1">Non-unique mapping containers in the
750*404b540aSrobert    <tt>pb_ds</tt>.</h6>
751*404b540aSrobert
752*404b540aSrobert    <p>In the first example above, then, one would use an
753*404b540aSrobert    associative container mapping each user to an associative
754*404b540aSrobert    container which maps each application id to a start time (see
755*404b540aSrobert    <a href=
756*404b540aSrobert    "../../../../testsuite/ext/pb_ds/example/basic_multimap.cc"><tt>basic_multimap.cc</tt></a>);
757*404b540aSrobert    in the second example, one would use an associative container
758*404b540aSrobert    mapping each <tt><b>int</b></tt> to some size-type indicating
759*404b540aSrobert    the number of times it logically occurs (see <a href=
760*404b540aSrobert    "../../../../testsuite/ext/pb_ds/example/basic_multiset.cc"><tt>basic_multiset.cc</tt></a>).</p>
761*404b540aSrobert
762*404b540aSrobert    <p><a href=
763*404b540aSrobert    "assoc_performance_tests.html#multimaps">Associative-Container
764*404b540aSrobert    Performance Tests::Multimaps</a> quantifies some of these
765*404b540aSrobert    points, and <a href=
766*404b540aSrobert    "assoc_performance_tests.html#msc">Associative-Container
767*404b540aSrobert    Performance Tests::Observations::Mapping-Semantics
768*404b540aSrobert    Considerations</a> shows some simple calculations.</p>
769*404b540aSrobert
770*404b540aSrobert    <p><a href="assoc_examples.html#mmaps">Associative-Container
771*404b540aSrobert    Examples::Multimaps</a> shows some simple examples of using
772*404b540aSrobert    "multimaps".</p>
773*404b540aSrobert
774*404b540aSrobert    <p><a href="lu_based_containers.html">Design::Associative
775*404b540aSrobert    Containers::List-Based Containers</a> discusses types of
776*404b540aSrobert    containers especially suited as secondary
777*404b540aSrobert    associative-containers.</p>
778*404b540aSrobert
779*404b540aSrobert    <h2><a name="pq" id="pq">Priority Queues</a></h2>
780*404b540aSrobert
781*404b540aSrobert    <h3><a name="pq_more_ops" id="pq_more_ops">Slightly Different
782*404b540aSrobert    Methods</a></h3>
783*404b540aSrobert
784*404b540aSrobert    <p>Priority queues are containers that allow efficiently
785*404b540aSrobert    inserting values and accessing the maximal value (in the sense
786*404b540aSrobert    of the container's comparison functor); <i>i.e.</i>, their
787*404b540aSrobert    interface supports <tt>push</tt> and <tt>pop</tt>. The STL's
788*404b540aSrobert    priority queues indeed support these methods, but they support
789*404b540aSrobert    little else. For algorithmic and software-engineering purposes,
790*404b540aSrobert    other methods are needed:</p>
791*404b540aSrobert
792*404b540aSrobert    <ol>
793*404b540aSrobert      <li>Many graph algorithms [<a href=
794*404b540aSrobert      "references.html#clrs2001">clrs2001</a>] require increasing a
795*404b540aSrobert      value in a priority queue (again, in the sense of the
796*404b540aSrobert      container's comparison functor), or joining two
797*404b540aSrobert      priority-queue objects.</li>
798*404b540aSrobert
799*404b540aSrobert      <li>It is sometimes necessary to erase an arbitrary value in
800*404b540aSrobert      a priority queue. For example, consider the <tt>select</tt>
801*404b540aSrobert      function for monitoring file descriptors:
802*404b540aSrobert        <pre>
803*404b540aSrobert<b>int</b>
804*404b540aSrobert    select
805*404b540aSrobert    (<b>int</b> nfds,
806*404b540aSrobert        fd_set *readfds,
807*404b540aSrobert        fd_set *writefds,
808*404b540aSrobert        fd_set *errorfds,
809*404b540aSrobert        <b>struct</b> timeval *timeout);
810*404b540aSrobert</pre>then, as the <tt>select</tt> manual page [<a href=
811*404b540aSrobert"references.html#select_man">select_man</a>] states:
812*404b540aSrobert
813*404b540aSrobert        <p><q>The nfds argument specifies the range of file
814*404b540aSrobert        descriptors to be tested. The select() function tests file
815*404b540aSrobert        descriptors in the range of 0 to nfds-1.</q></p>
816*404b540aSrobert
817*404b540aSrobert        <p>It stands to reason, therefore, that we might wish to
818*404b540aSrobert        maintain a minimal value for <tt>nfds</tt>, and priority
819*404b540aSrobert        queues immediately come to mind. Note, though, that when a
820*404b540aSrobert        socket is closed, the minimal file description might
821*404b540aSrobert        change; in the absence of an efficient means to erase an
822*404b540aSrobert        arbitrary value from a priority queue, we might as well
823*404b540aSrobert        avoid its use altogether.</p>
824*404b540aSrobert
825*404b540aSrobert        <p><a href="pq_examples.html#xref">Priority-Queue
826*404b540aSrobert        Examples::Cross-Referencing</a> shows examples for these
827*404b540aSrobert        types of operations.</p>
828*404b540aSrobert      </li>
829*404b540aSrobert
830*404b540aSrobert      <li>STL containers typically support iterators. It is
831*404b540aSrobert      somewhat unusual for <tt>std::priority_queue</tt> to omit
832*404b540aSrobert      them (see, <i>e.g.</i>, [<a href=
833*404b540aSrobert      "references.html#meyers01stl">meyers01stl</a>]). One might
834*404b540aSrobert      ask why do priority queues need to support iterators, since
835*404b540aSrobert      they are self-organizing containers with a different purpose
836*404b540aSrobert      than abstracting sequences. There are several reasons:
837*404b540aSrobert
838*404b540aSrobert        <ol>
839*404b540aSrobert          <li>Iterators (even in self-organizing containers) are
840*404b540aSrobert          useful for many purposes, <i>e.g.</i>, cross-referencing
841*404b540aSrobert          containers, serialization, and debugging code that uses
842*404b540aSrobert          these containers.</li>
843*404b540aSrobert
844*404b540aSrobert          <li>The STL's hash-based containers support iterators,
845*404b540aSrobert          even though they too are self-organizing containers with
846*404b540aSrobert          a different purpose than abstracting sequences.</li>
847*404b540aSrobert
848*404b540aSrobert          <li>In STL-like containers, it is natural to specify the
849*404b540aSrobert          interface of operations for modifying a value or erasing
850*404b540aSrobert          a value (discussed previously) in terms of a iterators.
851*404b540aSrobert          This is discussed further in <a href=
852*404b540aSrobert          "pq_design.html#pq_it">Design::Priority
853*404b540aSrobert          Queues::Iterators</a>. It should be noted that the STL's
854*404b540aSrobert          containers also use iterators for accessing and
855*404b540aSrobert          manipulating a specific value. <i>E.g.</i>, in hash-based
856*404b540aSrobert          containers, one checks the existence of a key by
857*404b540aSrobert          comparing the iterator returned by <tt>find</tt> to the
858*404b540aSrobert          iterator returned by <tt>end</tt>, and not by comparing a
859*404b540aSrobert          pointer returned by <tt>find</tt> to <tt>NULL</tt>.</li>
860*404b540aSrobert        </ol>
861*404b540aSrobert      </li>
862*404b540aSrobert    </ol>
863*404b540aSrobert
864*404b540aSrobert    <p><a href="pq_performance_tests.html">Performance
865*404b540aSrobert    Tests::Priority Queues</a> quantifies some of these points.</p>
866*404b540aSrobert
867*404b540aSrobert    <h3><a name="pq_ds_genericity" id="pq_ds_genericity">More Data
868*404b540aSrobert    Structures and Traits</a></h3>
869*404b540aSrobert
870*404b540aSrobert    <p>There are three main implementations of priority queues: the
871*404b540aSrobert    first employs a binary heap, typically one which uses a
872*404b540aSrobert    sequence; the second uses a tree (or forest of trees), which is
873*404b540aSrobert    typically less structured than an associative container's tree;
874*404b540aSrobert    the third simply uses an associative container. These are
875*404b540aSrobert    shown, respectively, in Figures <a href=
876*404b540aSrobert    "#pq_different_underlying_dss">Underlying Priority-Queue
877*404b540aSrobert    Data-Structures</a> A1 and A2, B, and C.</p>
878*404b540aSrobert
879*404b540aSrobert    <h6 class="c1"><a name="pq_different_underlying_dss" id=
880*404b540aSrobert    "pq_different_underlying_dss"><img src=
881*404b540aSrobert    "pq_different_underlying_dss.png" alt="no image" /></a></h6>
882*404b540aSrobert
883*404b540aSrobert    <h6 class="c1">Underlying Priority-Queue Data-Structures.</h6>
884*404b540aSrobert
885*404b540aSrobert    <p>No single implementation can completely replace any of the
886*404b540aSrobert    others. Some have better <tt>push</tt> and <tt>pop</tt>
887*404b540aSrobert    amortized performance, some have better bounded (worst case)
888*404b540aSrobert    response time than others, some optimize a single method at the
889*404b540aSrobert    expense of others, <i>etc.</i>. In general the "best"
890*404b540aSrobert    implementation is dictated by the problem (see <a href=
891*404b540aSrobert    "pq_performance_tests.html#pq_observations">Performance
892*404b540aSrobert    Tests::Priority Queues::Observations</a>).</p>
893*404b540aSrobert
894*404b540aSrobert    <p>As with associative containers (see <a href=
895*404b540aSrobert    "#assoc_ds_genericity">Associative Containers::Traits for
896*404b540aSrobert    Underlying Data-Structures</a>), the more implementations
897*404b540aSrobert    co-exist, the more necessary a traits mechanism is for handling
898*404b540aSrobert    generic containers safely and efficiently. This is especially
899*404b540aSrobert    important for priority queues, since the invalidation
900*404b540aSrobert    guarantees of one of the most useful data structures - binary
901*404b540aSrobert    heaps - is markedly different than those of most of the
902*404b540aSrobert    others.</p>
903*404b540aSrobert
904*404b540aSrobert    <p><a href="pq_design.html#pq_traits">Design::Priority
905*404b540aSrobert    Queues::Traits</a> discusses this further.</p>
906*404b540aSrobert
907*404b540aSrobert    <h3><a name="pq_binary_heap" id="pq_binary_heap">Binary Heap
908*404b540aSrobert    Implementation</a></h3>
909*404b540aSrobert
910*404b540aSrobert    <p>Binary heaps are one of the most useful underlying
911*404b540aSrobert    data structures for priority queues. They are very efficient in
912*404b540aSrobert    terms of memory (since they don't require per-value structure
913*404b540aSrobert    metadata), and have the best amortized <tt>push</tt> and
914*404b540aSrobert    <tt>pop</tt> performance for primitive types (<i>e.g.</i>,
915*404b540aSrobert    <tt><b>int</b></tt>s).</p>
916*404b540aSrobert
917*404b540aSrobert    <p>The STL's <tt>priority_queue</tt> implements this data
918*404b540aSrobert    structure as an adapter over a sequence, typically
919*404b540aSrobert    <tt>std::vector</tt> or <tt>std::deque</tt>, which correspond
920*404b540aSrobert    to Figures <a href="#pq_different_underlying_dss">Underlying
921*404b540aSrobert    Priority-Queue Data-Structures</a> A1 and A2, respectively.</p>
922*404b540aSrobert
923*404b540aSrobert    <p>This is indeed an elegant example of the adapter concept and
924*404b540aSrobert    the algorithm/container/iterator decomposition (see [<a href=
925*404b540aSrobert    "references.html#nelson96stlpq">nelson96stlpql</a>]). There are
926*404b540aSrobert    possibly reasons, however, why a binary-heap priority queue
927*404b540aSrobert    would be better implemented as a container instead of a
928*404b540aSrobert    sequence adapter:</p>
929*404b540aSrobert
930*404b540aSrobert    <ol>
931*404b540aSrobert      <li><tt>std::priority_queue</tt> cannot erase values from its
932*404b540aSrobert      adapted sequence (irrespective of the sequence type). This
933*404b540aSrobert      means that the memory use of an <tt>std::priority_queue</tt>
934*404b540aSrobert      object is always proportional to the maximal number of values
935*404b540aSrobert      it ever contained, and not to the number of values that it
936*404b540aSrobert      currently contains (see <a href=
937*404b540aSrobert      "priority_queue_text_pop_mem_usage_test.html">Priority Queue
938*404b540aSrobert      Text <tt>pop</tt> Memory Use Test</a>); this implementation
939*404b540aSrobert      of binary heaps acts very differently than other underlying
940*404b540aSrobert      data structures (<i>e.g.</i>, pairing heaps).</li>
941*404b540aSrobert
942*404b540aSrobert      <li>Some combinations of adapted sequences and value types
943*404b540aSrobert      are very inefficient or just don't make sense. If one uses
944*404b540aSrobert      <tt>std::priority_queue&lt;std::vector&lt;std::string&gt;
945*404b540aSrobert      &gt; &gt;</tt>, for example, then not only will each
946*404b540aSrobert      operation perform a logarithmic number of
947*404b540aSrobert      <tt>std::string</tt> assignments, but, furthermore, any
948*404b540aSrobert      operation (including <tt>pop</tt>) can render the container
949*404b540aSrobert      useless due to exceptions. Conversely, if one uses
950*404b540aSrobert      <tt>std::priority_queue&lt;std::deque&lt;<b>int</b>&gt; &gt;
951*404b540aSrobert      &gt;</tt>, then each operation uses incurs a logarithmic
952*404b540aSrobert      number of indirect accesses (through pointers) unnecessarily.
953*404b540aSrobert      It might be better to let the container make a conservative
954*404b540aSrobert      deduction whether to use the structure in Figures <a href=
955*404b540aSrobert      "#pq_different_underlying_dss">Underlying Priority-Queue
956*404b540aSrobert      Data-Structures</a> A1 or A2.</li>
957*404b540aSrobert
958*404b540aSrobert      <li>There does not seem to be a systematic way to determine
959*404b540aSrobert      what exactly can be done with the priority queue.
960*404b540aSrobert
961*404b540aSrobert        <ol>
962*404b540aSrobert          <li>If <tt>p</tt> is a priority queue adapting an
963*404b540aSrobert          <tt>std::vector</tt>, then it is possible to iterate over
964*404b540aSrobert          all values by using <tt>&amp;p.top()</tt> and
965*404b540aSrobert          <tt>&amp;p.top() + p.size()</tt>, but this will not work
966*404b540aSrobert          if <tt>p</tt> is adapting an <tt>std::deque</tt>; in any
967*404b540aSrobert          case, one cannot use <tt>p.begin()</tt> and
968*404b540aSrobert          <tt>p.end()</tt>. If a different sequence is adapted, it
969*404b540aSrobert          is even more difficult to determine what can be
970*404b540aSrobert          done.</li>
971*404b540aSrobert
972*404b540aSrobert          <li>If <tt>p</tt> is a priority queue adapting an
973*404b540aSrobert          <tt>std::deque</tt>, then the reference return by
974*404b540aSrobert          <tt>p.top()</tt> will remain valid until it is popped,
975*404b540aSrobert          but if <tt>p</tt> adapts an <tt>std::vector</tt>, the
976*404b540aSrobert          next <tt>push</tt> will invalidate it. If a different
977*404b540aSrobert          sequence is adapted, it is even more difficult to
978*404b540aSrobert          determine what can be done.</li>
979*404b540aSrobert        </ol>
980*404b540aSrobert      </li>
981*404b540aSrobert
982*404b540aSrobert      <li>Sequence-based binary heaps can still implement
983*404b540aSrobert      linear-time <tt>erase</tt> and <tt>modify</tt> operations.
984*404b540aSrobert      This means that if one needs, <i>e.g.</i>, to erase a small
985*404b540aSrobert      (say logarithmic) number of values, then one might still
986*404b540aSrobert      choose this underlying data structure. Using
987*404b540aSrobert      <tt>std::priority_queue</tt>, however, this will generally
988*404b540aSrobert      change the order of growth of the entire sequence of
989*404b540aSrobert      operations.</li>
990*404b540aSrobert    </ol>
991*404b540aSrobert  </div>
992*404b540aSrobert</body>
993*404b540aSrobert</html>
994