1*404b540aSrobert<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 2*404b540aSrobert "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3*404b540aSrobert 4*404b540aSrobert<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 5*404b540aSrobert<head> 6*404b540aSrobert <meta name="generator" content= 7*404b540aSrobert "HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" /> 8*404b540aSrobert 9*404b540aSrobert <title>Motivation</title> 10*404b540aSrobert <meta http-equiv="Content-Type" content= 11*404b540aSrobert "text/html; charset=us-ascii" /> 12*404b540aSrobert </head> 13*404b540aSrobert 14*404b540aSrobert<body> 15*404b540aSrobert <div id="page"> 16*404b540aSrobert <h1>Motivation</h1> 17*404b540aSrobert 18*404b540aSrobert <p>Many fine associative-container libraries were already 19*404b540aSrobert written, most notably, the STL's associative containers. Why 20*404b540aSrobert then write another library? This section shows some possible 21*404b540aSrobert advantages of this library, when considering the challenges in 22*404b540aSrobert <a href="introduction.html">Introduction</a>. Many of these 23*404b540aSrobert points stem from the fact that the STL introduced 24*404b540aSrobert associative-containers in a two-step process (first 25*404b540aSrobert standardizing tree-based containers, only then adding 26*404b540aSrobert hash-based containers, which are fundamentally different), did 27*404b540aSrobert not standardize priority queues as containers, and (in our 28*404b540aSrobert opinion) overloads the iterator concept.</p> 29*404b540aSrobert 30*404b540aSrobert <h2><a name="assoc" id="assoc">Associative Containers</a></h2> 31*404b540aSrobert 32*404b540aSrobert <h3><a name="assoc_policies" id="assoc_policies">More 33*404b540aSrobert Configuration Choices</a></h3> 34*404b540aSrobert 35*404b540aSrobert <p>Associative containers require a relatively large number of 36*404b540aSrobert policies to function efficiently in various settings. In some 37*404b540aSrobert cases this is needed for making their common operations more 38*404b540aSrobert efficient, and in other cases this allows them to support a 39*404b540aSrobert larger set of operations</p> 40*404b540aSrobert 41*404b540aSrobert <ol> 42*404b540aSrobert <li>Hash-based containers, for example, support look-up and 43*404b540aSrobert insertion methods (<i>e.g.</i>, <tt>find</tt> and 44*404b540aSrobert <tt>insert</tt>). In order to locate elements quickly, they 45*404b540aSrobert are supplied a hash functor, which instruct how to transform 46*404b540aSrobert a key object into some size type; <i>e.g.</i>, a hash functor 47*404b540aSrobert might transform <tt>"hello"</tt> into <tt>1123002298</tt>. A 48*404b540aSrobert hash table, though, requires transforming each key object 49*404b540aSrobert into some size-type type in some specific domain; 50*404b540aSrobert <i>e.g.</i>, a hash table with a 128-long table might 51*404b540aSrobert transform <tt>"hello"</tt> into position 63. The policy by 52*404b540aSrobert which the hash value is transformed into a position within 53*404b540aSrobert the table can dramatically affect performance (see <a href= 54*404b540aSrobert "hash_based_containers.html#hash_policies">Design::Associative 55*404b540aSrobert Containers::Hash-Based Containers::Hash Policies</a>). 56*404b540aSrobert Hash-based containers also do not resize naturally (as 57*404b540aSrobert opposed to tree-based containers, for example). The 58*404b540aSrobert appropriate resize policy is unfortunately intertwined with 59*404b540aSrobert the policy that transforms hash value into a position within 60*404b540aSrobert the table (see <a href= 61*404b540aSrobert "hash_based_containers.html#resize_policies">Design::Associative 62*404b540aSrobert Containers::Hash-Based Containers::Resize Policies</a>). 63*404b540aSrobert 64*404b540aSrobert <p><a href= 65*404b540aSrobert "assoc_performance_tests.html#hash_based">Associative-Container 66*404b540aSrobert Performance Tests::Hash-Based Containers</a> quantifies 67*404b540aSrobert some of these points.</p> 68*404b540aSrobert </li> 69*404b540aSrobert 70*404b540aSrobert <li>Tree-based containers, for example, also support look-up 71*404b540aSrobert and insertion methods, and are primarily useful when 72*404b540aSrobert maintaining order between elements is important. In some 73*404b540aSrobert cases, though, one can utilize their balancing algorithms for 74*404b540aSrobert completely different purposes. 75*404b540aSrobert 76*404b540aSrobert <p>Figure <a href="#node_invariants">Metadata for 77*404b540aSrobert order-statistics and interval intersections</a>-A, for 78*404b540aSrobert example, shows a tree whose each node contains two entries: 79*404b540aSrobert a floating-point key, and some size-type <i>metadata</i> 80*404b540aSrobert (in bold beneath it) that is the number of nodes in the 81*404b540aSrobert sub-tree. (<i>E.g.</i>, the root has key 0.99, and has 5 82*404b540aSrobert nodes (including itself) in its sub-tree.) A container based 83*404b540aSrobert on this data structure can obviously answer efficiently 84*404b540aSrobert whether 0.3 is in the container object, but it can also 85*404b540aSrobert answer what is the order of 0.3 among all those in the 86*404b540aSrobert container object [<a href= 87*404b540aSrobert "references.html#clrs2001">clrs2001</a>] (see <a href= 88*404b540aSrobert "assoc_examples.html#tree_like_based">Associative Container 89*404b540aSrobert Examples::Tree-Like-Based Containers (Trees and 90*404b540aSrobert Tries)</a>).</p> 91*404b540aSrobert 92*404b540aSrobert <p>As another example, Figure <a href= 93*404b540aSrobert "#node_invariants">Metadata for order-statistics and 94*404b540aSrobert interval intersections</a>-B shows a tree whose each node 95*404b540aSrobert contains two entries: a half-open geometric line interval, 96*404b540aSrobert and a number <i>metadata</i> (in bold beneath it) that is 97*404b540aSrobert the largest endpoint of all intervals in its sub-tree. 98*404b540aSrobert (<i>E.g.</i>, the root describes the interval <i>[20, 99*404b540aSrobert 36)</i>, and the largest endpoint in its sub-tree is 99.) A 100*404b540aSrobert container based on this data structure can obviously answer 101*404b540aSrobert efficiently whether <i>[3, 41)</i> is in the container 102*404b540aSrobert object, but it can also answer efficiently whether the 103*404b540aSrobert container object has intervals that intersect <i>[3, 104*404b540aSrobert 41)</i> (see <a href= 105*404b540aSrobert "assoc_examples.html#tree_like_based">Associative Container 106*404b540aSrobert Examples::Tree-Like-Based Containers (Trees and 107*404b540aSrobert Tries)</a>). These types of queries are very useful in 108*404b540aSrobert geometric algorithms and lease-management algorithms.</p> 109*404b540aSrobert 110*404b540aSrobert <p>It is important to note, however, that as the trees are 111*404b540aSrobert modified, their internal structure changes. To maintain 112*404b540aSrobert these invariants, one must supply some policy that is aware 113*404b540aSrobert of these changes (see <a href= 114*404b540aSrobert "tree_based_containers.html#invariants">Design::Associative 115*404b540aSrobert Containers::Tree-Based Containers::Node Invariants</a>); 116*404b540aSrobert without this, it would be better to use a linked list (in 117*404b540aSrobert itself very efficient for these purposes).</p> 118*404b540aSrobert 119*404b540aSrobert <p><a href= 120*404b540aSrobert "assoc_performance_tests.html#tree_like_based">Associative-Container 121*404b540aSrobert Performance Tests::Tree-Like-Based Containers</a> 122*404b540aSrobert quantifies some of these points.</p> 123*404b540aSrobert </li> 124*404b540aSrobert </ol> 125*404b540aSrobert 126*404b540aSrobert <h6 class="c1"><a name="node_invariants" id= 127*404b540aSrobert "node_invariants"><img src="node_invariants.png" alt= 128*404b540aSrobert "no image" /></a></h6> 129*404b540aSrobert 130*404b540aSrobert <h6 class="c1">Metadata for order-statistics and interval 131*404b540aSrobert intersections.</h6> 132*404b540aSrobert 133*404b540aSrobert <h3><a name="assoc_ds_genericity" id="assoc_ds_genericity">More 134*404b540aSrobert Data Structures and Traits</a></h3> 135*404b540aSrobert 136*404b540aSrobert <p>The STL contains associative containers based on red-black 137*404b540aSrobert trees and collision-chaining hash tables. These are obviously 138*404b540aSrobert very useful, but they are not ideal for all types of 139*404b540aSrobert settings.</p> 140*404b540aSrobert 141*404b540aSrobert <p>Figure <a href= 142*404b540aSrobert "#different_underlying_data_structures">Different underlying 143*404b540aSrobert data structures</a> shows different underlying data structures 144*404b540aSrobert (the ones currently supported in <tt>pb_ds</tt>). A shows a 145*404b540aSrobert collision-chaining hash-table, B shows a probing hash-table, C 146*404b540aSrobert shows a red-black tree, D shows a splay tree, E shows a tree 147*404b540aSrobert based on an ordered vector(implicit in the order of the 148*404b540aSrobert elements), F shows a PATRICIA trie, and G shows a list-based 149*404b540aSrobert container with update policies.</p> 150*404b540aSrobert 151*404b540aSrobert <p>Each of these data structures has some performance benefits, 152*404b540aSrobert in terms of speed, size or both (see <a href= 153*404b540aSrobert "assoc_performance_tests.html">Associative-Container 154*404b540aSrobert Performance Tests</a> and <a href= 155*404b540aSrobert "assoc_performance_tests.html#dss_family_choice">Associative-Container 156*404b540aSrobert Performance Tests::Observations::Underlying Data-Structure 157*404b540aSrobert Families</a>). For now, though, note that <i>e.g.</i>, 158*404b540aSrobert vector-based trees and probing hash tables manipulate memory 159*404b540aSrobert more efficiently than red-black trees and collision-chaining 160*404b540aSrobert hash tables, and that list-based associative containers are 161*404b540aSrobert very useful for constructing "multimaps" (see <a href= 162*404b540aSrobert "#assoc_mapping_semantics">Alternative to Multiple Equivalent 163*404b540aSrobert Keys</a>, <a href= 164*404b540aSrobert "assoc_performance_tests.html#multimaps">Associative Container 165*404b540aSrobert Performance Tests::Multimaps</a>, and <a href= 166*404b540aSrobert "assoc_performance_tests.html#msc">Associative Container 167*404b540aSrobert Performance Tests::Observations::Mapping-Semantics 168*404b540aSrobert Considerations</a>).</p> 169*404b540aSrobert 170*404b540aSrobert <h6 class="c1"><a name="different_underlying_data_structures" 171*404b540aSrobert id="different_underlying_data_structures"><img src= 172*404b540aSrobert "different_underlying_dss.png" alt="no image" /></a></h6> 173*404b540aSrobert 174*404b540aSrobert <h6 class="c1">Different underlying data structures.</h6> 175*404b540aSrobert 176*404b540aSrobert <p>Now consider a function manipulating a generic associative 177*404b540aSrobert container, <i>e.g.</i>,</p> 178*404b540aSrobert <pre> 179*404b540aSrobert<b>template</b>< 180*404b540aSrobert <b>class</b> Cntnr> 181*404b540aSrobert<b>int</b> 182*404b540aSrobert some_op_sequence 183*404b540aSrobert (Cntnr &r_cnt) 184*404b540aSrobert{ 185*404b540aSrobert ... 186*404b540aSrobert} 187*404b540aSrobert</pre> 188*404b540aSrobert 189*404b540aSrobert <p>Ideally, the underlying data structure of <tt>Cntnr</tt> 190*404b540aSrobert would not affect what can be done with <tt>r_cnt</tt>. 191*404b540aSrobert Unfortunately, this is not the case.</p> 192*404b540aSrobert 193*404b540aSrobert <p>For example, if <tt>Cntnr</tt> is <tt>std::map</tt>, then 194*404b540aSrobert the function can use <tt>std::for_each(r_cnt.find(foo), 195*404b540aSrobert r_cnt.find(bar), foobar)</tt> in order to apply <tt>foobar</tt> 196*404b540aSrobert to all elements between <tt>foo</tt> and <tt>bar</tt>. If 197*404b540aSrobert <tt>Cntnr</tt> is a hash-based container, then this call's 198*404b540aSrobert results are undefined.</p> 199*404b540aSrobert 200*404b540aSrobert <p>Also, if <tt>Cntnr</tt> is tree-based, the type and object 201*404b540aSrobert of the comparison functor can be accessed. If <tt>Cntnr</tt> is 202*404b540aSrobert hash based, these queries are nonsensical.</p> 203*404b540aSrobert 204*404b540aSrobert <p>There are various other differences based on the container's 205*404b540aSrobert underlying data structure. For one, they can be constructed by, 206*404b540aSrobert and queried for, different policies. Furthermore:</p> 207*404b540aSrobert 208*404b540aSrobert <ol> 209*404b540aSrobert <li>Containers based on C, D, E and F store elements in a 210*404b540aSrobert meaningful order; the others store elements in a meaningless 211*404b540aSrobert (and probably time-varying) order. By implication, only 212*404b540aSrobert containers based on C, D, E and F can support erase 213*404b540aSrobert operations taking an iterator and returning an iterator to 214*404b540aSrobert the following element without performance loss (see <a href= 215*404b540aSrobert "#assoc_ers_methods">Slightly Different Methods::Methods 216*404b540aSrobert Related to Erase</a>).</li> 217*404b540aSrobert 218*404b540aSrobert <li>Containers based on C, D, E, and F can be split and 219*404b540aSrobert joined efficiently, while the others cannot. Containers based 220*404b540aSrobert on C and D, furthermore, can guarantee that this is 221*404b540aSrobert exception-free; containers based on E cannot guarantee 222*404b540aSrobert this.</li> 223*404b540aSrobert 224*404b540aSrobert <li>Containers based on all but E can guarantee that erasing 225*404b540aSrobert an element is exception free; containers based on E cannot 226*404b540aSrobert guarantee this. Containers based on all but B and E can 227*404b540aSrobert guarantee that modifying an object of their type does not 228*404b540aSrobert invalidate iterators or references to their elements, while 229*404b540aSrobert containers based on B and E cannot. Containers based on C, D, 230*404b540aSrobert and E can furthermore make a stronger guarantee, namely that 231*404b540aSrobert modifying an object of their type does not affect the order 232*404b540aSrobert of iterators.</li> 233*404b540aSrobert </ol> 234*404b540aSrobert 235*404b540aSrobert <p>A unified tag and traits system (as used for the STL's 236*404b540aSrobert iterators, for example) can ease generic manipulation of 237*404b540aSrobert associative containers based on different underlying 238*404b540aSrobert data structures (see <a href= 239*404b540aSrobert "tutorial.html#assoc_ds_gen">Tutorial::Associative 240*404b540aSrobert Containers::Determining Containers' Attributes</a> and <a href= 241*404b540aSrobert "ds_gen.html#container_traits">Design::Associative 242*404b540aSrobert Containers::Data-Structure Genericity::Data-Structure Tags and 243*404b540aSrobert Traits</a>).</p> 244*404b540aSrobert 245*404b540aSrobert <h3><a name="assoc_diff_it" id="assoc_diff_it">Differentiating 246*404b540aSrobert between Iterator Types</a></h3> 247*404b540aSrobert 248*404b540aSrobert <p>Iterators are centric to the STL's design, because of the 249*404b540aSrobert container/algorithm/iterator decomposition that allows an 250*404b540aSrobert algorithm to operate on a range through iterators of some 251*404b540aSrobert sequence (<i>e.g.</i>, one originating from a container). 252*404b540aSrobert Iterators, then, are useful because they allow going over a 253*404b540aSrobert <u>sequence</u>. The STL also uses iterators for accessing a 254*404b540aSrobert <u>specific</u> element - <i>e.g.</i>, when an associative 255*404b540aSrobert container returns one through <tt>find</tt>. The STL, however, 256*404b540aSrobert consistently uses the same types of iterators for both 257*404b540aSrobert purposes: going over a range, and accessing a specific found 258*404b540aSrobert element. Before the introduction of hash-based containers to 259*404b540aSrobert the STL, this made sense (with the exception of priority 260*404b540aSrobert queues, which are discussed in <a href="#pq">Priority 261*404b540aSrobert Queues</a>).</p> 262*404b540aSrobert 263*404b540aSrobert <p>Using the STL's associative containers together with 264*404b540aSrobert non-order-preserving associative containers (and also because 265*404b540aSrobert of priority-queues container), there is a possible need for 266*404b540aSrobert different types of iterators for self-organizing containers - 267*404b540aSrobert the iterator concept seems overloaded to mean two different 268*404b540aSrobert things (in some cases). The following subsections explain this; 269*404b540aSrobert <a href="tutorial.html#assoc_find_range">Tutorial::Associative 270*404b540aSrobert Containers::Point-Type and Range-Type Methods</a> explains an 271*404b540aSrobert alternative design which does not complicate the use of 272*404b540aSrobert order-preserving containers, but is better for unordered 273*404b540aSrobert containers; <a href= 274*404b540aSrobert "ds_gen.html#find_range">Design::Associative 275*404b540aSrobert Containers::Data-Structure Genericity::Point-Type and 276*404b540aSrobert Range-Type Methods</a> explains the design further.</p> 277*404b540aSrobert 278*404b540aSrobert <h4><a name="assoc_find_it_range_it" id= 279*404b540aSrobert "assoc_find_it_range_it">Using Point-Type Iterators for 280*404b540aSrobert Range-Type Operations</a></h4> 281*404b540aSrobert 282*404b540aSrobert <p>Suppose <tt>cntnr</tt> is some associative container, and 283*404b540aSrobert say <tt>c</tt> is an object of type <tt>cntnr</tt>. Then what 284*404b540aSrobert will be the outcome of</p> 285*404b540aSrobert <pre> 286*404b540aSrobertstd::for_each(c.find(1), c.find(5), foo); 287*404b540aSrobert</pre> 288*404b540aSrobert 289*404b540aSrobert <p>If <tt>cntnr</tt> is a tree-based container object, then an 290*404b540aSrobert in-order walk will apply <tt>foo</tt> to the relevant elements, 291*404b540aSrobert <i>e.g.</i>, as in Figure <a href="#range_it_in_hts">Range 292*404b540aSrobert iteration in different data structures</a> -A. If <tt>c</tt> is 293*404b540aSrobert a hash-based container, then the order of elements between any 294*404b540aSrobert two elements is undefined (and probably time-varying); there is 295*404b540aSrobert no guarantee that the elements traversed will coincide with the 296*404b540aSrobert <i>logical</i> elements between 1 and 5, <i>e.g.</i>, as in 297*404b540aSrobert Figure <a href="#range_it_in_hts">Range iteration in different 298*404b540aSrobert data structures</a>-B.</p> 299*404b540aSrobert 300*404b540aSrobert <h6 class="c1"><a name="range_it_in_hts" id= 301*404b540aSrobert "range_it_in_hts"><img src="point_iterators_range_ops_1.png" 302*404b540aSrobert alt="no image" /></a></h6> 303*404b540aSrobert 304*404b540aSrobert <h6 class="c1">Range iteration in different 305*404b540aSrobert data structures.</h6> 306*404b540aSrobert 307*404b540aSrobert <p>In our opinion, this problem is not caused just because 308*404b540aSrobert red-black trees are order preserving while collision-chaining 309*404b540aSrobert hash tables are (generally) not - it is more fundamental. Most 310*404b540aSrobert of the STL's containers order sequences in a well-defined 311*404b540aSrobert manner that is determined by their <u>interface</u>: calling 312*404b540aSrobert <tt>insert</tt> on a tree-based container modifies its sequence 313*404b540aSrobert in a predictable way, as does calling <tt>push_back</tt> on a 314*404b540aSrobert list or a vector. Conversely, collision-chaining hash tables, 315*404b540aSrobert probing hash tables, priority queues, and list-based containers 316*404b540aSrobert (which are very useful for "multimaps") are self-organizing 317*404b540aSrobert data structures; the effect of each operation modifies their 318*404b540aSrobert sequences in a manner that is (practically) determined by their 319*404b540aSrobert <u>implementation</u>.</p> 320*404b540aSrobert 321*404b540aSrobert <p>Consequently, applying an algorithm to a sequence obtained 322*404b540aSrobert from most containers <u>may or may not</u> make sense, but 323*404b540aSrobert applying it to a sub-sequence of a self-organizing container 324*404b540aSrobert <u>does not</u>.</p> 325*404b540aSrobert 326*404b540aSrobert <h4><a name="assoc_range_it_for_find_it" id= 327*404b540aSrobert "assoc_range_it_for_find_it">The Cost of Enabling Range 328*404b540aSrobert Capabilities to Point-Type Iterators</a></h4> 329*404b540aSrobert 330*404b540aSrobert <p>Suppose <tt>c</tt> is some collision-chaining hash-based 331*404b540aSrobert container object, and one calls <tt>c.find(3)</tt>. Then what 332*404b540aSrobert composes the returned iterator?</p> 333*404b540aSrobert 334*404b540aSrobert <p>Figure <a href="#find_its_in_hash_tables">Point-type 335*404b540aSrobert iterators in hash tables</a>-A shows the simplest (and most 336*404b540aSrobert efficient) implementation of a collision-chaining hash table. 337*404b540aSrobert The little box marked <tt>point_iterator</tt> shows an object 338*404b540aSrobert that contains a pointer to the element's node. Note that this 339*404b540aSrobert "iterator" has no way to move to the next element (<i>i.e.</i>, 340*404b540aSrobert it cannot support <tt><b>operator</b>++</tt>). Conversely, the 341*404b540aSrobert little box marked <tt>iterator</tt> stores both a pointer to 342*404b540aSrobert the element, as well as some other information (<i>e.g.</i>, 343*404b540aSrobert the bucket number of the element). the second iterator, then, 344*404b540aSrobert is "heavier" than the first one- it requires more time and 345*404b540aSrobert space. If we were to use a different container to 346*404b540aSrobert cross-reference into this hash-table using these iterators - it 347*404b540aSrobert would take much more space. As noted in <a href= 348*404b540aSrobert "#assoc_find_it_range_it">Using Point-Type Iterators for 349*404b540aSrobert Range-Type Operations</a>, nothing much can be done by 350*404b540aSrobert incrementing these iterators, so why is this extra information 351*404b540aSrobert needed?</p> 352*404b540aSrobert 353*404b540aSrobert <p>Alternatively, one might create a collision-chaining 354*404b540aSrobert hash-table where the lists might be linked, forming a 355*404b540aSrobert monolithic total-element list, as in Figure <a href= 356*404b540aSrobert "#find_its_in_hash_tables">Point-type iterators in hash 357*404b540aSrobert tables</a>-B (this seems similar to the Dinkumware design 358*404b540aSrobert [<a href="references.html#dinkumware_stl">dinkumware_stl</a>]). 359*404b540aSrobert Here the iterators are as light as can be, but the hash-table's 360*404b540aSrobert operations are more complicated.</p> 361*404b540aSrobert 362*404b540aSrobert <h6 class="c1"><a name="find_its_in_hash_tables" id= 363*404b540aSrobert "find_its_in_hash_tables"><img src= 364*404b540aSrobert "point_iterators_range_ops_2.png" alt="no image" /></a></h6> 365*404b540aSrobert 366*404b540aSrobert <h6 class="c1">Point-type iterators in hash tables.</h6> 367*404b540aSrobert 368*404b540aSrobert <p>It should be noted that containers based on 369*404b540aSrobert collision-chaining hash-tables are not the only ones with this 370*404b540aSrobert type of behavior; many other self-organizing data structures 371*404b540aSrobert display it as well.</p> 372*404b540aSrobert 373*404b540aSrobert <h4><a name="assoc_inv_guar" id="assoc_inv_guar">Invalidation 374*404b540aSrobert Guarantees</a></h4> 375*404b540aSrobert 376*404b540aSrobert <p>Consider the following snippet:</p> 377*404b540aSrobert <pre> 378*404b540aSrobertit = c.find(3); 379*404b540aSrobert 380*404b540aSrobertc.erase(5); 381*404b540aSrobert</pre> 382*404b540aSrobert 383*404b540aSrobert <p>Following the call to <tt>erase</tt>, what is the validity 384*404b540aSrobert of <tt>it</tt>: can it be de-referenced? can it be 385*404b540aSrobert incremented?</p> 386*404b540aSrobert 387*404b540aSrobert <p>The answer depends on the underlying data structure of the 388*404b540aSrobert container. Figure <a href= 389*404b540aSrobert "#invalidation_guarantee_erase">Effect of erase in different 390*404b540aSrobert underlying data structures</a> shows three cases: A1 and A2 391*404b540aSrobert show a red-black tree; B1 and B2 show a probing hash-table; C1 392*404b540aSrobert and C2 show a collision-chaining hash table.</p> 393*404b540aSrobert 394*404b540aSrobert <h6 class="c1"><a name="invalidation_guarantee_erase" id= 395*404b540aSrobert "invalidation_guarantee_erase"><img src= 396*404b540aSrobert "invalidation_guarantee_erase.png" alt="no image" /></a></h6> 397*404b540aSrobert 398*404b540aSrobert <h6 class="c1">Effect of erase in different underlying 399*404b540aSrobert data structures.</h6> 400*404b540aSrobert 401*404b540aSrobert <ol> 402*404b540aSrobert <li>Erasing 5 from A1 yields A2. Clearly, an iterator to 3 403*404b540aSrobert can be de-referenced and incremented. The sequence of 404*404b540aSrobert iterators changed, but in a way that is well-defined by the 405*404b540aSrobert <u>interface</u>.</li> 406*404b540aSrobert 407*404b540aSrobert <li>Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is 408*404b540aSrobert not valid at all - it cannot be de-referenced or incremented; 409*404b540aSrobert the order of iterators changed in a way that is (practically) 410*404b540aSrobert determined by the <u>implementation</u> and not by the 411*404b540aSrobert <u>interface</u>.</li> 412*404b540aSrobert 413*404b540aSrobert <li>Erasing 5 from C1 yields C2. Here the situation is more 414*404b540aSrobert complicated. On the one hand, there is no problem in 415*404b540aSrobert de-referencing <tt>it</tt>. On the other hand, the order of 416*404b540aSrobert iterators changed in a way that is (practically) determined 417*404b540aSrobert by the <u>implementation</u> and not by the 418*404b540aSrobert <u>interface</u>.</li> 419*404b540aSrobert </ol> 420*404b540aSrobert 421*404b540aSrobert <p>So in classic STL, it is not always possible to express 422*404b540aSrobert whether <tt>it</tt> is valid or not. This is true also for 423*404b540aSrobert <tt>insert</tt>, <i>e.g.</i>. Again, the iterator concept seems 424*404b540aSrobert overloaded.</p> 425*404b540aSrobert 426*404b540aSrobert <h3><a name="assoc_methods" id="assoc_methods">Slightly 427*404b540aSrobert Different Methods</a></h3> 428*404b540aSrobert 429*404b540aSrobert <p>[<a href="references.html#meyers02both">meyers02both</a>] 430*404b540aSrobert points out that a class's methods should comprise only 431*404b540aSrobert operations which depend on the class's internal structure; 432*404b540aSrobert other operations are best designed as external functions. 433*404b540aSrobert Possibly, therefore, the STL's associative containers lack some 434*404b540aSrobert useful methods, and provide some other methods which would be 435*404b540aSrobert better left out (<i>e.g.</i>, [<a href= 436*404b540aSrobert "references.html#sgi_stl">sgi_stl</a>] ).</p> 437*404b540aSrobert 438*404b540aSrobert <h4><a name="assoc_ers_methods" id="assoc_ers_methods">Methods 439*404b540aSrobert Related to Erase</a></h4> 440*404b540aSrobert 441*404b540aSrobert <ol> 442*404b540aSrobert <li>Order-preserving STL associative containers provide the 443*404b540aSrobert method 444*404b540aSrobert <pre> 445*404b540aSrobertiterator 446*404b540aSrobert erase 447*404b540aSrobert (iterator it) 448*404b540aSrobert</pre>which takes an iterator, erases the corresponding element, 449*404b540aSrobertand returns an iterator to the following element. Also hash-based 450*404b540aSrobertSTL associative containers provide this method. This <u>seemingly 451*404b540aSrobertincreases</u> genericity between associative containers, since, <i> 452*404b540aSrobert e.g.</i>, it is possible to use 453*404b540aSrobert <pre> 454*404b540aSrobert<b>typename</b> C::iterator it = c.begin(); 455*404b540aSrobert<b>typename</b> C::iterator e_it = c.end(); 456*404b540aSrobert 457*404b540aSrobert<b>while</b>(it != e_it) 458*404b540aSrobert it = pred(*it)? c.erase(it) : ++it; 459*404b540aSrobert</pre>in order to erase from a container object <tt> 460*404b540aSrobert c</tt> all element which match a predicate <tt>pred</tt>. 461*404b540aSrobert However, in a different sense this actually 462*404b540aSrobert <u>decreases</u> genericity: an integral implication of 463*404b540aSrobert this method is that tree-based associative containers' 464*404b540aSrobert memory use is linear in the total number of elements they 465*404b540aSrobert store, while hash-based containers' memory use is unbounded 466*404b540aSrobert in the total number of elements they store. Assume a 467*404b540aSrobert hash-based container is allowed to decrease its size when 468*404b540aSrobert an element is erased. Then the elements might be rehashed, 469*404b540aSrobert which means that there is no "next" element - it is simply 470*404b540aSrobert undefined. Consequently, it is possible to infer from the 471*404b540aSrobert fact that STL hash-based containers provide this method 472*404b540aSrobert that they cannot downsize when elements are erased 473*404b540aSrobert (<a href="assoc_performance_tests.html#hash_based">Performance 474*404b540aSrobert Tests::Hash-Based Container Tests</a> quantifies this.) As 475*404b540aSrobert a consequence, different code is needed to manipulate 476*404b540aSrobert different containers, assuming that memory should be 477*404b540aSrobert conserved. <tt>pb_ds</tt>'s non-order preserving 478*404b540aSrobert associative containers omit this method. 479*404b540aSrobert </li> 480*404b540aSrobert 481*404b540aSrobert <li>All of <tt>pb_ds</tt>'s associative containers include a 482*404b540aSrobert conditional-erase method 483*404b540aSrobert <pre> 484*404b540aSrobert<b>template</b>< 485*404b540aSrobert <b>class</b> Pred> 486*404b540aSrobertsize_type 487*404b540aSrobert erase_if 488*404b540aSrobert (Pred pred) 489*404b540aSrobert</pre>which erases all elements matching a predicate. This is 490*404b540aSrobertprobably the only way to ensure linear-time multiple-item erase 491*404b540aSrobertwhich can actually downsize a container. 492*404b540aSrobert </li> 493*404b540aSrobert 494*404b540aSrobert <li>STL associative containers provide methods for 495*404b540aSrobert multiple-item erase of the form 496*404b540aSrobert <pre> 497*404b540aSrobertsize_type 498*404b540aSrobert erase 499*404b540aSrobert (It b, 500*404b540aSrobert It e) 501*404b540aSrobert</pre>erasing a range of elements given by a pair of iterators. For 502*404b540aSroberttree-based or trie-based containers, this can implemented more 503*404b540aSrobertefficiently as a (small) sequence of split and join operations. For 504*404b540aSrobertother, unordered, containers, this method isn't much better than an 505*404b540aSrobertexternal loop. Moreover, if <tt>c</tt> is a hash-based container, 506*404b540aSrobertthen, <i>e.g.</i>, <tt>c.erase(c.find(2), c.find(5))</tt> is almost 507*404b540aSrobertcertain to do something different than erasing all elements whose 508*404b540aSrobertkeys are between 2 and 5, and is likely to produce other undefined 509*404b540aSrobertbehavior. 510*404b540aSrobert </li> 511*404b540aSrobert </ol> 512*404b540aSrobert 513*404b540aSrobert <h4><a name="assoc_split_join_methods" id= 514*404b540aSrobert "assoc_split_join_methods">Methods Related to Split and 515*404b540aSrobert Join</a></h4> 516*404b540aSrobert 517*404b540aSrobert <p>It is well-known that tree-based and trie-based container 518*404b540aSrobert objects can be efficiently split or joined [<a href= 519*404b540aSrobert "references.html#clrs2001">clrs2001</a>]. Externally splitting 520*404b540aSrobert or joining trees is super-linear, and, furthermore, can throw 521*404b540aSrobert exceptions. Split and join methods, consequently, seem good 522*404b540aSrobert choices for tree-based container methods, especially, since as 523*404b540aSrobert noted just before, they are efficient replacements for erasing 524*404b540aSrobert sub-sequences. <a href= 525*404b540aSrobert "assoc_performance_tests.html#tree_like_based">Performance 526*404b540aSrobert Tests::Tree-Like-Based Container Tests</a> quantifies this.</p> 527*404b540aSrobert 528*404b540aSrobert <h4><a name="assoc_insert_methods" id= 529*404b540aSrobert "assoc_insert_methods">Methods Related to Insert</a></h4> 530*404b540aSrobert 531*404b540aSrobert <p>STL associative containers provide methods of the form</p> 532*404b540aSrobert <pre> 533*404b540aSrobert<b>template</b>< 534*404b540aSrobert <b>class</b> It> 535*404b540aSrobertsize_type 536*404b540aSrobert insert 537*404b540aSrobert (It b, 538*404b540aSrobert It e); 539*404b540aSrobert</pre>for inserting a range of elements given by a pair of 540*404b540aSrobertiterators. At best, this can be implemented as an external loop, 541*404b540aSrobertor, even more efficiently, as a join operation (for the case of 542*404b540aSroberttree-based or trie-based containers). Moreover, these methods seem 543*404b540aSrobertsimilar to constructors taking a range given by a pair of 544*404b540aSrobertiterators; the constructors, however, are transactional, whereas 545*404b540aSrobertthe insert methods are not; this is possibly confusing. 546*404b540aSrobert 547*404b540aSrobert <h4><a name="assoc_equiv_comp_methods" id= 548*404b540aSrobert "assoc_equiv_comp_methods">Functions Related to 549*404b540aSrobert Comparison</a></h4> 550*404b540aSrobert 551*404b540aSrobert <p>Associative containers are parametrized by policies 552*404b540aSrobert allowing to test key equivalence; <i>e.g.</i> a hash-based 553*404b540aSrobert container can do this through its equivalence functor, and a 554*404b540aSrobert tree-based container can do this through its comparison 555*404b540aSrobert functor. In addition, some STL associative containers have 556*404b540aSrobert global function operators, <i>e.g.</i>, 557*404b540aSrobert <tt><b>operator</b>==</tt> and <tt><b>operator</b><=</tt>, 558*404b540aSrobert that allow comparing entire associative containers.</p> 559*404b540aSrobert 560*404b540aSrobert <p>In our opinion, these functions are better left out. To 561*404b540aSrobert begin with, they do not significantly improve over an external 562*404b540aSrobert loop. More importantly, however, they are possibly misleading - 563*404b540aSrobert <tt><b>operator</b>==</tt>, for example, usually checks for 564*404b540aSrobert equivalence, or interchangeability, but the associative 565*404b540aSrobert container cannot check for values' equivalence, only keys' 566*404b540aSrobert equivalence; also, are two containers considered equivalent if 567*404b540aSrobert they store the same values in different order? this is an 568*404b540aSrobert arbitrary decision.</p> 569*404b540aSrobert 570*404b540aSrobert <h3><a name="assoc_mapping_semantics" id= 571*404b540aSrobert "assoc_mapping_semantics">Alternative to Multiple Equivalent 572*404b540aSrobert Keys</a></h3> 573*404b540aSrobert 574*404b540aSrobert <p>Maps (or sets) allow mapping (or storing) unique-key values. 575*404b540aSrobert The STL, however, also supplies associative containers which 576*404b540aSrobert map (or store) multiple values with equivalent keys: 577*404b540aSrobert <tt>std::multimap</tt>, <tt>std::multiset</tt>, 578*404b540aSrobert <tt>std::tr1::unordered_multimap</tt>, and 579*404b540aSrobert <tt>unordered_multiset</tt>. We first discuss how these might 580*404b540aSrobert be used, then why we think it is best to avoid them.</p> 581*404b540aSrobert 582*404b540aSrobert <p>Suppose one builds a simple bank-account application that 583*404b540aSrobert records for each client (identified by an <tt>std::string</tt>) 584*404b540aSrobert and account-id (marked by an <tt><b>unsigned long</b></tt>) - 585*404b540aSrobert the balance in the account (described by a 586*404b540aSrobert <tt><b>float</b></tt>). Suppose further that ordering this 587*404b540aSrobert information is not useful, so a hash-based container is 588*404b540aSrobert preferable to a tree based container. Then one can use</p> 589*404b540aSrobert <pre> 590*404b540aSrobertstd::tr1::unordered_map<std::pair<std::string, <b>unsigned long</b>>, <b>float</b>, ...> 591*404b540aSrobert</pre>which <u>hashes every combination of client and 592*404b540aSrobertaccount-id</u>. This might work well, except for the fact that it 593*404b540aSrobertis now impossible to efficiently list all of the accounts of a 594*404b540aSrobertspecific client (this would practically require iterating over all 595*404b540aSrobertentries). Instead, one can use 596*404b540aSrobert <pre> 597*404b540aSrobertstd::tr1::unordered_multimap<std::pair<std::string, <tt><b>unsigned long</b></tt>>, <b>float</b>, ...> 598*404b540aSrobert</pre>which <u>hashes every client</u>, and <u>decides equivalence 599*404b540aSrobertbased on client</u> only. This will ensure that all accounts 600*404b540aSrobertbelonging to a specific user are stored consecutively. 601*404b540aSrobert 602*404b540aSrobert <p>Also, suppose one wants an integers' priority queue 603*404b540aSrobert (<i>i.e.,</i> a container that supports <tt>push</tt>, 604*404b540aSrobert <tt>pop</tt>, and <tt>top</tt> operations, the last of which 605*404b540aSrobert returns the largest <tt><b>int</b></tt>) that also supports 606*404b540aSrobert operations such as <tt>find</tt> and <tt>lower_bound</tt>. A 607*404b540aSrobert reasonable solution is to build an adapter over 608*404b540aSrobert <tt>std::set<<b>int</b>></tt>. In this adapter, 609*404b540aSrobert <i>e.g.</i>, <tt>push</tt> will just call the tree-based 610*404b540aSrobert associative container's <tt>insert</tt> method; <tt>pop</tt> 611*404b540aSrobert will call its <tt>end</tt> method, and use it to return the 612*404b540aSrobert preceding element (which must be the largest). Then this might 613*404b540aSrobert work well, except that the container object cannot hold 614*404b540aSrobert multiple instances of the same integer (<tt>push(4)</tt>, 615*404b540aSrobert <i>e.g.</i>, will be a no-op if <tt>4</tt> is already in the 616*404b540aSrobert container object). If multiple keys are necessary, then one 617*404b540aSrobert might build the adapter over an 618*404b540aSrobert <tt>std::multiset<<b>int</b>></tt>.</p> 619*404b540aSrobert 620*404b540aSrobert <p class="c1">STL non-unique-mapping containers, then, are 621*404b540aSrobert useful when (1) a key can be decomposed in to a primary key and 622*404b540aSrobert a secondary key, (2) a key is needed multiple times, or (3) any 623*404b540aSrobert combination of (1) and (2).</p> 624*404b540aSrobert 625*404b540aSrobert <p>Figure <a href="#embedded_lists_1">Non-unique mapping 626*404b540aSrobert containers in the STL's design</a> shows how the STL's design 627*404b540aSrobert works internally; in this figure nodes shaded equally represent 628*404b540aSrobert equivalent-key values. Equivalent keys are stored consecutively 629*404b540aSrobert using the properties of the underlying data structure: binary 630*404b540aSrobert search trees (Figure <a href="#embedded_lists_1">Non-unique 631*404b540aSrobert mapping containers in the STL's design</a>-A) store 632*404b540aSrobert equivalent-key values consecutively (in the sense of an 633*404b540aSrobert in-order walk) naturally; collision-chaining hash tables 634*404b540aSrobert (Figure <a href="#embedded_lists_1">Non-unique mapping 635*404b540aSrobert containers in the STL's design</a>-B) store equivalent-key 636*404b540aSrobert values in the same bucket, the bucket can be arranged so that 637*404b540aSrobert equivalent-key values are consecutive.</p> 638*404b540aSrobert 639*404b540aSrobert <h6 class="c1"><a name="embedded_lists_1" id= 640*404b540aSrobert "embedded_lists_1"><img src="embedded_lists_1.png" alt= 641*404b540aSrobert "no image" /></a></h6> 642*404b540aSrobert 643*404b540aSrobert <h6 class="c1">Non-unique mapping containers in the STL's 644*404b540aSrobert design.</h6> 645*404b540aSrobert 646*404b540aSrobert <p>Put differently, STL non-unique mapping 647*404b540aSrobert associative-containers are associative containers that map 648*404b540aSrobert primary keys to linked lists that are embedded into the 649*404b540aSrobert container. Figure <a href="#embedded_lists_2">Effect of 650*404b540aSrobert embedded lists in STL multimaps</a> shows again the two 651*404b540aSrobert containers from Figure <a href="#embedded_lists_1">Non-unique 652*404b540aSrobert mapping containers in the STL's design</a>, this time with the 653*404b540aSrobert embedded linked lists of the grayed nodes marked 654*404b540aSrobert explicitly.</p> 655*404b540aSrobert 656*404b540aSrobert <h6 class="c1"><a name="embedded_lists_2" id= 657*404b540aSrobert "embedded_lists_2"><img src="embedded_lists_2.png" alt= 658*404b540aSrobert "no image" /></a></h6> 659*404b540aSrobert 660*404b540aSrobert <h6 class="c1">Effect of embedded lists in STL multimaps.</h6> 661*404b540aSrobert 662*404b540aSrobert <p>These embedded linked lists have several disadvantages.</p> 663*404b540aSrobert 664*404b540aSrobert <ol> 665*404b540aSrobert <li>The underlying data structure embeds the linked lists 666*404b540aSrobert according to its own consideration, which means that the 667*404b540aSrobert search path for a value might include several different 668*404b540aSrobert equivalent-key values. For example, the search path for the 669*404b540aSrobert the black node in either of Figures <a href= 670*404b540aSrobert "#embedded_lists_1">Non-unique mapping containers in the 671*404b540aSrobert STL's design</a> A or B, includes more than a single gray 672*404b540aSrobert node.</li> 673*404b540aSrobert 674*404b540aSrobert <li>The links of the linked lists are the underlying 675*404b540aSrobert data structures' nodes, which typically are quite structured. 676*404b540aSrobert <i>E.g.</i>, in the case of tree-based containers (Figure 677*404b540aSrobert <a href="#embedded_lists_2">Effect of embedded lists in STL 678*404b540aSrobert multimaps</a>-B), each "link" is actually a node with three 679*404b540aSrobert pointers (one to a parent and two to children), and a 680*404b540aSrobert relatively-complicated iteration algorithm. The linked lists, 681*404b540aSrobert therefore, can take up quite a lot of memory, and iterating 682*404b540aSrobert over all values equal to a given key (<i>e.g.</i>, through 683*404b540aSrobert the return value of the STL's <tt>equal_range</tt>) can be 684*404b540aSrobert expensive.</li> 685*404b540aSrobert 686*404b540aSrobert <li>The primary key is stored multiply; this uses more 687*404b540aSrobert memory.</li> 688*404b540aSrobert 689*404b540aSrobert <li>Finally, the interface of this design excludes several 690*404b540aSrobert useful underlying data structures. <i>E.g.</i>, of all the 691*404b540aSrobert unordered self-organizing data structures, practically only 692*404b540aSrobert collision-chaining hash tables can (efficiently) guarantee 693*404b540aSrobert that equivalent-key values are stored consecutively.</li> 694*404b540aSrobert </ol> 695*404b540aSrobert 696*404b540aSrobert <p>The above reasons hold even when the ratio of secondary keys 697*404b540aSrobert to primary keys (or average number of identical keys) is small, 698*404b540aSrobert but when it is large, there are more severe problems:</p> 699*404b540aSrobert 700*404b540aSrobert <ol> 701*404b540aSrobert <li>The underlying data structures order the links inside 702*404b540aSrobert each embedded linked-lists according to their internal 703*404b540aSrobert considerations, which effectively means that each of the 704*404b540aSrobert links is unordered. Irrespective of the underlying 705*404b540aSrobert data structure, searching for a specific value can degrade to 706*404b540aSrobert linear complexity.</li> 707*404b540aSrobert 708*404b540aSrobert <li>Similarly to the above point, it is impossible to apply 709*404b540aSrobert to the secondary keys considerations that apply to primary 710*404b540aSrobert keys. For example, it is not possible to maintain secondary 711*404b540aSrobert keys by sorted order.</li> 712*404b540aSrobert 713*404b540aSrobert <li>While the interface "understands" that all equivalent-key 714*404b540aSrobert values constitute a distinct list (<i>e.g.</i>, through 715*404b540aSrobert <tt>equal_range</tt>), the underlying data structure 716*404b540aSrobert typically does not. This means, <i>e.g.</i>, that operations 717*404b540aSrobert such as erasing from a tree-based container all values whose 718*404b540aSrobert keys are equivalent to a a given key can be super-linear in 719*404b540aSrobert the size of the tree; this is also true also for several 720*404b540aSrobert other operations that target a specific list.</li> 721*404b540aSrobert </ol> 722*404b540aSrobert 723*404b540aSrobert <p>In <tt>pb_ds</tt>, therefore, all associative containers map 724*404b540aSrobert (or store) unique-key values. One can (1) map primary keys to 725*404b540aSrobert secondary associative-containers (<i>i.e.</i>, containers of 726*404b540aSrobert secondary keys) or non-associative containers (2) map identical 727*404b540aSrobert keys to a size-type representing the number of times they 728*404b540aSrobert occur, or (3) any combination of (1) and (2). Instead of 729*404b540aSrobert allowing multiple equivalent-key values, <tt>pb_ds</tt> 730*404b540aSrobert supplies associative containers based on underlying 731*404b540aSrobert data structures that are suitable as secondary 732*404b540aSrobert associative-containers (see <a href= 733*404b540aSrobert "assoc_performance_tests.html#msc">Associative-Container 734*404b540aSrobert Performance Tests::Observations::Mapping-Semantics 735*404b540aSrobert Considerations</a>).</p> 736*404b540aSrobert 737*404b540aSrobert <p>Figures <a href="#embedded_lists_3">Non-unique mapping 738*404b540aSrobert containers in <tt>pb_ds</tt></a> A and B show the equivalent 739*404b540aSrobert structures in <tt>pb_ds</tt>'s design, to those in Figures 740*404b540aSrobert <a href="#embedded_lists_1">Non-unique mapping containers in 741*404b540aSrobert the STL's design</a> A and B, respectively. Each shaded box 742*404b540aSrobert represents some size-type or secondary 743*404b540aSrobert associative-container.</p> 744*404b540aSrobert 745*404b540aSrobert <h6 class="c1"><a name="embedded_lists_3" id= 746*404b540aSrobert "embedded_lists_3"><img src="embedded_lists_3.png" alt= 747*404b540aSrobert "no image" /></a></h6> 748*404b540aSrobert 749*404b540aSrobert <h6 class="c1">Non-unique mapping containers in the 750*404b540aSrobert <tt>pb_ds</tt>.</h6> 751*404b540aSrobert 752*404b540aSrobert <p>In the first example above, then, one would use an 753*404b540aSrobert associative container mapping each user to an associative 754*404b540aSrobert container which maps each application id to a start time (see 755*404b540aSrobert <a href= 756*404b540aSrobert "../../../../testsuite/ext/pb_ds/example/basic_multimap.cc"><tt>basic_multimap.cc</tt></a>); 757*404b540aSrobert in the second example, one would use an associative container 758*404b540aSrobert mapping each <tt><b>int</b></tt> to some size-type indicating 759*404b540aSrobert the number of times it logically occurs (see <a href= 760*404b540aSrobert "../../../../testsuite/ext/pb_ds/example/basic_multiset.cc"><tt>basic_multiset.cc</tt></a>).</p> 761*404b540aSrobert 762*404b540aSrobert <p><a href= 763*404b540aSrobert "assoc_performance_tests.html#multimaps">Associative-Container 764*404b540aSrobert Performance Tests::Multimaps</a> quantifies some of these 765*404b540aSrobert points, and <a href= 766*404b540aSrobert "assoc_performance_tests.html#msc">Associative-Container 767*404b540aSrobert Performance Tests::Observations::Mapping-Semantics 768*404b540aSrobert Considerations</a> shows some simple calculations.</p> 769*404b540aSrobert 770*404b540aSrobert <p><a href="assoc_examples.html#mmaps">Associative-Container 771*404b540aSrobert Examples::Multimaps</a> shows some simple examples of using 772*404b540aSrobert "multimaps".</p> 773*404b540aSrobert 774*404b540aSrobert <p><a href="lu_based_containers.html">Design::Associative 775*404b540aSrobert Containers::List-Based Containers</a> discusses types of 776*404b540aSrobert containers especially suited as secondary 777*404b540aSrobert associative-containers.</p> 778*404b540aSrobert 779*404b540aSrobert <h2><a name="pq" id="pq">Priority Queues</a></h2> 780*404b540aSrobert 781*404b540aSrobert <h3><a name="pq_more_ops" id="pq_more_ops">Slightly Different 782*404b540aSrobert Methods</a></h3> 783*404b540aSrobert 784*404b540aSrobert <p>Priority queues are containers that allow efficiently 785*404b540aSrobert inserting values and accessing the maximal value (in the sense 786*404b540aSrobert of the container's comparison functor); <i>i.e.</i>, their 787*404b540aSrobert interface supports <tt>push</tt> and <tt>pop</tt>. The STL's 788*404b540aSrobert priority queues indeed support these methods, but they support 789*404b540aSrobert little else. For algorithmic and software-engineering purposes, 790*404b540aSrobert other methods are needed:</p> 791*404b540aSrobert 792*404b540aSrobert <ol> 793*404b540aSrobert <li>Many graph algorithms [<a href= 794*404b540aSrobert "references.html#clrs2001">clrs2001</a>] require increasing a 795*404b540aSrobert value in a priority queue (again, in the sense of the 796*404b540aSrobert container's comparison functor), or joining two 797*404b540aSrobert priority-queue objects.</li> 798*404b540aSrobert 799*404b540aSrobert <li>It is sometimes necessary to erase an arbitrary value in 800*404b540aSrobert a priority queue. For example, consider the <tt>select</tt> 801*404b540aSrobert function for monitoring file descriptors: 802*404b540aSrobert <pre> 803*404b540aSrobert<b>int</b> 804*404b540aSrobert select 805*404b540aSrobert (<b>int</b> nfds, 806*404b540aSrobert fd_set *readfds, 807*404b540aSrobert fd_set *writefds, 808*404b540aSrobert fd_set *errorfds, 809*404b540aSrobert <b>struct</b> timeval *timeout); 810*404b540aSrobert</pre>then, as the <tt>select</tt> manual page [<a href= 811*404b540aSrobert"references.html#select_man">select_man</a>] states: 812*404b540aSrobert 813*404b540aSrobert <p><q>The nfds argument specifies the range of file 814*404b540aSrobert descriptors to be tested. The select() function tests file 815*404b540aSrobert descriptors in the range of 0 to nfds-1.</q></p> 816*404b540aSrobert 817*404b540aSrobert <p>It stands to reason, therefore, that we might wish to 818*404b540aSrobert maintain a minimal value for <tt>nfds</tt>, and priority 819*404b540aSrobert queues immediately come to mind. Note, though, that when a 820*404b540aSrobert socket is closed, the minimal file description might 821*404b540aSrobert change; in the absence of an efficient means to erase an 822*404b540aSrobert arbitrary value from a priority queue, we might as well 823*404b540aSrobert avoid its use altogether.</p> 824*404b540aSrobert 825*404b540aSrobert <p><a href="pq_examples.html#xref">Priority-Queue 826*404b540aSrobert Examples::Cross-Referencing</a> shows examples for these 827*404b540aSrobert types of operations.</p> 828*404b540aSrobert </li> 829*404b540aSrobert 830*404b540aSrobert <li>STL containers typically support iterators. It is 831*404b540aSrobert somewhat unusual for <tt>std::priority_queue</tt> to omit 832*404b540aSrobert them (see, <i>e.g.</i>, [<a href= 833*404b540aSrobert "references.html#meyers01stl">meyers01stl</a>]). One might 834*404b540aSrobert ask why do priority queues need to support iterators, since 835*404b540aSrobert they are self-organizing containers with a different purpose 836*404b540aSrobert than abstracting sequences. There are several reasons: 837*404b540aSrobert 838*404b540aSrobert <ol> 839*404b540aSrobert <li>Iterators (even in self-organizing containers) are 840*404b540aSrobert useful for many purposes, <i>e.g.</i>, cross-referencing 841*404b540aSrobert containers, serialization, and debugging code that uses 842*404b540aSrobert these containers.</li> 843*404b540aSrobert 844*404b540aSrobert <li>The STL's hash-based containers support iterators, 845*404b540aSrobert even though they too are self-organizing containers with 846*404b540aSrobert a different purpose than abstracting sequences.</li> 847*404b540aSrobert 848*404b540aSrobert <li>In STL-like containers, it is natural to specify the 849*404b540aSrobert interface of operations for modifying a value or erasing 850*404b540aSrobert a value (discussed previously) in terms of a iterators. 851*404b540aSrobert This is discussed further in <a href= 852*404b540aSrobert "pq_design.html#pq_it">Design::Priority 853*404b540aSrobert Queues::Iterators</a>. It should be noted that the STL's 854*404b540aSrobert containers also use iterators for accessing and 855*404b540aSrobert manipulating a specific value. <i>E.g.</i>, in hash-based 856*404b540aSrobert containers, one checks the existence of a key by 857*404b540aSrobert comparing the iterator returned by <tt>find</tt> to the 858*404b540aSrobert iterator returned by <tt>end</tt>, and not by comparing a 859*404b540aSrobert pointer returned by <tt>find</tt> to <tt>NULL</tt>.</li> 860*404b540aSrobert </ol> 861*404b540aSrobert </li> 862*404b540aSrobert </ol> 863*404b540aSrobert 864*404b540aSrobert <p><a href="pq_performance_tests.html">Performance 865*404b540aSrobert Tests::Priority Queues</a> quantifies some of these points.</p> 866*404b540aSrobert 867*404b540aSrobert <h3><a name="pq_ds_genericity" id="pq_ds_genericity">More Data 868*404b540aSrobert Structures and Traits</a></h3> 869*404b540aSrobert 870*404b540aSrobert <p>There are three main implementations of priority queues: the 871*404b540aSrobert first employs a binary heap, typically one which uses a 872*404b540aSrobert sequence; the second uses a tree (or forest of trees), which is 873*404b540aSrobert typically less structured than an associative container's tree; 874*404b540aSrobert the third simply uses an associative container. These are 875*404b540aSrobert shown, respectively, in Figures <a href= 876*404b540aSrobert "#pq_different_underlying_dss">Underlying Priority-Queue 877*404b540aSrobert Data-Structures</a> A1 and A2, B, and C.</p> 878*404b540aSrobert 879*404b540aSrobert <h6 class="c1"><a name="pq_different_underlying_dss" id= 880*404b540aSrobert "pq_different_underlying_dss"><img src= 881*404b540aSrobert "pq_different_underlying_dss.png" alt="no image" /></a></h6> 882*404b540aSrobert 883*404b540aSrobert <h6 class="c1">Underlying Priority-Queue Data-Structures.</h6> 884*404b540aSrobert 885*404b540aSrobert <p>No single implementation can completely replace any of the 886*404b540aSrobert others. Some have better <tt>push</tt> and <tt>pop</tt> 887*404b540aSrobert amortized performance, some have better bounded (worst case) 888*404b540aSrobert response time than others, some optimize a single method at the 889*404b540aSrobert expense of others, <i>etc.</i>. In general the "best" 890*404b540aSrobert implementation is dictated by the problem (see <a href= 891*404b540aSrobert "pq_performance_tests.html#pq_observations">Performance 892*404b540aSrobert Tests::Priority Queues::Observations</a>).</p> 893*404b540aSrobert 894*404b540aSrobert <p>As with associative containers (see <a href= 895*404b540aSrobert "#assoc_ds_genericity">Associative Containers::Traits for 896*404b540aSrobert Underlying Data-Structures</a>), the more implementations 897*404b540aSrobert co-exist, the more necessary a traits mechanism is for handling 898*404b540aSrobert generic containers safely and efficiently. This is especially 899*404b540aSrobert important for priority queues, since the invalidation 900*404b540aSrobert guarantees of one of the most useful data structures - binary 901*404b540aSrobert heaps - is markedly different than those of most of the 902*404b540aSrobert others.</p> 903*404b540aSrobert 904*404b540aSrobert <p><a href="pq_design.html#pq_traits">Design::Priority 905*404b540aSrobert Queues::Traits</a> discusses this further.</p> 906*404b540aSrobert 907*404b540aSrobert <h3><a name="pq_binary_heap" id="pq_binary_heap">Binary Heap 908*404b540aSrobert Implementation</a></h3> 909*404b540aSrobert 910*404b540aSrobert <p>Binary heaps are one of the most useful underlying 911*404b540aSrobert data structures for priority queues. They are very efficient in 912*404b540aSrobert terms of memory (since they don't require per-value structure 913*404b540aSrobert metadata), and have the best amortized <tt>push</tt> and 914*404b540aSrobert <tt>pop</tt> performance for primitive types (<i>e.g.</i>, 915*404b540aSrobert <tt><b>int</b></tt>s).</p> 916*404b540aSrobert 917*404b540aSrobert <p>The STL's <tt>priority_queue</tt> implements this data 918*404b540aSrobert structure as an adapter over a sequence, typically 919*404b540aSrobert <tt>std::vector</tt> or <tt>std::deque</tt>, which correspond 920*404b540aSrobert to Figures <a href="#pq_different_underlying_dss">Underlying 921*404b540aSrobert Priority-Queue Data-Structures</a> A1 and A2, respectively.</p> 922*404b540aSrobert 923*404b540aSrobert <p>This is indeed an elegant example of the adapter concept and 924*404b540aSrobert the algorithm/container/iterator decomposition (see [<a href= 925*404b540aSrobert "references.html#nelson96stlpq">nelson96stlpql</a>]). There are 926*404b540aSrobert possibly reasons, however, why a binary-heap priority queue 927*404b540aSrobert would be better implemented as a container instead of a 928*404b540aSrobert sequence adapter:</p> 929*404b540aSrobert 930*404b540aSrobert <ol> 931*404b540aSrobert <li><tt>std::priority_queue</tt> cannot erase values from its 932*404b540aSrobert adapted sequence (irrespective of the sequence type). This 933*404b540aSrobert means that the memory use of an <tt>std::priority_queue</tt> 934*404b540aSrobert object is always proportional to the maximal number of values 935*404b540aSrobert it ever contained, and not to the number of values that it 936*404b540aSrobert currently contains (see <a href= 937*404b540aSrobert "priority_queue_text_pop_mem_usage_test.html">Priority Queue 938*404b540aSrobert Text <tt>pop</tt> Memory Use Test</a>); this implementation 939*404b540aSrobert of binary heaps acts very differently than other underlying 940*404b540aSrobert data structures (<i>e.g.</i>, pairing heaps).</li> 941*404b540aSrobert 942*404b540aSrobert <li>Some combinations of adapted sequences and value types 943*404b540aSrobert are very inefficient or just don't make sense. If one uses 944*404b540aSrobert <tt>std::priority_queue<std::vector<std::string> 945*404b540aSrobert > ></tt>, for example, then not only will each 946*404b540aSrobert operation perform a logarithmic number of 947*404b540aSrobert <tt>std::string</tt> assignments, but, furthermore, any 948*404b540aSrobert operation (including <tt>pop</tt>) can render the container 949*404b540aSrobert useless due to exceptions. Conversely, if one uses 950*404b540aSrobert <tt>std::priority_queue<std::deque<<b>int</b>> > 951*404b540aSrobert ></tt>, then each operation uses incurs a logarithmic 952*404b540aSrobert number of indirect accesses (through pointers) unnecessarily. 953*404b540aSrobert It might be better to let the container make a conservative 954*404b540aSrobert deduction whether to use the structure in Figures <a href= 955*404b540aSrobert "#pq_different_underlying_dss">Underlying Priority-Queue 956*404b540aSrobert Data-Structures</a> A1 or A2.</li> 957*404b540aSrobert 958*404b540aSrobert <li>There does not seem to be a systematic way to determine 959*404b540aSrobert what exactly can be done with the priority queue. 960*404b540aSrobert 961*404b540aSrobert <ol> 962*404b540aSrobert <li>If <tt>p</tt> is a priority queue adapting an 963*404b540aSrobert <tt>std::vector</tt>, then it is possible to iterate over 964*404b540aSrobert all values by using <tt>&p.top()</tt> and 965*404b540aSrobert <tt>&p.top() + p.size()</tt>, but this will not work 966*404b540aSrobert if <tt>p</tt> is adapting an <tt>std::deque</tt>; in any 967*404b540aSrobert case, one cannot use <tt>p.begin()</tt> and 968*404b540aSrobert <tt>p.end()</tt>. If a different sequence is adapted, it 969*404b540aSrobert is even more difficult to determine what can be 970*404b540aSrobert done.</li> 971*404b540aSrobert 972*404b540aSrobert <li>If <tt>p</tt> is a priority queue adapting an 973*404b540aSrobert <tt>std::deque</tt>, then the reference return by 974*404b540aSrobert <tt>p.top()</tt> will remain valid until it is popped, 975*404b540aSrobert but if <tt>p</tt> adapts an <tt>std::vector</tt>, the 976*404b540aSrobert next <tt>push</tt> will invalidate it. If a different 977*404b540aSrobert sequence is adapted, it is even more difficult to 978*404b540aSrobert determine what can be done.</li> 979*404b540aSrobert </ol> 980*404b540aSrobert </li> 981*404b540aSrobert 982*404b540aSrobert <li>Sequence-based binary heaps can still implement 983*404b540aSrobert linear-time <tt>erase</tt> and <tt>modify</tt> operations. 984*404b540aSrobert This means that if one needs, <i>e.g.</i>, to erase a small 985*404b540aSrobert (say logarithmic) number of values, then one might still 986*404b540aSrobert choose this underlying data structure. Using 987*404b540aSrobert <tt>std::priority_queue</tt>, however, this will generally 988*404b540aSrobert change the order of growth of the entire sequence of 989*404b540aSrobert operations.</li> 990*404b540aSrobert </ol> 991*404b540aSrobert </div> 992*404b540aSrobert</body> 993*404b540aSrobert</html> 994