1<chapter xmlns="http://docbook.org/ns/docbook" version="5.0" 2 xml:id="std.containers" xreflabel="Containers"> 3<?dbhtml filename="containers.html"?> 4 5<info><title> 6 Containers 7 <indexterm><primary>Containers</primary></indexterm> 8</title> 9 <keywordset> 10 <keyword>ISO C++</keyword> 11 <keyword>library</keyword> 12 </keywordset> 13</info> 14 15 16 17<!-- Sect1 01 : Sequences --> 18<section xml:id="std.containers.sequences" xreflabel="Sequences"><info><title>Sequences</title></info> 19<?dbhtml filename="sequences.html"?> 20 21 22<section xml:id="containers.sequences.list" xreflabel="list"><info><title>list</title></info> 23<?dbhtml filename="list.html"?> 24 25 <section xml:id="sequences.list.size" xreflabel="list::size() is O(n)"><info><title>list::size() is O(n)</title></info> 26 27 <para> 28 Yes it is, and that's okay. This is a decision that we preserved 29 when we imported SGI's STL implementation. The following is 30 quoted from <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.sgi.com/tech/stl/FAQ.html">their FAQ</link>: 31 </para> 32 <blockquote> 33 <para> 34 The size() member function, for list and slist, takes time 35 proportional to the number of elements in the list. This was a 36 deliberate tradeoff. The only way to get a constant-time 37 size() for linked lists would be to maintain an extra member 38 variable containing the list's size. This would require taking 39 extra time to update that variable (it would make splice() a 40 linear time operation, for example), and it would also make the 41 list larger. Many list algorithms don't require that extra 42 word (algorithms that do require it might do better with 43 vectors than with lists), and, when it is necessary to maintain 44 an explicit size count, it's something that users can do 45 themselves. 46 </para> 47 <para> 48 This choice is permitted by the C++ standard. The standard says 49 that size() <quote>should</quote> be constant time, and 50 <quote>should</quote> does not mean the same thing as 51 <quote>shall</quote>. This is the officially recommended ISO 52 wording for saying that an implementation is supposed to do 53 something unless there is a good reason not to. 54 </para> 55 <para> 56 One implication of linear time size(): you should never write 57 </para> 58 <programlisting> 59 if (L.size() == 0) 60 ... 61 </programlisting> 62 63 <para> 64 Instead, you should write 65 </para> 66 67 <programlisting> 68 if (L.empty()) 69 ... 70 </programlisting> 71 </blockquote> 72 </section> 73</section> 74 75<section xml:id="containers.sequences.vector" xreflabel="vector"><info><title>vector</title></info> 76<?dbhtml filename="vector.html"?> 77 78 <para> 79 </para> 80 <section xml:id="sequences.vector.management" xreflabel="Space Overhead Management"><info><title>Space Overhead Management</title></info> 81 82 <para> 83 In <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/ml/libstdc++/2002-04/msg00105.html">this 84 message to the list</link>, Daniel Kostecky announced work on an 85 alternate form of <code>std::vector</code> that would support 86 hints on the number of elements to be over-allocated. The design 87 was also described, along with possible implementation choices. 88 </para> 89 <para> 90 The first two alpha releases were announced <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/ml/libstdc++/2002-07/msg00048.html">here</link> 91 and <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/ml/libstdc++/2002-07/msg00111.html">here</link>. 92 </para> 93 94 </section></section> 95</section> 96 97<!-- Sect1 02 : Associative --> 98<section xml:id="std.containers.associative" xreflabel="Associative"><info><title>Associative</title></info> 99<?dbhtml filename="associative.html"?> 100 101 102 <section xml:id="containers.associative.insert_hints" xreflabel="Insertion Hints"><info><title>Insertion Hints</title></info> 103 104 <para> 105 Section [23.1.2], Table 69, of the C++ standard lists this 106 function for all of the associative containers (map, set, etc): 107 </para> 108 <programlisting> 109 a.insert(p,t); 110 </programlisting> 111 <para> 112 where 'p' is an iterator into the container 'a', and 't' is the 113 item to insert. The standard says that <quote><code>t</code> is 114 inserted as close as possible to the position just prior to 115 <code>p</code>.</quote> (Library DR #233 addresses this topic, 116 referring to <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1780.html">N1780</link>. 117 Since version 4.2 GCC implements the resolution to DR 233, so 118 that insertions happen as close as possible to the hint. For 119 earlier releases the hint was only used as described below. 120 </para> 121 <para> 122 Here we'll describe how the hinting works in the libstdc++ 123 implementation, and what you need to do in order to take 124 advantage of it. (Insertions can change from logarithmic 125 complexity to amortized constant time, if the hint is properly 126 used.) Also, since the current implementation is based on the 127 SGI STL one, these points may hold true for other library 128 implementations also, since the HP/SGI code is used in a lot of 129 places. 130 </para> 131 <para> 132 In the following text, the phrases <emphasis>greater 133 than</emphasis> and <emphasis>less than</emphasis> refer to the 134 results of the strict weak ordering imposed on the container by 135 its comparison object, which defaults to (basically) 136 <quote><</quote>. Using those phrases is semantically sloppy, 137 but I didn't want to get bogged down in syntax. I assume that if 138 you are intelligent enough to use your own comparison objects, 139 you are also intelligent enough to assign <quote>greater</quote> 140 and <quote>lesser</quote> their new meanings in the next 141 paragraph. *grin* 142 </para> 143 <para> 144 If the <code>hint</code> parameter ('p' above) is equivalent to: 145 </para> 146 <itemizedlist> 147 <listitem> 148 <para> 149 <code>begin()</code>, then the item being inserted should 150 have a key less than all the other keys in the container. 151 The item will be inserted at the beginning of the container, 152 becoming the new entry at <code>begin()</code>. 153 </para> 154 </listitem> 155 <listitem> 156 <para> 157 <code>end()</code>, then the item being inserted should have 158 a key greater than all the other keys in the container. The 159 item will be inserted at the end of the container, becoming 160 the new entry before <code>end()</code>. 161 </para> 162 </listitem> 163 <listitem> 164 <para> 165 neither <code>begin()</code> nor <code>end()</code>, then: 166 Let <code>h</code> be the entry in the container pointed to 167 by <code>hint</code>, that is, <code>h = *hint</code>. Then 168 the item being inserted should have a key less than that of 169 <code>h</code>, and greater than that of the item preceding 170 <code>h</code>. The new item will be inserted between 171 <code>h</code> and <code>h</code>'s predecessor. 172 </para> 173 </listitem> 174 </itemizedlist> 175 <para> 176 For <code>multimap</code> and <code>multiset</code>, the 177 restrictions are slightly looser: <quote>greater than</quote> 178 should be replaced by <quote>not less than</quote>and <quote>less 179 than</quote> should be replaced by <quote>not greater 180 than.</quote> (Why not replace greater with 181 greater-than-or-equal-to? You probably could in your head, but 182 the mathematicians will tell you that it isn't the same thing.) 183 </para> 184 <para> 185 If the conditions are not met, then the hint is not used, and the 186 insertion proceeds as if you had called <code> a.insert(t) 187 </code> instead. (<emphasis>Note </emphasis> that GCC releases 188 prior to 3.0.2 had a bug in the case with <code>hint == 189 begin()</code> for the <code>map</code> and <code>set</code> 190 classes. You should not use a hint argument in those releases.) 191 </para> 192 <para> 193 This behavior goes well with other containers' 194 <code>insert()</code> functions which take an iterator: if used, 195 the new item will be inserted before the iterator passed as an 196 argument, same as the other containers. 197 </para> 198 <para> 199 <emphasis>Note </emphasis> also that the hint in this 200 implementation is a one-shot. The older insertion-with-hint 201 routines check the immediately surrounding entries to ensure that 202 the new item would in fact belong there. If the hint does not 203 point to the correct place, then no further local searching is 204 done; the search begins from scratch in logarithmic time. 205 </para> 206 </section> 207 208 209 <section xml:id="containers.associative.bitset" xreflabel="bitset"><info><title>bitset</title></info> 210 <?dbhtml filename="bitset.html"?> 211 212 <section xml:id="associative.bitset.size_variable" xreflabel="Variable"><info><title>Size Variable</title></info> 213 214 <para> 215 No, you cannot write code of the form 216 </para> 217 <!-- Careful, the leading spaces in PRE show up directly. --> 218 <programlisting> 219 #include <bitset> 220 221 void foo (size_t n) 222 { 223 std::bitset<n> bits; 224 .... 225 } 226 </programlisting> 227 <para> 228 because <code>n</code> must be known at compile time. Your 229 compiler is correct; it is not a bug. That's the way templates 230 work. (Yes, it <emphasis>is</emphasis> a feature.) 231 </para> 232 <para> 233 There are a couple of ways to handle this kind of thing. Please 234 consider all of them before passing judgement. They include, in 235 no chaptericular order: 236 </para> 237 <itemizedlist> 238 <listitem><para>A very large N in <code>bitset<N></code>.</para></listitem> 239 <listitem><para>A container<bool>.</para></listitem> 240 <listitem><para>Extremely weird solutions.</para></listitem> 241 </itemizedlist> 242 <para> 243 <emphasis>A very large N in 244 <code>bitset<N></code>. </emphasis> It has been 245 pointed out a few times in newsgroups that N bits only takes up 246 (N/8) bytes on most systems, and division by a factor of eight is 247 pretty impressive when speaking of memory. Half a megabyte given 248 over to a bitset (recall that there is zero space overhead for 249 housekeeping info; it is known at compile time exactly how large 250 the set is) will hold over four million bits. If you're using 251 those bits as status flags (e.g., 252 <quote>changed</quote>/<quote>unchanged</quote> flags), that's a 253 <emphasis>lot</emphasis> of state. 254 </para> 255 <para> 256 You can then keep track of the <quote>maximum bit used</quote> 257 during some testing runs on representative data, make note of how 258 many of those bits really need to be there, and then reduce N to 259 a smaller number. Leave some extra space, of course. (If you 260 plan to write code like the incorrect example above, where the 261 bitset is a local variable, then you may have to talk your 262 compiler into allowing that much stack space; there may be zero 263 space overhead, but it's all allocated inside the object.) 264 </para> 265 <para> 266 <emphasis>A container<bool>. </emphasis> The 267 Committee made provision for the space savings possible with that 268 (N/8) usage previously mentioned, so that you don't have to do 269 wasteful things like <code>Container<char></code> or 270 <code>Container<short int></code>. Specifically, 271 <code>vector<bool></code> is required to be specialized for 272 that space savings. 273 </para> 274 <para> 275 The problem is that <code>vector<bool></code> doesn't 276 behave like a normal vector anymore. There have been 277 journal articles which discuss the problems (the ones by Herb 278 Sutter in the May and July/August 1999 issues of C++ Report cover 279 it well). Future revisions of the ISO C++ Standard will change 280 the requirement for <code>vector<bool></code> 281 specialization. In the meantime, <code>deque<bool></code> 282 is recommended (although its behavior is sane, you probably will 283 not get the space savings, but the allocation scheme is different 284 than that of vector). 285 </para> 286 <para> 287 <emphasis>Extremely weird solutions. </emphasis> If 288 you have access to the compiler and linker at runtime, you can do 289 something insane, like figuring out just how many bits you need, 290 then writing a temporary source code file. That file contains an 291 instantiation of <code>bitset</code> for the required number of 292 bits, inside some wrapper functions with unchanging signatures. 293 Have your program then call the compiler on that file using 294 Position Independent Code, then open the newly-created object 295 file and load those wrapper functions. You'll have an 296 instantiation of <code>bitset<N></code> for the exact 297 <code>N</code> that you need at the time. Don't forget to delete 298 the temporary files. (Yes, this <emphasis>can</emphasis> be, and 299 <emphasis>has been</emphasis>, done.) 300 </para> 301 <!-- I wonder if this next paragraph will get me in trouble... --> 302 <para> 303 This would be the approach of either a visionary genius or a 304 raving lunatic, depending on your programming and management 305 style. Probably the latter. 306 </para> 307 <para> 308 Which of the above techniques you use, if any, are up to you and 309 your intended application. Some time/space profiling is 310 indicated if it really matters (don't just guess). And, if you 311 manage to do anything along the lines of the third category, the 312 author would love to hear from you... 313 </para> 314 <para> 315 Also note that the implementation of bitset used in libstdc++ has 316 <link linkend="manual.ext.containers.sgi">some extensions</link>. 317 </para> 318 319 </section> 320 <section xml:id="associative.bitset.type_string" xreflabel="Type String"><info><title>Type String</title></info> 321 322 <para> 323 </para> 324 <para> 325 Bitmasks do not take char* nor const char* arguments in their 326 constructors. This is something of an accident, but you can read 327 about the problem: follow the library's <quote>Links</quote> from 328 the homepage, and from the C++ information <quote>defect 329 reflector</quote> link, select the library issues list. Issue 330 number 116 describes the problem. 331 </para> 332 <para> 333 For now you can simply make a temporary string object using the 334 constructor expression: 335 </para> 336 <programlisting> 337 std::bitset<5> b ( std::string(<quote>10110</quote>) ); 338 </programlisting> 339 340 <para> 341 instead of 342 </para> 343 344 <programlisting> 345 std::bitset<5> b ( <quote>10110</quote> ); // invalid 346 </programlisting> 347 </section> 348 </section> 349 350</section> 351 352<!-- Sect1 03 : Unordered Associative --> 353<section xml:id="std.containers.unordered" xreflabel="Unordered"> 354 <info><title>Unordered Associative</title></info> 355 <?dbhtml filename="unordered_associative.html"?> 356 357 <section xml:id="containers.unordered.hash" xreflabel="Hash"> 358 <info><title>Hash Code</title></info> 359 360 <section xml:id="containers.unordered.cache" xreflabel="Cache"> 361 <info><title>Hash Code Caching Policy</title></info> 362 363 <para> 364 The unordered containers in libstdc++ may cache the hash code for each 365 element alongside the element itself. In some cases not recalculating 366 the hash code every time it's needed can improve performance, but the 367 additional memory overhead can also reduce performance, so whether an 368 unordered associative container caches the hash code or not depends on 369 a number of factors. The caching policy for GCC 4.8 is described below. 370 </para> 371 <para> 372 The C++ standard requires that <code>erase</code> and <code>swap</code> 373 operations must not throw exceptions. Those operations might need an 374 element's hash code, but cannot use the hash function if it could 375 throw. 376 This means the hash codes will be cached unless the hash function 377 has a non-throwing exception specification such as <code>noexcept</code> 378 or <code>throw()</code>. 379 </para> 380 <para> 381 Secondly, libstdc++ also needs the hash code in the implementation of 382 <code>local_iterator</code> and <code>const_local_iterator</code> in 383 order to know when the iterator has reached the end of the bucket. 384 This means that the local iterator types will embed a copy of the hash 385 function when possible. 386 Because the local iterator types must be DefaultConstructible and 387 CopyAssignable, if the hash function type does not model those concepts 388 then it cannot be embedded and so the hash code must be cached. 389 Note that a hash function might not be safe to use when 390 default-constructed (e.g if it a function pointer) so a hash 391 function that is contained in a local iterator won't be used until 392 the iterator is valid, so the hash function has been copied from a 393 correctly-initialized object. 394 </para> 395 <para> 396 If the hash function is non-throwing, DefaultConstructible and 397 CopyAssignable then libstdc++ doesn't need to cache the hash code for 398 correctness, but might still do so for performance if computing a 399 hash code is an expensive operation, as it may be for arbitrarily 400 long strings. 401 As an extension libstdc++ provides a trait type to describe whether 402 a hash function is fast. By default hash functions are assumed to be 403 fast unless the trait is specialized for the hash function and the 404 trait's value is false, in which case the hash code will always be 405 cached. 406 The trait can be specialized for user-defined hash functions like so: 407 </para> 408 <programlisting> 409 #include <unordered_set> 410 411 struct hasher 412 { 413 std::size_t operator()(int val) const noexcept 414 { 415 // Some very slow computation of a hash code from an int ! 416 ... 417 } 418 } 419 420 namespace std 421 { 422 template<> 423 struct __is_fast_hash<hasher> : std::false_type 424 { }; 425 } 426 </programlisting> 427 </section> 428</section> 429 430</section> 431 432<!-- Sect1 04 : Interacting with C --> 433<section xml:id="std.containers.c" xreflabel="Interacting with C"><info><title>Interacting with C</title></info> 434<?dbhtml filename="containers_and_c.html"?> 435 436 437 <section xml:id="containers.c.vs_array" xreflabel="Containers vs. Arrays"><info><title>Containers vs. Arrays</title></info> 438 439 <para> 440 You're writing some code and can't decide whether to use builtin 441 arrays or some kind of container. There are compelling reasons 442 to use one of the container classes, but you're afraid that 443 you'll eventually run into difficulties, change everything back 444 to arrays, and then have to change all the code that uses those 445 data types to keep up with the change. 446 </para> 447 <para> 448 If your code makes use of the standard algorithms, this isn't as 449 scary as it sounds. The algorithms don't know, nor care, about 450 the kind of <quote>container</quote> on which they work, since 451 the algorithms are only given endpoints to work with. For the 452 container classes, these are iterators (usually 453 <code>begin()</code> and <code>end()</code>, but not always). 454 For builtin arrays, these are the address of the first element 455 and the <link linkend="iterators.predefined.end">past-the-end</link> element. 456 </para> 457 <para> 458 Some very simple wrapper functions can hide all of that from the 459 rest of the code. For example, a pair of functions called 460 <code>beginof</code> can be written, one that takes an array, 461 another that takes a vector. The first returns a pointer to the 462 first element, and the second returns the vector's 463 <code>begin()</code> iterator. 464 </para> 465 <para> 466 The functions should be made template functions, and should also 467 be declared inline. As pointed out in the comments in the code 468 below, this can lead to <code>beginof</code> being optimized out 469 of existence, so you pay absolutely nothing in terms of increased 470 code size or execution time. 471 </para> 472 <para> 473 The result is that if all your algorithm calls look like 474 </para> 475 <programlisting> 476 std::transform(beginof(foo), endof(foo), beginof(foo), SomeFunction); 477 </programlisting> 478 <para> 479 then the type of foo can change from an array of ints to a vector 480 of ints to a deque of ints and back again, without ever changing 481 any client code. 482 </para> 483 484<programlisting> 485// beginof 486template<typename T> 487 inline typename vector<T>::iterator 488 beginof(vector<T> &v) 489 { return v.begin(); } 490 491template<typename T, unsigned int sz> 492 inline T* 493 beginof(T (&array)[sz]) { return array; } 494 495// endof 496template<typename T> 497 inline typename vector<T>::iterator 498 endof(vector<T> &v) 499 { return v.end(); } 500 501template<typename T, unsigned int sz> 502 inline T* 503 endof(T (&array)[sz]) { return array + sz; } 504 505// lengthof 506template<typename T> 507 inline typename vector<T>::size_type 508 lengthof(vector<T> &v) 509 { return v.size(); } 510 511template<typename T, unsigned int sz> 512 inline unsigned int 513 lengthof(T (&)[sz]) { return sz; } 514</programlisting> 515 516 <para> 517 Astute readers will notice two things at once: first, that the 518 container class is still a <code>vector<T></code> instead 519 of a more general <code>Container<T></code>. This would 520 mean that three functions for <code>deque</code> would have to be 521 added, another three for <code>list</code>, and so on. This is 522 due to problems with getting template resolution correct; I find 523 it easier just to give the extra three lines and avoid confusion. 524 </para> 525 <para> 526 Second, the line 527 </para> 528 <programlisting> 529 inline unsigned int lengthof (T (&)[sz]) { return sz; } 530 </programlisting> 531 <para> 532 looks just weird! Hint: unused parameters can be left nameless. 533 </para> 534 </section> 535 536</section> 537 538</chapter> 539