1<?xml version="1.0" encoding="UTF-8" standalone="no"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Diagnostics</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><meta name="keywords" content="C++, library, profile" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="profile_mode.html" title="Chapter 19. Profile Mode" /><link rel="prev" href="profile_mode_devel.html" title="Developer Information" /><link rel="next" href="mt_allocator.html" title="Chapter 20. The mt_allocator" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Diagnostics</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="profile_mode_devel.html">Prev</a> </td><th width="60%" align="center">Chapter 19. Profile Mode</th><td width="20%" align="right"> <a accesskey="n" href="mt_allocator.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="manual.ext.profile_mode.diagnostics"></a>Diagnostics</h2></div></div></div><p> 3 The table below presents all the diagnostics we intend to implement. 4 Each diagnostic has a corresponding compile time switch 5 <code class="code">-D_GLIBCXX_PROFILE_<diagnostic></code>. 6 Groups of related diagnostics can be turned on with a single switch. 7 For instance, <code class="code">-D_GLIBCXX_PROFILE_LOCALITY</code> is equivalent to 8 <code class="code">-D_GLIBCXX_PROFILE_SOFTWARE_PREFETCH 9 -D_GLIBCXX_PROFILE_RBTREE_LOCALITY</code>. 10 </p><p> 11 The benefit, cost, expected frequency and accuracy of each diagnostic 12 was given a grade from 1 to 10, where 10 is highest. 13 A high benefit means that, if the diagnostic is accurate, the expected 14 performance improvement is high. 15 A high cost means that turning this diagnostic on leads to high slowdown. 16 A high frequency means that we expect this to occur relatively often. 17 A high accuracy means that the diagnostic is unlikely to be wrong. 18 These grades are not perfect. They are just meant to guide users with 19 specific needs or time budgets. 20 </p><div class="table"><a id="table.profile_diagnostics"></a><p class="title"><strong>Table 19.2. Profile Diagnostics</strong></p><div class="table-contents"><table class="table" summary="Profile Diagnostics" border="1"><colgroup><col align="left" class="c1" /><col align="left" class="c2" /><col align="left" class="c3" /><col align="left" class="c4" /><col align="left" class="c5" /><col align="left" class="c6" /><col align="left" class="c7" /></colgroup><thead><tr><th align="left">Group</th><th align="left">Flag</th><th align="left">Benefit</th><th align="left">Cost</th><th align="left">Freq.</th><th align="left">Implemented</th><td class="auto-generated"> </td></tr></thead><tbody><tr><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.containers" title="Containers"> 21 CONTAINERS</a></td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.hashtable_too_small" title="Hashtable Too Small"> 22 HASHTABLE_TOO_SMALL</a></td><td align="left">10</td><td align="left">1</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.hashtable_too_large" title="Hashtable Too Large"> 23 HASHTABLE_TOO_LARGE</a></td><td align="left">5</td><td align="left">1</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.inefficient_hash" title="Inefficient Hash"> 24 INEFFICIENT_HASH</a></td><td align="left">7</td><td align="left">3</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.vector_too_small" title="Vector Too Small"> 25 VECTOR_TOO_SMALL</a></td><td align="left">8</td><td align="left">1</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.vector_too_large" title="Vector Too Large"> 26 VECTOR_TOO_LARGE</a></td><td align="left">5</td><td align="left">1</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.vector_to_hashtable" title="Vector to Hashtable"> 27 VECTOR_TO_HASHTABLE</a></td><td align="left">7</td><td align="left">7</td><td align="left"> </td><td align="left">10</td><td align="left">no</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.hashtable_to_vector" title="Hashtable to Vector"> 28 HASHTABLE_TO_VECTOR</a></td><td align="left">7</td><td align="left">7</td><td align="left"> </td><td align="left">10</td><td align="left">no</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.vector_to_list" title="Vector to List"> 29 VECTOR_TO_LIST</a></td><td align="left">8</td><td align="left">5</td><td align="left"> </td><td align="left">10</td><td align="left">yes</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.list_to_vector" title="List to Vector"> 30 LIST_TO_VECTOR</a></td><td align="left">10</td><td align="left">5</td><td align="left"> </td><td align="left">10</td><td align="left">no</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.assoc_ord_to_unord" title="Ordered to Unordered Associative Container"> 31 ORDERED_TO_UNORDERED</a></td><td align="left">10</td><td align="left">5</td><td align="left"> </td><td align="left">10</td><td align="left">only map/unordered_map</td></tr><tr><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.algorithms" title="Algorithms"> 32 ALGORITHMS</a></td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.algorithms.sort" title="Sort Algorithm Performance"> 33 SORT</a></td><td align="left">7</td><td align="left">8</td><td align="left"> </td><td align="left">7</td><td align="left">no</td></tr><tr><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.locality" title="Data Locality"> 34 LOCALITY</a></td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.locality.sw_prefetch" title="Need Software Prefetch"> 35 SOFTWARE_PREFETCH</a></td><td align="left">8</td><td align="left">8</td><td align="left"> </td><td align="left">5</td><td align="left">no</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.locality.linked" title="Linked Structure Locality"> 36 RBTREE_LOCALITY</a></td><td align="left">4</td><td align="left">8</td><td align="left"> </td><td align="left">5</td><td align="left">no</td></tr><tr><td align="left"> </td><td align="left"><a class="link" href="profile_mode_diagnostics.html#manual.ext.profile_mode.analysis.mthread.false_share" title="False Sharing"> 37 FALSE_SHARING</a></td><td align="left">8</td><td align="left">10</td><td align="left"> </td><td align="left">10</td><td align="left">no</td></tr></tbody></table></div></div><br class="table-break" /><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.template"></a>Diagnostic Template</h3></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 38 <code class="code">_GLIBCXX_PROFILE_<diagnostic></code>. 39 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> What problem will it diagnose? 40 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>. 41 What is the fundamental reason why this is a problem</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> 42 Percentage reduction in execution time. When reduction is more than 43 a constant factor, describe the reduction rate formula. 44 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> 45 What would the advise look like?</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> 46 What stdlibc++ components need to be instrumented?</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 47 How do we decide when to issue the advice?</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 48 How do we measure benefits? Math goes here.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 49</p><pre class="programlisting"> 50program code 51... 52advice sample 53</pre><p> 54</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.containers"></a>Containers</h3></div></div></div><p> 55<span class="emphasis"><em>Switch:</em></span> 56 <code class="code">_GLIBCXX_PROFILE_CONTAINERS</code>. 57</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.hashtable_too_small"></a>Hashtable Too Small</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 58 <code class="code">_GLIBCXX_PROFILE_HASHTABLE_TOO_SMALL</code>. 59 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect hashtables with many 60 rehash operations, small construction size and large destruction size. 61 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> Rehash is very expensive. 62 Read content, follow chains within bucket, evaluate hash function, place at 63 new location in different order.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> 36%. 64 Code similar to example below. 65 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> 66 Set initial size to N at construction site S. 67 </p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> 68 <code class="code">unordered_set, unordered_map</code> constructor, destructor, rehash. 69 </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 70 For each dynamic instance of <code class="code">unordered_[multi]set|map</code>, 71 record initial size and call context of the constructor. 72 Record size increase, if any, after each relevant operation such as insert. 73 Record the estimated rehash cost.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 74 Number of individual rehash operations * cost per rehash.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 75</p><pre class="programlisting"> 761 unordered_set<int> us; 772 for (int k = 0; k < 1000000; ++k) { 783 us.insert(k); 794 } 80 81foo.cc:1: advice: Changing initial unordered_set size from 10 to 1000000 saves 1025530 rehash operations. 82</pre><p> 83</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.hashtable_too_large"></a>Hashtable Too Large</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 84 <code class="code">_GLIBCXX_PROFILE_HASHTABLE_TOO_LARGE</code>. 85 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect hashtables which are 86 never filled up because fewer elements than reserved are ever 87 inserted. 88 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> Save memory, which 89 is good in itself and may also improve memory reference performance through 90 fewer cache and TLB misses.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> unknown. 91 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> 92 Set initial size to N at construction site S. 93 </p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> 94 <code class="code">unordered_set, unordered_map</code> constructor, destructor, rehash. 95 </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 96 For each dynamic instance of <code class="code">unordered_[multi]set|map</code>, 97 record initial size and call context of the constructor, and correlate it 98 with its size at destruction time. 99 </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 100 Number of iteration operations + memory saved.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 101</p><pre class="programlisting"> 1021 vector<unordered_set<int>> v(100000, unordered_set<int>(100)) ; 1032 for (int k = 0; k < 100000; ++k) { 1043 for (int j = 0; j < 10; ++j) { 1054 v[k].insert(k + j); 1065 } 1076 } 108 109foo.cc:1: advice: Changing initial unordered_set size from 100 to 10 saves N 110bytes of memory and M iteration steps. 111</pre><p> 112</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.inefficient_hash"></a>Inefficient Hash</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 113 <code class="code">_GLIBCXX_PROFILE_INEFFICIENT_HASH</code>. 114 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect hashtables with polarized 115 distribution. 116 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> A non-uniform 117 distribution may lead to long chains, thus possibly increasing complexity 118 by a factor up to the number of elements. 119 </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span> factor up 120 to container size. 121 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Change hash function 122 for container built at site S. Distribution score = N. Access score = S. 123 Longest chain = C, in bucket B. 124 </p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> 125 <code class="code">unordered_set, unordered_map</code> constructor, destructor, [], 126 insert, iterator. 127 </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 128 Count the exact number of link traversals. 129 </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 130 Total number of links traversed.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 131</p><pre class="programlisting"> 132class dumb_hash { 133 public: 134 size_t operator() (int i) const { return 0; } 135}; 136... 137 unordered_set<int, dumb_hash> hs; 138 ... 139 for (int i = 0; i < COUNT; ++i) { 140 hs.find(i); 141 } 142</pre><p> 143</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_too_small"></a>Vector Too Small</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 144 <code class="code">_GLIBCXX_PROFILE_VECTOR_TOO_SMALL</code>. 145 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span>Detect vectors with many 146 resize operations, small construction size and large destruction size.. 147 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>Resizing can be expensive. 148 Copying large amounts of data takes time. Resizing many small vectors may 149 have allocation overhead and affect locality.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>%. 150 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> 151 Set initial size to N at construction site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code>. 152 </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 153 For each dynamic instance of <code class="code">vector</code>, 154 record initial size and call context of the constructor. 155 Record size increase, if any, after each relevant operation such as 156 <code class="code">push_back</code>. Record the estimated resize cost. 157 </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 158 Total number of words copied * time to copy a word.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 159</p><pre class="programlisting"> 1601 vector<int> v; 1612 for (int k = 0; k < 1000000; ++k) { 1623 v.push_back(k); 1634 } 164 165foo.cc:1: advice: Changing initial vector size from 10 to 1000000 saves 166copying 4000000 bytes and 20 memory allocations and deallocations. 167</pre><p> 168</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_too_large"></a>Vector Too Large</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 169 <code class="code">_GLIBCXX_PROFILE_VECTOR_TOO_LARGE</code> 170 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span>Detect vectors which are 171 never filled up because fewer elements than reserved are ever 172 inserted. 173 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>Save memory, which 174 is good in itself and may also improve memory reference performance through 175 fewer cache and TLB misses.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>%. 176 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> 177 Set initial size to N at construction site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code>. 178 </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 179 For each dynamic instance of <code class="code">vector</code>, 180 record initial size and call context of the constructor, and correlate it 181 with its size at destruction time.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 182 Total amount of memory saved.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 183</p><pre class="programlisting"> 1841 vector<vector<int>> v(100000, vector<int>(100)) ; 1852 for (int k = 0; k < 100000; ++k) { 1863 for (int j = 0; j < 10; ++j) { 1874 v[k].insert(k + j); 1885 } 1896 } 190 191foo.cc:1: advice: Changing initial vector size from 100 to 10 saves N 192bytes of memory and may reduce the number of cache and TLB misses. 193</pre><p> 194</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_to_hashtable"></a>Vector to Hashtable</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 195 <code class="code">_GLIBCXX_PROFILE_VECTOR_TO_HASHTABLE</code>. 196 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect uses of 197 <code class="code">vector</code> that can be substituted with <code class="code">unordered_set</code> 198 to reduce execution time. 199 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> 200 Linear search in a vector is very expensive, whereas searching in a hashtable 201 is very quick.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>factor up 202 to container size. 203 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace 204 <code class="code">vector</code> with <code class="code">unordered_set</code> at site S. 205 </p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code> 206 operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 207 For each dynamic instance of <code class="code">vector</code>, 208 record call context of the constructor. Issue the advice only if the 209 only methods called on this <code class="code">vector</code> are <code class="code">push_back</code>, 210 <code class="code">insert</code> and <code class="code">find</code>. 211 </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 212 Cost(vector::push_back) + cost(vector::insert) + cost(find, vector) - 213 cost(unordered_set::insert) + cost(unordered_set::find). 214 </p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 215</p><pre class="programlisting"> 2161 vector<int> v; 217... 2182 for (int i = 0; i < 1000; ++i) { 2193 find(v.begin(), v.end(), i); 2204 } 221 222foo.cc:1: advice: Changing "vector" to "unordered_set" will save about 500,000 223comparisons. 224</pre><p> 225</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.hashtable_to_vector"></a>Hashtable to Vector</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 226 <code class="code">_GLIBCXX_PROFILE_HASHTABLE_TO_VECTOR</code>. 227 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect uses of 228 <code class="code">unordered_set</code> that can be substituted with <code class="code">vector</code> 229 to reduce execution time. 230 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> 231 Hashtable iterator is slower than vector iterator.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>95%. 232 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace 233 <code class="code">unordered_set</code> with <code class="code">vector</code> at site S. 234 </p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">unordered_set</code> 235 operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 236 For each dynamic instance of <code class="code">unordered_set</code>, 237 record call context of the constructor. Issue the advice only if the 238 number of <code class="code">find</code>, <code class="code">insert</code> and <code class="code">[]</code> 239 operations on this <code class="code">unordered_set</code> are small relative to the 240 number of elements, and methods <code class="code">begin</code> or <code class="code">end</code> 241 are invoked (suggesting iteration).</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 242 Number of .</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 243</p><pre class="programlisting"> 2441 unordered_set<int> us; 245... 2462 int s = 0; 2473 for (unordered_set<int>::iterator it = us.begin(); it != us.end(); ++it) { 2484 s += *it; 2495 } 250 251foo.cc:1: advice: Changing "unordered_set" to "vector" will save about N 252indirections and may achieve better data locality. 253</pre><p> 254</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.vector_to_list"></a>Vector to List</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 255 <code class="code">_GLIBCXX_PROFILE_VECTOR_TO_LIST</code>. 256 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where 257 <code class="code">vector</code> could be substituted with <code class="code">list</code> for 258 better performance. 259 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> 260 Inserting in the middle of a vector is expensive compared to inserting in a 261 list. 262 </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>factor up to 263 container size. 264 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace vector with list 265 at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code> 266 operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 267 For each dynamic instance of <code class="code">vector</code>, 268 record the call context of the constructor. Record the overhead of each 269 <code class="code">insert</code> operation based on current size and insert position. 270 Report instance with high insertion overhead. 271 </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 272 (Sum(cost(vector::method)) - Sum(cost(list::method)), for 273 method in [push_back, insert, erase]) 274 + (Cost(iterate vector) - Cost(iterate list))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 275</p><pre class="programlisting"> 2761 vector<int> v; 2772 for (int i = 0; i < 10000; ++i) { 2783 v.insert(v.begin(), i); 2794 } 280 281foo.cc:1: advice: Changing "vector" to "list" will save about 5,000,000 282operations. 283</pre><p> 284</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.list_to_vector"></a>List to Vector</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 285 <code class="code">_GLIBCXX_PROFILE_LIST_TO_VECTOR</code>. 286 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where 287 <code class="code">list</code> could be substituted with <code class="code">vector</code> for 288 better performance. 289 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> 290 Iterating through a vector is faster than through a list. 291 </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>64%. 292 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace list with vector 293 at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">vector</code> 294 operations and access methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 295 Issue the advice if there are no <code class="code">insert</code> operations. 296 </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 297 (Sum(cost(vector::method)) - Sum(cost(list::method)), for 298 method in [push_back, insert, erase]) 299 + (Cost(iterate vector) - Cost(iterate list))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 300</p><pre class="programlisting"> 3011 list<int> l; 302... 3032 int sum = 0; 3043 for (list<int>::iterator it = l.begin(); it != l.end(); ++it) { 3054 sum += *it; 3065 } 307 308foo.cc:1: advice: Changing "list" to "vector" will save about 1000000 indirect 309memory references. 310</pre><p> 311</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.list_to_slist"></a>List to Forward List (Slist)</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 312 <code class="code">_GLIBCXX_PROFILE_LIST_TO_SLIST</code>. 313 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where 314 <code class="code">list</code> could be substituted with <code class="code">forward_list</code> for 315 better performance. 316 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> 317 The memory footprint of a forward_list is smaller than that of a list. 318 This has beneficial effects on memory subsystem, e.g., fewer cache misses. 319 </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>40%. 320 Note that the reduction is only noticeable if the size of the forward_list 321 node is in fact larger than that of the list node. For memory allocators 322 with size classes, you will only notice an effect when the two node sizes 323 belong to different allocator size classes. 324 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span>Replace list with 325 forward_list at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span><code class="code">list</code> 326 operations and iteration methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 327 Issue the advice if there are no <code class="code">backwards</code> traversals 328 or insertion before a given node. 329 </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 330 Always true.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 331</p><pre class="programlisting"> 3321 list<int> l; 333... 3342 int sum = 0; 3353 for (list<int>::iterator it = l.begin(); it != l.end(); ++it) { 3364 sum += *it; 3375 } 338 339foo.cc:1: advice: Change "list" to "forward_list". 340</pre><p> 341</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.assoc_ord_to_unord"></a>Ordered to Unordered Associative Container</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 342 <code class="code">_GLIBCXX_PROFILE_ORDERED_TO_UNORDERED</code>. 343 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect cases where ordered 344 associative containers can be replaced with unordered ones. 345 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> 346 Insert and search are quicker in a hashtable than in 347 a red-black tree.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>52%. 348 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> 349 Replace set with unordered_set at site S.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> 350 <code class="code">set</code>, <code class="code">multiset</code>, <code class="code">map</code>, 351 <code class="code">multimap</code> methods.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 352 Issue the advice only if we are not using operator <code class="code">++</code> on any 353 iterator on a particular <code class="code">[multi]set|map</code>. 354 </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 355 (Sum(cost(hashtable::method)) - Sum(cost(rbtree::method)), for 356 method in [insert, erase, find]) 357 + (Cost(iterate hashtable) - Cost(iterate rbtree))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 358</p><pre class="programlisting"> 3591 set<int> s; 3602 for (int i = 0; i < 100000; ++i) { 3613 s.insert(i); 3624 } 3635 int sum = 0; 3646 for (int i = 0; i < 100000; ++i) { 3657 sum += *s.find(i); 3668 } 367</pre><p> 368</p></li></ul></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.algorithms"></a>Algorithms</h3></div></div></div><p><span class="emphasis"><em>Switch:</em></span> 369 <code class="code">_GLIBCXX_PROFILE_ALGORITHMS</code>. 370 </p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.algorithms.sort"></a>Sort Algorithm Performance</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 371 <code class="code">_GLIBCXX_PROFILE_SORT</code>. 372 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Give measure of sort algorithm 373 performance based on actual input. For instance, advise Radix Sort over 374 Quick Sort for a particular call context. 375 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> 376 See papers: 377 <a class="link" href="https://dl.acm.org/citation.cfm?doid=1065944.1065981" target="_top"> 378 A framework for adaptive algorithm selection in STAPL</a> and 379 <a class="link" href="https://ieeexplore.ieee.org/document/4228227/" target="_top"> 380 Optimizing Sorting with Machine Learning Algorithms</a>. 381 </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>60%. 382 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Change sort algorithm 383 at site S from X Sort to Y Sort.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> <code class="code">sort</code> 384 algorithm.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 385 Issue the advice if the cost model tells us that another sort algorithm 386 would do better on this input. Requires us to know what algorithm we 387 are using in our sort implementation in release mode.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 388 Runtime(algo) for algo in [radix, quick, merge, ...]</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 389</p><pre class="programlisting"> 390</pre><p> 391</p></li></ul></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.locality"></a>Data Locality</h3></div></div></div><p><span class="emphasis"><em>Switch:</em></span> 392 <code class="code">_GLIBCXX_PROFILE_LOCALITY</code>. 393 </p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.locality.sw_prefetch"></a>Need Software Prefetch</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 394 <code class="code">_GLIBCXX_PROFILE_SOFTWARE_PREFETCH</code>. 395 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Discover sequences of indirect 396 memory accesses that are not regular, thus cannot be predicted by 397 hardware prefetchers. 398 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> 399 Indirect references are hard to predict and are very expensive when they 400 miss in caches.</p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>25%. 401 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Insert prefetch 402 instruction.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Vector iterator and 403 access operator []. 404 </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 405 First, get cache line size and page size from system. 406 Then record iterator dereference sequences for which the value is a pointer. 407 For each sequence within a container, issue a warning if successive pointer 408 addresses are not within cache lines and do not form a linear pattern 409 (otherwise they may be prefetched by hardware). 410 If they also step across page boundaries, make the warning stronger. 411 </p><p>The same analysis applies to containers other than vector. 412 However, we cannot give the same advice for linked structures, such as list, 413 as there is no random access to the n-th element. The user may still be 414 able to benefit from this information, for instance by employing frays (user 415 level light weight threads) to hide the latency of chasing pointers. 416 </p><p> 417 This analysis is a little oversimplified. A better cost model could be 418 created by understanding the capability of the hardware prefetcher. 419 This model could be trained automatically by running a set of synthetic 420 cases. 421 </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 422 Total distance between pointer values of successive elements in vectors 423 of pointers.</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 424</p><pre class="programlisting"> 4251 int zero = 0; 4262 vector<int*> v(10000000, &zero); 4273 for (int k = 0; k < 10000000; ++k) { 4284 v[random() % 10000000] = new int(k); 4295 } 4306 for (int j = 0; j < 10000000; ++j) { 4317 count += (*v[j] == 0 ? 0 : 1); 4328 } 433 434foo.cc:7: advice: Insert prefetch instruction. 435</pre><p> 436</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.locality.linked"></a>Linked Structure Locality</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 437 <code class="code">_GLIBCXX_PROFILE_RBTREE_LOCALITY</code>. 438 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Give measure of locality of 439 objects stored in linked structures (lists, red-black trees and hashtables) 440 with respect to their actual traversal patterns. 441 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span>Allocation can be tuned 442 to a specific traversal pattern, to result in better data locality. 443 See paper: 444 <a class="link" href="https://parasol.tamu.edu/publications/download.php?file_id=570" target="_top"> 445 Custom Memory Allocation for Free</a> by Jula and Rauchwerger. 446 </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>30%. 447 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> 448 High scatter score N for container built at site S. 449 Consider changing allocation sequence or choosing a structure conscious 450 allocator.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Methods of all 451 containers using linked structures.</p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 452 First, get cache line size and page size from system. 453 Then record the number of successive elements that are on different line 454 or page, for each traversal method such as <code class="code">find</code>. Give advice 455 only if the ratio between this number and the number of total node hops 456 is above a threshold.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 457 Sum(same_cache_line(this,previous))</p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 458</p><pre class="programlisting"> 459 1 set<int> s; 460 2 for (int i = 0; i < 10000000; ++i) { 461 3 s.insert(i); 462 4 } 463 5 set<int> s1, s2; 464 6 for (int i = 0; i < 10000000; ++i) { 465 7 s1.insert(i); 466 8 s2.insert(i); 467 9 } 468... 469 // Fast, better locality. 47010 for (set<int>::iterator it = s.begin(); it != s.end(); ++it) { 47111 sum += *it; 47212 } 473 // Slow, elements are further apart. 47413 for (set<int>::iterator it = s1.begin(); it != s1.end(); ++it) { 47514 sum += *it; 47615 } 477 478foo.cc:5: advice: High scatter score NNN for set built here. Consider changing 479the allocation sequence or switching to a structure conscious allocator. 480</pre><p> 481</p></li></ul></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.mthread"></a>Multithreaded Data Access</h3></div></div></div><p> 482 The diagnostics in this group are not meant to be implemented short term. 483 They require compiler support to know when container elements are written 484 to. Instrumentation can only tell us when elements are referenced. 485 </p><p><span class="emphasis"><em>Switch:</em></span> 486 <code class="code">_GLIBCXX_PROFILE_MULTITHREADED</code>. 487 </p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.mthread.ddtest"></a>Data Dependence Violations at Container Level</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 488 <code class="code">_GLIBCXX_PROFILE_DDTEST</code>. 489 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect container elements 490 that are referenced from multiple threads in the parallel region or 491 across parallel regions. 492 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> 493 Sharing data between threads requires communication and perhaps locking, 494 which may be expensive. 495 </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>?%. 496 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Change data 497 distribution or parallel algorithm.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Container access methods 498 and iterators. 499 </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 500 Keep a shadow for each container. Record iterator dereferences and 501 container member accesses. Issue advice for elements referenced by 502 multiple threads. 503 See paper: <a class="link" href="https://dl.acm.org/citation.cfm?id=207110.207148" target="_top"> 504 The LRPD test: speculative run-time parallelization of loops with 505 privatization and reduction parallelization</a>. 506 </p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 507 Number of accesses to elements referenced from multiple threads 508 </p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 509</p><pre class="programlisting"> 510</pre><p> 511</p></li></ul></div></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a id="manual.ext.profile_mode.analysis.mthread.false_share"></a>False Sharing</h4></div></div></div><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="emphasis"><em>Switch:</em></span> 512 <code class="code">_GLIBCXX_PROFILE_FALSE_SHARING</code>. 513 </p></li><li class="listitem"><p><span class="emphasis"><em>Goal:</em></span> Detect elements in the 514 same container which share a cache line, are written by at least one 515 thread, and accessed by different threads. 516 </p></li><li class="listitem"><p><span class="emphasis"><em>Fundamentals:</em></span> Under these assumptions, 517 cache protocols require 518 communication to invalidate lines, which may be expensive. 519 </p></li><li class="listitem"><p><span class="emphasis"><em>Sample runtime reduction:</em></span>68%. 520 </p></li><li class="listitem"><p><span class="emphasis"><em>Recommendation:</em></span> Reorganize container 521 or use padding to avoid false sharing.</p></li><li class="listitem"><p><span class="emphasis"><em>To instrument:</em></span> Container access methods 522 and iterators. 523 </p></li><li class="listitem"><p><span class="emphasis"><em>Analysis:</em></span> 524 First, get the cache line size. 525 For each shared container, record all the associated iterator dereferences 526 and member access methods with the thread id. Compare the address lists 527 across threads to detect references in two different threads to the same 528 cache line. Issue a warning only if the ratio to total references is 529 significant. Do the same for iterator dereference values if they are 530 pointers.</p></li><li class="listitem"><p><span class="emphasis"><em>Cost model:</em></span> 531 Number of accesses to same cache line from different threads. 532 </p></li><li class="listitem"><p><span class="emphasis"><em>Example:</em></span> 533</p><pre class="programlisting"> 5341 vector<int> v(2, 0); 5352 #pragma omp parallel for shared(v, SIZE) schedule(static, 1) 5363 for (i = 0; i < SIZE; ++i) { 5374 v[i % 2] += i; 5385 } 539 540OMP_NUM_THREADS=2 ./a.out 541foo.cc:1: advice: Change container structure or padding to avoid false 542sharing in multithreaded access at foo.cc:4. Detected N shared cache lines. 543</pre><p> 544</p></li></ul></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="manual.ext.profile_mode.analysis.statistics"></a>Statistics</h3></div></div></div><p> 545<span class="emphasis"><em>Switch:</em></span> 546 <code class="code">_GLIBCXX_PROFILE_STATISTICS</code>. 547</p><p> 548 In some cases the cost model may not tell us anything because the costs 549 appear to offset the benefits. Consider the choice between a vector and 550 a list. When there are both inserts and iteration, an automatic advice 551 may not be issued. However, the programmer may still be able to make use 552 of this information in a different way. 553</p><p> 554 This diagnostic will not issue any advice, but it will print statistics for 555 each container construction site. The statistics will contain the cost 556 of each operation actually performed on the container. 557</p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="profile_mode_devel.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="profile_mode.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="mt_allocator.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Developer Information </td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 20. The mt_allocator</td></tr></table></div></body></html>