Lines Matching full:clang

1 # Optimizing Clang : A Practical Example of Applying BOLT
12 programs can benefit too. Clang, one of the most popular open-source C/C++ compilers,
14 As we will see, the Clang binary suffers from many instruction cache
18 In this tutorial we will first build Clang with PGO and LTO, and then will show steps on how to
19 apply BOLT optimizations to make Clang up to 15% faster. We will also analyze where
23 ## Building Clang
25 The process of getting Clang sources and performing the build is very similar to the
26 one described at http://clang.llvm.org/get_started.html. For completeness, we provide the detailed steps
27 on how to obtain and build Clang in [Bootstrapping Clang-7 with PGO and LTO](#bootstrapping-clang-7-with-pgo-and-lto) section.
29 The only difference from the standard Clang build is that we require the `-Wl,-q` flag to be present during
33 ## Optimizing Clang with BOLT
35 We will use the setup described in [Bootstrapping Clang-7 with PGO and LTO](#bootstrapping-clang-7-with-pgo-and-lto).
38 Before we can run BOLT optimizations, we need to collect the profile for Clang, and we will use
39 Clang/LLVM sources for that.
51 -DCMAKE_C_COMPILER=$CPATH/clang -DCMAKE_CXX_COMPILER=$CPATH/clang++ \
52 -DLLVM_ENABLE_PROJECTS="clang" \
54 $ perf record -e cycles:u -j any,u -- ninja clang
61 $ perf2bolt $CPATH/clang-7 -p perf.data -o clang-7.fdata -w clang-7.yaml
63 Notice that we are passing `clang-7` to `perf2bolt` which is the real binary that
64 `clang` and `clang++` are symlinking to. The next step will optimize Clang using
67 $ llvm-bolt $CPATH/clang-7 -o $CPATH/clang-7.bolt -b clang-7.yaml \
106 `clang-7.bolt` can be used as a replacement for *PGO+LTO* Clang:
108 $ mv $CPATH/clang-7 $CPATH/clang-7.org
109 $ ln -fs $CPATH/clang-7.bolt $CPATH/clang-7
111 Doing a new build of Clang using the new binary shows a significant overall
114 $ ln -fs $CPATH/clang-7.org $CPATH/clang-7
115 $ ninja clean && /bin/time -f %e ninja clang -j48
117 $ ln -fs $CPATH/clang-7.bolt $CPATH/clang-7
118 $ ninja clean && /bin/time -f %e ninja clang -j48
124 If we run BOLT on a Clang binary compiled without *PGO+LTO* (in which case the build is finished in 253.32 seconds),
130 We mentioned that Clang suffers from considerable instruction cache misses. This can be measured with `perf`:
132 $ ln -fs $CPATH/clang-7.org $CPATH/clang-7
133 $ ninja clean && perf stat -e instructions,L1-icache-misses -- ninja clang -j48
142 $ ln -fs $CPATH/clang-7.bolt $CPATH/clang-7
143 $ ninja clean && perf stat -e instructions,L1-icache-misses -- ninja clang -j48
154 ## Using Clang for Other Applications
156 We have collected profile for Clang using its own source code. Would it be enough to speed up
159 On our 48-core Haswell system using the *PGO+LTO* Clang, the build finished in 136.06 seconds, while using the *PGO+LTO+BOLT* Clang, 126.10 seconds.
160 That's a noticeable improvement, but not as significant as the one we saw on Clang itself.
162 Another reason is that Clang is run with a different set of options while building `mysqld` compared
168 using the `merge-fdata` utility that comes with BOLT. Optimized with that profile, the *PGO+LTO+BOLT* Clang was able
169 to perform the `mysqld` build in 124.74 seconds, i.e. 11 seconds or 9% faster compared to *PGO+LGO* Clang.
170 The merged profile didn't make the original Clang compilation slower either, while the number of profiled functions in Clang increased from 11,415 to 14,025.
177 performance of the Clang compiler. Similarly, BOLT could be used to improve the performance
184 ## Bootstrapping Clang-7 with PGO and LTO
186 Below we describe detailed steps to build Clang, and make it ready for BOLT
190 ### Getting Clang-7 Sources
193 builds. E.g. `TOPLEV=~/clang-7/`. Follow with commands to clone the `release_70`
204 default system compiler to build Clang. If your system lacks a compiler, use
215 -DLLVM_ENABLE_PROJECTS="clang;lld" \
225 Using the freshly-baked stage 1 Clang compiler, we are going to build Clang with
233 -DCMAKE_C_COMPILER=$CPATH/clang -DCMAKE_CXX_COMPILER=$CPATH/clang++ \
234 -DLLVM_ENABLE_PROJECTS="clang;lld" \
244 while building Clang itself:
251 -DCMAKE_C_COMPILER=$CPATH/clang -DCMAKE_CXX_COMPILER=$CPATH/clang++ \
252 -DLLVM_ENABLE_PROJECTS="clang" \
254 $ ninja clang
258 passed back into Clang:
261 $ ${TOPLEV}/stage1/install/bin/llvm-profdata merge -output=clang.profdata *
264 ### Building Clang with PGO and LTO
267 our scenario, i.e. building Clang. We will also enable link-time optimizations
280 -DCMAKE_C_COMPILER=$CPATH/clang -DCMAKE_CXX_COMPILER=$CPATH/clang++ \
281 -DLLVM_ENABLE_PROJECTS="clang;lld" \
283 -DLLVM_PROFDATA_FILE=${TOPLEV}/stage2-prof-gen/profiles/clang.profdata \
288 Now we have a Clang compiler that can build itself much faster. As we will see,