Lines Matching +full:ninja +full:- +full:build
11 large applications measuring over hundreds of megabytes in size. However, medium-sized
12 programs can benefit too. Clang, one of the most popular open-source C/C++ compilers,
15 misses and can be significantly improved with BOLT, even on top of profile-guided and
16 link-time optimizations.
18 In this tutorial we will first build Clang with PGO and LTO, and then will show steps on how to
20 the compile-time performance gains are coming from, and verify that the speed-ups are
25 The process of getting Clang sources and performing the build is very similar to the
27 on how to obtain and build Clang in [Bootstrapping Clang-7 with PGO and LTO](#bootstrapping-clang-7-with-pgo-and-lto) section.
29 The only difference from the standard Clang build is that we require the `-Wl,-q` flag to be present during
35 We will use the setup described in [Bootstrapping Clang-7 with PGO and LTO](#bootstrapping-clang-7-with-pgo-and-lto).
36 Adjust the steps accordingly if you skipped that section. We will also assume that `llvm-bolt` is present in your `$PATH`.
41 implements taken branch sampling (`-b/-j` flag). For that reason, it may not be possible to
49 $ CPATH=${TOPLEV}/stage2-prof-use-lto/install/bin/
50 $ cmake -G Ninja ${TOPLEV}/llvm -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_BUILD_TYPE=Release \
51 -DCMAKE_C_COMPILER=$CPATH/clang -DCMAKE_CXX_COMPILER=$CPATH/clang++ \
52 -DLLVM_ENABLE_PROJECTS="clang" \
53 -DLLVM_USE_LINKER=lld -DCMAKE_INSTALL_PREFIX=${TOPLEV}/stage3/install
54 $ perf record -e cycles:u -j any,u -- ninja clang
61 $ perf2bolt $CPATH/clang-7 -p perf.data -o clang-7.fdata -w clang-7.yaml
63 Notice that we are passing `clang-7` to `perf2bolt` which is the real binary that
67 $ llvm-bolt $CPATH/clang-7 -o $CPATH/clang-7.bolt -b clang-7.yaml \
68 -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions \
69 -split-all-cold -dyno-stats -icf=1 -use-gnu-stack
74 BOLT-INFO: enabling relocation mode
75 BOLT-INFO: 11415 functions out of 104526 simple functions (10.9%) have non-empty execution profile.
77 BOLT-INFO: ICF folded 29144 out of 105177 functions in 8 passes. 82 functions had jump tables.
78 BOLT-INFO: Removing all identical functions will save 5466.69 KB of code space. Folded functions were called 2131985 times based on profile.
79 BOLT-INFO: basic block reordering modified layout of 7848 (10.32%) functions
81 660155947 : executed forward branches (-2.3%)
82 48252553 : taken forward branches (-57.2%)
84 52389551 : taken backward branches (-19.5%)
85 35650038 : executed unconditional branches (-33.2%)
89 6113398840 : executed instructions (-0.6%)
93 825703946 : total branches (-2.1%)
94 136292142 : taken branches (-41.1%)
95 689411804 : non-taken conditional branches (+12.6%)
96 100642104 : taken conditional branches (-43.4%)
104 ## Measuring Compile-time Improvement
106 `clang-7.bolt` can be used as a replacement for *PGO+LTO* Clang:
108 $ mv $CPATH/clang-7 $CPATH/clang-7.org
109 $ ln -fs $CPATH/clang-7.bolt $CPATH/clang-7
111 Doing a new build of Clang using the new binary shows a significant overall
112 build time reduction on a 48-core Haswell system:
114 $ ln -fs $CPATH/clang-7.org $CPATH/clang-7
115 $ ninja clean && /bin/time -f %e ninja clang -j48
117 $ ln -fs $CPATH/clang-7.bolt $CPATH/clang-7
118 $ ninja clean && /bin/time -f %e ninja clang -j48
121 That's 22.61 seconds (or 12%) faster compared to the *PGO+LTO* build.
122 Notice that we are measuring an improvement of the total build time, which includes the time spent in the linker.
124 If we run BOLT on a Clang binary compiled without *PGO+LTO* (in which case the build is finished in 253.32 seconds),
126 but, as expected, the result is still slower than *PGO+LTO+BOLT* build.
132 $ ln -fs $CPATH/clang-7.org $CPATH/clang-7
133 $ ninja clean && perf stat -e instructions,L1-icache-misses -- ninja clang -j48
136 359,996,216,537 L1-icache-misses
142 $ ln -fs $CPATH/clang-7.bolt $CPATH/clang-7
143 $ ninja clean && perf stat -e instructions,L1-icache-misses -- ninja clang -j48
146 244,888,677,972 L1-icache-misses
149 the number of stalls in the CPU front-end.
157 the compilation of other projects? We picked `mysqld`, an open-source database, to do the test.
159 On our 48-core Haswell system using the *PGO+LTO* Clang, the build finished in 136.06 seconds, while using the *PGO+LTO+BOLT* Clang, 126.10 seconds.
168 using the `merge-fdata` utility that comes with BOLT. Optimized with that profile, the *PGO+LTO+BOLT* Clang was able
169 to perform the `mysqld` build in 124.74 seconds, i.e. 11 seconds or 9% faster compared to *PGO+LGO* Clang.
181 ----
184 ## Bootstrapping Clang-7 with PGO and LTO
186 Below we describe detailed steps to build Clang, and make it ready for BOLT
187 optimizations. If you already have the build setup, you can skip this section,
188 except for the last step that adds `-Wl,-q` linker flag to the final build.
190 ### Getting Clang-7 Sources
193 builds. E.g. `TOPLEV=~/clang-7/`. Follow with commands to clone the `release_70`
198 $ git clone --branch=release/7.x https://github.com/llvm/llvm-project.git
203 Stage 1 will be the first build we are going to do, and we will be using the
204 default system compiler to build Clang. If your system lacks a compiler, use
207 `cmake` and `ninja` packages. Note that we disable the build of certain
208 compiler-rt components that are known to cause build issues at release/7.x.
212 $ cmake -G Ninja ${TOPLEV}/llvm-project/llvm -DLLVM_TARGETS_TO_BUILD=X86 \
213 -DCMAKE_BUILD_TYPE=Release \
214 -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DCMAKE_ASM_COMPILER=gcc \
215 -DLLVM_ENABLE_PROJECTS="clang;lld" \
216 -DLLVM_ENABLE_RUNTIMES="compiler-rt" \
217 -DCOMPILER_RT_BUILD_SANITIZERS=OFF -DCOMPILER_RT_BUILD_XRAY=OFF \
218 -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
219 -DCMAKE_INSTALL_PREFIX=${TOPLEV}/stage1/install
220 $ ninja install
225 Using the freshly-baked stage 1 Clang compiler, we are going to build Clang with
228 $ mkdir ${TOPLEV}/stage2-prof-gen
229 $ cd ${TOPLEV}/stage2-prof-gen
231 $ cmake -G Ninja ${TOPLEV}/llvm-project/llvm -DLLVM_TARGETS_TO_BUILD=X86 \
232 -DCMAKE_BUILD_TYPE=Release \
233 -DCMAKE_C_COMPILER=$CPATH/clang -DCMAKE_CXX_COMPILER=$CPATH/clang++ \
234 -DLLVM_ENABLE_PROJECTS="clang;lld" \
235 -DLLVM_USE_LINKER=lld -DLLVM_BUILD_INSTRUMENTED=ON \
236 -DCMAKE_INSTALL_PREFIX=${TOPLEV}/stage2-prof-gen/install
237 $ ninja install
246 $ mkdir ${TOPLEV}/stage3-train
247 $ cd ${TOPLEV}/stage3-train
248 $ CPATH=${TOPLEV}/stage2-prof-gen/install/bin
249 $ cmake -G Ninja ${TOPLEV}/llvm-project/llvm -DLLVM_TARGETS_TO_BUILD=X86 \
250 -DCMAKE_BUILD_TYPE=Release \
251 -DCMAKE_C_COMPILER=$CPATH/clang -DCMAKE_CXX_COMPILER=$CPATH/clang++ \
252 -DLLVM_ENABLE_PROJECTS="clang" \
253 -DLLVM_USE_LINKER=lld -DCMAKE_INSTALL_PREFIX=${TOPLEV}/stage3-train/install
254 $ ninja clang
256 Once the build is completed, the profile files will be saved under
257 `${TOPLEV}/stage2-prof-gen/profiles`. We will merge them before they can be
260 $ cd ${TOPLEV}/stage2-prof-gen/profiles
261 $ ${TOPLEV}/stage1/install/bin/llvm-profdata merge -output=clang.profdata *
267 our scenario, i.e. building Clang. We will also enable link-time optimizations
268 to allow cross-module inlining and other optimizations. Finally, we are going to
274 $ mkdir ${TOPLEV}/stage2-prof-use-lto
275 $ cd ${TOPLEV}/stage2-prof-use-lto
277 $ export LDFLAGS="-Wl,-q"
278 $ cmake -G Ninja ${TOPLEV}/llvm-project/llvm -DLLVM_TARGETS_TO_BUILD=X86 \
279 -DCMAKE_BUILD_TYPE=Release \
280 -DCMAKE_C_COMPILER=$CPATH/clang -DCMAKE_CXX_COMPILER=$CPATH/clang++ \
281 -DLLVM_ENABLE_PROJECTS="clang;lld" \
282 -DLLVM_ENABLE_LTO=Full \
283 -DLLVM_PROFDATA_FILE=${TOPLEV}/stage2-prof-gen/profiles/clang.profdata \
284 -DLLVM_USE_LINKER=lld \
285 -DCMAKE_INSTALL_PREFIX=${TOPLEV}/stage2-prof-use-lto/install
286 $ ninja install
288 Now we have a Clang compiler that can build itself much faster. As we will see,