Revision tags: llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
#
d2696dec |
| 17-Sep-2020 |
Snehasish Kumar <snehasishk@google.com> |
[llvm] Add -bbsections-cold-text-prefix to emit cold clusters to a different section.
This change adds an option to basic block sections to allow cold clusters to be assigned a custom text prefix. W
[llvm] Add -bbsections-cold-text-prefix to emit cold clusters to a different section.
This change adds an option to basic block sections to allow cold clusters to be assigned a custom text prefix. With a custom prefix such as ".text.split." (D87840), lld can place them in a separate output section. The benefits are -
* Empirically shown to improve icache and itlb metrics by 3-5% (absolute) compared to placing split parts in .text.unlikely. * Mitigates against poor profiles, eg samplePGO profiles used with the machine function splitter. Optimizations such as hugepage remapping can make different decisions at the section granularity. * Enables section granularity hotness monitoring (checking on the decisions made during compilation vs sample data from production).
Differential Revision: https://reviews.llvm.org/D87813
show more ...
|
#
7841e21c |
| 14-Sep-2020 |
Rahman Lavaee <rahmanl@google.com> |
Let -basic-block-sections=labels emit basicblock metadata in a new .bb_addr_map section, instead of emitting special unary-encoded symbols.
This patch introduces the new .bb_addr_map section feature
Let -basic-block-sections=labels emit basicblock metadata in a new .bb_addr_map section, instead of emitting special unary-encoded symbols.
This patch introduces the new .bb_addr_map section feature which allows us to emit the bits needed for mapping binary profiles to basic blocks into a separate section. The format of the emitted data is represented as follows. It includes a header for every function:
| Address of the function | -> 8 bytes (pointer size) | Number of basic blocks in this function (>0) | -> ULEB128
The header is followed by a BB record for every basic block. These records are ordered in the same order as MachineBasicBlocks are placed in the function. Each BB Info is structured as follows:
| Offset of the basic block relative to function begin | -> ULEB128 | Binary size of the basic block | -> ULEB128 | BB metadata | -> ULEB128 [ MBB.isReturn() OR MBB.hasTailCall() << 1 OR MBB.isEHPad() << 2 ]
The new feature will replace the existing "BB labels" functionality with -basic-block-sections=labels. The .bb_addr_map section scrubs the specially-encoded BB symbols from the binary and makes it friendly to profilers and debuggers. Furthermore, the new feature reduces the binary size overhead from 70% bloat to only 12%.
For more information and results please refer to the RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html
Reviewed By: MaskRay, snehasish
Differential Revision: https://reviews.llvm.org/D85408
show more ...
|
Revision tags: llvmorg-11.0.0-rc2 |
|
#
94faadac |
| 05-Aug-2020 |
Snehasish Kumar <snehasishk@google.com> |
[llvm][CodeGen] Machine Function Splitter
We introduce a codegen optimization pass which splits functions into hot and cold parts. This pass leverages the basic block sections feature recently intro
[llvm][CodeGen] Machine Function Splitter
We introduce a codegen optimization pass which splits functions into hot and cold parts. This pass leverages the basic block sections feature recently introduced in LLVM from the Propeller project. The pass targets functions with profile coverage, identifies cold blocks and moves them to a separate section. The linker groups all cold blocks across functions together, decreasing fragmentation and improving icache and itlb utilization.
We evaluated the Machine Function Splitter pass on clang bootstrap and SPECInt 2017.
For clang bootstrap we observe a mean 2.33% runtime improvement with a ~32% reduction in itlb and stlb misses. Additionally, L1 icache misses reduced by 9.5% while L2 instruction misses reduced by 20%.
For SPECInt we report the change in IntRate the C/C++ benchmarks. All benchmarks apart from mcf and x264 improve, on average by 0.6% with the max for deepsjeng at 1.6%.
Benchmark % Change 500.perlbench_r 0.78 502.gcc_r 0.82 505.mcf_r -0.30 520.omnetpp_r 0.18 523.xalancbmk_r 0.37 525.x264_r -0.46 531.deepsjeng_r 1.61 541.leela_r 0.83 557.xz_r 0.15
Differential Revision: https://reviews.llvm.org/D85368
show more ...
|
#
8d943a92 |
| 06-Aug-2020 |
Snehasish Kumar <snehasishk@google.com> |
[NFC] Rename BBSectionsPrepare -> BasicBlockSections.
Rename the BBSectionsPrepare pass as suggested by the review comment in https://reviews.llvm.org/D85368.
Differential Revision: https://reviews
[NFC] Rename BBSectionsPrepare -> BasicBlockSections.
Rename the BBSectionsPrepare pass as suggested by the review comment in https://reviews.llvm.org/D85368.
Differential Revision: https://reviews.llvm.org/D85380
show more ...
|