#
7f230fee |
| 07-Mar-2022 |
serge-sans-paille <sguelton@redhat.com> |
Cleanup codegen includes
after: 1061034926 before: 1063332844
Differential Revision: https://reviews.llvm.org/D121169
|
Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
#
3da0aeea |
| 23-Apr-2021 |
Snehasish Kumar <snehasishk@google.com> |
[NFC] Use hasSection instead of getSection().empty()
Use the optimized check hasSection() instead of calling getSection().empty(). Originally suggested in D101004, but was dropped in the commit.
|
#
8077d0ff |
| 21-Apr-2021 |
Snehasish Kumar <snehasishk@google.com> |
[CodeGen] Do not split functions with attr "implicit-section-name".
The #pragma clang section can be used at a coarse granularity to specify the section used for bss/data/text/rodata for global obje
[CodeGen] Do not split functions with attr "implicit-section-name".
The #pragma clang section can be used at a coarse granularity to specify the section used for bss/data/text/rodata for global objects. When split functions is enabled, the function may be split into two parts violating user expectations.
Reference: https://clang.llvm.org/docs/LanguageExtensions.html#specifying-section-names-for-global-objects-pragma-clang-section
Differential Revision: https://reviews.llvm.org/D101004
show more ...
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
#
2c7077e6 |
| 09-Feb-2021 |
Snehasish Kumar <snehasishk@google.com> |
[CodeGen] Split out cold exception handling pads.
Support for splitting exception handling pads was added in D73739. This change updates the code to split out exception handling pads if profile info
[CodeGen] Split out cold exception handling pads.
Support for splitting exception handling pads was added in D73739. This change updates the code to split out exception handling pads if profile information indicates that they are cold. For a given function with multiple landind pads, if one of them is hot they are all retained as part of the hot code section.
Differential Revision: https://reviews.llvm.org/D96372
show more ...
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
#
7af80299 |
| 08-Dec-2020 |
Pan, Tao <tao.pan@intel.com> |
[CodeGen] Add text section prefix for COFF object file
Text section prefix is created in CodeGenPrepare, it's file format independent implementation, text section name is written into object file i
[CodeGen] Add text section prefix for COFF object file
Text section prefix is created in CodeGenPrepare, it's file format independent implementation, text section name is written into object file in TargetLoweringObjectFile, it's file format dependent implementation, port code of adding text section prefix to text section name from ELF to COFF. Different with ELF that use '.' as concatenation character, COFF use '$' as concatenation character. That is, concatenation character is variable, so split concatenation character from text section prefix. Text section prefix is existing feature of ELF, it can help to reduce icache and itlb misses, it's also make possible aggregate other compilers e.g. v8 created same prefix sections. Furthermore, the recent feature Machine Function Splitter (basic block level text prefix section) is based on text section prefix.
Reviewed By: pengfei, rnk
Differential Revision: https://reviews.llvm.org/D92073
show more ...
|
Revision tags: llvmorg-11.0.1-rc1 |
|
#
24bf6ff4 |
| 09-Oct-2020 |
Snehasish Kumar <snehasishk@google.com> |
[llvm] Update default cutoff threshold for machine function splitter.
Based on internal testing at Google we found that setting the profile summary cutoff threshold to 999950 yields the best results
[llvm] Update default cutoff threshold for machine function splitter.
Based on internal testing at Google we found that setting the profile summary cutoff threshold to 999950 yields the best results in terms of itlb and icache metrics (as observed on Intel CPUs).
*default* = Split out code if no profile count available for block *size-%* = The fraction of bytes split out of .text and .text.hot *itlb* = Misses per kilo instructions (MPKI) for itlb *icache* = Misses per kilo instructions (MPKI) for L1 icache
Search1
| cutoff | size-% | itlb | icache | |---------|---------|-----------|---------| | default | 42.5861 | 0.0822151 | 2.46363 | | 999999 | 44.9350 | 0.0767194 | 2.44416 | | 999950 | 50.0660 | 0.075744 | 2.4091 | | 999500 | 56.9158 | 0.082564 | 2.4188 | | 995000 | 63.8625 | 0.0814927 | 2.42832 | | 990000 | 71.7314 | 0.106906 | 2.57785 |
Search2
| cutoff | size-% | itlb | icache | |---------|--------|----------|---------| | default | 2.8845 | 0.626712 | 4.73245 | | 999999 | 3.3291 | 0.602309 | 4.70045 | | 999950 | 3.8577 | 0.587842 | 4.71632 | | 999500 | 4.4170 | 0.63577 | 4.68351 | | 995000 | 5.1020 | 0.657969 | 4.82272 | | 990000 | 5.7153 | 0.719122 | 5.39496 |
Differential Revision: https://reviews.llvm.org/D89085
show more ...
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2 |
|
#
94faadac |
| 05-Aug-2020 |
Snehasish Kumar <snehasishk@google.com> |
[llvm][CodeGen] Machine Function Splitter
We introduce a codegen optimization pass which splits functions into hot and cold parts. This pass leverages the basic block sections feature recently intro
[llvm][CodeGen] Machine Function Splitter
We introduce a codegen optimization pass which splits functions into hot and cold parts. This pass leverages the basic block sections feature recently introduced in LLVM from the Propeller project. The pass targets functions with profile coverage, identifies cold blocks and moves them to a separate section. The linker groups all cold blocks across functions together, decreasing fragmentation and improving icache and itlb utilization.
We evaluated the Machine Function Splitter pass on clang bootstrap and SPECInt 2017.
For clang bootstrap we observe a mean 2.33% runtime improvement with a ~32% reduction in itlb and stlb misses. Additionally, L1 icache misses reduced by 9.5% while L2 instruction misses reduced by 20%.
For SPECInt we report the change in IntRate the C/C++ benchmarks. All benchmarks apart from mcf and x264 improve, on average by 0.6% with the max for deepsjeng at 1.6%.
Benchmark % Change 500.perlbench_r 0.78 502.gcc_r 0.82 505.mcf_r -0.30 520.omnetpp_r 0.18 523.xalancbmk_r 0.37 525.x264_r -0.46 531.deepsjeng_r 1.61 541.leela_r 0.83 557.xz_r 0.15
Differential Revision: https://reviews.llvm.org/D85368
show more ...
|