Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5 |
|
#
68f7b075 |
| 23-Nov-2024 |
Rahman Lavaee <rahmanl@google.com> |
[BasicBlockSections] Allow mixing of -basic-block-sections with MFS. (#117076)
This PR allows mixing `-basic-block-sections` with
`-enable-machine-function-splitter`. The strategy is to let
`-basi
[BasicBlockSections] Allow mixing of -basic-block-sections with MFS. (#117076)
This PR allows mixing `-basic-block-sections` with
`-enable-machine-function-splitter`. The strategy is to let
`-basic-block-sections` take precedence over functions with profiles.
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
735ab61a |
| 13-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[CodeGen] Remove unused includes (NFC) (#115996)
Identified with misc-include-cleaner.
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
09989996 |
| 12-Jul-2024 |
paperchalice <liujunchang97@outlook.com> |
[CodeGen][NewPM] Port `machine-block-freq` to new pass manager (#98317)
- Add `MachineBlockFrequencyAnalysis`.
- Add `MachineBlockFrequencyPrinterPass`.
- Use `MachineBlockFrequencyInfoWrapperPass
[CodeGen][NewPM] Port `machine-block-freq` to new pass manager (#98317)
- Add `MachineBlockFrequencyAnalysis`.
- Add `MachineBlockFrequencyPrinterPass`.
- Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager.
- `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new
pass manager migration.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
#
163eaf3b |
| 22-Feb-2024 |
Daniel Hoekwater <hoekwater@google.com> |
[CodeGen] Clean up MachineFunctionSplitter MBB safety checking (NFC)
Move the "is MBB safe to split" check out of `isColdBlock` and update the comment since we're no longer using a temporary hack.
|
Revision tags: llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2 |
|
#
ef1c25eb |
| 04-Aug-2023 |
Daniel Hoekwater <hoekwater@google.com> |
[CodeGen][AArch64] Don't split jump table basic blocks
Jump tables on AArch64 are label-relative rather than table-relative, so having jump table destinations that are in different sections causes p
[CodeGen][AArch64] Don't split jump table basic blocks
Jump tables on AArch64 are label-relative rather than table-relative, so having jump table destinations that are in different sections causes problems with relocation. Jump table lookups have a max range of 1MB, so all destinations must be in the same section as the lookup code. Both of these restrictions can be mitigated with some careful and complex logic, but doing so doesn't gain a huge performance benefit.
Efficiently ensuring jump tables are correct and can be compressed on AArch64 is a TODO item. In the meantime, don't split blocks that can cause problems.
Differential Revision: https://reviews.llvm.org/D157124
show more ...
|
#
3dbabead |
| 25-Aug-2023 |
Snehasish Kumar <snehasishk@google.com> |
[CodeGen] Remove unused option in MachineFunctionSplitter.
The option was added in github.com/llvm/llvm-project/commit/90ab85a but it doesn't seem to be used. The triple check has been removed so th
[CodeGen] Remove unused option in MachineFunctionSplitter.
The option was added in github.com/llvm/llvm-project/commit/90ab85a but it doesn't seem to be used. The triple check has been removed so this shouldn't be required going forward.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D158885
show more ...
|
#
8c249c44 |
| 04-Aug-2023 |
Daniel Hoekwater <hoekwater@google.com> |
[CodeGen][AArch64] Don't split functions with a red zone on AArch64
Because unconditional branch relaxation on AArch64 grows the stack to spill a register, splitting a function would cause the red z
[CodeGen][AArch64] Don't split functions with a red zone on AArch64
Because unconditional branch relaxation on AArch64 grows the stack to spill a register, splitting a function would cause the red zone to be overwritten. Explicitly disable MFS for such functions.
Differential Revision: https://reviews.llvm.org/D157127
show more ...
|
#
90ab85a1 |
| 09-Aug-2023 |
Daniel Hoekwater <hoekwater@google.com> |
Reland "[CodeGen][AArch64] Make MFS testable on AArch64"
Reverted by 3d22dac6c3b97d7bb92f243886dfb0d32a5c42e9 because it depended on b9d079d6188b50730e0a67267b7fee36008435ce, which broke some tests.
|
#
77596e6b |
| 21-Aug-2023 |
Fangrui Song <i@maskray.me> |
Revert D157750 "[Driver][CodeGen] Properly handle -fsplit-machine-functions for fatbinary compilation."
This reverts commit 317a0fe5bd7113c0ac9d30b2de58ca409e5ff754. This reverts commit 30c4b97aec60
Revert D157750 "[Driver][CodeGen] Properly handle -fsplit-machine-functions for fatbinary compilation."
This reverts commit 317a0fe5bd7113c0ac9d30b2de58ca409e5ff754. This reverts commit 30c4b97aec60895a6905816670f493cdd1d7c546.
See post-commit discussions on https://reviews.llvm.org/D157750 that we should use a different mechanism to handle the error with --cuda-gpu-arch=
The IR/DiagnosticInfo.cpp, warn_drv_for_elf_only, codegne tests in clang/test/Driver, and the following driver behavior (downgrading error to warning) changes are undesired. ``` % clang --target=riscv64 -fsplit-machine-functions -c a.c warning: -fsplit-machine-functions is not valid for riscv64 [-Wbackend-plugin] ```
show more ...
|
#
317a0fe5 |
| 17-Aug-2023 |
Han Shen <shenhan@google.com> |
[Driver][CodeGen] Properly handle -fsplit-machine-functions for fatbinary compilation.
When building a fatbinary, the driver invokes the compiler multiple times with different "--target". (For examp
[Driver][CodeGen] Properly handle -fsplit-machine-functions for fatbinary compilation.
When building a fatbinary, the driver invokes the compiler multiple times with different "--target". (For example, with "-x cuda --cuda-gpu-arch=sm_70" flags, clang will be invoded twice, once with --target=x86_64_...., once with --target=sm_70) If we use -fsplit-machine-functions or -fno-split-machine-functions for such invocation, the driver reports an error.
This CL changes the behavior so:
- "-fsplit-machine-functions" is now passed to all targets, for non-X86 targets, the flag is a NOOP and causes a warning.
- "-fno-split-machine-functions" now negates -fsplit-machine-functions (if -fno-split-machine-functions appears after any -fsplit-machine-functions) for any target triple, previously, it causes an error.
- "-fsplit-machine-functions -Xarch_device -fno-split-machine-functions" enables MFS on host but disables MFS for GPUS without warnings/errors.
- "-Xarch_host -fsplit-machine-functions" enables MFS on host but disables MFS for GPUS without warnings/errors.
Reviewed by: xur, dhoekwater
Differential Revision: https://reviews.llvm.org/D157750
show more ...
|
Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
f7f744a5 |
| 18-Jul-2023 |
Han Shen <shenhan@google.com> |
[CodeGen] Separate MachineFunctionSplitter logic for different profile types.
In D152577 @xur has a post-submit comment regarding to an awkward usage of MFS for Autofdo - instead of just using -fspl
[CodeGen] Separate MachineFunctionSplitter logic for different profile types.
In D152577 @xur has a post-submit comment regarding to an awkward usage of MFS for Autofdo - instead of just using -fsplit-machine-function, the user needs to add "-mllvm -mfs-psi-cutoff=0" to choose the right logic for AutoFDO. The compiler should choose the right default values for such case.
This CL separate MFS logic for different profile types.
Reviewed By: xur, wenlei
Differential Revision: https://reviews.llvm.org/D155253
show more ...
|
#
8df75969 |
| 10-Jul-2023 |
Han Shen <shenhan@google.com> |
[CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.
The original MFS work D85368 shows good performance improvement with Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAF
[CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO.
The original MFS work D85368 shows good performance improvement with Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAFDO) does not show performance gain. This is mainly caused by a less accurate profile compared to the iFDO profile.
For the past few months, we have been working to improve FSAFDO quality, like in D145171. Taking advantage of this improvement, MFS now shows performance improvements over FSAFDO profiles.
That being said, 2 minor changes need to be made, 1) An FS-AutoFDO profile generation pass needs to be added right before MFS pass and an FSAFDO profile load pass is needed when FS-AutoFDO is enabled and the MFS flag is present. 2) MFS only applies to hot functions, because we believe (and experiment also shows) FS-AutoFDO is more accurate about functions that have plenty of samples than those with no or very few samples.
With this improvement, we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark.
This is #1 of the two patches that enables the improvement.
Reviewed By: wenlei, snehasish, xur
Differential Revision: https://reviews.llvm.org/D152399
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1 |
|
#
950487bd |
| 26-Jan-2023 |
Hongtao Yu <hoy@fb.com> |
[Pseudo Probe] Do not instrument EH blocks.
This change avoids inserting probes to EH blocks. Pseudo probe can prevent block merging when probes in the blocks look different. This has a chained effe
[Pseudo Probe] Do not instrument EH blocks.
This change avoids inserting probes to EH blocks. Pseudo probe can prevent block merging when probes in the blocks look different. This has a chained effect to passes incurring exponential IR growth (such as jump threading) and as a consequence the compilation may time out. Not inserting probes to EH blocks could mitigate the issue. Another benefit is that both IR size and binary size are smaller. Since EH blocks are usually cold, the change should have minimal impact to profile quality.
Testing:
Out of two internal large benchmarks, no perf impact seen. 1% size savings to both the `text` and the `pseudo_probe` section.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D142747
show more ...
|
Revision tags: llvmorg-17-init, llvmorg-15.0.7 |
|
#
51b68573 |
| 16-Dec-2022 |
Fangrui Song <i@maskray.me> |
[Transforms,CodeGen] std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable
[Transforms,CodeGen] std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
show more ...
|
#
c589730a |
| 05-Dec-2022 |
Krzysztof Parzyszek <kparzysz@quicinc.com> |
[YAML] Convert Optional to std::optional
|
#
89fae41e |
| 05-Dec-2022 |
Fangrui Song <i@maskray.me> |
[IR] llvm::Optional => std::optional
Many llvm/IR/* files have been migrated by other contributors. This migrates most remaining files.
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3 |
|
#
e170d955 |
| 17-Aug-2022 |
Archit Saxena <archsaxe@fb.com> |
Split EH code by default
The current machine function splitter is reliant on profile data to do profile summary analysis to split blocks into cold section. This may sometimes limit the usage of mach
Split EH code by default
The current machine function splitter is reliant on profile data to do profile summary analysis to split blocks into cold section. This may sometimes limit the usage of machine function splitter especially in cases where we could do some form of static analysis to split out cold blocks if profile data is absent or profile data which may be faulty (Consider Sample PGO).
Of all code that could statically be marked cold Exception handling blocks are one of them (In fact BFI framework also tends to mark them as cold), and the most in size contribution. In my experiments I found out Exception handling pads and all code reachable from there account for up to 6-8% of the .text section on modern production binaries. This patch introduces a flag to split out all Exception handling blocks and blocks only reachable from Exceptional Handling pad to cold section. This flag has shown to give a performance win of up to 0.1% in terms of average cycles and instructions executed on internal facebook search service.
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D131824
show more ...
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
3bb1ce23 |
| 22-Jul-2022 |
ARCHIT SAXENA <archsaxe@fb.com> |
Add a nop instruction if a section starts with landing pad for function splitter
This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.l
Add a nop instruction if a section starts with landing pad for function splitter
This change adds a nop instruction if section starts with landing pad. This change is like [D73739](https://reviews.llvm.org/D73739) which avoids zero offset landing pad in basic block sections.
Detailed description: The current machine functions splitter can create ˜sections which start with a landing pad themselves. This places landing pad at offset zero from LPStart. ``` .section .text.split.foo10,"ax",@progbits foo10.cold: # %lpad .cfi_startproc .cfi_personality 3, __gxx_personality_v0 .cfi_lsda 3, .Lexception5 .cfi_def_cfa %rsp, 16 .Ltmp11: <--- This is a Landing pad and also LP Start as it is start of this section movq %rax, %rdi <--- first instruction is at offest 0 from LPStart callq _Unwind_Resume@PLT
``` This will cause landing pad entries to become zero (.Ltmp11-foo10.cold) ``` .Lcst_begin4: .uleb128 .Ltmp9-.Lfunc_begin2 # >> Call Site 1 << .uleb128 .Ltmp10-.Ltmp9 # Call between .Ltmp9 and .Ltmp10 .uleb128 .Ltmp11-foo10.cold <---This is zero # jumps to .Ltmp11 .byte 3 # On action: 2 .uleb128 .Ltmp10-.Lfunc_begin2 # >> Call Site 2 << .uleb128 .Lfunc_end9-.Ltmp10 # Call between .Ltmp10 and .Lfunc_end9 .byte 0 # has no landing pad .byte 0 # On action: cleanup .p2align 2 ``` The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. This change adds a nop instruction at start of such sections so that such a case could be avoided. Output: ``` .section .text.split.foo10,"ax",@progbits foo10.cold: # %lpad .cfi_startproc .cfi_personality 3, __gxx_personality_v0 .cfi_lsda 3, .Lexception5 .cfi_def_cfa %rsp, 16 nop <--- new instruction that is added .Ltmp11: movq %rax, %rdi callq _Unwind_Resume@PLT ```
Reviewed By: modimo, snehasish, rahmanl
Differential Revision: https://reviews.llvm.org/D130133
show more ...
|
#
611ffcf4 |
| 14-Jul-2022 |
Kazu Hirata <kazu@google.com> |
[llvm] Use value instead of getValue (NFC)
|
#
a7938c74 |
| 26-Jun-2022 |
Kazu Hirata <kazu@google.com> |
[llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.
|
#
3b7c3a65 |
| 25-Jun-2022 |
Kazu Hirata <kazu@google.com> |
Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
|
#
aa8feeef |
| 25-Jun-2022 |
Kazu Hirata <kazu@google.com> |
Don't use Optional::hasValue (NFC)
|
Revision tags: llvmorg-14.0.6 |
|
#
e0e687a6 |
| 20-Jun-2022 |
Kazu Hirata <kazu@google.com> |
[llvm] Don't use Optional::hasValue (NFC)
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
989f1c72 |
| 15-Mar-2022 |
serge-sans-paille <sguelton@redhat.com> |
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
#
a278250b |
| 10-Mar-2022 |
Nico Weber <thakis@chromium.org> |
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
show more ...
|