Revision tags: llvmorg-17.0.3 |
|
#
2e26d091 |
| 06-Oct-2023 |
Matthias Braun <matze@braunis.de> |
BlockFrequencyInfo: Add PrintBlockFreq helper (#67512)
- Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions
into a `PrintBlockFreq()` function returning a `Printable` object. This
BlockFrequencyInfo: Add PrintBlockFreq helper (#67512)
- Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions
into a `PrintBlockFreq()` function returning a `Printable` object. This
simplifies usage as it can be directly piped to a `raw_ostream` like
`dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`.
- Previously there was an interesting behavior where
`BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number
and as an `uint64_t`. Most algorithms use the `BlockFrequency`
abstraction with the integers, the print function for basic blocks
printed the `Scaled64` number potentially showing higher accuracy than
was used by the algorithm. This changes things to only print
`BlockFrequency` values.
- Replace some instances of `dbgs() << Freq.getFrequency()` with the new
function.
show more ...
|
#
5181156b |
| 05-Oct-2023 |
Matthias Braun <matze@braunis.de> |
Use BlockFrequency type in more places (NFC) (#68266)
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
Use BlockFrequency type in more places (NFC) (#68266)
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.
- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
show more ...
|
Revision tags: llvmorg-17.0.2 |
|
#
6b8d04c2 |
| 21-Sep-2023 |
Fangrui Song <i@maskray.me> |
[CodeLayout] Refactor std::vector uses, namespace, and EdgeCountT. NFC
* Place types and functions in the llvm::codelayout namespace * Change EdgeCountT from pair<pair<uint64_t, uint64_t>, uint64_t>
[CodeLayout] Refactor std::vector uses, namespace, and EdgeCountT. NFC
* Place types and functions in the llvm::codelayout namespace * Change EdgeCountT from pair<pair<uint64_t, uint64_t>, uint64_t> to a struct and utilize structured bindings. It is not conventional to use the "T" suffix for structure types. * Remove a redundant copy in ChainT::merge. * Change {ExtTSPImpl,CDSortImpl}::run to use return value instead of an output parameter * Rename applyCDSLayout to computeCacheDirectedLayout: (a) avoid rare abbreviation "CDS" (cache-directed sort) (b) "compute" is more conventional for the specific use case * Change the parameter types from std::vector to ArrayRef so that SmallVector arguments can be used. * Similarly, rename applyExtTspLayout to computeExtTspLayout.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D159526
show more ...
|
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0 |
|
#
0a1aa6cd |
| 14-Sep-2023 |
Arthur Eubanks <aeubanks@google.com> |
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future chang
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
show more ...
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
1bcb6a3d |
| 21-Jun-2023 |
Guozhi Wei <carrot@google.com> |
[MBP] Enable duplicating return block to remove jump to return
Sometimes LLVM generates branch to return instruction, like PR63227.
It is because in function MachineBlockPlacement::canTailDuplicate
[MBP] Enable duplicating return block to remove jump to return
Sometimes LLVM generates branch to return instruction, like PR63227.
It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds we avoid duplicating a BB into another already placed BB to prevent destroying computed layout. But if the successor BB is a return block, duplicating it will only reduce taken branches without hurt to any other branches.
Differential Revision: https://reviews.llvm.org/D153093
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3 |
|
#
43b38696 |
| 20-Apr-2023 |
Akshay Khadse <akshayskhadse@gmail.com> |
Fix uninitialized class members
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D148692
|
Revision tags: llvmorg-16.0.2 |
|
#
8bf7f86d |
| 17-Apr-2023 |
Akshay Khadse <akshayskhadse@gmail.com> |
Fix uninitialized pointer members in CodeGen
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.
Reviewed By: LuoYuanke
Differentia
Fix uninitialized pointer members in CodeGen
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D148303
show more ...
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0 |
|
#
a585fa26 |
| 14-Mar-2023 |
Kazu Hirata <kazu@google.com> |
[CodeGen] Use *{Set,Map}::contains (NFC)
|
Revision tags: llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3 |
|
#
1e692113 |
| 14-Feb-2023 |
Fangrui Song <i@maskray.me> |
Move global namespace cl::opt inside llvm::
|
Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5 |
|
#
36e8e193 |
| 04-Nov-2022 |
Mingming Liu <mingmingl@google.com> |
[NFC][BlockPlacement]Add an option to renumber blocks based on function layout order.
Use case: - When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwh
[NFC][BlockPlacement]Add an option to renumber blocks based on function layout order.
Use case: - When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwhile blocks could be numbered in a different order. - As a result, it's hard to map between the graph and pass output. With this option on, the basic blocks are renumbered in function layout order.
This option is only useful when a function is to be visualized (i.e., when view options are on) to make it debugging only.
Use https://godbolt.org/z/5WTW36bMr as an example: - As MBP pass output (shown in godbolt output window), `func2` is in a basic block numbered `2` (`bb.2`), and `func1` is in a basic block numbered `3` (`bb.3`); `bb.3` is a block with higher block frequency than `bb.2`, and `bb.3` is placed before `bb.2` in the functin layout. - Use [1] to get the dot graph (graph uploaded in [2]), the blocks are re-numbered. - `func1` is in 'if.end' block, and labeled `1` in visualized dot; `func2` is in 'if.then' blocks, and labeled `3` --> the labeled number and bb number won't map. - [[ https://github.com/llvm/llvm-project/blob/b5626ae9751f0d82aa04791a21689b289721738e/llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp#L127 | DOTGraphTraits<MachineBlockFrequencyInfo *>::getNodeLabel ]] is where labeled numbers are based on function layout number, and [[ https://github.com/llvm/llvm-project/blob/a8d93783f37c042ace67069ae4ca6f8fd849c2d0/llvm/include/llvm/Support/GraphWriter.h#L209 | called by graph writer ]]. So call 'MachineFunction::RenumberBlocks' would make labeled number (in dot graph) and block number (in pass output) consistent with each other.
[1] `./bin/clang++ -O3 -S -mllvm -view-block-layout-with-bfi=count -mllvm -view-bfi-func-name=_Z9func_loopv -mllvm -print-after=block-placement -mllvm -filter-print-funcs=_Z9func_loopv test.c`
[2] {F25201785}
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D137467
show more ...
|
Revision tags: llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
8d5b694d |
| 15-Jul-2022 |
spupyrev <spupyrev@fb.com> |
extending code layout alg
The diff modifies ext-tsp code layout algorithm in the following ways: (i) fixes merging of cold block chains (this is a port of D129397); (ii) adjusts the cost model utili
extending code layout alg
The diff modifies ext-tsp code layout algorithm in the following ways: (i) fixes merging of cold block chains (this is a port of D129397); (ii) adjusts the cost model utilized for optimization; (iii) adjusts some APIs so that the implementation can be used in BOLT; this is a prerequisite for D129895.
The only non-trivial change is (ii). Here we introduce different weights for conditional and unconditional branches in the cost model. Based on the new model it is slightly more important to increase the number of "fall-through unconditional" jumps, which makes sense, as placing two blocks with an unconditional jump next to each other reduces the number of jump instructions in the generated code. Experimentally, this makes a mild impact on the performance; I've seen up to 0.2%-0.3% perf win on some benchmarks.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D129893
show more ...
|
#
9e6d1f4b |
| 17-Jul-2022 |
Kazu Hirata <kazu@google.com> |
[CodeGen] Qualify auto variables in for loops (NFC)
|
Revision tags: llvmorg-14.0.6 |
|
#
1e67385d |
| 17-Jun-2022 |
Mingming Liu <mingmingl@google.com> |
[MachineBlockPlacementStats] Added check for "-filter-print-funcs" option to the machine-block-placement-stats.
Differential Revision: https://reviews.llvm.org/D128019
|
#
b7d09557 |
| 17-Jun-2022 |
Mingming Liu <mingmingl@google.com> |
Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats."
This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to add differen
Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats."
This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to add differential revision link to commit message and re-commit.
show more ...
|
#
46d45df4 |
| 17-Jun-2022 |
Mingming Liu <mingmingl@google.com> |
[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats.
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
989f1c72 |
| 15-Mar-2022 |
serge-sans-paille <sguelton@redhat.com> |
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in
Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169
after: 1061034926 before: 1063332844
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3 |
|
#
a278250b |
| 10-Mar-2022 |
Nico Weber <thakis@chromium.org> |
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https:/
Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
show more ...
|
#
7f230fee |
| 07-Mar-2022 |
serge-sans-paille <sguelton@redhat.com> |
Cleanup codegen includes
after: 1061034926 before: 1063332844
Differential Revision: https://reviews.llvm.org/D121169
|
Revision tags: llvmorg-14.0.0-rc2 |
|
#
bcdc0477 |
| 01-Mar-2022 |
spupyrev <spupyrev@fb.com> |
speeding up ext-tsp for huge instances
Differential Revision: https://reviews.llvm.org/D120780
|
Revision tags: llvmorg-14.0.0-rc1 |
|
#
dee058c6 |
| 05-Feb-2022 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Turn on ext-tsp by default for CSSPGO.
I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, pro
[CSSPGO] Turn on ext-tsp by default for CSSPGO.
I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D119048
show more ...
|
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
73d92faa |
| 01-Dec-2021 |
Nicholas Guy <nicholas.guy@arm.com> |
[CodeGen] Emit alignment "Max Skip" operand
The current AsmPrinter has support to emit the "Max Skip" operand (the 3rd of .p2align), however has no support for it to actually be specified. Adding Ma
[CodeGen] Emit alignment "Max Skip" operand
The current AsmPrinter has support to emit the "Max Skip" operand (the 3rd of .p2align), however has no support for it to actually be specified. Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a per-block basis. Leaving the value as default (0) causes no observable differences in behaviour.
Differential Revision: https://reviews.llvm.org/D114590
show more ...
|
Revision tags: llvmorg-13.0.1-rc1 |
|
#
f573f686 |
| 08-Nov-2021 |
spupyrev <spupyrev@fb.com> |
ext-tsp basic block layout
A new basic block ordering improving existing MachineBlockPlacement.
The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality
ext-tsp basic block layout
A new basic block ordering improving existing MachineBlockPlacement.
The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem.
The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order.
An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast.
Differential Revision: https://reviews.llvm.org/D113424
show more ...
|
#
3678326d |
| 07-Dec-2021 |
Nico Weber <thakis@chromium.org> |
Revert "ext-tsp basic block layout"
This reverts commit c68f71eb37c2b6ffcf29e865d443a910e73083bd.
Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424
|
#
c68f71eb |
| 08-Nov-2021 |
spupyrev <spupyrev@fb.com> |
ext-tsp basic block layout
A new basic block ordering improving existing MachineBlockPlacement.
The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality
ext-tsp basic block layout
A new basic block ordering improving existing MachineBlockPlacement.
The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem.
The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order.
An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast.
Differential Revision: https://reviews.llvm.org/D113424
show more ...
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
#
c9fca53a |
| 10-Sep-2021 |
Kazu Hirata <kazu@google.com> |
[CodeGen, Target] Use pred_empty and succ_empty (NFC)
|