History log of /llvm-project/llvm/lib/CodeGen/MachineBlockPlacement.cpp (Results 26 – 50 of 331)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-17.0.3
# 2e26d091 06-Oct-2023 Matthias Braun <matze@braunis.de>

BlockFrequencyInfo: Add PrintBlockFreq helper (#67512)

- Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions
into a `PrintBlockFreq()` function returning a `Printable` object. This

BlockFrequencyInfo: Add PrintBlockFreq helper (#67512)

- Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions
into a `PrintBlockFreq()` function returning a `Printable` object. This
simplifies usage as it can be directly piped to a `raw_ostream` like
`dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`.
- Previously there was an interesting behavior where
`BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number
and as an `uint64_t`. Most algorithms use the `BlockFrequency`
abstraction with the integers, the print function for basic blocks
printed the `Scaled64` number potentially showing higher accuracy than
was used by the algorithm. This changes things to only print
`BlockFrequency` values.
- Replace some instances of `dbgs() << Freq.getFrequency()` with the new
function.

show more ...


# 5181156b 05-Oct-2023 Matthias Braun <matze@braunis.de>

Use BlockFrequency type in more places (NFC) (#68266)

The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to

Use BlockFrequency type in more places (NFC) (#68266)

The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.

- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.

show more ...


Revision tags: llvmorg-17.0.2
# 6b8d04c2 21-Sep-2023 Fangrui Song <i@maskray.me>

[CodeLayout] Refactor std::vector uses, namespace, and EdgeCountT. NFC

* Place types and functions in the llvm::codelayout namespace
* Change EdgeCountT from pair<pair<uint64_t, uint64_t>, uint64_t>

[CodeLayout] Refactor std::vector uses, namespace, and EdgeCountT. NFC

* Place types and functions in the llvm::codelayout namespace
* Change EdgeCountT from pair<pair<uint64_t, uint64_t>, uint64_t> to a struct and utilize structured bindings.
It is not conventional to use the "T" suffix for structure types.
* Remove a redundant copy in ChainT::merge.
* Change {ExtTSPImpl,CDSortImpl}::run to use return value instead of an output parameter
* Rename applyCDSLayout to computeCacheDirectedLayout: (a) avoid rare
abbreviation "CDS" (cache-directed sort) (b) "compute" is more conventional
for the specific use case
* Change the parameter types from std::vector to ArrayRef so that
SmallVector arguments can be used.
* Similarly, rename applyExtTspLayout to computeExtTspLayout.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D159526

show more ...


Revision tags: llvmorg-17.0.1, llvmorg-17.0.0
# 0a1aa6cd 14-Sep-2023 Arthur Eubanks <aeubanks@google.com>

[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)

This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future chang

[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)

This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::

show more ...


Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 1bcb6a3d 21-Jun-2023 Guozhi Wei <carrot@google.com>

[MBP] Enable duplicating return block to remove jump to return

Sometimes LLVM generates branch to return instruction, like PR63227.

It is because in function MachineBlockPlacement::canTailDuplicate

[MBP] Enable duplicating return block to remove jump to return

Sometimes LLVM generates branch to return instruction, like PR63227.

It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds
we avoid duplicating a BB into another already placed BB to prevent destroying
computed layout. But if the successor BB is a return block, duplicating it will
only reduce taken branches without hurt to any other branches.

Differential Revision: https://reviews.llvm.org/D153093

show more ...


Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3
# 43b38696 20-Apr-2023 Akshay Khadse <akshayskhadse@gmail.com>

Fix uninitialized class members

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D148692


Revision tags: llvmorg-16.0.2
# 8bf7f86d 17-Apr-2023 Akshay Khadse <akshayskhadse@gmail.com>

Fix uninitialized pointer members in CodeGen

This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.

Reviewed By: LuoYuanke

Differentia

Fix uninitialized pointer members in CodeGen

This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D148303

show more ...


Revision tags: llvmorg-16.0.1, llvmorg-16.0.0
# a585fa26 14-Mar-2023 Kazu Hirata <kazu@google.com>

[CodeGen] Use *{Set,Map}::contains (NFC)


Revision tags: llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3
# 1e692113 14-Feb-2023 Fangrui Song <i@maskray.me>

Move global namespace cl::opt inside llvm::


Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5
# 36e8e193 04-Nov-2022 Mingming Liu <mingmingl@google.com>

[NFC][BlockPlacement]Add an option to renumber blocks based on function layout order.

Use case:
- When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwh

[NFC][BlockPlacement]Add an option to renumber blocks based on function layout order.

Use case:
- When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwhile blocks could be numbered in a different order.
- As a result, it's hard to map between the graph and pass output. With this option on, the basic blocks are renumbered in function layout order.

This option is only useful when a function is to be visualized (i.e., when view options are on) to make it debugging only.

Use https://godbolt.org/z/5WTW36bMr as an example:
- As MBP pass output (shown in godbolt output window), `func2` is in a basic block numbered `2` (`bb.2`), and `func1` is in a basic block numbered `3` (`bb.3`);
`bb.3` is a block with higher block frequency than `bb.2`, and `bb.3` is placed before `bb.2` in the functin layout.
- Use [1] to get the dot graph (graph uploaded in [2]), the blocks are re-numbered.
- `func1` is in 'if.end' block, and labeled `1` in visualized dot; `func2` is in 'if.then' blocks, and labeled `3` --> the labeled number and bb number won't map.
- [[ https://github.com/llvm/llvm-project/blob/b5626ae9751f0d82aa04791a21689b289721738e/llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp#L127 | DOTGraphTraits<MachineBlockFrequencyInfo *>::getNodeLabel ]] is where labeled numbers are based on function layout number, and [[ https://github.com/llvm/llvm-project/blob/a8d93783f37c042ace67069ae4ca6f8fd849c2d0/llvm/include/llvm/Support/GraphWriter.h#L209
| called by graph writer ]].
So call 'MachineFunction::RenumberBlocks' would make labeled number (in dot graph) and block number (in pass output) consistent with each other.

[1] `./bin/clang++ -O3 -S -mllvm -view-block-layout-with-bfi=count -mllvm -view-bfi-func-name=_Z9func_loopv -mllvm -print-after=block-placement -mllvm -filter-print-funcs=_Z9func_loopv test.c`

[2] {F25201785}

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D137467

show more ...


Revision tags: llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 8d5b694d 15-Jul-2022 spupyrev <spupyrev@fb.com>

extending code layout alg

The diff modifies ext-tsp code layout algorithm in the following ways:
(i) fixes merging of cold block chains (this is a port of D129397);
(ii) adjusts the cost model utili

extending code layout alg

The diff modifies ext-tsp code layout algorithm in the following ways:
(i) fixes merging of cold block chains (this is a port of D129397);
(ii) adjusts the cost model utilized for optimization;
(iii) adjusts some APIs so that the implementation can be used in BOLT; this is
a prerequisite for D129895.

The only non-trivial change is (ii). Here we introduce different weights for
conditional and unconditional branches in the cost model. Based on the new model
it is slightly more important to increase the number of "fall-through
unconditional" jumps, which makes sense, as placing two blocks with an
unconditional jump next to each other reduces the number of jump instructions in
the generated code. Experimentally, this makes a mild impact on the performance;
I've seen up to 0.2%-0.3% perf win on some benchmarks.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D129893

show more ...


# 9e6d1f4b 17-Jul-2022 Kazu Hirata <kazu@google.com>

[CodeGen] Qualify auto variables in for loops (NFC)


Revision tags: llvmorg-14.0.6
# 1e67385d 17-Jun-2022 Mingming Liu <mingmingl@google.com>

[MachineBlockPlacementStats] Added check for "-filter-print-funcs"
option to the machine-block-placement-stats.

Differential Revision: https://reviews.llvm.org/D128019


# b7d09557 17-Jun-2022 Mingming Liu <mingmingl@google.com>

Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats."

This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to
add differen

Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats."

This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to
add differential revision link to commit message and re-commit.

show more ...


# 46d45df4 17-Jun-2022 Mingming Liu <mingmingl@google.com>

[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats.


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# 989f1c72 15-Mar-2022 serge-sans-paille <sguelton@redhat.com>

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# a278250b 10-Mar-2022 Nico Weber <thakis@chromium.org>

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https:/

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169

show more ...


# 7f230fee 07-Mar-2022 serge-sans-paille <sguelton@redhat.com>

Cleanup codegen includes

after: 1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169


Revision tags: llvmorg-14.0.0-rc2
# bcdc0477 01-Mar-2022 spupyrev <spupyrev@fb.com>

speeding up ext-tsp for huge instances

Differential Revision: https://reviews.llvm.org/D120780


Revision tags: llvmorg-14.0.0-rc1
# dee058c6 05-Feb-2022 Hongtao Yu <hoy@fb.com>

[CSSPGO] Turn on ext-tsp by default for CSSPGO.

I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, pro

[CSSPGO] Turn on ext-tsp by default for CSSPGO.

I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D119048

show more ...


Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 73d92faa 01-Dec-2021 Nicholas Guy <nicholas.guy@arm.com>

[CodeGen] Emit alignment "Max Skip" operand

The current AsmPrinter has support to emit the "Max Skip" operand
(the 3rd of .p2align), however has no support for it to actually be specified.
Adding Ma

[CodeGen] Emit alignment "Max Skip" operand

The current AsmPrinter has support to emit the "Max Skip" operand
(the 3rd of .p2align), however has no support for it to actually be specified.
Adding MaxBytesForAlignment to MachineBasicBlock provides this capability on a
per-block basis. Leaving the value as default (0) causes no observable differences
in behaviour.

Differential Revision: https://reviews.llvm.org/D114590

show more ...


Revision tags: llvmorg-13.0.1-rc1
# f573f686 08-Nov-2021 spupyrev <spupyrev@fb.com>

ext-tsp basic block layout

A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality

ext-tsp basic block layout

A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality and thus processor I-cache utilization. This is
achieved via increasing the number of fall-through jumps and co-locating
frequently executed nodes together. The name follows the underlying
optimization problem, Extended-TSP, which is a generalization of classical
(maximum) Traveling Salesmen Problem.

The algorithm is a greedy heuristic that works with chains (ordered lists)
of basic blocks. Initially all chains are isolated basic blocks. On every
iteration, we pick a pair of chains whose merging yields the biggest increase
in the ExtTSP value, which models how i-cache "friendly" a specific chain is.
A pair of chains giving the maximum gain is merged into a new chain. The
procedure stops when there is only one chain left, or when merging does not
increase ExtTSP. In the latter case, the remaining chains are sorted by
density in decreasing order.

An important aspect is the way two chains are merged. Unlike earlier
algorithms (e.g., based on the approach of Pettis-Hansen), two
chains, X and Y, are first split into three, X1, X2, and Y. Then we
consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y,
X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score.
This improves the quality of the final result (the search space is larger)
while keeping the implementation sufficiently fast.

Differential Revision: https://reviews.llvm.org/D113424

show more ...


# 3678326d 07-Dec-2021 Nico Weber <thakis@chromium.org>

Revert "ext-tsp basic block layout"

This reverts commit c68f71eb37c2b6ffcf29e865d443a910e73083bd.

Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424


# c68f71eb 08-Nov-2021 spupyrev <spupyrev@fb.com>

ext-tsp basic block layout

A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality

ext-tsp basic block layout

A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality and thus processor I-cache utilization. This is
achieved via increasing the number of fall-through jumps and co-locating
frequently executed nodes together. The name follows the underlying
optimization problem, Extended-TSP, which is a generalization of classical
(maximum) Traveling Salesmen Problem.

The algorithm is a greedy heuristic that works with chains (ordered lists)
of basic blocks. Initially all chains are isolated basic blocks. On every
iteration, we pick a pair of chains whose merging yields the biggest increase
in the ExtTSP value, which models how i-cache "friendly" a specific chain is.
A pair of chains giving the maximum gain is merged into a new chain. The
procedure stops when there is only one chain left, or when merging does not
increase ExtTSP. In the latter case, the remaining chains are sorted by
density in decreasing order.

An important aspect is the way two chains are merged. Unlike earlier
algorithms (e.g., based on the approach of Pettis-Hansen), two
chains, X and Y, are first split into three, X1, X2, and Y. Then we
consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y,
X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score.
This improves the quality of the final result (the search space is larger)
while keeping the implementation sufficiently fast.

Differential Revision: https://reviews.llvm.org/D113424

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# c9fca53a 10-Sep-2021 Kazu Hirata <kazu@google.com>

[CodeGen, Target] Use pred_empty and succ_empty (NFC)


12345678910>>...14