History log of /llvm-project/llvm/lib/CodeGen/MachineOutliner.cpp (Results 26 – 50 of 190)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# fe35e142 04-Feb-2023 Jessica Paquette <jpaquette@apple.com>

[MachineOutliner] NFC: Pull variable out from erase_if

`Mapper.UnsignedVec.begin()` never changes throughout the call to
`erase_if`, so no need to recalculate it.

Also drop some redundant braces.


# 443c5b9f 04-Feb-2023 Jessica Paquette <jpaquette@apple.com>

[NFC] Remove redundant check for MBB being empty in outliner

If the size is < 2, then we just break anyway.


# 7bb9d70b 04-Feb-2023 Jessica Paquette <jpaquette@apple.com>

[NFC] Remove unneccessary `llvm::` in MachineOutliner/SuffixTree

We have `using llvm`, we don't need to say `llvm::`.


# ec37ebf5 04-Feb-2023 Jessica Paquette <jpaquette@apple.com>

[NFC] Use SmallVector/ArrayRef in MachineOutliner/SuffixTree for small types

The MachineOutliner + SuffixTree both used `std::vector` everywhere because I
didn't know any better at the time.

At lea

[NFC] Use SmallVector/ArrayRef in MachineOutliner/SuffixTree for small types

The MachineOutliner + SuffixTree both used `std::vector` everywhere because I
didn't know any better at the time.

At least for small types, such as `unsigned` and iterators, I can't see any
particular reason to use std::vector over `SmallVector` here.

show more ...


Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# 4de8521b 16-Feb-2022 Jessica Paquette <jpaquette@apple.com>

[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"

Recommit with bug fixes + added testcases to the outliner. Also adds some
debug output.

We found a case in the Swift benchmarks w

[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"

Recommit with bug fixes + added testcases to the outliner. Also adds some
debug output.

We found a case in the Swift benchmarks where the MachineOutliner introduces
about a 20% compile time overhead in comparison to building without the
MachineOutliner.

The origin of this slowdown is that the benchmark has long blocks which incur
lots of LRU checks for lots of candidates.

Imagine a case like this:

```
bb:
i1
i2
i3
...
i123456
```

Now imagine that all of the outlining candidates appear early in the block, and
that something like, say, NZCV is defined at the end of the block.

The outliner has to check liveness for certain registers across all candidates,
because outlining from areas where those registers are used is unsafe at call
boundaries.

This is fairly wasteful because in the previously-described case, the outlining
candidates will never appear in an area where those registers are live.

To avoid this, precalculate areas where we will consider outlining from.
Anything outside of these areas is mapped to illegal and not included in the
outlining search space. This allows us to reduce the size of the outliner's
suffix tree as well, giving us a potential memory win.

By precalculating areas, we can also optimize other checks too, like whether
or not LR is live across an outlining candidate.

Doing all of this is about a 16% compile time improvement on the case.

This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now,
this only implements the AArch64 path. The original "is the MBB safe" method
still works as before.

show more ...


# 7ef8f9c9 20-Dec-2022 Jessica Paquette <jpaquette@apple.com>

[IR/MachineOutliner] Add a "nooutline" function attr and respect it

Add `nooutline` + update LangRef to say it exists.

This makes it possible to say "don't outline from this function ever."

We wan

[IR/MachineOutliner] Add a "nooutline" function attr and respect it

Add `nooutline` + update LangRef to say it exists.

This makes it possible to say "don't outline from this function ever."

We want to be able to toggle whether or not a function should be in the search
set regardless of default behaviour.

Add testcases for the IR Outliner + Machine Outliner.

Also remove an unnecessary check for an empty function in the Machine Outliner.

Differential Revision: https://reviews.llvm.org/D140438

show more ...


# 998960ee 03-Dec-2022 Kazu Hirata <kazu@google.com>

[CodeGen] Use std::nullopt instead of None (NFC)

This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of

[CodeGen] Use std::nullopt instead of None (NFC)

This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

show more ...


# 0ff51d5d 10-Jun-2022 Eli Friedman <efriedma@quicinc.com>

Fix interaction of CFI instructions with MachineOutliner.

1. When checking if a candidate contains a CFI instruction, actually
iterate over all of the instructions, instead of stopping halfway
throu

Fix interaction of CFI instructions with MachineOutliner.

1. When checking if a candidate contains a CFI instruction, actually
iterate over all of the instructions, instead of stopping halfway
through.
2. Make sure copied CFI directives refer to the correct instruction.

Fixes https://github.com/llvm/llvm-project/issues/55842

Differential Revision: https://reviews.llvm.org/D126930

show more ...


# 3b9707db 05-Jun-2022 Kazu Hirata <kazu@google.com>

[llvm] Convert for_each to range-based for loops (NFC)


# 989f1c72 15-Mar-2022 serge-sans-paille <sguelton@redhat.com>

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-in

Cleanup codegen includes

This is a (fixed) recommit of https://reviews.llvm.org/D121169

after: 1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681

show more ...


# a278250b 10-Mar-2022 Nico Weber <thakis@chromium.org>

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https:/

Revert "Cleanup codegen includes"

This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169

show more ...


# 7f230fee 07-Mar-2022 serge-sans-paille <sguelton@redhat.com>

Cleanup codegen includes

after: 1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169


# 68c718c8 23-Feb-2022 Jessica Paquette <jpaquette@apple.com>

Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges""

This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781.

This commit was not NFC.

(See: https://reviews.llvm.org/r

Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges""

This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781.

This commit was not NFC.

(See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)

show more ...


# d97f997e 16-Feb-2022 Jessica Paquette <jpaquette@apple.com>

[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"

We found a case in the Swift benchmarks where the MachineOutliner introduces
about a 20% compile time overhead in comparison to bu

[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"

We found a case in the Swift benchmarks where the MachineOutliner introduces
about a 20% compile time overhead in comparison to building without the
MachineOutliner.

The origin of this slowdown is that the benchmark has long blocks which incur
lots of LRU checks for lots of candidates.

Imagine a case like this:

```
bb:
i1
i2
i3
...
i123456
```

Now imagine that all of the outlining candidates appear early in the block, and
that something like, say, NZCV is defined at the end of the block.

The outliner has to check liveness for certain registers across all candidates,
because outlining from areas where those registers are used is unsafe at call
boundaries.

This is fairly wasteful because in the previously-described case, the outlining
candidates will never appear in an area where those registers are live.

To avoid this, precalculate areas where we will consider outlining from.
Anything outside of these areas is mapped to illegal and not included in the
outlining search space. This allows us to reduce the size of the outliner's
suffix tree as well, giving us a potential memory win.

By precalculating areas, we can also optimize other checks too, like whether
or not LR is live across an outlining candidate.

Doing all of this is about a 16% compile time improvement on the case.

This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now,
this only implements the AArch64 path. The original "is the MBB safe" method
still works as before.

show more ...


# 12389e37 18-Feb-2022 Jessica Paquette <jpaquette@apple.com>

[MachineOutliner] Add statistics for unsigned vector size

Useful for debugging + evaluating improvements to the outliner.

Stats are the number of illegal, legal, and invisible instructions in the
u

[MachineOutliner] Add statistics for unsigned vector size

Useful for debugging + evaluating improvements to the outliner.

Stats are the number of illegal, legal, and invisible instructions in the
unsigned vector, and it's total length.

show more ...


# 6398903a 14-Feb-2022 Momchil Velikov <momchil.velikov@arm.com>

Extend the `uwtable` attribute with unwind table kind

We have the `clang -cc1` command-line option `-funwind-tables=1|2` and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tabl

Extend the `uwtable` attribute with unwind table kind

We have the `clang -cc1` command-line option `-funwind-tables=1|2` and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tables (1) or asynchronous unwind tables (2)`. However, this is
encoded in LLVM IR by the presence or the absence of the `uwtable`
attribute, i.e. we lose the information whether to generate want just
some unwind tables or asynchronous unwind tables.

Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.

That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.

This patch extends the `uwtable` attribute with an optional
value:
- `uwtable` (default to `async`)
- `uwtable(sync)`, synchronous unwind tables
- `uwtable(async)`, asynchronous (instruction precise) unwind tables

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D114543

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# f5f28d5b 01-Dec-2021 Ties Stuij <ties.stuij@arm.com>

[ARM] Implement BTI placement pass for PACBTI-M

This patch implements a new MachineFunction in the ARM backend for
placing BTI instructions. It is similar to the existing AArch64
aarch64-branch-targ

[ARM] Implement BTI placement pass for PACBTI-M

This patch implements a new MachineFunction in the ARM backend for
placing BTI instructions. It is similar to the existing AArch64
aarch64-branch-targets pass.

BTI instructions are inserted into basic blocks that:
- Have their address taken
- Are the entry block of a function, if the function has external
linkage or has its address taken
- Are mentioned in jump tables
- Are exception/cleanup landing pads

Each BTI instructions is placed in the beginning of a BB after the
so-called meta instructions (e.g. exception handler labels).

Each outlining candidate and the outlined function need to be in agreement about
whether BTI placement is enabled or not. If branch target enforcement is
disabled for a function, the outliner should not covertly enable it by emitting
a call to an outlined function, which begins with BTI.

The cost mode of the outliner is adjusted to account for the extra BTI
instructions in the outlined function.

The ARM Constant Islands pass will maintain the count of the jump tables, which
reference a block. A `BTI` instruction is removed from a block only if the
reference count reaches zero.

PAC instructions in entry blocks are replaced with PACBTI instructions (tests
for this case will be added in a later patch because the compiler currently does
not generate PAC instructions).

The ARM Constant Island pass is adjusted to handle BTI
instructions correctly.

Functions with static linkage that don't have their address taken can
still be called indirectly by linker-generated veneers and thus their
entry points need be marked with BTI or PACBTI.

The changes are tested using "LLVM IR -> assembly" tests, jump tables
also have a MIR test. Unfortunately it is not possible add MIR tests
for exception handling and computed gotos because of MIR parser
limitations.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Mikhail Maltsev
- Momchil Velikov
- Ties Stuij

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D112426

show more ...


Revision tags: llvmorg-13.0.1-rc1
# 1e9fa0b1 17-Nov-2021 DianQK <dianqk@icloud.com>

Fix the side effect of outlined function when the register is implicit use and implicit-def in the same instruction.

This is the diff associated with {D95267}, and we need to mark $x0 as live whethe

Fix the side effect of outlined function when the register is implicit use and implicit-def in the same instruction.

This is the diff associated with {D95267}, and we need to mark $x0 as live whether or not $x0 is dead.

The compiler also needs to mark register $x0 as live in for the following case.

```
$x1 = ADDXri $sp, 16, 0
BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def $x0
```

This change fixes an issue where the wrong registers were used when -machine-outliner-reruns>0.
As an example:

```
lang=c
typedef struct {
double v1;
double v2;
} D16;

typedef struct {
D16 v1;
D16 v2;
} D32;

typedef long long LL8;
typedef struct {
long long v1;
long long v2;
} LL16;
typedef struct {
LL16 v1;
LL16 v2;
} LL32;

typedef struct {
LL32 v1;
LL32 v2;
} LL64;

LL8 needx0(LL8 v0, LL8 v1);

void bar(LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8);

LL8 foo(LL8 v0, LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8)
{
LL8 result = needx0(v0, 0);
bar(v1, v2, v3, v4, v5, v6, v7, v8);
return result + 1;
}
```

As you can see from the `foo` function, we should not modify the value of `x0` until we call `needx0`.
This code is compiled to give the following instruction MIR code.

```
$sp = frame-setup SUBXri $sp, 256, 0
frame-setup STPDi killed $d13, killed $d12, $sp, 16
frame-setup STPDi killed $d11, killed $d10, $sp, 18
frame-setup STPDi killed $d9, killed $d8, $sp, 20

frame-setup STPXi killed $x26, killed $x25, $sp, 22
frame-setup STPXi killed $x24, killed $x23, $sp, 24
frame-setup STPXi killed $x22, killed $x21, $sp, 26
frame-setup STPXi killed $x20, killed $x19, $sp, 28
...
$x1 = MOVZXi 0, 0
BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0
...
```

Since there are some other instruction sequences that duplicate `foo`, after the first execution of Machine Outliner you will get:
```
$sp = frame-setup SUBXri $sp, 256, 0
frame-setup STPDi killed $d13, killed $d12, $sp, 16
frame-setup STPDi killed $d11, killed $d10, $sp, 18
frame-setup STPDi killed $d9, killed $d8, $sp, 20

$x7 = ORRXrs $xzr, $lr, 0
BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26
$lr = ORRXrs $xzr, $x7, 0
...
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
...
```

For the first time we outlined the following sequence:
```
frame-setup STPXi killed $x26, killed $x25, $sp, 22
frame-setup STPXi killed $x24, killed $x23, $sp, 24
frame-setup STPXi killed $x22, killed $x21, $sp, 26
frame-setup STPXi killed $x20, killed $x19, $sp, 28
```
and
```
$x1 = MOVZXi 0, 0
BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0
```

When we execute the outline again, we will get:
```
$x0 = ORRXrs $xzr, $lr, 0 <---- here
BL @OUTLINED_FUNCTION_2_0, implicit-def $lr, implicit $sp, implicit-def $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $d8, implicit $d9, implicit $d10, implicit $d11, implicit $d12, implicit $d13, implicit $x0
$lr = ORRXrs $xzr, $x0, 0

$x7 = ORRXrs $xzr, $lr, 0
BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26
$lr = ORRXrs $xzr, $x7, 0
...
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
```

When calling `OUTLINED_FUNCTION_2_0`, we used `x0` to save the `lr` register.
The reason for the above error appears to be that:
```
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp
```
should be:
```
BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp, implicit $x0
```

When processing the same instruction with both `implicit-def $x0` and `implicit $x0` we should keep `implicit $x0`.
A reproducible demo is available at: [https://github.com/DianQK/reproduce_outlined_function_use_live_x0](https://github.com/DianQK/reproduce_outlined_function_use_live_x0).

Reviewed By: jinlin

Differential Revision: https://reviews.llvm.org/D112911

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3
# 7c2192b2 03-Mar-2021 Jin Lin <jinl@uber.com>

Add the use of register r for outlined function when register r is live in and defined later.

The compiler needs to mark register $x0 as live in for the following case.

$x1 = ADDXri $sp, 16, 0

Add the use of register r for outlined function when register r is live in and defined later.

The compiler needs to mark register $x0 as live in for the following case.

$x1 = ADDXri $sp, 16, 0
BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def dead $x0

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D95267

show more ...


Revision tags: llvmorg-12.0.0-rc2
# 22f00f61 15-Feb-2021 Kazu Hirata <kazu@google.com>

[CodeGen] Use range-based for loops (NFC)


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 6a6e3821 09-Jan-2021 Kazu Hirata <kazu@google.com>

[llvm] Drop unnecessary make_range (NFC)


# cfeecdf7 07-Jan-2021 Kazu Hirata <kazu@google.com>

[llvm] Use llvm::all_of (NFC)


# 1e3ed091 29-Dec-2020 Kazu Hirata <kazu@google.com>

[CodeGen] Use llvm::append_range (NFC)


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1
# 5b30d9ad 05-Nov-2020 Momchil Velikov <momchil.velikov@arm.com>

[MachineOutliner] Do not outline debug instructions

The debug location is removed from any outlined instruction. This
causes the MachineVerifier to crash on outlined DBG_VALUE
instructions.

Then, d

[MachineOutliner] Do not outline debug instructions

The debug location is removed from any outlined instruction. This
causes the MachineVerifier to crash on outlined DBG_VALUE
instructions.

Then, debug instructions are "invisible" to the outliner, that is, two
ranges of instructions from different functions are considered
identical if the only difference is debug instructions. Since a debug
instruction from one function is unlikely to provide sensible debug
information about all functions, sharing an outlined sequence, this
patch just removes debug instructions from the outlined functions.

Differential Revision: https://reviews.llvm.org/D89485

show more ...


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# 3c83b967 08-Sep-2020 Simon Pilgrim <llvm-dev@redking.me.uk>

LiveRegUnits.h - reduce MachineRegisterInfo.h include. NFC.

We only need to include MachineInstrBundle.h, but exposes an implicit dependency in MachineOutliner.h.

Also, remove duplicate includes fr

LiveRegUnits.h - reduce MachineRegisterInfo.h include. NFC.

We only need to include MachineInstrBundle.h, but exposes an implicit dependency in MachineOutliner.h.

Also, remove duplicate includes from LiveRegUnits.cpp + MachineOutliner.cpp.

show more ...


12345678