#
34c6c5e7 |
| 24-Jan-2025 |
Maksim Panchenko <maks@fb.com> |
[BOLT][AArch64] Fix PLT optimization (#124192)
Preserve C++ exception metadata while running PLT optimization on
AArch64.
|
#
3023b15f |
| 09-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Support POSSIBLE_PIC_FIXED_BRANCH
Detect and support fixed PIC indirect jumps of the following form: ``` movslq En(%rip), %r1 leaq PIC_JUMP_TABLE(%rip), %r2 addq %r2, %r1 jmpq *%r1 ```
w
[BOLT] Support POSSIBLE_PIC_FIXED_BRANCH
Detect and support fixed PIC indirect jumps of the following form: ``` movslq En(%rip), %r1 leaq PIC_JUMP_TABLE(%rip), %r2 addq %r2, %r1 jmpq *%r1 ```
with PIC_JUMP_TABLE that looks like following:
``` JT: ---------- E1:| L1 - JT | |----------| E2:| L2 - JT | |----------| | | ...... En:| Ln - JT | ---------- ```
The code could be produced by compilers, see https://github.com/llvm/llvm-project/issues/91648.
Test Plan: updated jump-table-fixed-ref-pic.test
Reviewers: maksfb, ayermolo, dcci, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/91667
show more ...
|
#
587308c3 |
| 15-Jul-2024 |
Paschalis Mpeis <paschalis.mpeis@arm.com> |
[BOLT][AArch64] Provide createDummyReturnFunction (#96626)
AArch64 needs this function when instrumenting statically-linked binaries.
Sample commands:
```bash
clang -Wl,-q test.c -static -o out
[BOLT][AArch64] Provide createDummyReturnFunction (#96626)
AArch64 needs this function when instrumenting statically-linked binaries.
Sample commands:
```bash
clang -Wl,-q test.c -static -o out
llvm-bolt -instrument -instrumentation-sleep-time=5 out -o out.instr
```
show more ...
|
#
344228eb |
| 02-Jul-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Drop macro-fusion alignment (#97358)
9d0754ada5dbbc0c009bcc2f7824488419cc5530 dropped MC support required for
optimal macro-fusion alignment in BOLT. Remove the support in BOLT as
performan
[BOLT] Drop macro-fusion alignment (#97358)
9d0754ada5dbbc0c009bcc2f7824488419cc5530 dropped MC support required for
optimal macro-fusion alignment in BOLT. Remove the support in BOLT as
performance measurements with large binaries didn't show a significant
improvement.
Test Plan:
macro-fusion alignment was never upstreamed, so no upstream tests are
affected.
show more ...
|
#
6c5b62b8 |
| 28-Jun-2024 |
Nathan Sidwell <nathan@acm.org> |
[BOLT][NFC] Separate isReversibleBranch's 2 semantics (#95572)
`isUnsupportedBranch` was renamed (and inverted) to `isReversibleBranch`, as that was how it was being used. But one use in `BinaryFu
[BOLT][NFC] Separate isReversibleBranch's 2 semantics (#95572)
`isUnsupportedBranch` was renamed (and inverted) to `isReversibleBranch`, as that was how it was being used. But one use in `BinaryFunction::disassemble` was using the original meaning to detect unsupported branches, and the `isUnsupportedBranch` had 2 separate semantic checks.
Move the unsupported branch check from `isReversibleBranch` to a new entry point: `isUnsupportedInstruction`. Call that from `BinaryFunction::disassemble`.
Move the dynamic branch check from X86's isReversibleBranch to the base class, as it is not an architecture-specific check.
Remove unnecessary `isReversibleBranch` calls from Instrumentation and X86 MCPlusBuilder.
show more ...
|
#
a13bc971 |
| 11-Jun-2024 |
Paschalis Mpeis <paschalis.mpeis@arm.com> |
[BOLT][AArch64] Implement PLTCall optimization (#93584)
`convertCallToIndirectCall` applies the PLTCall optimization and returns
an (updated if needed) iterator to the converted call instruction. S
[BOLT][AArch64] Implement PLTCall optimization (#93584)
`convertCallToIndirectCall` applies the PLTCall optimization and returns
an (updated if needed) iterator to the converted call instruction. Since
AArch64 requires to inject additional instructions to implement this
pass, the relevant BasicBlock and an iterator was passed to the
`convertCallToIndirectCall`.
`NumCallsOptimized` is updated only on successful application of the
pass.
Tests:
- Inputs/plt-tailcall.c: an example of a tail call optimized PLT call.
- AArch64/plt-call.test: it is the actual A64 test, that runs the
PLTCall optimization on the above input file and verifies the
application of the pass to the calls: 'printf' and 'puts'.
show more ...
|
#
3fefb3c5 |
| 07-Jun-2024 |
Nathan Sidwell <nathan@acm.org> |
[BOLT][NFC] Infailable fns return void (#92018)
Both `reverseBranchCondition` and `replaceBranchTarget` return a success boolean. But all-but-one caller ignores the return value, and the exception e
[BOLT][NFC] Infailable fns return void (#92018)
Both `reverseBranchCondition` and `replaceBranchTarget` return a success boolean. But all-but-one caller ignores the return value, and the exception emits a fatal error on failure.
Thus, just return nothing.
show more ...
|
#
be83f5c1 |
| 24-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Simplify analyzeIndirectBranch (#91662)
Simplify mutually exclusive sanity checks in analyzeIndirectBranch,
where an UNKNOWN IndirectBranchType is to be returned. Reduces confusion
and
[BOLT][NFC] Simplify analyzeIndirectBranch (#91662)
Simplify mutually exclusive sanity checks in analyzeIndirectBranch,
where an UNKNOWN IndirectBranchType is to be returned. Reduces confusion
and code duplication when adding a new IndirectBranchType (to be added
in #91667).
Test Plan: NFC
show more ...
|
#
46588039 |
| 24-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Add isRIPRel and isIndexed helpers (#91661)
Move out common X86MemOperand checks into helper lambdas. To be reused
in #91667.
Test Plan: NFC
|
#
76fdc2e5 |
| 17-May-2024 |
Nathan Sidwell <nathan@acm.org> |
[BOLT][NFC] Rename isUnsupportedBranch to isReversibleBranch (#92447)
`isUnsupportedBranch` is not a very informative name, and doesn't match
its corresponding `reverseBranchCondition`, as I noted
[BOLT][NFC] Rename isUnsupportedBranch to isReversibleBranch (#92447)
`isUnsupportedBranch` is not a very informative name, and doesn't match
its corresponding `reverseBranchCondition`, as I noted in PR #92018.
Here's a renaming to a more mnemonic name.
show more ...
|
#
7de82ca3 |
| 29-Mar-2024 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Don't terminate on trap instruction for Linux kernel (#87021)
Under normal circumstances, we terminate basic blocks on a trap
instruction. However, Linux kernel may resume execution after hi
[BOLT] Don't terminate on trap instruction for Linux kernel (#87021)
Under normal circumstances, we terminate basic blocks on a trap
instruction. However, Linux kernel may resume execution after hitting a
trap (ud2 on x86). Thus, we introduce "--terminal-trap" option that will
specify if the trap instruction should terminate the control flow. The
option is on by default except for the Linux kernel mode when it's off.
show more ...
|
#
6b1cf004 |
| 21-Mar-2024 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Add support for Linux kernel static keys jump table (#86090)
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to t
[BOLT] Add support for Linux kernel static keys jump table (#86090)
Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.
show more ...
|
#
49b8a99a |
| 14-Mar-2024 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Add createCondBranch() and createLongUncondBranch() (#85315)
Add MCPlusBuilder interface for creating two new branch types.
|
#
bba790db |
| 14-Mar-2024 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Refactor instruction creation interface. NFCI (#85292)
Refactor MCPlusBuilder's create{Instruction}() functions that used to
return bool. We almost never check the return value as we rely on
[BOLT] Refactor instruction creation interface. NFCI (#85292)
Refactor MCPlusBuilder's create{Instruction}() functions that used to
return bool. We almost never check the return value as we rely on
llvm_unreachable() to detect unimplemented functionality. There were a
couple of cases that checked the return value, but they would hit the
unreachable condition first (at least in debug builds) before the return
value gets checked.
show more ...
|
#
59ab86bb |
| 14-Mar-2024 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Clear operands when creating new instructions. NFCI (#85191)
Reset operand list whenever we create a new instruction via a parameter
passed by reference. Most functions were already doing th
[BOLT] Clear operands when creating new instructions. NFCI (#85191)
Reset operand list whenever we create a new instruction via a parameter
passed by reference. Most functions were already doing this, but there
are several places missing the reset. Potentially, if we don not clear
the list it could lead to invalid instruction operands. But the existing
code is unaffected.
show more ...
|
#
082fe9a5 |
| 02-Feb-2024 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Remove duplicate expression (#80380)
Reported by cpp check static analyzer in #80111.
Fixes #80111.
|
#
8fb83bf5 |
| 06-Oct-2023 |
Job Noorman <jnoorman@igalia.com> |
[BOLT][NFC] Add MCSubtargetInfo to MCPlusBuilder (#68223)
On RISC-V, it's helpful to have access to `MCSubtargetInfo` while
generating instructions in `MCPlusBuilder`. For example, a return
instru
[BOLT][NFC] Add MCSubtargetInfo to MCPlusBuilder (#68223)
On RISC-V, it's helpful to have access to `MCSubtargetInfo` while
generating instructions in `MCPlusBuilder`. For example, a return
instruction might be generated differently based on if the target
supports compressed instructions (`c.jr ra`) or not (`jalr ra`).
show more ...
|
#
853e126c |
| 18-Aug-2023 |
Rafael Auler <rafaelauler@fb.com> |
[BOLT] Support input binaries that use R_X86_GOTPC64
In large code model, the address of GOT is calculated by the static linker via R_X86_GOTPC64 reloc applied against a MOVABSQ instruction. In the
[BOLT] Support input binaries that use R_X86_GOTPC64
In large code model, the address of GOT is calculated by the static linker via R_X86_GOTPC64 reloc applied against a MOVABSQ instruction. In the final binary, it can be disassembled as a regular immediate, but because such immediate is the result of PC-relative pointer arithmetic, we need to parse this relocation and update this calculation whenever we move code, otherwise we break the code trying to read GOT.
A test case showing how GOT is accessed was provided.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D158911
show more ...
|
#
eafe4ee2 |
| 01-Sep-2023 |
Job Noorman <jnoorman@igalia.com> |
[BOLT] Rename isLoad/isStore to mayLoad/mayStore
As discussed in D159266, for some instructions it's impossible to know statically if they will load/store (e.g., predicated instructions). Therefore,
[BOLT] Rename isLoad/isStore to mayLoad/mayStore
As discussed in D159266, for some instructions it's impossible to know statically if they will load/store (e.g., predicated instructions). Therefore, mayLoad/mayStore are more appropriate names.
show more ...
|
#
6e4c2305 |
| 08-Aug-2023 |
Elvina Yakubova <elvina.yakubova@huawei.com> |
[BOLT][Instrumentation] Initial instrumentation support for AArch64
This commit adds code generation for AArch64 instrumentation, including direct and indirect calls support.
Reviewed By: rafauler,
[BOLT][Instrumentation] Initial instrumentation support for AArch64
This commit adds code generation for AArch64 instrumentation, including direct and indirect calls support.
Reviewed By: rafauler, yota9
Differential Revision: https://reviews.llvm.org/D151899
show more ...
|
#
28fd2ca1 |
| 27-Jul-2023 |
Denis Revunov <revunov.denis@huawei-partners.com> |
[BOLT] Fix trap value for non-X86
The trap value used by BOLT was assumed to be single-byte instruction. It made some functions unaligned on AArch64(e.g exceptions-instrumentation test) and caused e
[BOLT] Fix trap value for non-X86
The trap value used by BOLT was assumed to be single-byte instruction. It made some functions unaligned on AArch64(e.g exceptions-instrumentation test) and caused emission failures. Fix that by changing fill value to StringRef.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D158191
show more ...
|
#
9fee2ac0 |
| 22-Aug-2023 |
zhoujiapeng <zjpzhoujiapeng@163.com> |
[BOLT][NFC] Split createRelocation in X86 and share the second part
This commit splits the createRelocation function for the X86 architecture into two parts, retaining the first half and moving the
[BOLT][NFC] Split createRelocation in X86 and share the second part
This commit splits the createRelocation function for the X86 architecture into two parts, retaining the first half and moving the second half to a new function called extractFixupExpr. The purpose of this change is to make extractFixupExpr a shared function between AArch64 and X86 architectures, increasing code reusability and maintainability.
Child revision: https://reviews.llvm.org/D156018
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D157217
show more ...
|
#
5c4d306a |
| 06-May-2023 |
Maksim Panchenko <maks@fb.com> |
[BOLT][NFC] Change signature of MCPlusBuilder::isUnsupportedBranch()
Make MCPlusBuilder::isUnsupportedBranch() take MCInst, not opcode.
Reviewed By: Amir
Differential Revision: https://reviews.llv
[BOLT][NFC] Change signature of MCPlusBuilder::isUnsupportedBranch()
Make MCPlusBuilder::isUnsupportedBranch() take MCInst, not opcode.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D152765
show more ...
|
#
43f56a2f |
| 07-Jun-2023 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Fix handling of code references from unmodified code
In lite mode (default for X86), BOLT optimizes and relocates functions with profile. The rest of the code is preserved, but if it referenc
[BOLT] Fix handling of code references from unmodified code
In lite mode (default for X86), BOLT optimizes and relocates functions with profile. The rest of the code is preserved, but if it references relocated code such references have to be updated. The update is handled by scanExternalRefs() function. Note that we cannot solely rely on relocations written by the linker, as not all code references are exposed to the linker. Additionally, the linker can modify certain instructions and relocations will no longer match the code.
With this change, start using symbolic disassembler for scanning code for references in scanExternalRefs(). Unlike the previous approach, the symbolizer properly detects and creates references for instructions with multiple/ambiguous symbolic operands and handles cases where a relocation doesn't match any operand. See test cases for examples.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D152631
show more ...
|
#
3f1e9468 |
| 20-May-2023 |
Shengchen Kan <shengchen.kan@intel.com> |
[X86][MC][bolt] Share code between encoding optimization and assembler relaxation, NFCI
PUSH[16|32|64]i[8|32] are not arithmetic instructions, so I renamed the functions.
Reviewed By: Amir
Differe
[X86][MC][bolt] Share code between encoding optimization and assembler relaxation, NFCI
PUSH[16|32|64]i[8|32] are not arithmetic instructions, so I renamed the functions.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D151028
show more ...
|