#
28fd2ca1 |
| 27-Jul-2023 |
Denis Revunov <revunov.denis@huawei-partners.com> |
[BOLT] Fix trap value for non-X86
The trap value used by BOLT was assumed to be single-byte instruction. It made some functions unaligned on AArch64(e.g exceptions-instrumentation test) and caused e
[BOLT] Fix trap value for non-X86
The trap value used by BOLT was assumed to be single-byte instruction. It made some functions unaligned on AArch64(e.g exceptions-instrumentation test) and caused emission failures. Fix that by changing fill value to StringRef.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D158191
show more ...
|
#
a86dd9ae |
| 29-Jun-2023 |
Denis Revunov <revunov.denis@huawei-partners.com> |
[BOLT][Instrumentation] Fix indirect call profile in PIE
Because indirect call tables use static addresses for call sites, but pc values recorded by runtime may be subject to ASLR in PIE, we couldn'
[BOLT][Instrumentation] Fix indirect call profile in PIE
Because indirect call tables use static addresses for call sites, but pc values recorded by runtime may be subject to ASLR in PIE, we couldn't find indirect call descriptions by their runtime address in PIE. It resulted in [unknown] entries in profile for all indirect calls. We need to substract base address of .text from runtime addresses to get the corresponding static addresses. Here we create a getter for base address of .text and substract it's return value from recorded PC values. It converts them to static addresses, which then may be used to find the corresponding indirect call descriptions.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D154121
show more ...
|
#
23c8d382 |
| 21-Aug-2023 |
Job Noorman <jnoorman@igalia.com> |
[BOLT] Calculate input to output address map using BOLTLinker
BOLT uses MCAsmLayout to calculate the output values of basic blocks. This means output values are calculated based on a pre-linking sta
[BOLT] Calculate input to output address map using BOLTLinker
BOLT uses MCAsmLayout to calculate the output values of basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used.
This issue was first addressed in D154604 by adding all basic block symbols to the symbol table for the linker to resolve them. However, the runtime overhead of handling this huge symbol table turned out to be prohibitively large.
This patch solves the issue in a different way. First, a temporary section containing [input address, output symbol] pairs is emitted to the intermediary object file. The linker will resolve all these references so we end up with a section of [input address, output address] pairs. This section is then parsed and used to: - Replace BinaryBasicBlock::OffsetTranslationTable - Replace BinaryFunction::InputOffsetToAddressMap - Update BinaryBasicBlock::OutputAddressRange
Note that the reason this is more performant than the previous attempt is that these symbol references do not cause entries to be added to the symbol table. Instead, section-relative references are used for the relocations.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155604
show more ...
|
#
fc395884 |
| 29-Jul-2023 |
Job Noorman <jnoorman@igalia.com> |
[BOLT][RISCV] Recognize mapping symbols
The RISC-V psABI [1] defines them similarly to AArch64.
[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#mapping-symbol
R
[BOLT][RISCV] Recognize mapping symbols
The RISC-V psABI [1] defines them similarly to AArch64.
[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#mapping-symbol
Reviewed By: yota9, Amir
Differential Revision: https://reviews.llvm.org/D153277
show more ...
|
#
dd630d83 |
| 13-Jul-2023 |
Maksim Panchenko <maks@fb.com> |
[BOLT][NFC] Add post-CFG processing to MetadataRewriter interface
Add MetadataRewriter::postCFGInitializer().
Reviewed By: jobnoorman
Differential Revision: https://reviews.llvm.org/D155153
|
#
7b72920a |
| 10-Jul-2023 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Fix warning message
Add missing EOL in a warning message.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D154895
|
#
f2f1e670 |
| 11-Jul-2023 |
Job Noorman <jnoorman@igalia.com> |
[BOLT] Make sure temp object file is always written
BOLT used `ToolOutputFile::keep` to make sure the intermediary object file was written to disk for debugging purposes when `--keep-tmp` was passed
[BOLT] Make sure temp object file is always written
BOLT used `ToolOutputFile::keep` to make sure the intermediary object file was written to disk for debugging purposes when `--keep-tmp` was passed. However, since and intermediary `buffer_ostream` was used to stream to, and this class only writes to its output stream in its destructor, the object file was lost whenever its destructor wouldn't run. This could happen, for example, if there is a crash while linking.
This patch makes sure the object file is written to disk immediately after we're done creating it. This is very useful while debugging JITLink crashes. This patch also gets rid of creating a temporary file when `--keep-tmp` is not passed by streaming the object file directly to a `SmallString`.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D154826
show more ...
|
#
87fb0ea2 |
| 12-Sep-2022 |
Rui Zhong <reversezr33@gmail.com> |
[BOLT][DWARF] Implement new mechanism for DWARFRewriter
This revision implement new mechanism for DWARFRewriter. In the new mechanism, we adopt the same way with DWARFLinker did. By parsing Debug in
[BOLT][DWARF] Implement new mechanism for DWARFRewriter
This revision implement new mechanism for DWARFRewriter. In the new mechanism, we adopt the same way with DWARFLinker did. By parsing Debug information into IR, we are allowed to handle debug information more flexible. Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished.
A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level. This class is also used to write out the .debug_info and .debug_abbrev sections. Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges. When conversion does happen we can also remove low_pc entry.
Reviewed By: maksfb, ayermolo
Differential Revision: https://reviews.llvm.org/D130315
show more ...
|
#
de7781ea |
| 07-Jul-2023 |
Nico Weber <thakis@chromium.org> |
Revert "[DWARF][BOLT] Implement new mechanism for DWARFRewriter"
This reverts commit 460a2244430fae192298a5fd9fa2a269e540e8c1. It breaks building on macOS, and it was landed with a review URL pointi
Revert "[DWARF][BOLT] Implement new mechanism for DWARFRewriter"
This reverts commit 460a2244430fae192298a5fd9fa2a269e540e8c1. It breaks building on macOS, and it was landed with a review URL pointing to some Facebook-internal service.
Also reverts a bunch of follow-ups:
Revert "[BOLT][DWARF] Don't check string offsets" This reverts commit f9d6f48c8bf5acaac07502403c41cf0b0d89c8d2.
Revert "[BOLT][DWARF] Change to process and write out TUs first then CUs in batches" This reverts commit 88e95c1e4bb6e2ad3bfd185b96341ad5c09eff6b.
Revert "[BOLT][DWARF] Output DWO files as they are being processed" This reverts commit 46ca2e3fcd419b1246357ed3b9cd36630f16e64d.
Revert "[BOLT][DWARF] Don't check string offsets" This reverts commit cfe4a4b04f219a9dbb4e3fc01883437b6ff0e702.
Revert "[BOLT][DWARF] Numerous fixes for a new DWARFRewriter" This reverts commit 2701a661daa393ad5901ac88d420d7aa931eda0d.
show more ...
|
#
460a2244 |
| 12-Sep-2022 |
Alexander Yermolovich <ayermolo@meta.com> |
[DWARF][BOLT] Implement new mechanism for DWARFRewriter
Summary: This revision implement new mechanism for DWARFRewriter. In the new mechanism, we adopt the same way with DWARFLinker did. By parsing
[DWARF][BOLT] Implement new mechanism for DWARFRewriter
Summary: This revision implement new mechanism for DWARFRewriter. In the new mechanism, we adopt the same way with DWARFLinker did. By parsing Debug information into IR, we are allowed to handle debug information more flexible. Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished.
A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level. This class is also used to write out the .debug_info and .debug_abbrev sections. Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges. When conversion does happen we can also remove low_pc entry.
Differential Revision: https://phabricator.intern.facebook.com/D39484421
Tasks: T117448832
show more ...
|
#
38639a81 |
| 28-Jun-2023 |
Maksim Panchenko <maks@fb.com> |
[BOLT][NFCI] Migrate Linux Kernel handling code to MetadataRewriter
Create LinuxKernelRewriter and move kernel-specific code to this class.
Depends on D154023
Reviewed By: Amir
Differential Revis
[BOLT][NFCI] Migrate Linux Kernel handling code to MetadataRewriter
Create LinuxKernelRewriter and move kernel-specific code to this class.
Depends on D154023
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154024
show more ...
|
#
43dce27c |
| 28-Jun-2023 |
Maksim Panchenko <maks@fb.com> |
[BOLT][NFCI] Migrate pseudo probes to MetadataRewriter interface
Use new MetdataRewriter interface to update pseudo probes and move ProbeDecoder out of BinaryContext into new PseudoProbeRewriter cla
[BOLT][NFCI] Migrate pseudo probes to MetadataRewriter interface
Use new MetdataRewriter interface to update pseudo probes and move ProbeDecoder out of BinaryContext into new PseudoProbeRewriter class.
Depends on D154021
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154022 Differential Revision: https://reviews.llvm.org/D154023
show more ...
|
#
98e2d630 |
| 28-Jun-2023 |
Maksim Panchenko <maks@fb.com> |
[BOLT][NFCI] Use MetadataRewriter interface to update SDT markers
Migrate SDT markers processing to the new MetadataRewriter interface.
Depends on D154020
Reviewed By: Amir
Differential Revision:
[BOLT][NFCI] Use MetadataRewriter interface to update SDT markers
Migrate SDT markers processing to the new MetadataRewriter interface.
Depends on D154020
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154021
show more ...
|
#
c9b1f062 |
| 28-Jun-2023 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Introduce MetadataRewriter interface
Introduce the MetadataRewriter interface to handle updates for various types of auxiliary data stored in a binary file.
To implement metadata processing
[BOLT] Introduce MetadataRewriter interface
Introduce the MetadataRewriter interface to handle updates for various types of auxiliary data stored in a binary file.
To implement metadata processing using this new interface, all metadata rewriters should derive from the RewriterBase class and implement one or more of the following methods, depending on the timing of metadata read and write operations:
* preCFGInitializer() * postCFGInitializer() // TBD * preEmitFinalizer() // TBD * postEmitFinalizer()
By adopting this approach, we aim to simplify the RewriteInstance class and improve its scalability to accommodate new extensions of file formats, including various metadata types of the Linux Kernel.
Differential Revision: https://reviews.llvm.org/D154020
show more ...
|
#
fd49cc87 |
| 29-Jun-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Print functions after attaching profile (-print-profile)
Add an extra point of dumping functions: immediately after attaching the profile information. This dumping is enabled by newly in
[BOLT][NFC] Print functions after attaching profile (-print-profile)
Add an extra point of dumping functions: immediately after attaching the profile information. This dumping is enabled by newly introduced `-print-profile` and `-print-all`.
The reason is that in `aggregate-only`/perf2bolt mode BOLT may not reach the point of printing the function after CFG is constructed (`-print-cfg`), while we may still want to inspect the attached profile, especially for diff'ing purposes.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D153996
show more ...
|
#
a89c9b35 |
| 22-Jun-2023 |
Shatian Wang <shatian@meta.com> |
[BOLT] Fixing relative ordering of cold sections under multi-way function splitting
Order code sections with names in the form of ".text.cold.i" based on the value of i
[Context] SplitFunctions.cpp
[BOLT] Fixing relative ordering of cold sections under multi-way function splitting
Order code sections with names in the form of ".text.cold.i" based on the value of i
[Context] SplitFunctions.cpp implements splitting strategies that can potentially split each function into maximum N>2 fragments. When such N-way splitting happens, new code sections with names ".text.cold.1", ..., ".text.cold.i", ... "text.cold.N-2" will be created A section with name ".text.cold.i" contains the the (i+2)th fragment of each function. As an example, if each function is splitted into N=3 fragments: hot, warm, cold, then code sections will now include - a section with name ".text" containing hot fragments - a section with name ".text.cold" containing warm fragments - a section with name ".text.cold.1" containing cold fragments
The order of these new sections in the output binary currently depends on the order in which they are encountered by the emitter. For example, under N=3-way splitting, if the first function is 2-way splitted into hot and cold and the second function is 3-way splitted into hot, warm, and cold then the cold fragment is encountered first, resulting in the final section to be in the following order .text (hot), .text.cold.1 (cold), .text.cold (warm)
The above is suboptimal because the distance of jumps/calls between the hot and the warm sections will be much bigger than when ordering the sections as follows .text (hot), .text.cold (warm), .text.cold.1 (cold)
This diff orders the sections with names in the form of ".text.cold" or ".text.cold.i" based on the value of i (assuming the i-value of ".text.cold" is 0).
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D152941
show more ...
|
#
38ba2824 |
| 22-Jun-2023 |
Job Noorman <jnoorman@igalia.com> |
[BOLT] Don't register internal func relocs as external references
Currently, all relocations that point inside a function are registered as external references. If these relocations cannot be resolv
[BOLT] Don't register internal func relocs as external references
Currently, all relocations that point inside a function are registered as external references. If these relocations cannot be resolved as jump tables or computed gotos, the containing function gets marked as not-simple and excluded from optimizations.
RISC-V uses relocations for branches and jumps (to support linker relaxation) and as such, almost no functions get marked as simple. This patch fixes this by only registering relocations that originate outside of the referenced function as external references.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D153345
show more ...
|
#
b410d24a |
| 22-Jun-2023 |
Job Noorman <jnoorman@igalia.com> |
[BOLT][RISCV] Implement R_RISCV_ADD32/SUB32
Thispatch implements the R_RISCV_ADD32 and R_RISCV_SUB32 relocations for RISC-V.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D
[BOLT][RISCV] Implement R_RISCV_ADD32/SUB32
Thispatch implements the R_RISCV_ADD32 and R_RISCV_SUB32 relocations for RISC-V.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D146554
show more ...
|
#
82ef86c1 |
| 21-Jun-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Set IsRelro section attribute based on PT_GNU_RELRO segment
Handle PT_GNU_RELRO segment in accordance with Linux Standard Base spec chapter 12:
> PT_GNU_RELRO > The array element specifies t
[BOLT] Set IsRelro section attribute based on PT_GNU_RELRO segment
Handle PT_GNU_RELRO segment in accordance with Linux Standard Base spec chapter 12:
> PT_GNU_RELRO > The array element specifies the location and size of a segment which may > be made *read-only* after relocations have been processed.
Perform a readelf-style mapping check between this segment and sections, set `IsRelro` section attribute.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D152944
show more ...
|
#
e7541f56 |
| 18-Jun-2023 |
Kazu Hirata <kazu@google.com> |
[BOLT] Use llvm::is_contained (NFC)
|
#
f8730293 |
| 16-Jun-2023 |
Job Noorman <jnoorman@igalia.com> |
[BOLT] Add minimal RISC-V 64-bit support
Just enough features are implemented to process a simple "hello world" executable and produce something that still runs (including libc calls). This was main
[BOLT] Add minimal RISC-V 64-bit support
Just enough features are implemented to process a simple "hello world" executable and produce something that still runs (including libc calls). This was mainly a matter of implementing support for various relocations. Currently, the following are handled:
- R_RISCV_JAL - R_RISCV_CALL - R_RISCV_CALL_PLT - R_RISCV_BRANCH - R_RISCV_RVC_BRANCH - R_RISCV_RVC_JUMP - R_RISCV_GOT_HI20 - R_RISCV_PCREL_HI20 - R_RISCV_PCREL_LO12_I - R_RISCV_RELAX - R_RISCV_NONE
Executables linked with linker relaxation will probably fail to be processed. BOLT relocates .text to a high address while leaving .plt at its original (low) address. This causes PC-relative PLT calls that were relaxed to a JAL to not fit their offset in an I-immediate anymore. This is something that will be addressed in a later patch.
Changes to the BOLT core are relatively minor. Two things were tricky to implement and needed slightly larger changes. I'll explain those below.
The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a AUIPC/JALR pair, the second does not get any relocation (unlike other PCREL pairs). This causes issues with the combinations of the way BOLT processes binaries and the RISC-V MC-layer handles relocations: - BOLT reassembles instructions one by one and since the JALR doesn't have a relocation, it simply gets copied without modification; - Even though the MC-layer handles R_RISCV_CALL properly (adjusts both the AUIPC and the JALR), it assumes the immediates of both instructions are 0 (to be able to or-in a new value). This will most likely not be the case for the JALR that got copied over.
To handle this difficulty without resorting to RISC-V-specific hacks in the BOLT core, a new binary pass was added that searches for AUIPC/JALR pairs and zeroes-out the immediate of the JALR.
A second difficulty was supporting ABS symbols. As far as I can tell, ABS symbols were not handled at all, causing __global_pointer$ to break. RewriteInstance::analyzeRelocation was updated to handle these generically.
Tests are provided for all supported relocations. Note that in order to test the correct handling of PLT entries, an ELF file produced by GCC had to be used. While I tried to strip the YAML representation, it's still quite large. Any suggestions on how to improve this would be appreciated.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D145687
show more ...
|
#
05634f73 |
| 15-Jun-2023 |
Job Noorman <jnoorman@igalia.com> |
[BOLT] Move from RuntimeDyld to JITLink
RuntimeDyld has been deprecated in favor of JITLink. [1] This patch replaces all uses of RuntimeDyld in BOLT with JITLink.
Care has been taken to minimize th
[BOLT] Move from RuntimeDyld to JITLink
RuntimeDyld has been deprecated in favor of JITLink. [1] This patch replaces all uses of RuntimeDyld in BOLT with JITLink.
Care has been taken to minimize the impact on the code structure in order to ease the inspection of this (rather large) changeset. Since BOLT relied on the RuntimeDyld API in multiple places, this wasn't always possible though and I'll explain the changes in code structure first.
Design note: BOLT uses a JIT linker to perform what essentially is static linking. No linked code is ever executed; the result of linking is simply written back to an executable file. For this reason, I restricted myself to the use of the core JITLink library and avoided ORC as much as possible.
RuntimeDyld contains methods for loading objects (loadObject) and symbol lookup (getSymbol). Since JITLink doesn't provide a class with a similar interface, the BOLTLinker abstract class was added to implement it. It was added to Core since both the Rewrite and RuntimeLibs libraries make use of it. Wherever a RuntimeDyld object was used before, it was replaced with a BOLTLinker object.
There is one major difference between the RuntimeDyld and BOLTLinker interfaces: in JITLink, section allocation and the application of fixups (relocation) happens in a single call (jitlink::link). That is, there is no separate method like finalizeWithMemoryManagerLocking in RuntimeDyld. BOLT used to remap sections between allocating (loadObject) and linking them (finalizeWithMemoryManagerLocking). This doesn't work anymore with JITLink. Instead, BOLTLinker::loadObject accepts a callback that is called before fixups are applied which is used to remap sections.
The actual implementation of the BOLTLinker interface lives in the JITLinkLinker class in the Rewrite library. It's the only part of the BOLT code that should directly interact with the JITLink API.
For loading object, JITLinkLinker first creates a LinkGraph (jitlink::createLinkGraphFromObject) and then links it (jitlink::link). For the latter, it uses a custom JITLinkContext with the following properties: - Use BOLT's ExecutableFileMemoryManager. This one was updated to implement the JITLinkMemoryManager interface. Since BOLT never executes code, its finalization step is a no-op. - Pass config: don't use the default target passes since they modify DWARF sections in a way that seems incompatible with BOLT. Also run a custom pre-prune pass that makes sure sections without symbols are not pruned by JITLink. - Implement symbol lookup. This used to be implemented by BOLTSymbolResolver. - Call the section mapper callback before the final linking step. - Copy symbol values when the LinkGraph is resolved. Symbols are stored inside JITLinkLinker to ensure that later objects (i.e., instrumentation libraries) can find them. This functionality used to be provided by RuntimeDyld but I did not find a way to use JITLink directly for this.
Some more minor points of interest: - BinarySection::SectionID: JITLink doesn't have something equivalent to RuntimeDyld's Section IDs. Instead, sections can only be referred to by name. Hence, SectionID was updated to a string. - There seem to be no tests for Mach-O. I've tested a small hello-world style binary but not more than that. - On Mach-O, JITLink "normalizes" section names to include the segment name. I had to parse the section name back from this manually which feels slightly hacky.
[1] https://reviews.llvm.org/D145686#4222642
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D147544
show more ...
|
#
1ebad216 |
| 13-Jun-2023 |
Maksim Panchenko <maks@fb.com> |
[BOLT][NFCI] Remove redundant instance of MCAsmBackend
Use instance of MCAsmBackend from BinaryContext instead of creating a new one.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.
[BOLT][NFCI] Remove redundant instance of MCAsmBackend
Use instance of MCAsmBackend from BinaryContext instead of creating a new one.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D152849
show more ...
|
#
c4e60a7f |
| 12-Jun-2023 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Fix --max-funcs=<N> option
Fix off-by-one error while handling of the --max-funcs=<N> option. We used to process N+1 functions when N was requested.
Reviewed By: Amir
Differential Revision:
[BOLT] Fix --max-funcs=<N> option
Fix off-by-one error while handling of the --max-funcs=<N> option. We used to process N+1 functions when N was requested.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D152751
show more ...
|
#
f5425c12 |
| 24-Apr-2023 |
Christian Ulmann <christian.ulmann@nextsilicon.com> |
[LoopInfo] Move generic LoopInfo into own files
This commit splits the generic part of `LoopInfo` into separate files. These new `GenericLoopInfo` files are located in `llvm/Support` to be inline wi
[LoopInfo] Move generic LoopInfo into own files
This commit splits the generic part of `LoopInfo` into separate files. These new `GenericLoopInfo` files are located in `llvm/Support` to be inline with `GenericDomTree`.
Furthermore, this change ensures that MLIR's Bazel build does not have to link against `LLVMAnalysis` just to use these template headers.
Depends on D148219
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D148235
show more ...
|