Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
c1931241 |
| 22-Dec-2024 |
Fangrui Song <i@maskray.me> |
[MC] Remove fixup_begin/fixup_end
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
d73d5c8c |
| 15-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[MC] Remove unused includes (NFC) (#116317)
Identified with misc-include-cleaner.
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2 |
|
#
fb702823 |
| 29-Jul-2024 |
Fangrui Song <i@maskray.me> |
[MC] Move some bool members to MCFragment. NFC
Move `AllowAutoPadding` to MCFragment, which reduce the MCRelaxableFragment size by 8 bytes. While here, also move `AlignToBundleEnd` next to `HasInstr
[MC] Move some bool members to MCFragment. NFC
Move `AllowAutoPadding` to MCFragment, which reduce the MCRelaxableFragment size by 8 bytes. While here, also move `AlignToBundleEnd` next to `HasInstructions`. Functions that create fragments are slightly shorter due to fewer byte zeroing instructions.
Although fewer in number than MCDataFragments, MCRelaxableFragment objects still constitute a significant proportion warranting optimization. ``` % clang -c sqlite3.i -w -g -Xclang -print-stats ... 2206 assembler - Number of emitted assembler fragments - align 83980 assembler - Number of emitted assembler fragments - data 84 assembler - Number of emitted assembler fragments - fill 169462 assembler - Number of emitted assembler fragments - total 11396 assembler - Number of emitted assembler fragments - relaxable ```
Pull Request: https://github.com/llvm/llvm-project/pull/100976
show more ...
|
#
de5aa8d0 |
| 29-Jul-2024 |
Fangrui Song <i@maskray.me> |
[MC] Remove unused MCCompactEncodedInstFragment
This has been used after #94950.
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
10c894cf |
| 30-Jun-2024 |
Fangrui Song <i@maskray.me> |
[MC] Move MCAsmLayout from MCFragment.cpp to MCAssembler.cpp. NFC
8d736236d36ca5c98832b7631aea2e538f6a54aa (2015) moved these MCAsmLayout functions to MCFragment.cpp, but the original placement is b
[MC] Move MCAsmLayout from MCFragment.cpp to MCAssembler.cpp. NFC
8d736236d36ca5c98832b7631aea2e538f6a54aa (2015) moved these MCAsmLayout functions to MCFragment.cpp, but the original placement is better as these functions are tightly coupled with MCAssembler.cpp.
show more ...
|
#
2afa193b |
| 30-Jun-2024 |
Fangrui Song <i@maskray.me> |
[MC] Remove MCAsmLayout::invalidateFragmentsFrom
The simplification is enabled by 9d0754ada5dbbc0c009bcc2f7824488419cc5530 ("[MC] Relax fragments eagerly").
|
#
c9f6a5e4 |
| 22-Jun-2024 |
Fangrui Song <i@maskray.me> |
[MC] Move computeBundlePadding closer to its only caller. NFC
There is only one caller after #95188.
|
#
8cb6e587 |
| 22-Jun-2024 |
Fangrui Song <i@maskray.me> |
[MC] Allocate MCFragment with a bump allocator
#95197 and 75006466296ed4b0f845cbbec4bf77c21de43b40 eliminated all raw `new MCXXXFragment`. We can now place fragments in a bump allocator. In addition
[MC] Allocate MCFragment with a bump allocator
#95197 and 75006466296ed4b0f845cbbec4bf77c21de43b40 eliminated all raw `new MCXXXFragment`. We can now place fragments in a bump allocator. In addition, remove the dead `Kind == FragmentType(~0)` condition.
~CodeViewContext may call `StrTabFragment->destroy()` and need to be reset before `FragmentAllocator.Reset()`. Tested by llvm/test/MC/COFF/cv-compiler-info.ll using asan.
Pull Request: https://github.com/llvm/llvm-project/pull/96402
show more ...
|
#
82f9b0fb |
| 22-Jun-2024 |
Fangrui Song <i@maskray.me> |
[MC] Remove Parent initializer from MCFragment ctor
|
#
9b44cfbd |
| 22-Jun-2024 |
Fangrui Song <i@maskray.me> |
[MC] Remove unused section parameters from MCFragment constructors
|
Revision tags: llvmorg-18.1.8 |
|
#
27588fe2 |
| 13-Jun-2024 |
Fangrui Song <i@maskray.me> |
[MC] Move MCFragment::Atom to MCSectionMachO::Atoms
Mach-O's `.subsections_via_symbols` mechanism associates a fragment with an atom (a non-temporary defined symbol). The current approach (`MCFragme
[MC] Move MCFragment::Atom to MCSectionMachO::Atoms
Mach-O's `.subsections_via_symbols` mechanism associates a fragment with an atom (a non-temporary defined symbol). The current approach (`MCFragment::Atom`) wastes space for other object file formats.
After #95077, `MCFragment::LayoutOrder` is only used by `AttemptToFoldSymbolOffsetDifference`. While it could be removed, we might explore future uses for `LayoutOrder`.
@aengelke suggests one use case: move `Atom` into MCSection. This works because Mach-O doesn't support `.subsection`, and `LayoutOrder`, as the index into the fragment list, is unchanged.
This patch moves MCFragment::Atom to MCSectionMachO::Atoms. `getAtom` may be called at parse time before `Atoms` is initialized, so a bound checking is needed to keep the hack working.
Pull Request: https://github.com/llvm/llvm-project/pull/95341
show more ...
|
#
846e47e7 |
| 13-Jun-2024 |
aengelke <engelke@in.tum.de> |
[MC] Reduce size of MCDataFragment by 8 bytes (#95293)
Due to alignment, the first two fields of MCEncodedFragment are
currently at bytes 40 and 41, so 1 byte over the 8 byte boundary,
causing 7 b
[MC] Reduce size of MCDataFragment by 8 bytes (#95293)
Due to alignment, the first two fields of MCEncodedFragment are
currently at bytes 40 and 41, so 1 byte over the 8 byte boundary,
causing 7 bytes padding to be inserted for the following pointer.
Fold two bools of MCFragment into bitfields to reduce move the two
fields of MCEncodedFragment one byte earlier to remove the padding
bytes. This works, as in the Itanium ABI, there is no padding after
base classes.
This gives a space reduction of MCDataFragment from 224 to 216 bytes.
show more ...
|
#
2cc4bc13 |
| 12-Jun-2024 |
Fangrui Song <i@maskray.me> |
MCFragment: Initialize Offset to 0
After 9d0754ada5dbbc0c009bcc2f7824488419cc5530 ("[MC] Relax fragments eagerly") removes the assert of Offset, it is no longer useful to initialize the member to -1
MCFragment: Initialize Offset to 0
After 9d0754ada5dbbc0c009bcc2f7824488419cc5530 ("[MC] Relax fragments eagerly") removes the assert of Offset, it is no longer useful to initialize the member to -1.
Now the symbol value estimate is more precise, which leads to slight behavior change to layout-interdependency.s.
show more ...
|
#
de19f7b6 |
| 11-Jun-2024 |
Fangrui Song <i@maskray.me> |
[MC] Replace fragment ilist with singly-linked lists
Fragments are allocated with `operator new` and stored in an ilist with Prev/Next/Parent pointers. A more efficient representation would be an ar
[MC] Replace fragment ilist with singly-linked lists
Fragments are allocated with `operator new` and stored in an ilist with Prev/Next/Parent pointers. A more efficient representation would be an array of fragments without the overhead of Prev/Next pointers.
As the first step, replace ilist with singly-linked lists.
* `getPrevNode` uses have been eliminated by previous changes. * The last use of the `Prev` pointer remains: for each subsection, there is an insertion point and the current insertion point is stored at `CurInsertionPoint`. * `HexagonAsmBackend::finishLayout` needs a backward iterator. Save all fragments within `Frags`. Hexagon programs are usually small, and the performance does not matter that much.
To eliminate `Prev`, change the subsection representation to singly-linked lists for subsections and a pointer to the active singly-linked list. The fragments from all subsections will be chained together at layout time.
Since fragment lists are disconnected before layout time, we can remove `MCFragment::SubsectionNumber` (https://reviews.llvm.org/D69411). The current implementation of `AttemptToFoldSymbolOffsetDifference` requires future improvement for robustness.
Pull Request: https://github.com/llvm/llvm-project/pull/95077
show more ...
|
#
9d0754ad |
| 10-Jun-2024 |
Fangrui Song <i@maskray.me> |
[MC] Relax fragments eagerly
Lazy relaxation caused hash table lookups (`getFragmentOffset`) and complex use/compute interdependencies. Some expressions involding forward declared symbols (e.g. `sub
[MC] Relax fragments eagerly
Lazy relaxation caused hash table lookups (`getFragmentOffset`) and complex use/compute interdependencies. Some expressions involding forward declared symbols (e.g. `subsection-if.s`) cannot be computed. Recursion detection requires complex `IsBeingLaidOut` (https://reviews.llvm.org/D79570).
D76114's `invalidateFragmentsFrom` makes lazy relaxation even less useful.
Switch to eager relaxation to greatly simplify code and resolve these issues. This change also removes a `getPrevNode` use, which makes it more feasible to replace the fragment representation, which might yield a large peak RSS win.
Minor downsides: The number of section relaxations may increase (offset by avoiding the hash table lookup). For relax-recompute-align.s, the computed layout is not optimal.
show more ...
|
#
dcb71c06 |
| 08-Jun-2024 |
Fangrui Song <i@maskray.me> |
[MC] Simplify Sec.getFragmentList().insert(Sec.begin(), F). NFC
Decrease the uses of getFragmentList() to make it easier to change the fragment list representation.
|
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
9fec33aa |
| 19-Jan-2024 |
Amir Ayupov <aaupov@fb.com> |
Revert "[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)"
This reverts commit 82bc33ea3f1a539be50ed46919dc53fc6b685da9.
Accidentally pushed unrelated changes.
|
#
82bc33ea |
| 19-Jan-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)
Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it s
[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)
Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it set.
Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.
show more ...
|
#
7850c94b |
| 16-Jan-2024 |
David Green <david.green@arm.com> |
[NFC] sentinal -> sentinel
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
eecd41aa |
| 11-Jul-2022 |
spupyrev <spupyrev@fb.com> |
Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"
This reverts commit 6d0528636ae54fba75938a79ae7a98dfcc949f72.
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
#
6d052863 |
| 05-Aug-2021 |
Rafael Auler <rafaelauler@fb.com> |
Rebase: [Facebook] [MC] Introduce NeverAlign fragment type
Summary: Introduce NeverAlign fragment type.
The intended usage of this fragment is to insert it before a pair of macro-op fusion eligible
Rebase: [Facebook] [MC] Introduce NeverAlign fragment type
Summary: Introduce NeverAlign fragment type.
The intended usage of this fragment is to insert it before a pair of macro-op fusion eligible instructions. NeverAlign fragment ensures that the next fragment (first instruction in the pair) does not end at a given alignment boundary by emitting a minimal size nop if necessary.
In effect, it ensures that a pair of macro-fusible instructions is not split by a given alignment boundary, which is a precondition for macro-op fusion in modern Intel Cores (64B = cache line size, see Intel Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode Pipeline: Macro-Fusion).
This patch introduces functionality used by BOLT when emitting code with MacroFusion alignment already in place.
The use case is different from BoundaryAlign and instruction bundling: - BoundaryAlign can be extended to perform the desired alignment for the first instruction in the macro-op fusion pair (D101817). However, this approach has higher overhead due to reliance on relaxation as BoundaryAlign requires in the general case - see https://reviews.llvm.org/D97982#2710638. - Instruction bundling: the intent of NeverAlign fragment is to prevent the first instruction in a pair ending at a given alignment boundary, by inserting at most one minimum size nop. It's OK if either instruction crosses the cache line. Padding both instructions using bundles to not cross the alignment boundary would result in excessive padding. There's no straightforward way to request instruction bundling to avoid a given end alignment for the first instruction in the bundle.
LLVM: https://reviews.llvm.org/D97982
Manual rebase conflict history: https://phabricator.intern.facebook.com/D30142613
Test Plan: sandcastle
Reviewers: #llvm-bolt
Subscribers: phabricatorlinter
Differential Revision: https://phabricator.intern.facebook.com/D31361547
show more ...
|
#
412c788a |
| 14-Jun-2022 |
Guillaume Chatelet <gchatelet@google.com> |
[NFC][Alignment] Use Align in MCAlignFragment
|
#
5d57578a |
| 20-Oct-2021 |
Leonard Grey <lgrey@chromium.org> |
[MC] Recursively calculate symbol offset
This is speculative since I'm not sure if there's some implicit contract that a variable symbol must not have another variable symbol in its evaluation tree.
[MC] Recursively calculate symbol offset
This is speculative since I'm not sure if there's some implicit contract that a variable symbol must not have another variable symbol in its evaluation tree.
Downstream bug: https://bugs.chromium.org/p/chromium/issues/detail?id=471146#c23.
Test is based on alias.s (removed checks since we just need to know it didn't crash).
Differential Revision: https://reviews.llvm.org/D109109
show more ...
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
#
705a4c14 |
| 08-Dec-2020 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdY
[CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s
Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead.
**ELF object emission**
The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.
Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.
The format of `.pseudo_probe_desc` section looks like:
``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ```
For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :
``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ```
To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.
**Assembling**
Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.
A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.
A example assembly looks like:
``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq *%rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ```
With inlining turned on, the assembly may look different around %bb2 with an inlined probe:
``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ```
**Disassembling**
We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.
An example disassembly looks like:
``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ```
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D91878
show more ...
|
#
7ead5f5a |
| 10-Dec-2020 |
Mitch Phillips <31459023+hctim@users.noreply.github.com> |
Revert "[CSSPGO] Pseudo probe encoding and emission."
This reverts commit b035513c06d1cba2bae8f3e88798334e877523e1.
Reason: Broke the ASan buildbots: http://lab.llvm.org:8011/#/builders/5/builds/
Revert "[CSSPGO] Pseudo probe encoding and emission."
This reverts commit b035513c06d1cba2bae8f3e88798334e877523e1.
Reason: Broke the ASan buildbots: http://lab.llvm.org:8011/#/builders/5/builds/2269
show more ...
|