History log of /llvm-project/llvm/lib/MC/MCFragment.cpp (Results 1 – 25 of 68)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7
# c1931241 22-Dec-2024 Fangrui Song <i@maskray.me>

[MC] Remove fixup_begin/fixup_end


Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# d73d5c8c 15-Nov-2024 Kazu Hirata <kazu@google.com>

[MC] Remove unused includes (NFC) (#116317)

Identified with misc-include-cleaner.


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2
# fb702823 29-Jul-2024 Fangrui Song <i@maskray.me>

[MC] Move some bool members to MCFragment. NFC

Move `AllowAutoPadding` to MCFragment, which reduce the
MCRelaxableFragment size by 8 bytes. While here, also move
`AlignToBundleEnd` next to `HasInstr

[MC] Move some bool members to MCFragment. NFC

Move `AllowAutoPadding` to MCFragment, which reduce the
MCRelaxableFragment size by 8 bytes. While here, also move
`AlignToBundleEnd` next to `HasInstructions`. Functions that create
fragments are slightly shorter due to fewer byte zeroing instructions.

Although fewer in number than MCDataFragments, MCRelaxableFragment
objects still constitute a significant proportion warranting
optimization.
```
% clang -c sqlite3.i -w -g -Xclang -print-stats
...
2206 assembler - Number of emitted assembler fragments - align
83980 assembler - Number of emitted assembler fragments - data
84 assembler - Number of emitted assembler fragments - fill
169462 assembler - Number of emitted assembler fragments - total
11396 assembler - Number of emitted assembler fragments - relaxable
```

Pull Request: https://github.com/llvm/llvm-project/pull/100976

show more ...


# de5aa8d0 29-Jul-2024 Fangrui Song <i@maskray.me>

[MC] Remove unused MCCompactEncodedInstFragment

This has been used after #94950.


Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init
# 10c894cf 30-Jun-2024 Fangrui Song <i@maskray.me>

[MC] Move MCAsmLayout from MCFragment.cpp to MCAssembler.cpp. NFC

8d736236d36ca5c98832b7631aea2e538f6a54aa (2015) moved these MCAsmLayout
functions to MCFragment.cpp, but the original placement is b

[MC] Move MCAsmLayout from MCFragment.cpp to MCAssembler.cpp. NFC

8d736236d36ca5c98832b7631aea2e538f6a54aa (2015) moved these MCAsmLayout
functions to MCFragment.cpp, but the original placement is better as
these functions are tightly coupled with MCAssembler.cpp.

show more ...


# 2afa193b 30-Jun-2024 Fangrui Song <i@maskray.me>

[MC] Remove MCAsmLayout::invalidateFragmentsFrom

The simplification is enabled by
9d0754ada5dbbc0c009bcc2f7824488419cc5530 ("[MC] Relax fragments eagerly").


# c9f6a5e4 22-Jun-2024 Fangrui Song <i@maskray.me>

[MC] Move computeBundlePadding closer to its only caller. NFC

There is only one caller after #95188.


# 8cb6e587 22-Jun-2024 Fangrui Song <i@maskray.me>

[MC] Allocate MCFragment with a bump allocator

#95197 and 75006466296ed4b0f845cbbec4bf77c21de43b40 eliminated all raw
`new MCXXXFragment`. We can now place fragments in a bump allocator.
In addition

[MC] Allocate MCFragment with a bump allocator

#95197 and 75006466296ed4b0f845cbbec4bf77c21de43b40 eliminated all raw
`new MCXXXFragment`. We can now place fragments in a bump allocator.
In addition, remove the dead `Kind == FragmentType(~0)` condition.

~CodeViewContext may call `StrTabFragment->destroy()` and need to be
reset before `FragmentAllocator.Reset()`.
Tested by llvm/test/MC/COFF/cv-compiler-info.ll using asan.

Pull Request: https://github.com/llvm/llvm-project/pull/96402

show more ...


# 82f9b0fb 22-Jun-2024 Fangrui Song <i@maskray.me>

[MC] Remove Parent initializer from MCFragment ctor


# 9b44cfbd 22-Jun-2024 Fangrui Song <i@maskray.me>

[MC] Remove unused section parameters from MCFragment constructors


Revision tags: llvmorg-18.1.8
# 27588fe2 13-Jun-2024 Fangrui Song <i@maskray.me>

[MC] Move MCFragment::Atom to MCSectionMachO::Atoms

Mach-O's `.subsections_via_symbols` mechanism associates a fragment with
an atom (a non-temporary defined symbol). The current approach
(`MCFragme

[MC] Move MCFragment::Atom to MCSectionMachO::Atoms

Mach-O's `.subsections_via_symbols` mechanism associates a fragment with
an atom (a non-temporary defined symbol). The current approach
(`MCFragment::Atom`) wastes space for other object file formats.

After #95077, `MCFragment::LayoutOrder` is only used by
`AttemptToFoldSymbolOffsetDifference`. While it could be removed, we
might explore future uses for `LayoutOrder`.

@aengelke suggests one use case: move `Atom` into MCSection. This works
because Mach-O doesn't support `.subsection`, and `LayoutOrder`, as the
index into the fragment list, is unchanged.

This patch moves MCFragment::Atom to MCSectionMachO::Atoms. `getAtom`
may be called at parse time before `Atoms` is initialized, so a bound
checking is needed to keep the hack working.

Pull Request: https://github.com/llvm/llvm-project/pull/95341

show more ...


# 846e47e7 13-Jun-2024 aengelke <engelke@in.tum.de>

[MC] Reduce size of MCDataFragment by 8 bytes (#95293)

Due to alignment, the first two fields of MCEncodedFragment are
currently at bytes 40 and 41, so 1 byte over the 8 byte boundary,
causing 7 b

[MC] Reduce size of MCDataFragment by 8 bytes (#95293)

Due to alignment, the first two fields of MCEncodedFragment are
currently at bytes 40 and 41, so 1 byte over the 8 byte boundary,
causing 7 bytes padding to be inserted for the following pointer.

Fold two bools of MCFragment into bitfields to reduce move the two
fields of MCEncodedFragment one byte earlier to remove the padding
bytes. This works, as in the Itanium ABI, there is no padding after
base classes.

This gives a space reduction of MCDataFragment from 224 to 216 bytes.

show more ...


# 2cc4bc13 12-Jun-2024 Fangrui Song <i@maskray.me>

MCFragment: Initialize Offset to 0

After 9d0754ada5dbbc0c009bcc2f7824488419cc5530 ("[MC] Relax fragments
eagerly") removes the assert of Offset, it is no longer useful to
initialize the member to -1

MCFragment: Initialize Offset to 0

After 9d0754ada5dbbc0c009bcc2f7824488419cc5530 ("[MC] Relax fragments
eagerly") removes the assert of Offset, it is no longer useful to
initialize the member to -1.

Now the symbol value estimate is more precise, which leads to slight
behavior change to layout-interdependency.s.

show more ...


# de19f7b6 11-Jun-2024 Fangrui Song <i@maskray.me>

[MC] Replace fragment ilist with singly-linked lists

Fragments are allocated with `operator new` and stored in an ilist with
Prev/Next/Parent pointers. A more efficient representation would be an
ar

[MC] Replace fragment ilist with singly-linked lists

Fragments are allocated with `operator new` and stored in an ilist with
Prev/Next/Parent pointers. A more efficient representation would be an
array of fragments without the overhead of Prev/Next pointers.

As the first step, replace ilist with singly-linked lists.

* `getPrevNode` uses have been eliminated by previous changes.
* The last use of the `Prev` pointer remains: for each subsection, there is an insertion point and
the current insertion point is stored at `CurInsertionPoint`.
* `HexagonAsmBackend::finishLayout` needs a backward iterator. Save all
fragments within `Frags`. Hexagon programs are usually small, and the
performance does not matter that much.

To eliminate `Prev`, change the subsection representation to
singly-linked lists for subsections and a pointer to the active
singly-linked list. The fragments from all subsections will be chained
together at layout time.

Since fragment lists are disconnected before layout time, we can remove
`MCFragment::SubsectionNumber` (https://reviews.llvm.org/D69411). The
current implementation of `AttemptToFoldSymbolOffsetDifference` requires
future improvement for robustness.

Pull Request: https://github.com/llvm/llvm-project/pull/95077

show more ...


# 9d0754ad 10-Jun-2024 Fangrui Song <i@maskray.me>

[MC] Relax fragments eagerly

Lazy relaxation caused hash table lookups (`getFragmentOffset`) and
complex use/compute interdependencies. Some expressions involding
forward declared symbols (e.g. `sub

[MC] Relax fragments eagerly

Lazy relaxation caused hash table lookups (`getFragmentOffset`) and
complex use/compute interdependencies. Some expressions involding
forward declared symbols (e.g. `subsection-if.s`) cannot be computed.
Recursion detection requires complex `IsBeingLaidOut`
(https://reviews.llvm.org/D79570).

D76114's `invalidateFragmentsFrom` makes lazy relaxation even less
useful.

Switch to eager relaxation to greatly simplify code and resolve these
issues. This change also removes a `getPrevNode` use, which makes it
more feasible to replace the fragment representation, which might yield
a large peak RSS win.

Minor downsides: The number of section relaxations may increase (offset
by avoiding the hash table lookup). For relax-recompute-align.s, the
computed layout is not optimal.

show more ...


# dcb71c06 08-Jun-2024 Fangrui Song <i@maskray.me>

[MC] Simplify Sec.getFragmentList().insert(Sec.begin(), F). NFC

Decrease the uses of getFragmentList() to make it easier to change the
fragment list representation.


Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 9fec33aa 19-Jan-2024 Amir Ayupov <aaupov@fb.com>

Revert "[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)"

This reverts commit 82bc33ea3f1a539be50ed46919dc53fc6b685da9.

Accidentally pushed unrelated changes.


# 82bc33ea 19-Jan-2024 Amir Ayupov <aaupov@fb.com>

[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)

Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it s

[BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653)

Fix the bug where merge-fdata unconditionally outputs boltedcollection
line, regardless of whether input files have it set.

Test Plan:
Added bolt/test/X86/merge-fdata-nobat-mode.test which fails without this
fix.

show more ...


# 7850c94b 16-Jan-2024 David Green <david.green@arm.com>

[NFC] sentinal -> sentinel


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# eecd41aa 11-Jul-2022 spupyrev <spupyrev@fb.com>

Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"

This reverts commit 6d0528636ae54fba75938a79ae7a98dfcc949f72.


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2
# 6d052863 05-Aug-2021 Rafael Auler <rafaelauler@fb.com>

Rebase: [Facebook] [MC] Introduce NeverAlign fragment type

Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible

Rebase: [Facebook] [MC] Introduce NeverAlign fragment type

Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.

In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).

This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.

The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.

LLVM: https://reviews.llvm.org/D97982

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: phabricatorlinter

Differential Revision: https://phabricator.intern.facebook.com/D31361547

show more ...


# 412c788a 14-Jun-2022 Guillaume Chatelet <gchatelet@google.com>

[NFC][Alignment] Use Align in MCAlignFragment


# 5d57578a 20-Oct-2021 Leonard Grey <lgrey@chromium.org>

[MC] Recursively calculate symbol offset

This is speculative since I'm not sure if there's some implicit contract that a
variable symbol must not have another variable symbol in its evaluation tree.

[MC] Recursively calculate symbol offset

This is speculative since I'm not sure if there's some implicit contract that a
variable symbol must not have another variable symbol in its evaluation tree.

Downstream bug: https://bugs.chromium.org/p/chromium/issues/detail?id=471146#c23.

Test is based on alias.s (removed checks since we just need to know it didn't
crash).

Differential Revision: https://reviews.llvm.org/D109109

show more ...


Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 705a4c14 08-Dec-2020 Hongtao Yu <hoy@fb.com>

[CSSPGO] Pseudo probe encoding and emission.

This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdY

[CSSPGO] Pseudo probe encoding and emission.

This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section .pseudo_probe_desc,"",@progbits
.quad 6309742469962978389 // Func GUID
.quad 4294967295 // Func Hash
.byte 9 // Length of func name
.ascii "_Z5funcAi" // Func name
.quad 7102633082150537521
.quad 138828622701
.byte 12
.ascii "_Z8funcLeafi"
.quad 446061515086924981
.quad 4294967295
.byte 9
.ascii "_Z5funcBi"
.quad -2016976694713209516
.quad 72617220756
.byte 7
.ascii "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
GUID (uint64)
GUID of the function
NPROBES (ULEB128)
Number of probes originating from this function.
NUM_INLINED_FUNCTIONS (ULEB128)
Number of callees inlined into this function, aka number of
first-level inlinees
PROBE RECORDS
A list of NPROBES entries. Each entry contains:
INDEX (ULEB128)
TYPE (uint4)
0 - block probe, 1 - indirect call, 2 - direct call
ATTRIBUTE (uint3)
reserved
ADDRESS_TYPE (uint1)
0 - code address, 1 - address delta
CODE_ADDRESS (uint64 or ULEB128)
code address or address delta, depending on ADDRESS_TYPE
INLINED FUNCTION RECORDS
A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
callees. Each record contains:
INLINE SITE
GUID of the inlinee (uint64)
ID of the callsite probe (ULEB128)
FUNCTION BODY
A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 3 0
.pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe 837061429793323041 4 0
popq %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
2011a0: 50 push rax
2011a1: 85 ff test edi,edi
[Probe]: FUNC: foo2 Index: 1 Type: Block
2011a3: 74 02 je 2011a7 <foo2+0x7>
[Probe]: FUNC: foo2 Index: 3 Type: Block
[Probe]: FUNC: foo2 Index: 4 Type: Block
[Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6
2011a5: 58 pop rax
2011a6: c3 ret
[Probe]: FUNC: foo2 Index: 2 Type: Block
2011a7: bf 01 00 00 00 mov edi,0x1
[Probe]: FUNC: foo2 Index: 5 Type: IndirectCall
2011ac: ff d6 call rsi
[Probe]: FUNC: foo2 Index: 4 Type: Block
2011ae: 58 pop rax
2011af: c3 ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878

show more ...


# 7ead5f5a 10-Dec-2020 Mitch Phillips <31459023+hctim@users.noreply.github.com>

Revert "[CSSPGO] Pseudo probe encoding and emission."

This reverts commit b035513c06d1cba2bae8f3e88798334e877523e1.

Reason: Broke the ASan buildbots:
http://lab.llvm.org:8011/#/builders/5/builds/

Revert "[CSSPGO] Pseudo probe encoding and emission."

This reverts commit b035513c06d1cba2bae8f3e88798334e877523e1.

Reason: Broke the ASan buildbots:
http://lab.llvm.org:8011/#/builders/5/builds/2269

show more ...


123