History log of /llvm-project/llvm/tools/llvm-profgen/ProfiledBinary.cpp (Results 1 – 25 of 91)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4
# 04ebd190 26-Aug-2024 Amir Ayupov <aaupov@fb.com>

[MC][NFC] Statically allocate storage for decoded pseudo probes and function records

Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`)
and function records (`InlineTreeVec`).

Le

[MC][NFC] Statically allocate storage for decoded pseudo probes and function records

Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`)
and function records (`InlineTreeVec`).

Leverage that to also shrink sizes of `MCDecodedPseudoProbe`:
- Drop Guid since it's accessible via `InlineTree`.

`MCDecodedPseudoProbeInlineTree`:
- Keep track of probes and inlinees using `ArrayRef`s now that probes
and function records belonging to the same function are allocated
contiguously.

This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing
time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with
400MiB .pseudo_probe section containing 43M probes and 25M function
records.

Depends on:
#102774
#102787
#102788

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102789

show more ...


# ca53611c 23-Aug-2024 Kazu Hirata <kazu@google.com>

[llvm] Use range-based for loops (NFC) (#105861)


Revision tags: llvmorg-19.1.0-rc3
# 242f4e85 11-Aug-2024 Amir Ayupov <aaupov@fb.com>

[profgen][NFC] Pass parameter as const_ref

Pass `ProbeNode` parameter of `trackInlineesOptimizedAway` as const
reference.

Reviewers: wlei-llvm, WenleiHe

Reviewed By: WenleiHe

Pull Request: https:

[profgen][NFC] Pass parameter as const_ref

Pass `ProbeNode` parameter of `trackInlineesOptimizedAway` as const
reference.

Reviewers: wlei-llvm, WenleiHe

Reviewed By: WenleiHe

Pull Request: https://github.com/llvm/llvm-project/pull/102787

show more ...


Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 75bc20ff 06-Jul-2024 Kazu Hirata <kazu@google.com>

[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914)


Revision tags: llvmorg-18.1.8
# 2fa6eaf9 13-Jun-2024 xur-llvm <59886942+xur-llvm@users.noreply.github.com>

[llvm-profgen] Add support for Linux kenrel profile (#92831)

Add the support to handle Linux kernel perf files. The functionality is
under option -kernel. Note that currently only main kernel (in v

[llvm-profgen] Add support for Linux kenrel profile (#92831)

Add the support to handle Linux kernel perf files. The functionality is
under option -kernel. Note that currently only main kernel (in vmlinux)
is handled: kernel modules are not handled.

---------

Co-authored-by: Han Shen <shenhan@google.com>

show more ...


Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2
# 8c03f400 15-Mar-2024 Haohai Wen <haohai.wen@intel.com>

[llvm-profgen] Support COFF binary (#83972)

Intel Vtune/SEP has supported collecting LBR on Windows and generating
perf-script file which is same format as Linux perf script. This patch
teaches ll

[llvm-profgen] Support COFF binary (#83972)

Intel Vtune/SEP has supported collecting LBR on Windows and generating
perf-script file which is same format as Linux perf script. This patch
teaches llvm-profgen to disassemble COFF binary so that we can do
Sampling based PGO on Windows.

show more ...


Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2
# 7ff2dc3b 30-Jan-2024 Nathan Lanza <nathanlanza@gmail.com>

[profgen] Use a 64bit integer for &'ing the loadable address (#79930)

For the linux kernel, the loadable segments start at 0xffff... and thus
the 32 bit integer here was truncating all the meaningf

[profgen] Use a 64bit integer for &'ing the loadable address (#79930)

For the linux kernel, the loadable segments start at 0xffff... and thus
the 32 bit integer here was truncating all the meaningful bits. Grow it
to 64 bits.

show more ...


Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# 7a3db658 23-Oct-2023 Hongtao Yu <hoy@fb.com>

[llvm-profgen] More tweaks to warnings (#68608)

Tweaking warnings more to avoid flooding user log.


# ef0e0adc 17-Oct-2023 William Junda Huang <williamjhuang@google.com>

[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164)

This is phase 2 of the MD5 refactoring on Sample Profile following
https://reviews.llvm.o

[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164)

This is phase 2 of the MD5 refactoring on Sample Profile following
https://reviews.llvm.org/D147740

In previous implementation, when a MD5 Sample Profile is read, the
reader first converts the MD5 values to strings, and then create a
StringRef as if the numerical strings are regular function names, and
later on IPO transformation passes perform string comparison over these
numerical strings for profile matching. This is inefficient since it
causes many small heap allocations.
In this patch I created a class `ProfileFuncRef` that is similar to
`StringRef` but it can represent a hash value directly without any
conversion, and it will be more efficient (I will attach some benchmark
results later) when being used in associative containers.

ProfileFuncRef guarantees the same function name in string form or in
MD5 form has the same hash value, which also fix a few issue in IPO
passes where function matching/lookup only check for function name
string, while returns a no-match if the profile is MD5.

When testing on an internal large profile (> 1 GB, with more than 10
million functions), the full profile load time is reduced from 28 sec to
25 sec in average, and reading function offset table from 0.78s to 0.7s

show more ...


Revision tags: llvmorg-17.0.3
# 96776783 04-Oct-2023 Hongtao Yu <hoy@fb.com>

[llvm-profgen] Print DWP related warnings under show-detailed-warning (#68019)

Printing DWP related warnings under show-detailed-warning so that they
won't flood user log.


Revision tags: llvmorg-17.0.2, llvmorg-17.0.1
# 47669af4 18-Sep-2023 Hongtao Yu <hoy@fb.com>

[llvm-profgen] Ignore inline frames with an emtpy function name (#66678)

Broken debug information can give empty names for an inlined frame, e.g,
```

0x1d605c68: ryKeyINS7_17SmartCounterTypesEEE

[llvm-profgen] Ignore inline frames with an emtpy function name (#66678)

Broken debug information can give empty names for an inlined frame, e.g,
```

0x1d605c68: ryKeyINS7_17SmartCounterTypesEEESt10shared_ptrINS7_15AsyncCacheValueIS9_EEESaIhESt6atomicEEE9fetch_subElSt12memory_order at Filename: edata.h
Function start filename: edata.h
Function start line: 266
Function start address: 0x1d605c68
Line: 267
Column: 0
(inlined by) at Filename: edata.h
Function start filename: edata.h
Function start line: 274
Function start address: 0x1d605c68
Line: 275
Column: 0
(inlined by) _EEEmmEv at Filename: arena.c
Function start filename: arena.c
Function start line: 1303
Line: 1308
Column: 0
```

This patch avoids creating a sample context with an empty function name
by stopping tracking at that frame. This prevents a hash failure that
leads to an ICE, where empty context serves at an empty key for the
underlying MapVector
https://github.com/llvm/llvm-project/blob/7624de5beae2f142abfdb3e32a63c263a586d768/llvm/lib/ProfileData/SampleProfWriter.cpp#L261

show more ...


Revision tags: llvmorg-17.0.0, llvmorg-17.0.0-rc4
# 01b88dd6 30-Aug-2023 Takuya Shimizu <shimizu2486@gmail.com>

[NFC] Remove unused variables declared in conditions

D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}`
This patch is an NFC fix to suppress t

[NFC] Remove unused variables declared in conditions

D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}`
This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch.

Differential Revision: https://reviews.llvm.org/D158016

show more ...


Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 09742be8 23-Jun-2023 Hongtao Yu <hoy@fb.com>

[llvm-profgen] Remove target triple check to allow for more targets

Llvm-profgen internally uses the llvm libraries and the MCDesc interface to do disassembling and symblization and it never checks

[llvm-profgen] Remove target triple check to allow for more targets

Llvm-profgen internally uses the llvm libraries and the MCDesc interface to do disassembling and symblization and it never checks against target-specific instruction operators. This makes it quite transparent to targets and a first attempt for an aarch64 binary just works. Therefore I'm removing the unnecessary triple check to unblock for new targets.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D153449

show more ...


Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# 27c37327 25-May-2023 Mark Santaniello <marksan@meta.com>

Avoid pointless canonicalize when using Dwarf names

CPU profile indicated memcmp was hot due to the two rfind calls in
getCanonicalFnName. If UseSymbolTable is false, we can avoid the cost entirely.

Avoid pointless canonicalize when using Dwarf names

CPU profile indicated memcmp was hot due to the two rfind calls in
getCanonicalFnName. If UseSymbolTable is false, we can avoid the cost entirely.

For CSSPGO profiles I've measured ~5% speedup with this change.

Profile similarity before/after matches 100%.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D151441

show more ...


Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2
# 62c7f035 07-Feb-2023 Archibald Elliott <archibald.elliott@arm.com>

[NFC][TargetParser] Remove llvm/ADT/Triple.h

I also ran `git clang-format` to get the headers in the right order for
the new location, which has changed the order of other headers in two
files.


Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 537cdf92 07-Dec-2022 Elena Lepilkina <elena.lepilkina@syntacore.com>

[llvm-objdump][RISCV] Use new common method to parse ARCH RISCV attribute

Differential Revision: https://reviews.llvm.org/D139553


# 21c4dc79 17-Dec-2022 Fangrui Song <i@maskray.me>

std::optional::value => operator*/operator->

value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_E

std::optional::value => operator*/operator->

value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).

This fixes clang.

show more ...


# 5d7950a4 16-Dec-2022 Hongtao Yu <hoy@fb.com>

[CSSPGO][llvm-profgen] Missing frame inference.

This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call

[CSSPGO][llvm-profgen] Missing frame inference.

This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call elimination (TCE) but could also be extended to supporting other scenarios like frame pointer omission. When a tail called function is sampled, the caller frame will be missing from the call chain because the caller frame is reused for the callee frame. While TCE is beneficial to both perf and reducing stack overflow, a workaround being made in this change aims to find back the missing frames as much as possible.

The idea behind this work is to build a dynamic call graph that consists of only tail call edges constructed from LBR samples and DFS-search for a unique path for a given source frame and target frame on the graph. The unique path will be used to fill in the missing frames between the source and target. Note that only a unique path counts. Multiple paths are treated unreachable since we don't want to overcount for any particular possible path.

A switch --infer-missing-frame is introduced and defaults to be on.

Some testing results:
- 0.4% perf win according to three internal benchmarks.
- About 2/3 of the missing tail call frames can be recovered, according to an internal benchmark.
- 10% more profile generation time.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D139367

show more ...


# 89fab98e 05-Dec-2022 Fangrui Song <i@maskray.me>

[DebugInfo] llvm::Optional => std::optional

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716


Revision tags: llvmorg-15.0.6
# 286223ed 27-Nov-2022 Kazu Hirata <kazu@google.com>

[llvm-profgen] Use std::optional in ProfiledBinary.cpp (NFC)

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasv

[llvm-profgen] Use std::optional in ProfiledBinary.cpp (NFC)

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

show more ...


Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3
# d5a963ab 17-Oct-2022 Hongtao Yu <hoy@fb.com>

[PseudoProbe] Replace relocation with offset for entry probe.

Currently pseudo probe encoding for a function is like:
- For the first probe, a relocation from it to its physical position in the cod

[PseudoProbe] Replace relocation with offset for entry probe.

Currently pseudo probe encoding for a function is like:
- For the first probe, a relocation from it to its physical position in the code body
- For subsequent probes, an incremental offset from the current probe to the previous probe

The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address.

A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name.

Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function

```
Foo() {

Probe 1

Probe 2
}
```

If it is transformed into two binary functions:

```
Foo:


Foo.outlined:

```

The encoding for the two binary functions will be separate:

```

GUID of Foo
Probe 1

GUID of Foo
Sentinel probe of Foo.outlined
Probe 2
```

Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address.

On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues.

The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding.

Reviewed By: wenlei, maksfb

Differential Revision: https://reviews.llvm.org/D135912
Differential Revision: https://reviews.llvm.org/D135914
Differential Revision: https://reviews.llvm.org/D136394

show more ...


# 91cc53d5 26-Oct-2022 wlei <wlei@fb.com>

[llvm-profgen] Do not cache the frame location stack during computing inlined context size

In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cach

[llvm-profgen] Do not cache the frame location stack during computing inlined context size

In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cache the frame location stack.
Measured on one internal service binary, this can save 2GB memory usage and reduce a small run time (avoid one hash search).

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D128859

show more ...


# 46765248 14-Oct-2022 wlei <wlei@fb.com>

[llvm-profgen] Fix inconsistent loading address issues

This is to fix two issues related with loading address:

1) When multiple MMAPs occur and their loading address are different, before it only u

[llvm-profgen] Fix inconsistent loading address issues

This is to fix two issues related with loading address:

1) When multiple MMAPs occur and their loading address are different, before it only used the first MMap as base address, all perf address after it used the wrong base address.

2) For pseudo probe profile, the address is always based on preferred loading address. If the base address is not equal to the preferred loading address, the pseudo probe address query will be wrong.

Solution: Instead of converting the address to offset lazily, right now all the address after parsing are converted on the fly based on preferred loading address in the parsing time. There is no "offset" used in profile generator any more.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D126827

show more ...


Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 611ffcf4 14-Jul-2022 Kazu Hirata <kazu@google.com>

[llvm] Use value instead of getValue (NFC)


# 7e86b13c 28-Jun-2022 wlei <wlei@fb.com>

[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion an

[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.

One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.

Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.

Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127031

show more ...


1234