Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
#
04ebd190 |
| 26-Aug-2024 |
Amir Ayupov <aaupov@fb.com> |
[MC][NFC] Statically allocate storage for decoded pseudo probes and function records
Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`) and function records (`InlineTreeVec`).
Le
[MC][NFC] Statically allocate storage for decoded pseudo probes and function records
Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`) and function records (`InlineTreeVec`).
Leverage that to also shrink sizes of `MCDecodedPseudoProbe`: - Drop Guid since it's accessible via `InlineTree`.
`MCDecodedPseudoProbeInlineTree`: - Keep track of probes and inlinees using `ArrayRef`s now that probes and function records belonging to the same function are allocated contiguously.
This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with 400MiB .pseudo_probe section containing 43M probes and 25M function records.
Depends on: #102774 #102787 #102788
Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm
Reviewed By: wlei-llvm
Pull Request: https://github.com/llvm/llvm-project/pull/102789
show more ...
|
#
ca53611c |
| 23-Aug-2024 |
Kazu Hirata <kazu@google.com> |
[llvm] Use range-based for loops (NFC) (#105861)
|
Revision tags: llvmorg-19.1.0-rc3 |
|
#
242f4e85 |
| 11-Aug-2024 |
Amir Ayupov <aaupov@fb.com> |
[profgen][NFC] Pass parameter as const_ref
Pass `ProbeNode` parameter of `trackInlineesOptimizedAway` as const reference.
Reviewers: wlei-llvm, WenleiHe
Reviewed By: WenleiHe
Pull Request: https:
[profgen][NFC] Pass parameter as const_ref
Pass `ProbeNode` parameter of `trackInlineesOptimizedAway` as const reference.
Reviewers: wlei-llvm, WenleiHe
Reviewed By: WenleiHe
Pull Request: https://github.com/llvm/llvm-project/pull/102787
show more ...
|
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
75bc20ff |
| 06-Jul-2024 |
Kazu Hirata <kazu@google.com> |
[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914)
|
Revision tags: llvmorg-18.1.8 |
|
#
2fa6eaf9 |
| 13-Jun-2024 |
xur-llvm <59886942+xur-llvm@users.noreply.github.com> |
[llvm-profgen] Add support for Linux kenrel profile (#92831)
Add the support to handle Linux kernel perf files. The functionality is
under option -kernel. Note that currently only main kernel (in v
[llvm-profgen] Add support for Linux kenrel profile (#92831)
Add the support to handle Linux kernel perf files. The functionality is
under option -kernel. Note that currently only main kernel (in vmlinux)
is handled: kernel modules are not handled.
---------
Co-authored-by: Han Shen <shenhan@google.com>
show more ...
|
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2 |
|
#
8c03f400 |
| 15-Mar-2024 |
Haohai Wen <haohai.wen@intel.com> |
[llvm-profgen] Support COFF binary (#83972)
Intel Vtune/SEP has supported collecting LBR on Windows and generating
perf-script file which is same format as Linux perf script. This patch
teaches ll
[llvm-profgen] Support COFF binary (#83972)
Intel Vtune/SEP has supported collecting LBR on Windows and generating
perf-script file which is same format as Linux perf script. This patch
teaches llvm-profgen to disassemble COFF binary so that we can do
Sampling based PGO on Windows.
show more ...
|
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2 |
|
#
7ff2dc3b |
| 30-Jan-2024 |
Nathan Lanza <nathanlanza@gmail.com> |
[profgen] Use a 64bit integer for &'ing the loadable address (#79930)
For the linux kernel, the loadable segments start at 0xffff... and thus
the 32 bit integer here was truncating all the meaningf
[profgen] Use a 64bit integer for &'ing the loadable address (#79930)
For the linux kernel, the loadable segments start at 0xffff... and thus
the 32 bit integer here was truncating all the meaningful bits. Grow it
to 64 bits.
show more ...
|
Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4 |
|
#
7a3db658 |
| 23-Oct-2023 |
Hongtao Yu <hoy@fb.com> |
[llvm-profgen] More tweaks to warnings (#68608)
Tweaking warnings more to avoid flooding user log.
|
#
ef0e0adc |
| 17-Oct-2023 |
William Junda Huang <williamjhuang@google.com> |
[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164)
This is phase 2 of the MD5 refactoring on Sample Profile following
https://reviews.llvm.o
[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164)
This is phase 2 of the MD5 refactoring on Sample Profile following
https://reviews.llvm.org/D147740
In previous implementation, when a MD5 Sample Profile is read, the
reader first converts the MD5 values to strings, and then create a
StringRef as if the numerical strings are regular function names, and
later on IPO transformation passes perform string comparison over these
numerical strings for profile matching. This is inefficient since it
causes many small heap allocations.
In this patch I created a class `ProfileFuncRef` that is similar to
`StringRef` but it can represent a hash value directly without any
conversion, and it will be more efficient (I will attach some benchmark
results later) when being used in associative containers.
ProfileFuncRef guarantees the same function name in string form or in
MD5 form has the same hash value, which also fix a few issue in IPO
passes where function matching/lookup only check for function name
string, while returns a no-match if the profile is MD5.
When testing on an internal large profile (> 1 GB, with more than 10
million functions), the full profile load time is reduced from 28 sec to
25 sec in average, and reading function offset table from 0.78s to 0.7s
show more ...
|
Revision tags: llvmorg-17.0.3 |
|
#
96776783 |
| 04-Oct-2023 |
Hongtao Yu <hoy@fb.com> |
[llvm-profgen] Print DWP related warnings under show-detailed-warning (#68019)
Printing DWP related warnings under show-detailed-warning so that they
won't flood user log.
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1 |
|
#
47669af4 |
| 18-Sep-2023 |
Hongtao Yu <hoy@fb.com> |
[llvm-profgen] Ignore inline frames with an emtpy function name (#66678)
Broken debug information can give empty names for an inlined frame, e.g,
```
0x1d605c68: ryKeyINS7_17SmartCounterTypesEEE
[llvm-profgen] Ignore inline frames with an emtpy function name (#66678)
Broken debug information can give empty names for an inlined frame, e.g,
```
0x1d605c68: ryKeyINS7_17SmartCounterTypesEEESt10shared_ptrINS7_15AsyncCacheValueIS9_EEESaIhESt6atomicEEE9fetch_subElSt12memory_order at Filename: edata.h
Function start filename: edata.h
Function start line: 266
Function start address: 0x1d605c68
Line: 267
Column: 0
(inlined by) at Filename: edata.h
Function start filename: edata.h
Function start line: 274
Function start address: 0x1d605c68
Line: 275
Column: 0
(inlined by) _EEEmmEv at Filename: arena.c
Function start filename: arena.c
Function start line: 1303
Line: 1308
Column: 0
```
This patch avoids creating a sample context with an empty function name
by stopping tracking at that frame. This prevents a hash failure that
leads to an ICE, where empty context serves at an empty key for the
underlying MapVector
https://github.com/llvm/llvm-project/blob/7624de5beae2f142abfdb3e32a63c263a586d768/llvm/lib/ProfileData/SampleProfWriter.cpp#L261
show more ...
|
Revision tags: llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
#
01b88dd6 |
| 30-Aug-2023 |
Takuya Shimizu <shimizu2486@gmail.com> |
[NFC] Remove unused variables declared in conditions
D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}` This patch is an NFC fix to suppress t
[NFC] Remove unused variables declared in conditions
D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}` This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch.
Differential Revision: https://reviews.llvm.org/D158016
show more ...
|
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
09742be8 |
| 23-Jun-2023 |
Hongtao Yu <hoy@fb.com> |
[llvm-profgen] Remove target triple check to allow for more targets
Llvm-profgen internally uses the llvm libraries and the MCDesc interface to do disassembling and symblization and it never checks
[llvm-profgen] Remove target triple check to allow for more targets
Llvm-profgen internally uses the llvm libraries and the MCDesc interface to do disassembling and symblization and it never checks against target-specific instruction operators. This makes it quite transparent to targets and a first attempt for an aarch64 binary just works. Therefore I'm removing the unnecessary triple check to unblock for new targets.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D153449
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5 |
|
#
27c37327 |
| 25-May-2023 |
Mark Santaniello <marksan@meta.com> |
Avoid pointless canonicalize when using Dwarf names
CPU profile indicated memcmp was hot due to the two rfind calls in getCanonicalFnName. If UseSymbolTable is false, we can avoid the cost entirely.
Avoid pointless canonicalize when using Dwarf names
CPU profile indicated memcmp was hot due to the two rfind calls in getCanonicalFnName. If UseSymbolTable is false, we can avoid the cost entirely.
For CSSPGO profiles I've measured ~5% speedup with this change.
Profile similarity before/after matches 100%.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D151441
show more ...
|
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2 |
|
#
62c7f035 |
| 07-Feb-2023 |
Archibald Elliott <archibald.elliott@arm.com> |
[NFC][TargetParser] Remove llvm/ADT/Triple.h
I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.
|
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
537cdf92 |
| 07-Dec-2022 |
Elena Lepilkina <elena.lepilkina@syntacore.com> |
[llvm-objdump][RISCV] Use new common method to parse ARCH RISCV attribute
Differential Revision: https://reviews.llvm.org/D139553
|
#
21c4dc79 |
| 17-Dec-2022 |
Fangrui Song <i@maskray.me> |
std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_E
std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes clang.
show more ...
|
#
5d7950a4 |
| 16-Dec-2022 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO][llvm-profgen] Missing frame inference.
This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call
[CSSPGO][llvm-profgen] Missing frame inference.
This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call elimination (TCE) but could also be extended to supporting other scenarios like frame pointer omission. When a tail called function is sampled, the caller frame will be missing from the call chain because the caller frame is reused for the callee frame. While TCE is beneficial to both perf and reducing stack overflow, a workaround being made in this change aims to find back the missing frames as much as possible.
The idea behind this work is to build a dynamic call graph that consists of only tail call edges constructed from LBR samples and DFS-search for a unique path for a given source frame and target frame on the graph. The unique path will be used to fill in the missing frames between the source and target. Note that only a unique path counts. Multiple paths are treated unreachable since we don't want to overcount for any particular possible path.
A switch --infer-missing-frame is introduced and defaults to be on.
Some testing results: - 0.4% perf win according to three internal benchmarks. - About 2/3 of the missing tail call frames can be recovered, according to an internal benchmark. - 10% more profile generation time.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D139367
show more ...
|
#
89fab98e |
| 05-Dec-2022 |
Fangrui Song <i@maskray.me> |
[DebugInfo] llvm::Optional => std::optional
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
|
Revision tags: llvmorg-15.0.6 |
|
#
286223ed |
| 27-Nov-2022 |
Kazu Hirata <kazu@google.com> |
[llvm-profgen] Use std::optional in ProfiledBinary.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasv
[llvm-profgen] Use std::optional in ProfiledBinary.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
show more ...
|
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3 |
|
#
d5a963ab |
| 17-Oct-2022 |
Hongtao Yu <hoy@fb.com> |
[PseudoProbe] Replace relocation with offset for entry probe.
Currently pseudo probe encoding for a function is like: - For the first probe, a relocation from it to its physical position in the cod
[PseudoProbe] Replace relocation with offset for entry probe.
Currently pseudo probe encoding for a function is like: - For the first probe, a relocation from it to its physical position in the code body - For subsequent probes, an incremental offset from the current probe to the previous probe
The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address.
A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name.
Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function
``` Foo() { … Probe 1 … Probe 2 } ```
If it is transformed into two binary functions:
``` Foo: …
Foo.outlined: … ```
The encoding for the two binary functions will be separate:
```
GUID of Foo Probe 1
GUID of Foo Sentinel probe of Foo.outlined Probe 2 ```
Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address.
On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues.
The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding.
Reviewed By: wenlei, maksfb
Differential Revision: https://reviews.llvm.org/D135912 Differential Revision: https://reviews.llvm.org/D135914 Differential Revision: https://reviews.llvm.org/D136394
show more ...
|
#
91cc53d5 |
| 26-Oct-2022 |
wlei <wlei@fb.com> |
[llvm-profgen] Do not cache the frame location stack during computing inlined context size
In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cach
[llvm-profgen] Do not cache the frame location stack during computing inlined context size
In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cache the frame location stack. Measured on one internal service binary, this can save 2GB memory usage and reduce a small run time (avoid one hash search).
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D128859
show more ...
|
#
46765248 |
| 14-Oct-2022 |
wlei <wlei@fb.com> |
[llvm-profgen] Fix inconsistent loading address issues
This is to fix two issues related with loading address:
1) When multiple MMAPs occur and their loading address are different, before it only u
[llvm-profgen] Fix inconsistent loading address issues
This is to fix two issues related with loading address:
1) When multiple MMAPs occur and their loading address are different, before it only used the first MMap as base address, all perf address after it used the wrong base address.
2) For pseudo probe profile, the address is always based on preferred loading address. If the base address is not equal to the preferred loading address, the pseudo probe address query will be wrong.
Solution: Instead of converting the address to offset lazily, right now all the address after parsing are converted on the fly based on preferred loading address in the parsing time. There is no "offset" used in profile generator any more.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D126827
show more ...
|
Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
611ffcf4 |
| 14-Jul-2022 |
Kazu Hirata <kazu@google.com> |
[llvm] Use value instead of getValue (NFC)
|
#
7e86b13c |
| 28-Jun-2022 |
wlei <wlei@fb.com> |
[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie
This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion an
[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie
This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.
One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.
Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.
Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D127031
show more ...
|