ProfiledBinary.cpp - OpenGrok history log for /llvm-project/llvm/tools/llvm-profgen/ProfiledBinary.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4
# 04ebd190	26-Aug-2024	Amir Ayupov <aaupov@fb.com>	[MC][NFC] Statically allocate storage for decoded pseudo probes and function records Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`) and function records (`InlineTreeVec`). Le [MC][NFC] Statically allocate storage for decoded pseudo probes and function records Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`) and function records (`InlineTreeVec`). Leverage that to also shrink sizes of `MCDecodedPseudoProbe`: - Drop Guid since it's accessible via `InlineTree`. `MCDecodedPseudoProbeInlineTree`: - Keep track of probes and inlinees using `ArrayRef`s now that probes and function records belonging to the same function are allocated contiguously. This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with 400MiB .pseudo_probe section containing 43M probes and 25M function records. Depends on: #102774 #102787 #102788 Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm Reviewed By: wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/102789 show more ...
# ca53611c	23-Aug-2024	Kazu Hirata <kazu@google.com>	[llvm] Use range-based for loops (NFC) (#105861)
Revision tags: llvmorg-19.1.0-rc3
# 242f4e85	11-Aug-2024	Amir Ayupov <aaupov@fb.com>	[profgen][NFC] Pass parameter as const_ref Pass `ProbeNode` parameter of `trackInlineesOptimizedAway` as const reference. Reviewers: wlei-llvm, WenleiHe Reviewed By: WenleiHe Pull Request: https: [profgen][NFC] Pass parameter as const_ref Pass `ProbeNode` parameter of `trackInlineesOptimizedAway` as const reference. Reviewers: wlei-llvm, WenleiHe Reviewed By: WenleiHe Pull Request: https://github.com/llvm/llvm-project/pull/102787 show more ...
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 75bc20ff	06-Jul-2024	Kazu Hirata <kazu@google.com>	[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#97914)
Revision tags: llvmorg-18.1.8
# 2fa6eaf9	13-Jun-2024	xur-llvm <59886942+xur-llvm@users.noreply.github.com>	[llvm-profgen] Add support for Linux kenrel profile (#92831) Add the support to handle Linux kernel perf files. The functionality is under option -kernel. Note that currently only main kernel (in v [llvm-profgen] Add support for Linux kenrel profile (#92831) Add the support to handle Linux kernel perf files. The functionality is under option -kernel. Note that currently only main kernel (in vmlinux) is handled: kernel modules are not handled. --------- Co-authored-by: Han Shen <shenhan@google.com> show more ...
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2
# 8c03f400	15-Mar-2024	Haohai Wen <haohai.wen@intel.com>	[llvm-profgen] Support COFF binary (#83972) Intel Vtune/SEP has supported collecting LBR on Windows and generating perf-script file which is same format as Linux perf script. This patch teaches ll [llvm-profgen] Support COFF binary (#83972) Intel Vtune/SEP has supported collecting LBR on Windows and generating perf-script file which is same format as Linux perf script. This patch teaches llvm-profgen to disassemble COFF binary so that we can do Sampling based PGO on Windows. show more ...
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2
# 7ff2dc3b	30-Jan-2024	Nathan Lanza <nathanlanza@gmail.com>	[profgen] Use a 64bit integer for &'ing the loadable address (#79930) For the linux kernel, the loadable segments start at 0xffff... and thus the 32 bit integer here was truncating all the meaningf [profgen] Use a 64bit integer for &'ing the loadable address (#79930) For the linux kernel, the loadable segments start at 0xffff... and thus the 32 bit integer here was truncating all the meaningful bits. Grow it to 64 bits. show more ...
Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# 7a3db658	23-Oct-2023	Hongtao Yu <hoy@fb.com>	[llvm-profgen] More tweaks to warnings (#68608) Tweaking warnings more to avoid flooding user log.
# ef0e0adc	17-Oct-2023	William Junda Huang <williamjhuang@google.com>	[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.o [llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.org/D147740 In previous implementation, when a MD5 Sample Profile is read, the reader first converts the MD5 values to strings, and then create a StringRef as if the numerical strings are regular function names, and later on IPO transformation passes perform string comparison over these numerical strings for profile matching. This is inefficient since it causes many small heap allocations. In this patch I created a class `ProfileFuncRef` that is similar to `StringRef` but it can represent a hash value directly without any conversion, and it will be more efficient (I will attach some benchmark results later) when being used in associative containers. ProfileFuncRef guarantees the same function name in string form or in MD5 form has the same hash value, which also fix a few issue in IPO passes where function matching/lookup only check for function name string, while returns a no-match if the profile is MD5. When testing on an internal large profile (> 1 GB, with more than 10 million functions), the full profile load time is reduced from 28 sec to 25 sec in average, and reading function offset table from 0.78s to 0.7s show more ...
Revision tags: llvmorg-17.0.3
# 96776783	04-Oct-2023	Hongtao Yu <hoy@fb.com>	[llvm-profgen] Print DWP related warnings under show-detailed-warning (#68019) Printing DWP related warnings under show-detailed-warning so that they won't flood user log.
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1
# 47669af4	18-Sep-2023	Hongtao Yu <hoy@fb.com>	[llvm-profgen] Ignore inline frames with an emtpy function name (#66678) Broken debug information can give empty names for an inlined frame, e.g, ``` 0x1d605c68: ryKeyINS7_17SmartCounterTypesEEE [llvm-profgen] Ignore inline frames with an emtpy function name (#66678) Broken debug information can give empty names for an inlined frame, e.g, ``` 0x1d605c68: ryKeyINS7_17SmartCounterTypesEEESt10shared_ptrINS7_15AsyncCacheValueIS9_EEESaIhESt6atomicEEE9fetch_subElSt12memory_order at Filename: edata.h Function start filename: edata.h Function start line: 266 Function start address: 0x1d605c68 Line: 267 Column: 0 (inlined by) at Filename: edata.h Function start filename: edata.h Function start line: 274 Function start address: 0x1d605c68 Line: 275 Column: 0 (inlined by) _EEEmmEv at Filename: arena.c Function start filename: arena.c Function start line: 1303 Line: 1308 Column: 0 ``` This patch avoids creating a sample context with an empty function name by stopping tracking at that frame. This prevents a hash failure that leads to an ICE, where empty context serves at an empty key for the underlying MapVector https://github.com/llvm/llvm-project/blob/7624de5beae2f142abfdb3e32a63c263a586d768/llvm/lib/ProfileData/SampleProfWriter.cpp#L261 show more ...
Revision tags: llvmorg-17.0.0, llvmorg-17.0.0-rc4
# 01b88dd6	30-Aug-2023	Takuya Shimizu <shimizu2486@gmail.com>	[NFC] Remove unused variables declared in conditions D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}` This patch is an NFC fix to suppress t [NFC] Remove unused variables declared in conditions D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}` This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch. Differential Revision: https://reviews.llvm.org/D158016 show more ...
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 09742be8	23-Jun-2023	Hongtao Yu <hoy@fb.com>	[llvm-profgen] Remove target triple check to allow for more targets Llvm-profgen internally uses the llvm libraries and the MCDesc interface to do disassembling and symblization and it never checks [llvm-profgen] Remove target triple check to allow for more targets Llvm-profgen internally uses the llvm libraries and the MCDesc interface to do disassembling and symblization and it never checks against target-specific instruction operators. This makes it quite transparent to targets and a first attempt for an aarch64 binary just works. Therefore I'm removing the unnecessary triple check to unblock for new targets. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D153449 show more ...
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# 27c37327	25-May-2023	Mark Santaniello <marksan@meta.com>	Avoid pointless canonicalize when using Dwarf names CPU profile indicated memcmp was hot due to the two rfind calls in getCanonicalFnName. If UseSymbolTable is false, we can avoid the cost entirely. Avoid pointless canonicalize when using Dwarf names CPU profile indicated memcmp was hot due to the two rfind calls in getCanonicalFnName. If UseSymbolTable is false, we can avoid the cost entirely. For CSSPGO profiles I've measured ~5% speedup with this change. Profile similarity before/after matches 100%. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D151441 show more ...
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2
# 62c7f035	07-Feb-2023	Archibald Elliott <archibald.elliott@arm.com>	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 537cdf92	07-Dec-2022	Elena Lepilkina <elena.lepilkina@syntacore.com>	[llvm-objdump][RISCV] Use new common method to parse ARCH RISCV attribute Differential Revision: https://reviews.llvm.org/D139553
# 21c4dc79	17-Dec-2022	Fangrui Song <i@maskray.me>	std::optional::value => operator/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_E std::optional::value => operator/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang. show more ...
# 5d7950a4	16-Dec-2022	Hongtao Yu <hoy@fb.com>	[CSSPGO][llvm-profgen] Missing frame inference. This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call [CSSPGO][llvm-profgen] Missing frame inference. This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call elimination (TCE) but could also be extended to supporting other scenarios like frame pointer omission. When a tail called function is sampled, the caller frame will be missing from the call chain because the caller frame is reused for the callee frame. While TCE is beneficial to both perf and reducing stack overflow, a workaround being made in this change aims to find back the missing frames as much as possible. The idea behind this work is to build a dynamic call graph that consists of only tail call edges constructed from LBR samples and DFS-search for a unique path for a given source frame and target frame on the graph. The unique path will be used to fill in the missing frames between the source and target. Note that only a unique path counts. Multiple paths are treated unreachable since we don't want to overcount for any particular possible path. A switch --infer-missing-frame is introduced and defaults to be on. Some testing results: - 0.4% perf win according to three internal benchmarks. - About 2/3 of the missing tail call frames can be recovered, according to an internal benchmark. - 10% more profile generation time. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D139367 show more ...
# 89fab98e	05-Dec-2022	Fangrui Song <i@maskray.me>	[DebugInfo] llvm::Optional => std::optional https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Revision tags: llvmorg-15.0.6
# 286223ed	27-Nov-2022	Kazu Hirata <kazu@google.com>	[llvm-profgen] Use std::optional in ProfiledBinary.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasv [llvm-profgen] Use std::optional in ProfiledBinary.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716 show more ...
Revision tags: llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3
# d5a963ab	17-Oct-2022	Hongtao Yu <hoy@fb.com>	[PseudoProbe] Replace relocation with offset for entry probe. Currently pseudo probe encoding for a function is like: - For the first probe, a relocation from it to its physical position in the cod [PseudoProbe] Replace relocation with offset for entry probe. Currently pseudo probe encoding for a function is like: - For the first probe, a relocation from it to its physical position in the code body - For subsequent probes, an incremental offset from the current probe to the previous probe The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address. A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name. Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function ``` Foo() { … Probe 1 … Probe 2 } ``` If it is transformed into two binary functions: ``` Foo: … Foo.outlined: … ``` The encoding for the two binary functions will be separate: ``` GUID of Foo Probe 1 GUID of Foo Sentinel probe of Foo.outlined Probe 2 ``` Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address. On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues. The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding. Reviewed By: wenlei, maksfb Differential Revision: https://reviews.llvm.org/D135912 Differential Revision: https://reviews.llvm.org/D135914 Differential Revision: https://reviews.llvm.org/D136394 show more ...
# 91cc53d5	26-Oct-2022	wlei <wlei@fb.com>	[llvm-profgen] Do not cache the frame location stack during computing inlined context size In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cach [llvm-profgen] Do not cache the frame location stack during computing inlined context size In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cache the frame location stack. Measured on one internal service binary, this can save 2GB memory usage and reduce a small run time (avoid one hash search). Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D128859 show more ...
# 46765248	14-Oct-2022	wlei <wlei@fb.com>	[llvm-profgen] Fix inconsistent loading address issues This is to fix two issues related with loading address: 1) When multiple MMAPs occur and their loading address are different, before it only u [llvm-profgen] Fix inconsistent loading address issues This is to fix two issues related with loading address: 1) When multiple MMAPs occur and their loading address are different, before it only used the first MMap as base address, all perf address after it used the wrong base address. 2) For pseudo probe profile, the address is always based on preferred loading address. If the base address is not equal to the preferred loading address, the pseudo probe address query will be wrong. Solution: Instead of converting the address to offset lazily, right now all the address after parsing are converted on the fly based on preferred loading address in the parsing time. There is no "offset" used in profile generator any more. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D126827 show more ...
Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 611ffcf4	14-Jul-2022	Kazu Hirata <kazu@google.com>	[llvm] Use value instead of getValue (NFC)
# 7e86b13c	28-Jun-2022	wlei <wlei@fb.com>	[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion an [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner. One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node. Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic. Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D127031 show more ...
12 3 4