pseudo-probe-inline.ll - OpenGrok history log for /llvm-project/llvm/test/Transforms/SampleProfile/pseudo-probe-inline.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-18.1.8
# e20b9047	07-Jun-2024	Lei Wang <wlei@fb.com>	[PseudoProbe] Make probe discriminator compatible with dwarf base discriminator (#94506) It's useful if the probe-based build can consume a dwarf based profile(e.g. the profile transition), before [PseudoProbe] Make probe discriminator compatible with dwarf base discriminator (#94506) It's useful if the probe-based build can consume a dwarf based profile(e.g. the profile transition), before there is a conflict for the discriminator, this change tries to mitigate the issue by encoding the dwarf base discriminator into the probe discriminator. As the num of probe id(num of basic block and calls) starts from 1, there are some unused space. We try to reuse some bit of the probe id. The new encode rule is: - Use a bit to [28:28] to indicate whether dwarf base discriminator is encoded.(fortunately we can borrow this bit from the `PseudoProbeType`) - If the bit is set, use [15:3] for probe id, [18:16] for dwarf base discriminator. Otherwise, still use [18:3] for probe id. Note that these doesn't affect the original probe id capacity, we still prioritize probe id encoding, i.e. the base discriminator is not encoded when probe id is bigger than [15:3]. Then adjust `getBaseDiscriminatorFromDiscriminator` to use the base discriminator from the probe discriminator. show more ...
Revision tags: llvmorg-18.1.7
# 5a23d31c	28-May-2024	William Junda Huang <williamjhuang@google.com>	[Sample Profile] Check hot callsite threshold when inlining a function with a sample profile (#93286) Currently if a callsite is hot as determined by the sample profile, it is unconditionally inlin [Sample Profile] Check hot callsite threshold when inlining a function with a sample profile (#93286) Currently if a callsite is hot as determined by the sample profile, it is unconditionally inlined barring invalid cases (such as recursion). Inline cost check should still apply because a function's hotness and its inline cost are two different things. For example if a function is calling another very large function multiple times (at different code paths), the large function should not be inlined even if its hot. show more ...
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0
# 6b856abc	08-Sep-2023	Hongtao Yu <hoy@fb.com>	[PseudoProbe] Use probe id as the base dwarf discriminator for callsites (#65685) With `-fpseudo-probe-for-profiling`, the dwarf discriminator for a callsite will be overwritten to pseudo probe rel [PseudoProbe] Use probe id as the base dwarf discriminator for callsites (#65685) With `-fpseudo-probe-for-profiling`, the dwarf discriminator for a callsite will be overwritten to pseudo probe related information for that callsite. The probe information is encoded in a special format (i.e., with all lowest three digits be one) in order to be distinguished from regular dwarf discriminator. The special encoding format will be decoded to zero by the regular discriminator logic. This means all callsites would have a zero discriminator in both the sample profile and the compiler, for classic AutoFDO. This is inconvenient in that no decent classic AutoFDO can be generated from a pseudo probe build. I'm mitigating the issue by allowing callsite probe id to be used as the base dwarf discriminator for classic AutoFDO, since probe id is also unique and can be used to differentiate callsites on the same source line. show more ...
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 0271ae65	18-Jul-2022	Fangrui Song <i@maskray.me>	[test] Change test/SampleProfile to use opaque pointers
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5
# bc856eb3	01-Jun-2022	Mingming Liu <mingmingl@google.com>	[SampleProfile][Inline] Annotate sample profile inline remarks with link phase (prelink/postlink) information. Differential Revision: https://reviews.llvm.org/D126833
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3
# 4e24ca1c	15-Sep-2020	Hongtao Yu <hoy@fb.com>	[CSSPGO] Turn on Profi by default As titled. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D115011
# 76093b17	10-Aug-2021	Fangrui Song <i@maskray.me>	[InlineAdvisor] Add single quotes around caller/callee names Clang diagnostics refer to identifier names in quotes. This patch makes inline remarks conform to the convention. New behavior: ``` % cl [InlineAdvisor] Add single quotes around caller/callee names Clang diagnostics refer to identifier names in quotes. This patch makes inline remarks conform to the convention. New behavior: ``` % clang -O2 -Rpass=inline -Rpass-missed=inline -S a.c a.c:4:25: remark: 'foo' inlined into 'bar' with (cost=-30, threshold=337) at callsite bar:0:25; [-Rpass=inline] int bar(int a) { return foo(a); } ^ ``` Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D107791 show more ...
# 3d89b3cb	11-Dec-2020	Hongtao Yu <hoy@fb.com>	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time [CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264 show more ...
# 1645f465	20-Jan-2021	Wenlei He <aktoon@gmail.com>	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. This is resubmit of D95024, with build break and overtighten assertion fixed. Test Plan: show more ...
# 48ca6da9	02-Feb-2021	Adrian Kuegel <akuegel@google.com>	Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline" This reverts commit 9a03058d6322edb8abc803ba3e436cc62647d979.
# 9a03058d	20-Jan-2021	Wenlei He <aktoon@gmail.com>	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop [CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. Test Plan: Differential Revision: https://reviews.llvm.org/D95024 show more ...
# 6bae5973	04-Jan-2021	Wenlei He <aktoon@gmail.com>	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the ben [CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001 show more ...
# 224fee82	01-Feb-2021	Hongtao Yu <hoy@fb.com>	[CSSPGO] Tweaking inlining with pseudo probes. Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites. Also fixing an issue in the extbinary profi [CSSPGO] Tweaking inlining with pseudo probes. Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites. Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95791 show more ...