SampleProf.cpp - OpenGrok history log for /llvm-project/llvm/lib/ProfileData/SampleProf.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 18cdfa72	17-Jul-2024	Lei Wang <wlei@fb.com>	[SampleFDO] Stale profile call-graph matching (#95135) Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the [SampleFDO] Stale profile call-graph matching (#95135) Profile staleness could be due to function renaming. Given that sample profile loader relies on exact string matching, a trivial change in the function signature( such as `int foo()` --> `long foo()` ) can make the mangled name different, the function profile(including all nested children profile) becomes unavailable. This patch introduces stale profile call-graph level matching, targeting at identifying the trivial function renaming and reusing the old function profile. Some noteworthy details: 1. Extend the LCS based CFG level matching to identify new function. - Extend to match function and profile have different name instead of the exact function name matching. This leverages LCS, i.e during the finding of callsite anchor matching, when two function name are different, try matching the functions instead of return. - In LCS, the equal function check is replaced by `functionMatchesProfile`. - Only try matching functions that are new functions(neither appears on each side). This reduces the matching scope as we don't need to match the originally matched function. 2. Determine the matching by call-site anchor similarity check. - A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to check the renaming for the possible <IRFunc, ProfFunc> pair, use the LCS(diff) matching to compute the equal set and we define: `Similarity = \|equalSet * 2\| / (\|A\| + \|B\|)`. The profile name is marked as renamed if the similarity is above a threshold(`-func-profile-similarity-threshold`) 3. Process the matching in top-down function order - when a caller's is done matching, the new function names are saved for later use, using top-down order will maximize the reused results. - `ProfileNameToFuncMap` is used to save or cache the matching result. 4. Update the original profile at the end using `ProfileNameToFuncMap`. 5. Added a new switch --salvage-unused-profile to control this, default is false. Verified on one Meta's internal big service, confirmed 90%+ of the found renaming pair is good. (There could be incorrect renaming pair if the num of the anchor is small, but checked that those functions are simple cold function) show more ...
# ce03155a	09-Jul-2024	Mircea Trofin <mtrofin@google.com>	[NFC] Coding style fixes: SampleProf (#98208) Also some control flow simplifications. Notably, this doesn't address `sampleprof_error`. I think the style there tries to match `std::error_categ [NFC] Coding style fixes: SampleProf (#98208) Also some control flow simplifications. Notably, this doesn't address `sampleprof_error`. I think the style there tries to match `std::error_category`. Also left `hash_value` as-is, because it matches what we do in Hashing.h show more ...
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# ef0e0adc	17-Oct-2023	William Junda Huang <williamjhuang@google.com>	[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.o [llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164) This is phase 2 of the MD5 refactoring on Sample Profile following https://reviews.llvm.org/D147740 In previous implementation, when a MD5 Sample Profile is read, the reader first converts the MD5 values to strings, and then create a StringRef as if the numerical strings are regular function names, and later on IPO transformation passes perform string comparison over these numerical strings for profile matching. This is inefficient since it causes many small heap allocations. In this patch I created a class `ProfileFuncRef` that is similar to `StringRef` but it can represent a hash value directly without any conversion, and it will be more efficient (I will attach some benchmark results later) when being used in associative containers. ProfileFuncRef guarantees the same function name in string form or in MD5 form has the same hash value, which also fix a few issue in IPO passes where function matching/lookup only check for function name string, while returns a no-match if the profile is MD5. When testing on an internal large profile (> 1 GB, with more than 10 million functions), the full profile load time is reduced from 28 sec to 25 sec in average, and reading function offset table from 0.78s to 0.7s show more ...
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2
# 7624de5b	01-Aug-2023	William Huang <williamjhuang@google.com>	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740 show more ...
Revision tags: llvmorg-17.0.0-rc1
# 1a53b5c3	28-Jul-2023	Aaron Ballman <aaron@aaronballman.com>	Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292. Addressin Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292. Addressing issues found by: https://lab.llvm.org/buildbot/#/builders/245/builds/11732 https://lab.llvm.org/buildbot/#/builders/187/builds/12251 https://lab.llvm.org/buildbot/#/builders/186/builds/11099 https://lab.llvm.org/buildbot/#/builders/182/builds/6976 show more ...
Revision tags: llvmorg-18-init
# 66ba71d9	12-Jul-2023	William Huang <williamjhuang@google.com>	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740 show more ...
# 58056ae2	27-Jun-2023	Haojian Wu <hokein.wu@gmail.com>	Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 12e9c7aaa66b7624b5d7666ce2794d912bf9e4b7. The commi Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 12e9c7aaa66b7624b5d7666ce2794d912bf9e4b7. The commit has broken the buildbot, see comment https://reviews.llvm.org/D147740#4451540 show more ...
# 12e9c7aa	24-Jun-2023	William Huang <williamjhuang@google.com>	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740 show more ...
# c9a8a0e8	24-Jun-2023	Douglas Yung <douglas.yung@sony.com>	Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 31af18bccea95fe1ae8aa2c51cf7c8e92a1c208e. This chan Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map" This reverts commit 31af18bccea95fe1ae8aa2c51cf7c8e92a1c208e. This change is causing build failures on many Windows build bots: https://lab.llvm.org/buildbot/#/builders/216/builds/22833 https://lab.llvm.org/buildbot/#/builders/123/builds/19602 https://lab.llvm.org/buildbot/#/builders/172/builds/28315 https://lab.llvm.org/buildbot/#/builders/119/builds/13870 https://lab.llvm.org/buildbot/#/builders/233/builds/794 https://lab.llvm.org/buildbot/#/builders/235/builds/387 https://lab.llvm.org/buildbot/#/builders/13/builds/36921 https://lab.llvm.org/buildbot/#/builders/127/builds/50510 show more ...
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# 31af18bc	25-May-2023	William Huang <williamjhuang@google.com>	[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. [llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map. The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow. Several changes to note: (1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names. (2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) ) (3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing. (4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable. (5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive) Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%. Reviewed By: davidxl, snehasish Differential Revision: https://reviews.llvm.org/D147740 show more ...
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1
# a98d6a11	27-Mar-2023	wlei <wlei@fb.com>	[SamplePGO] Stale profile matching(part 1) AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profil [SamplePGO] Stale profile matching(part 1) AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profile annotations using the stale profile, which results in unstable or low performing binaries. Currently for source location based profile, once a code change causes a profile mismatch, all the locations afterward are mismatched, the affected samples or inlining info are lost. If we can provide a matching framework to reuse parts of the mismatched profile - aka incremental PGO, it will make PGO more stable, also increase the optimization coverage and boost the performance of binary. This patch is the part 1 of stale profile matching, summary of the implementation: - Added a structure for the matching result:`LocToLocMap`, which is a location to location map meaning the location of current build is matched to the location of the previous build(to be used to query the “stale” profile). - In order to use the matching results for sample query, we need to pass them to all the location queries. For code cleanliness, we added a new pointer field(`IRToProfileLocationMap`) to `FunctionSamples`. - Added a wrapper(`mapIRLocToProfileLoc`) for the query to the location, the location from input IR will be remapped to the matched profile location. - Added a new switch `--salvage-stale-profile`. - Some refactoring for the staleness detection. Test case is in part 2 with the matching algorithm. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D147456 show more ...
# 339b8a00	20-Mar-2023	wlei <wlei@fb.com>	[AutoFDO] Use flattened profiles for profile staleness metrics For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inline [AutoFDO] Use flattened profiles for profile staleness metrics For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching. Example for profile flattening: ``` Original profile: _Z3bazi:20301:1000 1: 1000 3: 2000 5: inline1:1600 1: 600 3: inline2:500 1: 500 Flattened profile: _Z3bazi:18701:1000 1: 1000 3: 2000 5: 600 inline1:600 inline1:1100:600 1: 600 3: 500 inline2: 500 inline2:500:500 1: 500 ``` This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D146452 show more ...
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3
# 1e692113	14-Feb-2023	Fangrui Song <i@maskray.me>	Move global namespace cl::opt inside llvm::
Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# ede60037	29-Jun-2022	Nicolai Hähnle <nicolai.haehnle@amd.com>	ManagedStatic: remove many straightforward uses in llvm (Reapply after revert in e9ce1a588030d8d4004f5d7e443afe46245e9a92 due to Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other ManagedStatic: remove many straightforward uses in llvm (Reapply after revert in e9ce1a588030d8d4004f5d7e443afe46245e9a92 due to Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other than error categories, to be checked in more detail and reapplied separately.) Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it. Differential Revision: https://reviews.llvm.org/D129120 show more ...
# e9ce1a58	10-Jul-2022	Nicolai Hähnle <nicolai.haehnle@amd.com>	Revert "ManagedStatic: remove many straightforward uses in llvm" This reverts commit e6f1f062457c928c18a88c612f39d9e168f65a85. Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
# e6f1f062	29-Jun-2022	Nicolai Hähnle <nicolai.haehnle@amd.com>	ManagedStatic: remove many straightforward uses in llvm Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, ManagedStatic: remove many straightforward uses in llvm Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it. Differential Revision: https://reviews.llvm.org/D129120 show more ...
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5
# 557efc9a	04-Jun-2022	Fangrui Song <i@maskray.me>	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the err [llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines. show more ...
Revision tags: llvmorg-14.0.4
# 1662cfa4	09-May-2022	Hongtao Yu <hoy@fb.com>	[CSSPGO][CSProfileConverter] Remove call target samples when including callee samples into caller. When a flat CS profile is converted to a nested profile, the call target samples for inlined callee [CSSPGO][CSProfileConverter] Remove call target samples when including callee samples into caller. When a flat CS profile is converted to a nested profile, the call target samples for inlined callee contexts are left over in the callsite target map. This could cause indirect call promotion to function improperly. One issue is that the inlined callsites are treated with double amount of samples. The other is the inlined callsites are reconsidered for subsequent PGO ICP. I'm fixing this by excluding call targets from the callsite for inlined targets. While fixing this I found that callsite target sum and the number of body samples for that callsite could be mismatched. {D122609} has an explanation and a fix for that on llvm-profgen side. For now I'm tolerating it in this change. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D125266 show more ...
# 23191a4f	12-May-2022	Hongtao Yu <hoy@fb.com>	[CSSPGO][llvm-profgen] Do not duplicate context profiles into base profile when converting CS flat profile to nested. Recent experiments with our two large internal services showed that duplicating [CSSPGO][llvm-profgen] Do not duplicate context profiles into base profile when converting CS flat profile to nested. Recent experiments with our two large internal services showed that duplicating context profiles into base profile caused code size inflation and didn't deliver good performance compared to no such duplication. It was a trick we made to catch up with the CS flat profile and I'm now turning it off by default. The code size inflation mainly comes from the enriched based profiles. A base profile for a function represents the uninlined (or outlined) portion of the whole function running time. Such portion could be very small if a function is inlined into most of its hot callsites. Duplicating context profiles of the function into its base profiles could cause the outlined body to be hot enough and in turn get many of its callees inlined, thus increases the code size. The size inflation could further cause perf regression. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124796 show more ...
Revision tags: llvmorg-14.0.3
# e36786d1	28-Apr-2022	Hongtao Yu <hoy@fb.com>	[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `P [CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122602 show more ...
Revision tags: llvmorg-14.0.2, llvmorg-14.0.1
# 407c721c	14-Mar-2022	Fangrui Song <i@maskray.me>	[Support] Change zlib::compress to return void With a sufficiently large output buffer, the only failure is Z_MEM_ERROR. Check it and call the noreturn report_bad_alloc_error if applicable. resize_f [Support] Change zlib::compress to return void With a sufficiently large output buffer, the only failure is Z_MEM_ERROR. Check it and call the noreturn report_bad_alloc_error if applicable. resize_for_overwrite may call report_bad_alloc_error as well. Now that there is no other error type, we can replace the return type with void and simplify call sites. Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D121512 show more ...
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# fc97efa4	23-Feb-2022	serge-sans-paille <sguelton@redhat.com>	Cleanup includes: ProfileData Estimation of the impact on preprocessor output: before: 1067349756 after: 1065940348 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-clea Cleanup includes: ProfileData Estimation of the impact on preprocessor output: before: 1067349756 after: 1065940348 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120434 show more ...
# 045f07b7	23-Feb-2022	Fangrui Song <i@maskray.me>	[ProfileData] Remove unused and racy FunctionSamples::Format after D51643 The write may be racy if ThinLTO creates multiple `InProcessThinBackend` instances.
# f0f70ae6	11-Feb-2022	Hongtao Yu <hoy@fb.com>	[CSSPGO] Do not recount callee samples when computing profile summary for nested CS profile. When generating nested CS profile with all calling contexts of a function duplicated into a base profile [CSSPGO] Do not recount callee samples when computing profile summary for nested CS profile. When generating nested CS profile with all calling contexts of a function duplicated into a base profile under `--generate-merged-base-profiles`, do not recount callee samples when computing profile summary. This fixes the profile summary mismatch between flat cs profile and nested cs profile, for both extbinary and text format. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119494 show more ...
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 156b1e6b	15-Dec-2021	Dmitri Gribenko <gribozavr@gmail.com>	Explicitly convert StringRef to std::string for compatibility with libstdc++ 5.4.0 For some reason, the user-defined implicit conversion from StringRef to std::string is not invoked by std::map::emp Explicitly convert StringRef to std::string for compatibility with libstdc++ 5.4.0 For some reason, the user-defined implicit conversion from StringRef to std::string is not invoked by std::map::emplace in libstdc++ 5.4.0, even though it works fine on modern systems. show more ...
12 3 4