Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
18cdfa72 |
| 17-Jul-2024 |
Lei Wang <wlei@fb.com> |
[SampleFDO] Stale profile call-graph matching (#95135)
Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the
[SampleFDO] Stale profile call-graph matching (#95135)
Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the
function signature( such as `int foo()` --> `long foo()` ) can make the
mangled name different, the function profile(including all nested
children profile) becomes unavailable.
This patch introduces stale profile call-graph level matching, targeting
at identifying the trivial function renaming and reusing the old
function profile.
Some noteworthy details:
1. Extend the LCS based CFG level matching to identify new function.
- Extend to match function and profile have different name instead of
the exact function name matching. This leverages LCS, i.e during the
finding of callsite anchor matching, when two function name are
different, try matching the functions instead of return.
- In LCS, the equal function check is replaced by
`functionMatchesProfile`.
- Only try matching functions that are new functions(neither appears on
each side). This reduces the matching scope as we don't need to match
the originally matched function.
2. Determine the matching by call-site anchor similarity check.
- A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to
check the renaming for the possible <IRFunc, ProfFunc> pair, use the
LCS(diff) matching to compute the equal set and we define: `Similarity =
|equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if
the similarity is above a
threshold(`-func-profile-similarity-threshold`)
3. Process the matching in top-down function order
- when a caller's is done matching, the new function names are saved for
later use, using top-down order will maximize the reused results.
- `ProfileNameToFuncMap` is used to save or cache the matching result.
4. Update the original profile at the end using `ProfileNameToFuncMap`.
5. Added a new switch --salvage-unused-profile to control this, default
is false.
Verified on one Meta's internal big service, confirmed 90%+ of the found
renaming pair is good. (There could be incorrect renaming pair if the
num of the anchor is small, but checked that those functions are simple
cold function)
show more ...
|
#
ce03155a |
| 09-Jul-2024 |
Mircea Trofin <mtrofin@google.com> |
[NFC] Coding style fixes: SampleProf (#98208)
Also some control flow simplifications.
Notably, this doesn't address `sampleprof_error`. I *think* the style
there tries to match `std::error_categ
[NFC] Coding style fixes: SampleProf (#98208)
Also some control flow simplifications.
Notably, this doesn't address `sampleprof_error`. I *think* the style
there tries to match `std::error_category`.
Also left `hash_value` as-is, because it matches what we do in Hashing.h
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4 |
|
#
ef0e0adc |
| 17-Oct-2023 |
William Junda Huang <williamjhuang@google.com> |
[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164)
This is phase 2 of the MD5 refactoring on Sample Profile following
https://reviews.llvm.o
[llvm-profdata] Do not create numerical strings for MD5 function names read from a Sample Profile. (#66164)
This is phase 2 of the MD5 refactoring on Sample Profile following
https://reviews.llvm.org/D147740
In previous implementation, when a MD5 Sample Profile is read, the
reader first converts the MD5 values to strings, and then create a
StringRef as if the numerical strings are regular function names, and
later on IPO transformation passes perform string comparison over these
numerical strings for profile matching. This is inefficient since it
causes many small heap allocations.
In this patch I created a class `ProfileFuncRef` that is similar to
`StringRef` but it can represent a hash value directly without any
conversion, and it will be more efficient (I will attach some benchmark
results later) when being used in associative containers.
ProfileFuncRef guarantees the same function name in string form or in
MD5 form has the same hash value, which also fix a few issue in IPO
passes where function matching/lookup only check for function name
string, while returns a no-match if the profile is MD5.
When testing on an internal large profile (> 1 GB, with more than 10
million functions), the full profile load time is reduced from 28 sec to
25 sec in average, and reading function offset table from 0.78s to 0.7s
show more ...
|
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2 |
|
#
7624de5b |
| 01-Aug-2023 |
William Huang <williamjhuang@google.com> |
[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.
[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
show more ...
|
Revision tags: llvmorg-17.0.0-rc1 |
|
#
1a53b5c3 |
| 28-Jul-2023 |
Aaron Ballman <aaron@aaronballman.com> |
Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292.
Addressin
Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 66ba71d913df7f7cd75e92c0c4265932b7c93292.
Addressing issues found by: https://lab.llvm.org/buildbot/#/builders/245/builds/11732 https://lab.llvm.org/buildbot/#/builders/187/builds/12251 https://lab.llvm.org/buildbot/#/builders/186/builds/11099 https://lab.llvm.org/buildbot/#/builders/182/builds/6976
show more ...
|
Revision tags: llvmorg-18-init |
|
#
66ba71d9 |
| 12-Jul-2023 |
William Huang <williamjhuang@google.com> |
[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.
[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
show more ...
|
#
58056ae2 |
| 27-Jun-2023 |
Haojian Wu <hokein.wu@gmail.com> |
Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 12e9c7aaa66b7624b5d7666ce2794d912bf9e4b7.
The commi
Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 12e9c7aaa66b7624b5d7666ce2794d912bf9e4b7.
The commit has broken the buildbot, see comment https://reviews.llvm.org/D147740#4451540
show more ...
|
#
12e9c7aa |
| 24-Jun-2023 |
William Huang <williamjhuang@google.com> |
[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.
[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
show more ...
|
#
c9a8a0e8 |
| 24-Jun-2023 |
Douglas Yung <douglas.yung@sony.com> |
Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 31af18bccea95fe1ae8aa2c51cf7c8e92a1c208e.
This chan
Revert "[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map"
This reverts commit 31af18bccea95fe1ae8aa2c51cf7c8e92a1c208e.
This change is causing build failures on many Windows build bots:
https://lab.llvm.org/buildbot/#/builders/216/builds/22833 https://lab.llvm.org/buildbot/#/builders/123/builds/19602 https://lab.llvm.org/buildbot/#/builders/172/builds/28315 https://lab.llvm.org/buildbot/#/builders/119/builds/13870 https://lab.llvm.org/buildbot/#/builders/233/builds/794 https://lab.llvm.org/buildbot/#/builders/235/builds/387 https://lab.llvm.org/buildbot/#/builders/13/builds/36921 https://lab.llvm.org/buildbot/#/builders/127/builds/50510
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5 |
|
#
31af18bc |
| 25-May-2023 |
William Huang <williamjhuang@google.com> |
[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader.
[llvm-profdata] Refactoring Sample Profile Reader to increase FDO build speed using MD5 as key to Sample Profile map
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact: When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
show more ...
|
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
#
a98d6a11 |
| 27-Mar-2023 |
wlei <wlei@fb.com> |
[SamplePGO] Stale profile matching(part 1)
AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profil
[SamplePGO] Stale profile matching(part 1)
AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profile annotations using the stale profile, which results in unstable or low performing binaries. Currently for source location based profile, once a code change causes a profile mismatch, all the locations afterward are mismatched, the affected samples or inlining info are lost. If we can provide a matching framework to reuse parts of the mismatched profile - aka incremental PGO, it will make PGO more stable, also increase the optimization coverage and boost the performance of binary.
This patch is the part 1 of stale profile matching, summary of the implementation: - Added a structure for the matching result:`LocToLocMap`, which is a location to location map meaning the location of current build is matched to the location of the previous build(to be used to query the “stale” profile). - In order to use the matching results for sample query, we need to pass them to all the location queries. For code cleanliness, we added a new pointer field(`IRToProfileLocationMap`) to `FunctionSamples`. - Added a wrapper(`mapIRLocToProfileLoc`) for the query to the location, the location from input IR will be remapped to the matched profile location. - Added a new switch `--salvage-stale-profile`. - Some refactoring for the staleness detection.
Test case is in part 2 with the matching algorithm.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D147456
show more ...
|
#
339b8a00 |
| 20-Mar-2023 |
wlei <wlei@fb.com> |
[AutoFDO] Use flattened profiles for profile staleness metrics
For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inline
[AutoFDO] Use flattened profiles for profile staleness metrics
For profile staleness report, before it only counts for the top-level function samples in the nested profile, the samples in the inlinees are ignored. This could affect the quality of the metrics when there are heavily inlined functions. This change adds a feature to flatten the nested profile and we're changing to use flatten profile as the input for stale profile detection and matching. Example for profile flattening:
``` Original profile: _Z3bazi:20301:1000 1: 1000 3: 2000 5: inline1:1600 1: 600 3: inline2:500 1: 500
Flattened profile: _Z3bazi:18701:1000 1: 1000 3: 2000 5: 600 inline1:600 inline1:1100:600 1: 600 3: 500 inline2: 500 inline2:500:500 1: 500 ``` This feature could be useful for offline analysis, like understanding the hotness of each individual function. So I'm adding the support to `llvm-profdata merge` under `--gen-flattened-profile`.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D146452
show more ...
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3 |
|
#
1e692113 |
| 14-Feb-2023 |
Fangrui Song <i@maskray.me> |
Move global namespace cl::opt inside llvm::
|
Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
ede60037 |
| 29-Jun-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a588030d8d4004f5d7e443afe46245e9a92 due to Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other
ManagedStatic: remove many straightforward uses in llvm
(Reapply after revert in e9ce1a588030d8d4004f5d7e443afe46245e9a92 due to Fuchsia test failures. Removed changes in lib/ExecutionEngine/ other than error categories, to be checked in more detail and reapplied separately.)
Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
show more ...
|
#
e9ce1a58 |
| 10-Jul-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
Revert "ManagedStatic: remove many straightforward uses in llvm"
This reverts commit e6f1f062457c928c18a88c612f39d9e168f65a85.
Reverting due to a failure on the fuchsia-x86_64-linux buildbot.
|
#
e6f1f062 |
| 29-Jun-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases,
ManagedStatic: remove many straightforward uses in llvm
Bulk remove many of the more trivial uses of ManagedStatic in the llvm directory, either by defining a new getter function or, in many cases, moving the static variable directly into the only function that uses it.
Differential Revision: https://reviews.llvm.org/D129120
show more ...
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
#
557efc9a |
| 04-Jun-2022 |
Fangrui Song <i@maskray.me> |
[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the err
[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
show more ...
|
Revision tags: llvmorg-14.0.4 |
|
#
1662cfa4 |
| 09-May-2022 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO][CSProfileConverter] Remove call target samples when including callee samples into caller.
When a flat CS profile is converted to a nested profile, the call target samples for inlined callee
[CSSPGO][CSProfileConverter] Remove call target samples when including callee samples into caller.
When a flat CS profile is converted to a nested profile, the call target samples for inlined callee contexts are left over in the callsite target map. This could cause indirect call promotion to function improperly. One issue is that the inlined callsites are treated with double amount of samples. The other is the inlined callsites are reconsidered for subsequent PGO ICP.
I'm fixing this by excluding call targets from the callsite for inlined targets. While fixing this I found that callsite target sum and the number of body samples for that callsite could be mismatched. {D122609} has an explanation and a fix for that on llvm-profgen side. For now I'm tolerating it in this change.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D125266
show more ...
|
#
23191a4f |
| 12-May-2022 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO][llvm-profgen] Do not duplicate context profiles into base profile when converting CS flat profile to nested.
Recent experiments with our two large internal services showed that duplicating
[CSSPGO][llvm-profgen] Do not duplicate context profiles into base profile when converting CS flat profile to nested.
Recent experiments with our two large internal services showed that duplicating context profiles into base profile caused code size inflation and didn't deliver good performance compared to no such duplication. It was a trick we made to catch up with the CS flat profile and I'm now turning it off by default.
The code size inflation mainly comes from the enriched based profiles. A base profile for a function represents the uninlined (or outlined) portion of the whole function running time. Such portion could be very small if a function is inlined into most of its hot callsites. Duplicating context profiles of the function into its base profiles could cause the outlined body to be hot enough and in turn get many of its callees inlined, thus increases the code size. The size inflation could further cause perf regression.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D124796
show more ...
|
Revision tags: llvmorg-14.0.3 |
|
#
e36786d1 |
| 28-Apr-2022 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat
To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `P
[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat
To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D122602
show more ...
|
Revision tags: llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
407c721c |
| 14-Mar-2022 |
Fangrui Song <i@maskray.me> |
[Support] Change zlib::compress to return void
With a sufficiently large output buffer, the only failure is Z_MEM_ERROR. Check it and call the noreturn report_bad_alloc_error if applicable. resize_f
[Support] Change zlib::compress to return void
With a sufficiently large output buffer, the only failure is Z_MEM_ERROR. Check it and call the noreturn report_bad_alloc_error if applicable. resize_for_overwrite may call report_bad_alloc_error as well.
Now that there is no other error type, we can replace the return type with void and simplify call sites.
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D121512
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
#
fc97efa4 |
| 23-Feb-2022 |
serge-sans-paille <sguelton@redhat.com> |
Cleanup includes: ProfileData
Estimation of the impact on preprocessor output:
before: 1067349756 after: 1065940348
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-clea
Cleanup includes: ProfileData
Estimation of the impact on preprocessor output:
before: 1067349756 after: 1065940348
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120434
show more ...
|
#
045f07b7 |
| 23-Feb-2022 |
Fangrui Song <i@maskray.me> |
[ProfileData] Remove unused and racy FunctionSamples::Format after D51643
The write may be racy if ThinLTO creates multiple `InProcessThinBackend` instances.
|
#
f0f70ae6 |
| 11-Feb-2022 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Do not recount callee samples when computing profile summary for nested CS profile.
When generating nested CS profile with all calling contexts of a function duplicated into a base profile
[CSSPGO] Do not recount callee samples when computing profile summary for nested CS profile.
When generating nested CS profile with all calling contexts of a function duplicated into a base profile under `--generate-merged-base-profiles`, do not recount callee samples when computing profile summary. This fixes the profile summary mismatch between flat cs profile and nested cs profile, for both extbinary and text format.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D119494
show more ...
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
156b1e6b |
| 15-Dec-2021 |
Dmitri Gribenko <gribozavr@gmail.com> |
Explicitly convert StringRef to std::string for compatibility with libstdc++ 5.4.0
For some reason, the user-defined implicit conversion from StringRef to std::string is not invoked by std::map::emp
Explicitly convert StringRef to std::string for compatibility with libstdc++ 5.4.0
For some reason, the user-defined implicit conversion from StringRef to std::string is not invoked by std::map::emplace in libstdc++ 5.4.0, even though it works fine on modern systems.
show more ...
|