History log of /llvm-project/llvm/lib/Transforms/IPO/SampleProfileMatcher.cpp (Results 1 – 8 of 8)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 3a5791e7 04-Nov-2024 Kazu Hirata <kazu@google.com>

[SampleProf] Templatize longestCommonSequence (NFC) (#114633)

This patch moves the implementation of longestCommonSequence to a new
header file.

I'm planning to implement a profile undrifting al

[SampleProf] Templatize longestCommonSequence (NFC) (#114633)

This patch moves the implementation of longestCommonSequence to a new
header file.

I'm planning to implement a profile undrifting algorithm for MemProf
so that the compiler can ingest somewhat stale MemProf profile and
still deliver most of the benefits that would be delivered if the
profile were completely up to date (with no line number or column
number differences).

Since the core undrifting algorithm is the same between MemProf and
AutoFDO, this patch turns longestCommonSequence into a template. The
original longestCommonSequence implementation is repurposed and now
serves as a wrapper around a template specialization.

Note that the usage differences between MemProf and AutoFDO are minor.
For example, I'm planning to use line-column number pair instead of
LineLocation, which uses a discriminator. To identify a function, I'm
planning to use uint64_t GUID instead of FunctionId.

For now, I'm returning matches via a function object InsertMatching
because it's impossible to infer the map type from LineLocation alone.
Specifically:

std::unordered_map<LineLocation, LineLocation>

does not work because we cannot infer the hash functor
LineLocationHash. I could define std::hash<LineLocation>.
Alternatively, in the future, I might switch to DenseMap and define
DenseMapInfo<LineLocation>. This way:

DenseMap<LineLocation, LineLocation>

automatically picks up DenseMapInfo<LineLocation>.

show more ...


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0
# 6e60330a 04-Sep-2024 Lei Wang <wlei@fb.com>

[SampleFDO] Read call-graph matching recovered top-level function profile (#101053)

With extbinary profile format, initial profile loading only reads
profile based on current function names in the

[SampleFDO] Read call-graph matching recovered top-level function profile (#101053)

With extbinary profile format, initial profile loading only reads
profile based on current function names in the module. However, if a
function is renamed, sample loader skips to load its original
profile(which has a different name), we will miss this case. To address
this, we load the top-level profile candidate explicitly for the
matching. If a match is found later, the function profile will be
further preserved for use by the sample loader.

show more ...


Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 18cdfa72 17-Jul-2024 Lei Wang <wlei@fb.com>

[SampleFDO] Stale profile call-graph matching (#95135)

Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the

[SampleFDO] Stale profile call-graph matching (#95135)

Profile staleness could be due to function renaming. Given that sample
profile loader relies on exact string matching, a trivial change in the
function signature( such as `int foo()` --> `long foo()` ) can make the
mangled name different, the function profile(including all nested
children profile) becomes unavailable.

This patch introduces stale profile call-graph level matching, targeting
at identifying the trivial function renaming and reusing the old
function profile.

Some noteworthy details:

1. Extend the LCS based CFG level matching to identify new function.
- Extend to match function and profile have different name instead of
the exact function name matching. This leverages LCS, i.e during the
finding of callsite anchor matching, when two function name are
different, try matching the functions instead of return.
- In LCS, the equal function check is replaced by
`functionMatchesProfile`.
- Only try matching functions that are new functions(neither appears on
each side). This reduces the matching scope as we don't need to match
the originally matched function.
2. Determine the matching by call-site anchor similarity check.
- A new function `functionMatchesProfile(IRFunc, ProfFunc)` is used to
check the renaming for the possible <IRFunc, ProfFunc> pair, use the
LCS(diff) matching to compute the equal set and we define: `Similarity =
|equalSet * 2| / (|A| + |B|)`. The profile name is marked as renamed if
the similarity is above a
threshold(`-func-profile-similarity-threshold`)

3. Process the matching in top-down function order
- when a caller's is done matching, the new function names are saved for
later use, using top-down order will maximize the reused results.
- `ProfileNameToFuncMap` is used to save or cache the matching result.
4. Update the original profile at the end using `ProfileNameToFuncMap`.

5. Added a new switch --salvage-unused-profile to control this, default
is false.

Verified on one Meta's internal big service, confirmed 90%+ of the found
renaming pair is good. (There could be incorrect renaming pair if the
num of the anchor is small, but checked that those functions are simple
cold function)

show more ...


# 21ba91c4 17-Jun-2024 Krzysztof Pszeniczny <kpszeniczny@google.com>

[SamplePGO] Add a cutoff for number of profile matching anchors (#95542)

The algorithm added by PR #87375 can be potentially quadratic in the
number of anchors. This is almost never a problem becau

[SamplePGO] Add a cutoff for number of profile matching anchors (#95542)

The algorithm added by PR #87375 can be potentially quadratic in the
number of anchors. This is almost never a problem because normally
functions have a reasonable number of function calls.

However, in some rare cases of auto-generated code we observed very
large functions that trigger quadratic behaviour here (resulting in
>130GB of peak heap memory usage for clang). Let's add a knob for
controlling the max number of callsites in a function above which stale
profile matching won't be performed.

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6
# 5b6f1511 13-May-2024 Lei Wang <wlei@fb.com>

[SampleFDO] Improve stale profile matching by diff algorithm (#87375)

This change improves the matching algorithm by using the diff algorithm,
the current matching algorithm only processes the call

[SampleFDO] Improve stale profile matching by diff algorithm (#87375)

This change improves the matching algorithm by using the diff algorithm,
the current matching algorithm only processes the callsites grouped by
the same name functions, it doesn't consider the order relationships
between different name functions, this sometimes fails to handle this
ambiguous anchor case. For example. (`Foo:1` means a
calliste[callee_name: callsite_location])
```
IR : foo:1 bar:2 foo:4 bar:5
Profile : bar:3 foo:5 bar:6
```
The `foo:1` is matched to the 2nd `foo:5` and using the diff
algorithm(finding longest common subsequence ) can help on this issue.
One well-known diff algorithm is the Myers diff algorithm(paper "An
O(ND) Difference Algorithm and Its Variations∗" Eugene W. Myers), its
variations have been implemented and used in many famous tools, like the
GNU diff or git diff. It provides an efficient way to find the longest
common subsequence or the shortest edit script through graph searching.
There are several variations/refinements for the algorithm, but as in
our case, the num of function callsites is usually very small, so we
implemented the basic greedy version in this change which should be good
enough.
We observed better matchings and positive perf improvement on our
internal services.

show more ...


# e06d6ed1 02-May-2024 Krzysztof Pszeniczny <kpszeniczny@google.com>

[SamplePGO] Handle FS discriminators in SampleProfileMatcher (#90858)

Currently the code uses FunctionSamples::getCallSiteIdentifier which
will sometimes incorrectly guess that FSAFDO discriminator

[SamplePGO] Handle FS discriminators in SampleProfileMatcher (#90858)

Currently the code uses FunctionSamples::getCallSiteIdentifier which
will sometimes incorrectly guess that FSAFDO discriminators are probe
based and will convert them incorrectly.

This change doesn't affect builds which don't use FSAFDO, it only fixes
sample profile matching with FS discriminators.

The test for this is manually updated to use discriminator value 15,
which is a perfectly valid base discriminator in the FS world, but
satisfies `isPseudoProbeDiscriminator`, so
`getBaseDiscriminatorFromDiscriminator` will incorrectly extract the
probe index from it.

Note: this change only affects how the base discriminators will be
extracted when doing stale profile matching in the IR-level sample
profile loader. It doesn't add stale profile matching to the MIR-level
FS profile loader pass.

show more ...


Revision tags: llvmorg-18.1.5, llvmorg-18.1.4
# 17642c76 03-Apr-2024 Krzysztof Pszeniczny <kpszeniczny@google.com>

[SamplePGO] Support -salvage-stale-profile without probes too (#86116)

Currently -salvage-stale-profile is a no-op if the profile is not
probe-based. We observed that it can help for regular, non-p

[SamplePGO] Support -salvage-stale-profile without probes too (#86116)

Currently -salvage-stale-profile is a no-op if the profile is not
probe-based. We observed that it can help for regular, non-probe- based
profiles too: some of our internal benchmarks show 0.2-0.3% QPS
improvement.

There seems to be no good reason to limit this flag to only work for
probe-based profiles.

show more ...


Revision tags: llvmorg-18.1.3
# 1d99d7a6 29-Mar-2024 Lei Wang <wlei@fb.com>

[SampleFDO][NFC] Refactoring SampleProfileMatcher (#86988)

Move all the stale profile matching stuffs into new files so that it can
be shared for unit testing.