History log of /llvm-project/llvm/lib/Transforms/Utils/SampleProfileInference.cpp (Results 1 – 25 of 27)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6
# 1650f1b3 15-May-2024 Jay Foad <jay.foad@amd.com>

Fix typo "indicies" (#92232)


Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3
# 5675f44c 19-Aug-2023 Kazu Hirata <kazu@google.com>

[Transforms] Remove unnecessary const from a return type (NFC)


Revision tags: llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4
# 2ee4ddae 12-May-2023 spupyrev <spupyrev@fb.com>

profilie inference changes for stale profile matching

This diff facilitates a new stale profile matching in BOLT: D144500

This is a no-op for existing usages of profi (CSSPGO).

Reviewed By: hoy

D

profilie inference changes for stale profile matching

This diff facilitates a new stale profile matching in BOLT: D144500

This is a no-op for existing usages of profi (CSSPGO).

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D150466

show more ...


Revision tags: llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 45b15592 12-Dec-2022 spupyrev <spupyrev@fb.com>

[BOLT] using jump weights in profi

We want to use profile inference (profi) in BOLT for stale profile matching.
This is the second change for existing usages of profi (e.g., CSSPGO):

(i) Added the

[BOLT] using jump weights in profi

We want to use profile inference (profi) in BOLT for stale profile matching.
This is the second change for existing usages of profi (e.g., CSSPGO):

(i) Added the ability to provide (estimated) jump weights for the algorithm. The
goal of the algorithm is to create a valid control flow for a given function
(that is, one in which incoming counts equal outgoing counts for every basic
block while minimally modifying the original input block and jump weights). The
input jump weights will be provided based on collected LBR profiles in BOLT.

(ii) Added the corresponding options to ProfiParams.

(iii) Slightly modified / simplified the construction of the flow network in profi
so as it utilizes fewer auxiliary nodes. This is done by introducing parallel
edges to the network (which is supported by MMF) and reduces the size of the
network from 3*|V| to 2*|V|, where |V| is the number of basic blocks in the
function.

**Inference (profile quality) impact:**
The diff is supposed to be a no-op for the inferred counts. However, our
implementation of MCF is not fully deterministic and might return different
results depending on the input network model. Since we changed the model
construction, there are a few differences in comparison to the original
implementation. I checked manually on an internal benchmark and see a minor
difference (+/- 1 count for certain basic blocks) in just a dozen of instances
(out of 10000+ input functions). Hence, the diff is highly unlikely to have an
impact for existing prod workloads.

**Runtime impact:**
I measure up to 10% speedup for block-only (ie CSSPGO/AutoFDO) inference and up
to 50% speedup for block+jump inference (ie BOLT) in comparison to the original
unoptimized version.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D139870

show more ...


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2
# 61eb12e1 27-Sep-2022 spupyrev <spupyrev@fb.com>

[BOLT] introducing profi params

We want to use profile inference (**profi**) in BOLT for stale profile matching.
To this end, I am making a few changes modifying the interface of the algorithm.
This

[BOLT] introducing profi params

We want to use profile inference (**profi**) in BOLT for stale profile matching.
To this end, I am making a few changes modifying the interface of the algorithm.
This is the first change for existing usages of profi (e.g., CSSPGO):
- introducing an object holding the algorithmic parameters;
- some renaming of existing options;
- dropped unused option, SampleProfileInferEntryCount, as we don't plan to change its default value;
- no changes in the output / tests.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D134756

show more ...


Revision tags: llvmorg-15.0.1, llvmorg-15.0.0
# fedc5973 03-Sep-2022 Kazu Hirata <kazu@google.com>

[llvm] Use range-based for loops (NFC)


# d0166c61 28-Aug-2022 Kazu Hirata <kazu@google.com>

[Utils] Remove redundaunt declarations (NFC)

Identified with readability-redundant-declaration.


# 56ea4f9b 28-Aug-2022 Kazu Hirata <kazu@google.com>

[Transforms] Qualify auto in range-based for loops (NFC)

Identified with readability-qualified-auto.


Revision tags: llvmorg-15.0.0-rc3
# 50724716 14-Aug-2022 Kazu Hirata <kazu@google.com>

[Transforms] Qualify auto in range-based for loops (NFC)

Identified with readability-qualified-auto.


Revision tags: llvmorg-15.0.0-rc2
# 5deb6782 06-Aug-2022 Fangrui Song <i@maskray.me>

Revert "[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508"

This reverts commit 48c74bb2e2a72830f1068823bfc2f6fd4b53d427.
With C++17

Revert "[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508"

This reverts commit 48c74bb2e2a72830f1068823bfc2f6fd4b53d427.
With C++17 the workaround is no longer needed.

show more ...


Revision tags: llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5
# d86a206f 05-Jun-2022 Fangrui Song <i@maskray.me>

Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options


# 557efc9a 04-Jun-2022 Fangrui Song <i@maskray.me>

[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC

Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the err

[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC

Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!`
error. More were added due to cargo cult. Since the error has been removed,
cl::ZeroOrMore is unneeded.

Also remove cl::init(false) while touching the lines.

show more ...


Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 851332a1 09-Mar-2022 Benoit Jacob <benoitjacob@google.com>

Fix linking error, undefined class static constants.

Reviewed By: spupyrev

Differential Revision: https://reviews.llvm.org/D121293


# ce29a042 09-Mar-2022 Vitaly Buka <vitalybuka@google.com>

Revert "Attempt to fix linking issue on the bot"

The issue was fixed with 48c74bb2e2a72830f1068823bfc2f6fd4b53d427

This reverts commit ac423a8c8aa87a128e51f3690afc1405d06b8c9d.


# ac423a8c 08-Mar-2022 Vitaly Buka <vitalybuka@google.com>

Attempt to fix linking issue on the bot


# 48c74bb2 08-Mar-2022 Fangrui Song <i@maskray.me>

[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508

MinBaseDistance may be odr-used by std::max, leading to an undefined symbol linker

[SampleProfileInference] Work around odr-use of const non-inline static data member to fix -O0 builds after D120508

MinBaseDistance may be odr-used by std::max, leading to an undefined symbol linker error:

```
ld.lld: error: undefined symbol: (anonymous namespace)::MinCostMaxFlow::MinBaseDistance
>>> referenced by SampleProfileInference.cpp:744 (/home/ray/llvm-project/llvm/lib/Transforms/Utils/SampleProfileInference.cpp:744)
>>> lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/SampleProfileInference.cpp.o:((anonymous namespace)::FlowAdjuster::jumpDistance(llvm::FlowJump*) const)
```

Since llvm-project is still using C++ 14, workaround it with a cast.

show more ...


Revision tags: llvmorg-14.0.0-rc2
# 81aedab7 24-Feb-2022 spupyrev <spupyrev@fb.com>

introducing some profi flags

Differential Revision: https://reviews.llvm.org/D120508


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init
# f2ade65f 31-Jan-2022 spupyrev <spupyrev@fb.com>

[CSSPGO] Even flow distribution

Differential Revision: https://reviews.llvm.org/D118640


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 13d1364a 10-Jan-2022 spupyrev <spupyrev@fb.com>

A better profi rebalancer

This is an extension of **profi** post-processing step that rebalances counts
in CFGs that have basic blocks w/o probes (aka "unknown" blocks). Specifically,
the new versio

A better profi rebalancer

This is an extension of **profi** post-processing step that rebalances counts
in CFGs that have basic blocks w/o probes (aka "unknown" blocks). Specifically,
the new version finds many more "unknown" subgraphs and marks more "unknown"
basic blocks as hot (which prevents unwanted optimization passes).

I see up to 0.5% perf on some (large) binaries, e.g., clang-10 and gcc-8.

The algorithm is still linear and yields no build time overhead.

show more ...


# 5f4ae564 18-Jan-2022 Jan Svoboda <jan_svoboda@apple.com>

[llvm] Remove uses of `std::vector<bool>`

LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.

This patch does just

[llvm] Remove uses of `std::vector<bool>`

LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.

This patch does just that for llvm.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D117121

show more ...


# e7774f49 26-Dec-2021 Kazu Hirata <kazu@google.com>

Use static_assert instead of assert (NFC)

Identified with misc-static-assert.


# 93a2c291 02-Dec-2021 spupyrev <spupyrev@fb.com>

profi - a flow-based profile inference algorithm: Part III (out of 3)

This is a continuation of D109860 and D109903.

An important challenge for profile inference is caused by the fact that the
samp

profi - a flow-based profile inference algorithm: Part III (out of 3)

This is a continuation of D109860 and D109903.

An important challenge for profile inference is caused by the fact that the
sample profile is collected on a fully optimized binary, while the block and
edge frequencies are consumed on an early stage of the compilation that operates
with a non-optimized IR. As a result, some of the basic blocks may not have
associated sample counts, and it is up to the algorithm to deduce missing
frequencies. The problem is illustrated in the figure where three basic
blocks are not present in the optimized binary and hence, receive no samples
during profiling.

We found that it is beneficial to treat all such blocks equally. Otherwise the
compiler may decide that some blocks are “cold” and apply undesirable
optimizations (e.g., hot-cold splitting) regressing the performance. Therefore,
we want to distribute the counts evenly along the blocks with missing samples.
This is achieved by a post-processing step that identifies "dangling" subgraphs
consisting of basic blocks with no sampled counts; once the subgraphs are
found, we rebalance the flow so as every branch probability is 50:50 within the
subgraphs.

Our experiments indicate up to 1% performance win using the optimization on
some binaries and a significant improvement in the quality of profile counts
(when compared to ground-truth instrumentation-based counts)

{F19093045}

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D109980

show more ...


# 98dd2f9e 02-Dec-2021 spupyrev <spupyrev@fb.com>

profi - a flow-based profile inference algorithm: Part II (out of 3)

This is a continuation of D109860.

Traditional flow-based algorithms cannot guarantee that the resulting edge
frequencies corres

profi - a flow-based profile inference algorithm: Part II (out of 3)

This is a continuation of D109860.

Traditional flow-based algorithms cannot guarantee that the resulting edge
frequencies correspond to a *connected* flow in the control-flow graph. For
example, for an instance in the attached figure, a flow-based (or any other)
inference algorithm may produce an output in which the hot loop is disconnected
from the entry block (refer to the rightmost graph in the figure). Furthermore,
creating a connected minimum-cost maximum flow is a computationally NP-hard
problem. Hence, we apply a post-processing adjustments to the computed flow
by connecting all isolated flow components ("islands").

This feature helps to keep all blocks with sample counts connected and results
in significant performance wins for some binaries.
{F19077343}

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D109903

show more ...


# 22d82949 02-Dec-2021 Kazu Hirata <kazu@google.com>

[llvm] Fix "unused variable" warnings


# 7cc2493d 01-Dec-2021 spupyrev <spupyrev@fb.com>

profi - a flow-based profile inference algorithm: Part I (out of 3)

The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm,

profi - a flow-based profile inference algorithm: Part I (out of 3)

The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.

Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.

The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.

The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.

UPD Dec 1st 2021:
- synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`;
- added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows.

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D109860

show more ...


12