History log of /llvm-project/bolt/lib/Passes/BinaryPasses.cpp (Results 1 – 25 of 69)
Revision Date Author Comments
# 1b8e0cf0 13-Nov-2024 Maksim Panchenko <maks@fb.com>

[BOLT] Never emit "large" functions (#115974)

"Large" functions are functions that are too big to fit into their
original slots after code modifications. CheckLargeFunctions pass is
designed to pr

[BOLT] Never emit "large" functions (#115974)

"Large" functions are functions that are too big to fit into their
original slots after code modifications. CheckLargeFunctions pass is
designed to prevent such functions from emission. Extend this pass to
work with functions with constant islands.

Now that CheckLargeFunctions covers all functions, it guarantees that we
will never see such functions after code emission on all platforms
(previously it was guaranteed on x86 only). Hence, we can get rid of
RewriteInstance extensions that were meant to support "large" functions.

show more ...


# 9a9af0a2 12-Nov-2024 Shaw Young <58664393+shawbyoung@users.noreply.github.com>

[BOLT] Match blocks with pseudo probes (#99891)

Match inline trees first between profile and the binary: by GUID,
checksum, parent, and inline site for inlined functions. Map profile
probes to bin

[BOLT] Match blocks with pseudo probes (#99891)

Match inline trees first between profile and the binary: by GUID,
checksum, parent, and inline site for inlined functions. Map profile
probes to binary probes via matched inline tree nodes. Each binary probe
has an associated binary basic block. If all probes from one profile
basic block map to the same binary basic block, it’s an exact match,
otherwise the block is determined by majority vote and reported as loose
match.

Pseudo probe matching happens between exact hash matching and call/loose
matching.

Introduce ProbeMatchSpec - a mechanism to match probes belonging to
another binary function. For example, given functions foo and bar:
```
void foo() {
bar();
}
```
profiled binary: bar is not inlined => have top-level function bar
new binary where the profile is applied to: bar is inlined into foo.

Currently, BOLT does 1:1 matching between profile functions and binary
functions based on the name. #100446 will extend this to N:M where
multiple profiles can be matched to one binary function (as in the
example above where binary function foo would use profiles for foo and
bar), and one profile can be matched to multiple binary functions (e.g.
if bar was inlined into multiple functions).

In this diff, ProbeMatchSpecs would only have one BinaryFunctionProfile
(existing name-based matching).

Test Plan: Added match-blocks-with-pseudo-probes.test

Performance test:
- Setup:
- Baseline no-BOLT: Clang with pseudo probes, ThinLTO + CSSPGO
(#79942)
- BOLT fresh: BOLTed Clang using fresh profile,
- BOLT stale (hash): BOLTed Clang using stale profile (collected on
Clang 10K commits back), `-infer-stale-profile` (hash+call block
matching)
- BOLT stale (+probe): BOLTed Clang using stale profile,
`-infer-stale-profile` with `-stale-matching-with-pseudo-probes`
(hash+call+pseudo probe block matching)
- 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
build,
- BOLT profiles are collected on Clang compiling large preprocessed
C++ file.
- Benchmark: building Clang (average of 5 runs), see driver in
aaupov/llvm-devmtg-2022
- Results, wall time, lower is better:
- Baseline no-BOLT: 429.52 +- 2.61s,
- BOLT stale (hash): 413.21 +- 2.19s,
- BOLT stale (+probe): 409.69 +- 1.41s,
- BOLT fresh: 384.50 +- 1.80s.

---------

Co-authored-by: Amir Ayupov <aaupov@fb.com>

show more ...


# 6ee5ff95 25-Oct-2024 Amir Ayupov <aaupov@fb.com>

[BOLT] Add profile density computation

Reuse the definition of profile density from llvm-profgen (#92144):
- the density is computed in perf2bolt using raw samples (perf.data or
pre-aggregated dat

[BOLT] Add profile density computation

Reuse the definition of profile density from llvm-profgen (#92144):
- the density is computed in perf2bolt using raw samples (perf.data or
pre-aggregated data),
- function density is the ratio of dynamically executed function bytes
to the static function size in bytes,
- profile density:
- functions are sorted by density in decreasing order, accumulating
their respective sample counts,
- profile density is the smallest density covering 99% of total sample
count.

In other words, BOLT binary profile density is the minimum amount of
profile information per function (excluding functions in tail 1% sample
count) which is sufficient to optimize the binary well.

The density threshold of 60 was determined through experiments with
large binaries by reducing the sample count and checking resulting
profile density and performance. The threshold is conservative.

perf2bolt would print the warning if the density is below the threshold
and suggest to increase the sampling duration and/or frequency to reach
a given density, e.g.:
```
BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples.
```

Test Plan: updated pre-aggregated-perf.test

Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe

Reviewed By: WenleiHe, wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/101094

show more ...


# 344228eb 02-Jul-2024 Amir Ayupov <aaupov@fb.com>

[BOLT] Drop macro-fusion alignment (#97358)

9d0754ada5dbbc0c009bcc2f7824488419cc5530 dropped MC support required for
optimal macro-fusion alignment in BOLT. Remove the support in BOLT as
performan

[BOLT] Drop macro-fusion alignment (#97358)

9d0754ada5dbbc0c009bcc2f7824488419cc5530 dropped MC support required for
optimal macro-fusion alignment in BOLT. Remove the support in BOLT as
performance measurements with large binaries didn't show a significant
improvement.

Test Plan:
macro-fusion alignment was never upstreamed, so no upstream tests are
affected.

show more ...


# c1a7c5ac 20-Jun-2024 aengelke <engelke@in.tum.de>

[MC] Eliminate two symbol-related hash maps (#95464)

Previously, a symbol insertion requires (at least) three hash table
operations:

- Lookup/create entry in Symbols (main symbol table)
- Looku

[MC] Eliminate two symbol-related hash maps (#95464)

Previously, a symbol insertion requires (at least) three hash table
operations:

- Lookup/create entry in Symbols (main symbol table)
- Lookup NextUniqueID to deduplicate identical temporary labels
- Add entry to UsedNames, which is also used to serve as storage for the
symbol name in the MCSymbol.

All three lookups are done with the same name, so combining these into a
single table reduces the number of lookups to one. Thus, a pointer to a
symbol table entry can be passed to createSymbol to avoid a duplicate
lookup of the same name.

The new symbol table entry value is placed in a separate header to avoid
including MCContext in MCSymbol or vice versa.

show more ...


# c8fc234e 22-May-2024 shaw young <58664393+shawbyoung@users.noreply.github.com>

[BOLT][NFC] Eliminate uses of throwing std::map::at (#92950)

Remove calls to std::unordered_map::at, std::map::at, and
std::vector::at.


# a9b67490 22-May-2024 Amir Ayupov <aaupov@fb.com>

[BOLT] Report adjusted program stats from perf2bolt in BAT mode (#91683)


# 87864295 19-May-2024 Amir Ayupov <aaupov@fb.com>

[BOLT] Fix preserved offset in fixDoubleJumps (#92485)


# 54b17fa4 12-May-2024 Amir Ayupov <aaupov@fb.com>

[BOLT] Preserve Offset annotation in fixDoubleJumps (#91898)

Offset annotation was missed when optimizing an unconditional branch to
a tail call.

Test Plan: update bb-with-two-tail-calls.s


# 6b9bca8f 10-May-2024 Amir Ayupov <aaupov@fb.com>

[BOLT] Preserve Offset annotation in SCTC (#91693)

Offset annotation is used in writing BAT tables.

Test Plan: updated sctc-bug4.test


# 6b1cf004 21-Mar-2024 Maksim Panchenko <maks@fb.com>

[BOLT] Add support for Linux kernel static keys jump table (#86090)

Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to t

[BOLT] Add support for Linux kernel static keys jump table (#86090)

Runtime code modification used by static keys is the most ubiquitous
self-modifying feature of the Linux kernel. The idea is to to eliminate
the condition check and associated conditional jump on a hot path if
that condition (based on a boolean value of a static key) does not
change often. Whenever they condition changes, the kernel runtime
modifies all code paths associated with that key flipping the code
between nop and (unconditional) jump.

show more ...


# bba790db 14-Mar-2024 Maksim Panchenko <maks@fb.com>

[BOLT] Refactor instruction creation interface. NFCI (#85292)

Refactor MCPlusBuilder's create{Instruction}() functions that used to
return bool. We almost never check the return value as we rely on

[BOLT] Refactor instruction creation interface. NFCI (#85292)

Refactor MCPlusBuilder's create{Instruction}() functions that used to
return bool. We almost never check the return value as we rely on
llvm_unreachable() to detect unimplemented functionality. There were a
couple of cases that checked the return value, but they would hit the
unreachable condition first (at least in debug builds) before the return
value gets checked.

show more ...


# 52cf0711 12-Feb-2024 Amir Ayupov <aaupov@fb.com>

[BOLT][NFC] Log through JournalingStreams (#81524)

Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augme

[BOLT][NFC] Log through JournalingStreams (#81524)

Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.

In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.

Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC

show more ...


# 13d60ce2 12-Feb-2024 Amir Ayupov <aaupov@fb.com>

[BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes (2/2) (#81523)

As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch cont

[BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes (2/2) (#81523)

As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch continue the migration
on libCore, libRewrite and libPasses to use the new BOLTError
class whenever a failure occurs.

Test Plan: NFC

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

show more ...


# fa7dd491 12-Feb-2024 Amir Ayupov <aaupov@fb.com>

[BOLT][NFC] Add BOLTError and return it from passes (1/2) (#81522)

As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we add a new class
BOLT

[BOLT][NFC] Add BOLTError and return it from passes (1/2) (#81522)

As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we add a new class
BOLTError and auxiliary functions `createFatalBOLTError()` and
`createNonFatalBOLTError()` that allow BOLT code to bubble up the
problem to the caller by using the Error class as a return
type (or Expected). Also changes passes to use these.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC

show more ...


# a5f3d1a8 12-Feb-2024 Amir Ayupov <aaupov@fb.com>

[BOLT][NFC] Return Error from BinaryFunctionPass::runOnFunctions (#81521)

As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we change the
in

[BOLT][NFC] Return Error from BinaryFunctionPass::runOnFunctions (#81521)

As part of the effort to refactor old error handling code that
would directly call exit(1), in this patch we change the
interface to `BinaryFunctionPass` to return an Error on
`runOnFunctions()`. This gives passes the ability to report a
serious problem to the caller (RewriteInstance class), so the
caller may decide how to best handle the exceptional situation.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC

show more ...


# 7fe97f04 08-Feb-2024 Maksim Panchenko <maks@fb.com>

[BOLT] Always run CheckLargeFunctions in non-relocation mode (#80922)

We run CheckLargeFunctions pass in non-relocation mode to prevent the
emission of functions that later could not be written to

[BOLT] Always run CheckLargeFunctions in non-relocation mode (#80922)

We run CheckLargeFunctions pass in non-relocation mode to prevent the
emission of functions that later could not be written to the output due
to their large size. The main reason behind the pass is to prevent the
emission of metadata for such functions since this metadata becomes
incorrect if the function is left unmodified.

Currently, the pass is enabled in non-relocation mode only when debug
info output is also enabled. As we emit increasingly more kinds of
metadata, e.g. for the Linux kernel, it becomes more challenging to
track metadata that needs to be fixed. Hence, I'm enabling the pass to
always run in non-relocation mode.

show more ...


# 8ea7f1d2 07-Feb-2024 Maksim Panchenko <maks@fb.com>

[BOLT][NFCI] Keep instruction annotations (#80382)

We used to delete most instruction annotations before code emission. It
was done to release memory taken by annotations and to reduce overall
mem

[BOLT][NFCI] Keep instruction annotations (#80382)

We used to delete most instruction annotations before code emission. It
was done to release memory taken by annotations and to reduce overall
memory consumption. However, since the implementation of annotations has
moved to using existing instruction operands, the memory overhead
associated with them has reduced drastically. I measured that savings
are less than 0.5% on large binaries and processing time is just
slightly reduced if we keep them. Additionally, I plan to use
annotations in pre-emission passes for the Linux kernel rewriter.

show more ...


# 3c64b24e 01-Feb-2024 Amir Ayupov <aaupov@fb.com>

[BOLT] Add extra staleness logging (#80225)

Report two extra metrics:
- # of stale functions with matching block count,
- # of stale blocks with matching instruction count.


# e9309b27 25-Jan-2024 Amir Ayupov <aaupov@fb.com>

[BOLT] Report input staleness (#79496)

It's beneficial to have uniform reporting in both `infer-stale-profile`
on and off cases, primarily for logging purposes.

Without this change, BOLT would r

[BOLT] Report input staleness (#79496)

It's beneficial to have uniform reporting in both `infer-stale-profile`
on and off cases, primarily for logging purposes.

Without this change, BOLT would report "input" staleness in
`infer-stale-profile=0` case (without matching), and "output" staleness
in `infer-stale-profile=1` case (after matching).

This change makes BOLT report "input" staleness in both cases. "Output"
staleness information is printed separately with "BOLT-INFO: inferred
profile..."

show more ...


# f633f325 14-Nov-2023 Maksim Panchenko <maks@fb.com>

[BOLT] Fix NOP instruction emission on x86 (#72186)

Use MCAsmBackend::writeNopData() interface to emit NOP instructions on
x86. There are multiple forms of NOP instruction on x86 with different
si

[BOLT] Fix NOP instruction emission on x86 (#72186)

Use MCAsmBackend::writeNopData() interface to emit NOP instructions on
x86. There are multiple forms of NOP instruction on x86 with different
sizes. Currently, LLVM's assembly/disassembly does not support all forms
correctly which can lead to a breakage of input code semantics, e.g. if
the program relies on NOP instructions for reserving a patch space.

Add "--keep-nops" option to preserve NOP instructions.

show more ...


# ec4a03c6 13-Nov-2023 Maksim Panchenko <maks@fb.com>

[BOLT] Enhance LowerAnnotations pass. NFCI. (#71847)

After #70147, all primary annotation types are stored directly in the
instruction and hence there's no need for the temporary storage we've
use

[BOLT] Enhance LowerAnnotations pass. NFCI. (#71847)

After #70147, all primary annotation types are stored directly in the
instruction and hence there's no need for the temporary storage we've
used previously for repopulating preserved annotations.

show more ...


# c6c04a83 09-Nov-2023 Vladislav Khmelevsky <och95@yandex.ru>

[BOLT] Run EliminateUnreachableBlocks in parallel (#71299)

The wall time for this pass decreased on my laptop from ~80 sec to 5
sec processing the clang.


# e28c393b 06-Nov-2023 maksfb <maks@fb.com>

[BOLT] Reduce the number of emitted symbols. NFCI. (#70175)

We emit a symbol before an instruction for a number of reasons, e.g. for
tracking LocSyms, debug line, or if the instruction has a label

[BOLT] Reduce the number of emitted symbols. NFCI. (#70175)

We emit a symbol before an instruction for a number of reasons, e.g. for
tracking LocSyms, debug line, or if the instruction has a label
annotation. Currently, we may emit multiple symbols per instruction.

Reuse the same label instead of creating and emitting new ones when
possible. I'm planning to refactor EH labels as well in a separate diff.

Change getLabel() to return a pointer instead of std::optional<> since
an empty label should be treated identically to no label.

show more ...


# 43e9eae6 11-Oct-2023 Job Noorman <jnoorman@igalia.com>

[BOLT] Preserve label annotations for injected functions (#68713)

Needed for instrumentation on RISC-V.


123