#
4f127667 |
| 11-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Set entry counts in BAT YAML profile (#91775)
Align with DataReader::readProfile that sets entry block counts from
FuncBranchData->EntryData.
Test Plan: updated bolt-address-translation-y
[BOLT] Set entry counts in BAT YAML profile (#91775)
Align with DataReader::readProfile that sets entry block counts from
FuncBranchData->EntryData.
Test Plan: updated bolt-address-translation-yaml.test
show more ...
|
#
bbcdd4f4 |
| 11-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Use disambiguated local names in BAT YAML
Align BAT YAML to fdata profile.
Test Plan: updated register-fragments-bolt-symbols.s
Reviewers: dcci, rafaelauler, ayermolo, maksfb
Reviewed By:
[BOLT] Use disambiguated local names in BAT YAML
Align BAT YAML to fdata profile.
Test Plan: updated register-fragments-bolt-symbols.s
Reviewers: dcci, rafaelauler, ayermolo, maksfb
Reviewed By: dcci
Pull Request: https://github.com/llvm/llvm-project/pull/91773
show more ...
|
#
db29f20f |
| 08-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Ignore returns in DataAggregator
Returns are ignored in perf/pre-aggregated/fdata profile reader (see DataReader::convertBranchData). They are also omitted in YAMLProfileWriter by virtue of n
[BOLT] Ignore returns in DataAggregator
Returns are ignored in perf/pre-aggregated/fdata profile reader (see DataReader::convertBranchData). They are also omitted in YAMLProfileWriter by virtue of not having the profile attached to them in the reader, and YAMLProfileWriter converting the profile attached to BinaryFunctions. Thus, return profile is universally ignored across all profile types except BAT YAML.
To make returns ignored for YAML produced in BAT mode, we can: 1) ignore them in YAMLProfileReader, 2) omit them from YAML profile in profile conversion/writing.
The first option is prone to profile staleness issue, where the profiled binary doesn't match the one to be optimized, and thus returns in the profile can no longer be reliably detected (as we don't distinguish them from calls in the profile).
The second option is robust to staleness but requires disassembling the branch source instruction.
Test Plan: Updated bolt-address-translation-yaml.test
Reviewers: rafaelauler, dcci, ayermolo, maksfb
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/90807
show more ...
|
#
f2d71305 |
| 01-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Simplify DataAggregator::getFallthroughsInTrace (#90752)
|
#
5fb59e74 |
| 25-Apr-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Print program stats in perf2bolt/aggregate-only mode (#89763)
|
#
3997f0eb |
| 11-Apr-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Cover all call sites in writeBATYAML
Call site information setting was conditioned on branch information presence for a given block. However, it's possible to have sampled profile lacking one
[BOLT] Cover all call sites in writeBATYAML
Call site information setting was conditioned on branch information presence for a given block. However, it's possible to have sampled profile lacking one or the other for a given basic block.
Iterate over branch profiles and call profiles independently to cover all recorded profile data.
Depends on https://github.com/llvm/llvm-project/pull/87569
Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s
Reviewers: ayermolo, dcci, maksfb, rafaelauler
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/87743
show more ...
|
#
88409926 |
| 11-Apr-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][BAT] Fix handling of split functions
Move BAT parent function lookup outside `getLocationName`, to the scope where we retrieve `FuncBranchData` linked with the function.
Previously DataAggre
[BOLT][BAT] Fix handling of split functions
Move BAT parent function lookup outside `getLocationName`, to the scope where we retrieve `FuncBranchData` linked with the function.
Previously DataAggregator would store branch profile recorded in the split fragment in `FuncBranchData` associated with the fragment, and perform name translation in `getLocationName` for symbol name only. This works for fdata profile which is printed out as-is, but doesn't work with BAT YAML profile writer which requires a combined profile.
The issue necessitated `fixupBATProfile` which partially addressed the issue (reassigned inter-fragment calls back into intra-function branches). However, `fixupBATProfile` fails to address disjoint profiles (i.e. doesn't merge `FuncBranchData` for fragments back into parent). This diff eliminates the need for `fixupBATProfile` by removing the root cause of the issue.
Test Plan: NFC for existing tests
Reviewers: ayermolo, dcci, rafaelauler, maksfb
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/87569
show more ...
|
#
2d3c827c |
| 05-Apr-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Use BAT for YAML profile call target information
Provide a mechanism to resolve call target information for calls from non-BAT functions to BAT functions (`YAMLProfileWriter::convert`). Make
[BOLT] Use BAT for YAML profile call target information
Provide a mechanism to resolve call target information for calls from non-BAT functions to BAT functions (`YAMLProfileWriter::convert`). Make it generic for future use in BAT-to-BAT calls.
Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: ayermolo, maksfb, rafaelauler, dcci
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/86219
show more ...
|
#
213eda15 |
| 25-Mar-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Add CallSiteInfo entries in YAMLBAT (#76896)
Attach call counters to YAML profile, covering inter-function control
flow.
Depends on: https://github.com/llvm/llvm-project/pull/86218
Tes
[BOLT] Add CallSiteInfo entries in YAMLBAT (#76896)
Attach call counters to YAML profile, covering inter-function control
flow.
Depends on: https://github.com/llvm/llvm-project/pull/86218
Test Plan:
Updated bolt/test/X86/bolt-address-translation-yaml.test
show more ...
|
#
d7d2f7ca |
| 24-Mar-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Emit intra-function control flow in YAMLBAT
Attach branch counters to YAML profile, covering intra-function control flow.
Depends on: https://github.com/llvm/llvm-project/pull/86353
Test Pl
[BOLT] Emit intra-function control flow in YAMLBAT
Attach branch counters to YAML profile, covering intra-function control flow.
Depends on: https://github.com/llvm/llvm-project/pull/86353
Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: rafaelauler, dcci, ayermolo, maksfb
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/76911
show more ...
|
#
62806811 |
| 21-Mar-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Output basic YAML profile in BAT mode
Relax assumptions that YAML output is not supported in BAT mode. Set up basic infrastructure for emitting YAML for functions not covered by BAT, such as
[BOLT] Output basic YAML profile in BAT mode
Relax assumptions that YAML output is not supported in BAT mode. Set up basic infrastructure for emitting YAML for functions not covered by BAT, such as from `.bolt.org.text` section (code identical to input binary sans external refs), or non-rewritten functions in non-relocation mode (where the function stays in the same section but BAT mapping is not emitted).
This diff only produces YAML profile for non-BAT functions (skipped, non-simple). YAML profile for BAT functions is added in follow-up diffs: - https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with internal control flow information only (branch profile), - https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile (calls profile).
Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test
Reviewers: ayermolo, dcci, maksfb, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/76910
show more ...
|
#
2abcbbd9 |
| 31-Jan-2024 |
Maksim Panchenko <maks@fb.com> |
[BOLT] Detect Linux kernel based on ELF program headers (#80086)
Check if program header addresses fall into the kernel space to detect a
Linux kernel binary on x86-64.
Delete opts::LinuxKernelM
[BOLT] Detect Linux kernel based on ELF program headers (#80086)
Check if program header addresses fall into the kernel space to detect a
Linux kernel binary on x86-64.
Delete opts::LinuxKernelMode and use BinaryContext::IsLinuxKernel
instead.
show more ...
|
#
ad8fd5b1 |
| 14-Dec-2023 |
Kazu Hirata <kazu@google.com> |
[BOLT] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,end
[BOLT] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.
show more ...
|
#
22bea0c5 |
| 06-Nov-2023 |
Jonathan Davies <jonathan.davies@arm.com> |
[BOLT] Add itrace aggregation for AUX data (#70426)
If you have a perf.data with Arm ETM data the only way to use perf2bolt
with Branch Aggregation is to first run `perf inject --itrace=l64i1us -o
[BOLT] Add itrace aggregation for AUX data (#70426)
If you have a perf.data with Arm ETM data the only way to use perf2bolt
with Branch Aggregation is to first run `perf inject --itrace=l64i1us -o
perf-brstack.data` and then pass the new perf-brstack.data into
perf2bolt. perf2bolt then runs `perf script -F pid,ip,brstack` to
produce the brstacks.
This PR adds `--itrace` arg to perf2bolt to enable Itrace Aggregation.
It takes a string which is what is passed to the `perf script -F
pid,ip,brstack --itrace={0}`. This command produces the brstacks without
having to run perf inject and creating a new perf.data file.
show more ...
|
#
5db75d74 |
| 20-Oct-2023 |
Jonathan Davies <jonathan.davies@arm.com> |
[BOLT] Filter itrace from perf script mmap & task events (#69585)
perf2bolt launches a few perf script commands and stores the output in
temporary files before processing the output and cleaning th
[BOLT] Filter itrace from perf script mmap & task events (#69585)
perf2bolt launches a few perf script commands and stores the output in
temporary files before processing the output and cleaning them up before
it exits.
The command `perf script --show-mmap-events` outputs PERF_RECORD_MMAP2
and instruction tracing data but when processed it only looks for
PERF_RECORD_MMAP2 and the instruction tracing data is ignored. This is
fine for small amounts of instruction trace data but when I've recorded
Arm ETM or Intel PT AUX I get lots of it
By adding `--no-itrace` is will just show the PERF_RECORD_MMAP2 records
and will save on time running the `perf script`, disk space storing the
output & time parsing the output.
It is the same for `perf script --show-task-events` where BOLT is only
interested in the PERF_RECORD_COMM & PERF_RECORD_FORK records.
### Data
| Perf Record | Perf Data Size | MMap Size | MMap No Itrace Size |
|---|---|---|---|
| perf record -e cs_etm/@tmc_etr0/u | 137K | 4468K | 0.632K |
| perf record -e intel_pt//u | 890K | 33378K | 0.673K |
show more ...
|
#
4627446d |
| 12-Sep-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Fix AutoFDO output format after D154120
AutoFDO profile has no leading 0x in hex dumps.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D159507
|
#
ffef4fe0 |
| 11-Sep-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Use formatv in DataAggregator/DataReader prints
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D154120
|
#
d796f36f |
| 31-Jul-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Simplify DataAggregator
Use short loop instead of duplicating the code for setHasProfileAvailable.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D154749
|
#
224e4cc5 |
| 15-Jun-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Sort BranchData in DataAggregator
Align perf reader to fdata behavior by sorting BranchData after reading samples, in the same way as DataReader: https://github.com/llvm/llvm-project/blob/20c
[BOLT] Sort BranchData in DataAggregator
Align perf reader to fdata behavior by sorting BranchData after reading samples, in the same way as DataReader: https://github.com/llvm/llvm-project/blob/20c66a0c66340f44f04b6526e45bcc5d872d480a/bolt/lib/Profile/DataReader.cpp#L1239
Namely, that order affects CallSiteInfo annotations which determine the construction order of CallGraph, which in turn affects function reordering.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D152731
show more ...
|
#
5acac7db |
| 09-Jun-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFCI] Use StringRef.split in launchPerfProcess
Use StringRef method instead of reimplementing the splitting. Incidentally, it also fixes the duplicate printing of the command arguments: ``` P
[BOLT][NFCI] Use StringRef.split in launchPerfProcess
Use StringRef method instead of reimplementing the splitting. Incidentally, it also fixes the duplicate printing of the command arguments: ``` PERF2BOLT: spawning perf job to read branch events Launching perf: /usr/bin/perf script^@-F^@pid,ip,brstack -F^@pid,ip,brstack pid,ip,brstack -f -i PERF2BOLT: spawning perf job to read mem events Launching perf: /usr/bin/perf script^@-F^@pid,event,addr,ip -F^@pid,event,addr,ip pid,event,addr,ip -f -i PERF2BOLT: spawning perf job to read process events Launching perf: /usr/bin/perf script^@--show-mmap-events --show-mmap-events -f -i PERF2BOLT: spawning perf job to read task events Launching perf: /usr/bin/perf script^@--show-task-events --show-task-events -f -i ```
Fixes it to: ``` PERF2BOLT: spawning perf job to read branch events Launching perf: /usr/bin/perf script -F pid,ip,brstack -f -i PERF2BOLT: spawning perf job to read mem events Launching perf: /usr/bin/perf script -F pid,event,addr,ip -f -i PERF2BOLT: spawning perf job to read process events Launching perf: /usr/bin/perf script --show-mmap-events -f -i PERF2BOLT: spawning perf job to read task events Launching perf: /usr/bin/perf script --show-task-events -f -i ```
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D152483
show more ...
|
#
c061f755 |
| 07-Jun-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Handle recursive calls as inter-branches in DataAggregator
Align yaml and fdata profiles by applying the same treatment to recursive calls (direct, indirect, tail). fdata profile increments e
[BOLT] Handle recursive calls as inter-branches in DataAggregator
Align yaml and fdata profiles by applying the same treatment to recursive calls (direct, indirect, tail). fdata profile increments entry count when handling recursive calls. Make perf/pre-aggregated perf reader (DataAggregator) do the same.
Test Plan: In pre-aggregated-perf.test, add a dummy pre-aggregated branch entry between an indirect call in `frame_dummy` function and its entry point. Check that YAML profile gets incremented entry count for this function.
End-to-end test: https://github.com/rafaelauler/bolt-tests/pull/24
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D152338
show more ...
|
#
713b2853 |
| 06-Jun-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Fix debug messages
Fix debug printing, making it easier to compare two debug logs side by side: - `BinaryFunction::addRelocation`: print function name instead of `this` ptr, - `DataAggre
[BOLT][NFC] Fix debug messages
Fix debug printing, making it easier to compare two debug logs side by side: - `BinaryFunction::addRelocation`: print function name instead of `this` ptr, - `DataAggregator::doTrace`: remove duplicated function name.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D152314
show more ...
|
#
a478a091 |
| 05-Jun-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Drop MMap events for deleted files
Don't parse/handle mmap events with "(deleted)" filename.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D151948
|
#
bce889c8 |
| 31-May-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Align BranchInfo and FuncBranchData in DataAggregator::recordTrace
`DataAggregator::recordTrace` serves two purposes: - Attaching LBR fallthrough ("trace") information to CFG (`getBranchInf
[BOLT] Align BranchInfo and FuncBranchData in DataAggregator::recordTrace
`DataAggregator::recordTrace` serves two purposes: - Attaching LBR fallthrough ("trace") information to CFG (`getBranchInfo`), which eventually gets emitted as YAML profile. - Populating vector of offsets that gets added to `FuncBranchData`, which eventually gets emitted as fdata profile.
`recordTrace` is invoked from `getFallthroughsInTrace` which checks its return status and passes on the collected vector of offsets to `doTrace`.
However, if a malformed trace is passed to `recordTrace` it might partially attach the profile to CFG and exit with false, not propagating the vector of offsets to `doTrace`. This leads to a difference between fdata and yaml profile collected from the same binary and the same perf file.
(Skylake LBR errata might produce such malformed traces where the last entry is duplicated, resulting in invalid fallthrough path between the last two entries).
There are two ways to handle this mismatch: conservative (aligned with fdata), or aggressive (aligned with yaml). Conservative approach would discard the trace entirely, buffering the CFG updates until all fallthroughs are confirmed. Aggressive approach would apply CFG updates and return the matching fallthroughs in the vector even if the trace is invalid (doesn't correspond to a valid fallthrough path). I chose to go with the former (conservative/fdata) approach which produces more accurate profile.
We can't rely on pre-filtering such traces early (in LBR sample processing) as DataAggregator is used for both perf samples and pre-aggregated perf information which loses branch stack information.
Test Plan: https://github.com/rafaelauler/bolt-tests/pull/22
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D151614
show more ...
|
#
860543d9 |
| 19-May-2023 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Extract DataAggregator::parseLBRSample
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D150986
|