Revision tags: llvmorg-21-init |
|
#
e6c9cd9c |
| 21-Jan-2025 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Drop parsing sample PC when processing LBR perf data (#123420)
Remove options to generate autofdo data (unused) and `use-event-pc` (not beneficial).
Cuts down perf2bolt time for 11GB perf.da
[BOLT] Drop parsing sample PC when processing LBR perf data (#123420)
Remove options to generate autofdo data (unused) and `use-event-pc` (not beneficial).
Cuts down perf2bolt time for 11GB perf.data by 40s (11:10->10:30).
show more ...
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5 |
|
#
51003076 |
| 02-Dec-2024 |
Paschalis Mpeis <paschalis.mpeis@arm.com> |
Reapply [BOLT] DataAggregator support for binaries with multiple text segments (#118023)
When a binary has multiple text segments, the Size is computed as the
difference of the last address of thes
Reapply [BOLT] DataAggregator support for binaries with multiple text segments (#118023)
When a binary has multiple text segments, the Size is computed as the
difference of the last address of these segments from the BaseAddress.
The base addresses of all text segments must be the same.
Introduces flag 'perf-script-events' for testing, which allows passing
perf events without BOLT having to parse them by invoking 'perf script'.
The flag is used to pass a mock perf profile that has two memory
mappings for a mock binary that has two text segments. The mapping
size is updated as `parseMMapEvents` now processes all text segments.
show more ...
|
#
537343de |
| 26-Nov-2024 |
Hans Wennborg <hans@chromium.org> |
Revert "[BOLT] DataAggregator support for binaries with multiple text segments (#92815)"
This caused test failures, see comment on the PR:
Failed Tests (2): BOLT-Unit :: Core/./CoreTests/AArc
Revert "[BOLT] DataAggregator support for binaries with multiple text segments (#92815)"
This caused test failures, see comment on the PR:
Failed Tests (2): BOLT-Unit :: Core/./CoreTests/AArch64/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0 BOLT-Unit :: Core/./CoreTests/X86/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0
> When a binary has multiple text segments, the Size is computed as the > difference of the last address of these segments from the BaseAddress. > The base addresses of all text segments must be the same. > > Introduces flag 'perf-script-events' for testing. It allows passing perf events > without BOLT having to parse them using 'perf script'. The flag is used to > pass a mock perf profile that has two memory mappings for a mock binary > that has two text segments. The size of the mapping is updated as this > change `parseMMapEvents` processes all text segments.
This reverts commit 4b71b3782d217db0138b701c4514bd2168ca1659.
show more ...
|
#
4b71b378 |
| 25-Nov-2024 |
Paschalis Mpeis <paschalis.mpeis@arm.com> |
[BOLT] DataAggregator support for binaries with multiple text segments (#92815)
When a binary has multiple text segments, the Size is computed as the
difference of the last address of these segment
[BOLT] DataAggregator support for binaries with multiple text segments (#92815)
When a binary has multiple text segments, the Size is computed as the
difference of the last address of these segments from the BaseAddress.
The base addresses of all text segments must be the same.
Introduces flag 'perf-script-events' for testing. It allows passing perf events
without BOLT having to parse them using 'perf script'. The flag is used to
pass a mock perf profile that has two memory mappings for a mock binary
that has two text segments. The size of the mapping is updated as this
change `parseMMapEvents` processes all text segments.
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
74e6478f |
| 08-Nov-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Set call to continuation count in pre-aggregated profile
#109683 identified an issue with pre-aggregated profile where a call to continuation fallthrough edge count is missing (profile disco
[BOLT] Set call to continuation count in pre-aggregated profile
#109683 identified an issue with pre-aggregated profile where a call to continuation fallthrough edge count is missing (profile discontinuity).
This issue only affects pre-aggregated profile but not perf data since LBR stack has the necessary information to determine if the trace (fall- through) starts at call continuation, whereas pre-aggregated fallthrough lacks this information.
The solution is to look at branch records in pre-aggregated profiles that correspond to returns and assign counts to call to continuation fallthrough: - BranchFrom is in another function or DSO, - BranchTo may be a call continuation site: - not an entry point/landing pad.
Note that we can't directly check if BranchFrom corresponds to a return instruction if it's in external DSO.
Keep call continuation handling for perf data (`getFallthroughsInTrace`) [1] as-is due to marginally better performance. The difference is that return-converted call to continuation fallthrough is slightly more frequent than other fallthroughs since the former only requires one LBR address while the latter need two that belong to the profiled binary. Hence return-converted fallthroughs have larger "weight" which affects code layout.
[1] `DataAggregator::getFallthroughsInTrace` https://github.com/llvm/llvm-project/blob/fea18afeed39fe4435d67eee1834f0f34b23013d/bolt/lib/Profile/DataAggregator.cpp#L906-L915
Test Plan: added callcont-fallthru.s
Reviewers: maksfb, ayermolo, ShatianWang, dcci
Reviewed By: maksfb, ShatianWang
Pull Request: https://github.com/llvm/llvm-project/pull/109486
show more ...
|
Revision tags: llvmorg-19.1.3 |
|
#
6ee5ff95 |
| 25-Oct-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Add profile density computation
Reuse the definition of profile density from llvm-profgen (#92144): - the density is computed in perf2bolt using raw samples (perf.data or pre-aggregated dat
[BOLT] Add profile density computation
Reuse the definition of profile density from llvm-profgen (#92144): - the density is computed in perf2bolt using raw samples (perf.data or pre-aggregated data), - function density is the ratio of dynamically executed function bytes to the static function size in bytes, - profile density: - functions are sorted by density in decreasing order, accumulating their respective sample counts, - profile density is the smallest density covering 99% of total sample count.
In other words, BOLT binary profile density is the minimum amount of profile information per function (excluding functions in tail 1% sample count) which is sufficient to optimize the binary well.
The density threshold of 60 was determined through experiments with large binaries by reducing the sample count and checking resulting profile density and performance. The threshold is conservative.
perf2bolt would print the warning if the density is below the threshold and suggest to increase the sampling duration and/or frequency to reach a given density, e.g.: ``` BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples. ```
Test Plan: updated pre-aggregated-perf.test
Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe
Reviewed By: WenleiHe, wlei-llvm
Pull Request: https://github.com/llvm/llvm-project/pull/101094
show more ...
|
#
08916cef |
| 25-Oct-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Set RawBranchCount in DataAggregator
Align DataAggregator (Linux perf and pre-aggregated profile reader) to DataReader (fdata profile reader) behavior: set BF->RawBranchCount which is used in
[BOLT] Set RawBranchCount in DataAggregator
Align DataAggregator (Linux perf and pre-aggregated profile reader) to DataReader (fdata profile reader) behavior: set BF->RawBranchCount which is used in profile density computation (#101094).
Reviewers: ayermolo, maksfb, dcci, rafaelauler, WenleiHe
Reviewed By: WenleiHe
Pull Request: https://github.com/llvm/llvm-project/pull/101093
show more ...
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1 |
|
#
6d216fb7 |
| 23-Sep-2024 |
Kristof Beyls <kristof.beyls@arm.com> |
[perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397)
… segments in Elf binary.
The heuristic is improved by also taking into account that only
executable segments sho
[perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397)
… segments in Elf binary.
The heuristic is improved by also taking into account that only
executable segments should contain instructions.
Fixes #109384.
show more ...
|
Revision tags: llvmorg-19.1.0 |
|
#
c00c62c1 |
| 13-Sep-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Add pseudo probe inline tree to YAML profile
Add probe inline tree information to YAML profile, at function level: - function GUID, - checksum, - parent node id, - call site in the parent.
T
[BOLT] Add pseudo probe inline tree to YAML profile
Add probe inline tree information to YAML profile, at function level: - function GUID, - checksum, - parent node id, - call site in the parent.
This information is used for pseudo probe block matching (#99891).
The encoding adds/changes probe information in multiple levels of YAML profile: - BinaryProfile: add pseudo_probe_desc with GUIDs and Hashes, which permits deduplication of data: - many GUIDs are duplicate as the same callee is commonly inlined into multiple callers, - hashes are also very repetitive, especially for functions with low block counts. - FunctionProfile: add inline tree (see above). Top-level function is included as root of function inline tree, which makes guid and pseudo_probe_desc_hash fields redundant. - BlockProfile: densely-encoded block probe information: - probes reference their containing inline tree node, - separate lists for block, call, indirect call probes, - block probe encoding is specialized: ids are encoded as bitset in uint64_t. If only block probe with id=1 is present, it's encoded as implicit entry (id=0, omitted). - inline tree nodes with identical probes share probe description where node indices are combined into a list.
On top of #107970, profile with new probe encoding has the following characteristics (profile for a large binary):
- Profile without probe information: 33MB, 3.8MB compressed (baseline). - Profile with inline tree information: 92MB, 14MB compressed.
Profile processing time (YAML parsing, inference, attaching steps): - profile without pseudo probes: 5s, - profile with pseudo probes, without pseudo probe matching: 11s, - with pseudo probe matching: 12.5s.
Test Plan: updated pseudoprobe-decoding-inline.test
Reviewers: wlei-llvm, ayermolo, rafaelauler, dcci, maksfb
Reviewed By: wlei-llvm, rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/107137
show more ...
|
#
ccc7a072 |
| 11-Sep-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Drop blocks without profile in BAT YAML (#107970)
Align BAT YAML (DataAggregator) to YAMLProfileWriter which drops blocks
without profile:
https://github.com/llvm/llvm-project/blob/61372f
[BOLT] Drop blocks without profile in BAT YAML (#107970)
Align BAT YAML (DataAggregator) to YAMLProfileWriter which drops blocks
without profile:
https://github.com/llvm/llvm-project/blob/61372fc5db9b14fd612be8a58a76edd7f0ee38aa/bolt/lib/Profile/YAMLProfileWriter.cpp#L162-L176
Test Plan: NFCI
show more ...
|
#
c820bd3e |
| 11-Sep-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Rename profile-use-pseudo-probes
The flag currently controls writing of probe information in YAML profile. #99891 adds a separate flag to use probe information for stale profile matching
[BOLT][NFC] Rename profile-use-pseudo-probes
The flag currently controls writing of probe information in YAML profile. #99891 adds a separate flag to use probe information for stale profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer and `profile-write-pseudo-probes` better captures the intent.
Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/106364
show more ...
|
Revision tags: llvmorg-19.1.0-rc4 |
|
#
ee09f7d1 |
| 26-Aug-2024 |
Amir Ayupov <aaupov@fb.com> |
[MC][NFC] Reduce Address2ProbesMap size
Replace the map from addresses to list of probes with a flat vector containing probe references sorted by their addresses.
Reduces pseudo probe parsing time
[MC][NFC] Reduce Address2ProbesMap size
Replace the map from addresses to list of probes with a flat vector containing probe references sorted by their addresses.
Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from 9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary.
Test Plan: ``` bin/llvm-lit -sv test/tools/llvm-profgen ```
Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm
Reviewed By: wlei-llvm
Pull Request: https://github.com/llvm/llvm-project/pull/102904
show more ...
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1 |
|
#
4d19676d |
| 24-Jul-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Add profile-use-pseudo-probes option
Move pseudo probe profile generation under --profile-use-pseudo-probes option. Note that updating pseudo probes is independent from this flag.
Test Plan:
[BOLT] Add profile-use-pseudo-probes option
Move pseudo probe profile generation under --profile-use-pseudo-probes option. Note that updating pseudo probes is independent from this flag.
Test Plan: updated pseudoprobe-decoding-inline.test
Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe
Reviewed By: WenleiHe
Pull Request: https://github.com/llvm/llvm-project/pull/100299
show more ...
|
Revision tags: llvmorg-20-init |
|
#
c905db67 |
| 19-Jul-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Attach pseudo probes to blocks in YAML profile
Read pseudo probes in regular and BAT YAML profile generation, and attach them to YAML profile basic blocks. This exposes GUID, probe id, and pr
[BOLT] Attach pseudo probes to blocks in YAML profile
Read pseudo probes in regular and BAT YAML profile generation, and attach them to YAML profile basic blocks. This exposes GUID, probe id, and probe type in profile for future use in stale profile matching.
Test Plan: updated pseudoprobe-decoding-inline.test
Reviewers: dcci, rafaelauler, ayermolo, maksfb
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/99554
show more ...
|
#
9b007a19 |
| 19-Jul-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Expose pseudo probe function checksum and GUID (#99389)
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profi
[BOLT] Expose pseudo probe function checksum and GUID (#99389)
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profile (both regular and BAT), along with function checksum.
To be used for stale function matching.
Test Plan: update pseudoprobe-decoding-inline.test
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7 |
|
#
d1d9545e |
| 24-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][BAT] Add entries for deleted basic blocks
Deleted basic blocks are required for correct mapping of branches modified by SCTC.
Increases BAT size, bytes: - large binary: 8622496 -> 8703244. -
[BOLT][BAT] Add entries for deleted basic blocks
Deleted basic blocks are required for correct mapping of branches modified by SCTC.
Increases BAT size, bytes: - large binary: 8622496 -> 8703244. - small binary (X86/bolt-address-translation.test): 928 -> 940.
Test Plan: updated bb-with-two-tail-calls.s
Reviewers: ayermolo, dcci, maksfb, rafaelauler
Reviewed By: rafaelauler
Pull Request: https://github.com/llvm/llvm-project/pull/91906
show more ...
|
#
465bfd41 |
| 22-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Simplify BBHashMapTy (#91812)
|
#
1529ec08 |
| 22-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Move out PrintProgramStats from Profile into Rewrite (#93075)
Eliminate the dependence of Profile on Passes.
Test Plan: NFC
|
#
97025bd9 |
| 22-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Use getLocationName in YAMLProfileWriter (#92493)
Disambiguate local functions using the containing file symbol in BAT
mode. Make local function naming consistent across BAT fdata and YAML
[BOLT] Use getLocationName in YAMLProfileWriter (#92493)
Disambiguate local functions using the containing file symbol in BAT
mode. Make local function naming consistent across BAT fdata and YAML
profiles.
Test Plan: updated register-fragments-bolt-symbols.s
show more ...
|
#
a9b67490 |
| 22-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Report adjusted program stats from perf2bolt in BAT mode (#91683)
|
#
1486653d |
| 21-May-2024 |
Kazu Hirata <kazu@google.com> |
[BOLT] Use StringRef::contains (NFC) (#92842)
|
Revision tags: llvmorg-18.1.6 |
|
#
9f15aa00 |
| 17-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT][NFC] Rename DataAggregator::BranchInfo to TakenBranchInfo
Align the name to its counterpart `FTInfo` which avoids name aliasing with llvm::bolt::BranchInfo and allows to drop namespace specif
[BOLT][NFC] Rename DataAggregator::BranchInfo to TakenBranchInfo
Align the name to its counterpart `FTInfo` which avoids name aliasing with llvm::bolt::BranchInfo and allows to drop namespace specifier.
Test Plan: NFC
Reviewers: maksfb, rafaelauler, ayermolo, dcci
Reviewed By: dcci
Pull Request: https://github.com/llvm/llvm-project/pull/92017
show more ...
|
#
4ecf2caf |
| 13-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Use aggregated FuncBranchData in writeBATYAML
Switch from FuncBranchData intermediate maps (Intra/InterIndex) to aggregated Data, same as one used by DataReader: https://github.com/llvm/llvm-
[BOLT] Use aggregated FuncBranchData in writeBATYAML
Switch from FuncBranchData intermediate maps (Intra/InterIndex) to aggregated Data, same as one used by DataReader: https://github.com/llvm/llvm-project/blob/e62ce1f8842cca36eb14126d79dcca0a85bf6d36/bolt/lib/Profile/DataReader.cpp#L385-L389 This aligns the order of the output between YAMLProfileWriter and writeBATYAML.
Test Plan: updated bolt-address-translation-yaml.test
Reviewers: rafaelauler, dcci, ayermolo, maksfb
Reviewed By: ayermolo, maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/91289
show more ...
|
#
f841ca0c |
| 13-May-2024 |
Kazu Hirata <kazu@google.com> |
Use StringRef::operator== instead of StringRef::equals (NFC) (#91864)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::
Use StringRef::operator== instead of StringRef::equals (NFC) (#91864)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
276 under llvm-project/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
show more ...
|
#
b5af667b |
| 13-May-2024 |
Amir Ayupov <aaupov@fb.com> |
[BOLT] Map branch source address to the containing basic block in BAT YAML
Fix an issue where the profile for all branches that have a BRANCHENTRY is dropped. If the branch has an entry in BAT, it w
[BOLT] Map branch source address to the containing basic block in BAT YAML
Fix an issue where the profile for all branches that have a BRANCHENTRY is dropped. If the branch has an entry in BAT, it will be translated to its input offset. We used to only permit the basic block offset as a branch source. Perform a lookup of containing basic block instead.
Test Plan: Updated bolt-address-translation-yaml.test
Reviewers: maksfb, dcci, rafaelauler, ayermolo
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/91273
show more ...
|