Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2 |
|
#
516e3017 |
| 01-Feb-2023 |
Steven Wu <stevenwu@apple.com> |
[NFC][Profile] Access profile through VirtualFileSystem
Make the access to profile data going through virtual file system so the inputs can be remapped. In the context of the caching, it can make su
[NFC][Profile] Access profile through VirtualFileSystem
Make the access to profile data going through virtual file system so the inputs can be remapped. In the context of the caching, it can make sure we capture the inputs and provided an immutable input as profile data.
Reviewed By: akyrtzi, benlangmuir
Differential Revision: https://reviews.llvm.org/D139052
show more ...
|
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
da2f5d0a |
| 14-Dec-2022 |
Fangrui Song <i@maskray.me> |
[tools] llvm::Optional => std::optional
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2 |
|
#
c136d858 |
| 20-Sep-2022 |
wlei <wlei@fb.com> |
[llvm-profgen] Remove CommaSeparated option from perf file cl
There could be comma in one perf file path, since at this point it only support one file as input, there is no need for the `llvm::cl::M
[llvm-profgen] Remove CommaSeparated option from perf file cl
There could be comma in one perf file path, since at this point it only support one file as input, there is no need for the `llvm::cl::MiscFlags::CommaSeparated` option.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D134287
show more ...
|
Revision tags: llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
aa58b7b1 |
| 24-Jun-2022 |
wlei <wlei@fb.com> |
[CSSPGO][llvm-profgen] Reimplement computeSummaryAndThreshold using context trie
Follow-up patch to https://reviews.llvm.org/D125246, support `computeSummaryAndThreshold` based on context trie.
Rev
[CSSPGO][llvm-profgen] Reimplement computeSummaryAndThreshold using context trie
Follow-up patch to https://reviews.llvm.org/D125246, support `computeSummaryAndThreshold` based on context trie.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D127026
show more ...
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
#
d86a206f |
| 05-Jun-2022 |
Fangrui Song <i@maskray.me> |
Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options
|
#
36c7d79d |
| 04-Jun-2022 |
Fangrui Song <i@maskray.me> |
Remove unneeded cl::ZeroOrMore for cl::opt options
Similar to 557efc9a8b68628c2c944678c6471dac30ed9e8e. This commit handles options where cl::ZeroOrMore is more than one line below cl::opt.
|
#
557efc9a |
| 04-Jun-2022 |
Fangrui Song <i@maskray.me> |
[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the err
[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded.
Also remove cl::init(false) while touching the lines.
show more ...
|
Revision tags: llvmorg-14.0.4, llvmorg-14.0.3 |
|
#
e36786d1 |
| 28-Apr-2022 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat
To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `P
[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat
To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D122602
show more ...
|
Revision tags: llvmorg-14.0.2 |
|
#
17f6cba3 |
| 15-Apr-2022 |
Wenlei He <aktoon@gmail.com> |
[llvm-profgen] Add process filter for perf reader
For profile generation, we need to filter raw perf samples for binary of interest. Sometimes binary name along isn't enough as we can have binary of
[llvm-profgen] Add process filter for perf reader
For profile generation, we need to filter raw perf samples for binary of interest. Sometimes binary name along isn't enough as we can have binary of the same name running in the system. This change adds a process id filter to allow users to further disambiguiate the input raw samples.
Differential Revision: https://reviews.llvm.org/D123869
show more ...
|
Revision tags: llvmorg-14.0.1 |
|
#
937924eb |
| 30-Mar-2022 |
Hongtao Yu <hoy@fb.com> |
[llvm-profgen] Read sample profiles for post-processing.
Sometimes we would like to run post-processing repeatedly on the original sample profile for tuning. In order to avoid regenerating the origi
[llvm-profgen] Read sample profiles for post-processing.
Sometimes we would like to run post-processing repeatedly on the original sample profile for tuning. In order to avoid regenerating the original profile from scratch every time, this change adds the support of reading in the original profile (called symbolized profile) and running the post-processor on it.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D121655
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
#
db29f437 |
| 23-Feb-2022 |
serge-sans-paille <sguelton@redhat.com> |
Cleanup include: DebugInfo/Symbolize
Estimation of the impact on preprocessor output after: 1067349756 before:1067487786
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-
Cleanup include: DebugInfo/Symbolize
Estimation of the impact on preprocessor output after: 1067349756 before:1067487786
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120433
show more ...
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
#
fec57e5b |
| 01-Feb-2022 |
Hongtao Yu <hoy@fb.com> |
Revert "[llvm-profgen] Clean up unnecessary memory reservations between phases."
This reverts commit 057e784b0962a7c5a17e858932bb6f03c7676c47.
|
#
057e784b |
| 01-Feb-2022 |
Hongtao Yu <hoy@fb.com> |
[llvm-profgen] Clean up unnecessary memory reservations between phases.
Cleaning up data structures that are not used after a certain point. This further brings down peak memory usage by 15% for a l
[llvm-profgen] Clean up unnecessary memory reservations between phases.
Cleaning up data structures that are not used after a certain point. This further brings down peak memory usage by 15% for a large benchmark.
Before: note: Before parsePerfTraces note: VM: 40.73 GB RSS: 39.18 GB note: Before parseAndAggregateTrace note: VM: 40.73 GB RSS: 39.18 GB note: After parseAndAggregateTrace note: VM: 88.93 GB RSS: 87.97 GB note: Before generateUnsymbolizedProfile note: VM: 88.95 GB RSS: 87.99 GB note: After generateUnsymbolizedProfile note: VM: 93.50 GB RSS: 92.53 GB note: After computeSizeForProfiledFunctions note: VM: 101.13 GB RSS: 99.36 GB note: After generateProbeBasedProfile note: VM: 215.61 GB RSS: 210.88 GB note: After postProcessProfiles note: VM: 237.48 GB RSS: 212.50 GB
After: note: Before parsePerfTraces note: VM: 40.73 GB RSS: 39.18 GB note: Before parseAndAggregateTrace note: VM: 40.73 GB RSS: 39.18 GB note: After parseAndAggregateTrace note: VM: 88.93 GB RSS: 87.96 GB note: Before generateUnsymbolizedProfile note: VM: 88.95 GB RSS: 87.97 GB note: After generateUnsymbolizedProfile note: VM: 93.50 GB RSS: 92.51 GB note: After computeSizeForProfiledFunctions note: VM: 93.50 GB RSS: 92.53 GB note: After generateProbeBasedProfile note: VM: 164.87 GB RSS: 163.55 GB note: After postProcessProfiles note: VM: 182.28 GB RSS: 179.43 GB
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D118677
show more ...
|
#
6693c562 |
| 25-Jan-2022 |
wlei <wlei@fb.com> |
[llvm-profgen] Support to load debug info from a second binary
For reducing binary size purpose, the binary's debug info and executable segment can be separated(like using objcopy --only-keep-debug)
[llvm-profgen] Support to load debug info from a second binary
For reducing binary size purpose, the binary's debug info and executable segment can be separated(like using objcopy --only-keep-debug). Here add support in llvm-profgen to use two binaries as input. The original one is executable binary and added for debug info only binary. Adding a flag `--debug-binary=file-path`, with this, the binary will load debug info from debug binary.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115948
show more ...
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
5740bb80 |
| 14-Dec-2021 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is e
[CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.
For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.
In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.
A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum.
```
main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ```
Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile.
I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115205
show more ...
|
Revision tags: llvmorg-13.0.1-rc1 |
|
#
a5f411b7 |
| 15-Oct-2021 |
wlei <wlei@fb.com> |
[llvm-profgen] Allow unsymbolized profile as perf input
This change allows the unsymbolized profile as input. The unsymbolized profile is created by `llvm-profgen` with `--skip-symbolization` and it
[llvm-profgen] Allow unsymbolized profile as perf input
This change allows the unsymbolized profile as input. The unsymbolized profile is created by `llvm-profgen` with `--skip-symbolization` and it's after the sample aggregation but before symbolization , so it has much small file size. It can be used for sample merging and trimming, also is useful for debugging or adding test cases. A switch `--unsymbolized-profile=file-patch` is added for this.
Format of unsymbolized profile: ```
[context stack1] # If it's a CS profile number of entries in RangeCounter from_1-to_1:count_1 from_2-to_2:count_2 ...... from_n-to_n:count_n number of entries in BranchCounter src_1->dst_1:count_1 src_2->dst_2:count_2 ...... src_n->dst_n:count_n [context stack2] ...... ```
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D111750
show more ...
|
#
1f0bc617 |
| 30-Sep-2021 |
Wenlei He <aktoon@gmail.com> |
[llvm-porfgen] Allow perf data as input
This change enables llvm-profgen to take raw perf data as alternative input format. Sometimes we need to retrieve evenets for processes with matching binary.
[llvm-porfgen] Allow perf data as input
This change enables llvm-profgen to take raw perf data as alternative input format. Sometimes we need to retrieve evenets for processes with matching binary. Using perf data as input allows us to retrieve process Ids from mmap events for matching binary, then filter by process id during perf script generation.
Differential Revision: https://reviews.llvm.org/D110793
show more ...
|
#
941191aa |
| 29-Sep-2021 |
Wenlei He <aktoon@gmail.com> |
[llvm-profgen] Refactor and better diagnostics
This change contains diagnostics improvments, refactoring and preparation for consuming perf data directly.
Diagnostics: - We now have more detailed
[llvm-profgen] Refactor and better diagnostics
This change contains diagnostics improvments, refactoring and preparation for consuming perf data directly.
Diagnostics: - We now have more detailed diagnostics when no mmap is found. - We also print warning for abnormal transition to external code.
Refactoring: - Simplify input perf trace processing to only allow a single input file. This is because 1) using multiple input perf trace (perf script) is error prone because we may miss key mmap events. 2) the functionality is not really being used anyways. - Make more functions private for Readers, move non-trivial definitions out of header. Cleanup some inconsistency. - Prepare for consuming perf data as input directly.
Differential Revision: https://reviews.llvm.org/D110729
show more ...
|
#
a03cf331 |
| 24-Sep-2021 |
wlei <wlei@fb.com> |
[llvm-profgen] Strip context to support non-CS profile generation for hybrid sample
Differential Revision: https://reviews.llvm.org/D109769
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
#
d5f20130 |
| 23-Sep-2021 |
wlei <wlei@fb.com> |
[AutoFDO][llvm-profgen] Profile generation for LBR(non-CS) sample
This patch introduces non-CS AutoFDO profile generation into LLVM. The profile is supposed to be well consumed by compiler using `-f
[AutoFDO][llvm-profgen] Profile generation for LBR(non-CS) sample
This patch introduces non-CS AutoFDO profile generation into LLVM. The profile is supposed to be well consumed by compiler using `-fprofile-sample-use=[profile]`.
After range and branch counters are extracted from the LBR sample, here we go through each addresses for symbolization, create FunctionSamples and populate its sub fields like TotalSamples, BodySamples and HeadSamples etc. For inlined code, as we need to map back to original code, so we always add body samples to the leaf frame's function sample.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109551
show more ...
|
Revision tags: llvmorg-13.0.0-rc3 |
|
#
964053d5 |
| 31-Aug-2021 |
wlei <wlei@fb.com> |
[llvm-profgen] Support LBR only perf script
This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation. A LBR perf script includes a batch of LB
[llvm-profgen] Support LBR only perf script
This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation. A LBR perf script includes a batch of LBR sample which starts with a frame pointer and a group of 32 LBR entries is followed. The FROM/TO LBR pair and the range between two consecutive entries (the former entry's TO and the latter entry's FROM) will be used to infer function profile info.
An example of LBR perf script(created by `perf script -F ip,brstack -i perf.data`) ``` 40062f 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 ... 4005d7 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 ... ... ```
For implementation: - Extended a new child class `LBRPerfReader` for the sample parsing, reused all the functionalities in `extractLBRStack` except for an extension to parsing leading instruction pointer. - `HybridSample` is reused(just leave the call stack empty) and the parsed samples is still aggregated in `AggregatedSamples`. After that, range samples, branch sample, address samples are computed and recorded. - Reused `ContextSampleCounterMap` to store the raw profile, since it's no need to aggregation by context, here it just registered one sample counter with a fake context key. - Unified to use `show-raw-profile` instead of `show-unwinder-output` to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output.
Profile generation part will come soon.
Differential Revision: https://reviews.llvm.org/D108153
show more ...
|
Revision tags: llvmorg-13.0.0-rc2 |
|
#
9af46710 |
| 17-Aug-2021 |
wlei <wlei@fb.com> |
[llvm-profgen] Move profiled binary loading out of PerfReader
Change to use unique pointer of profiled binary to unblock asan.
At same time, I realized we can decouple to move the profiled binary l
[llvm-profgen] Move profiled binary loading out of PerfReader
Change to use unique pointer of profiled binary to unblock asan.
At same time, I realized we can decouple to move the profiled binary loading out of PerfReader, so I made some other related refactors.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D108254
show more ...
|
#
f812c192 |
| 12-Aug-2021 |
wlei <wlei@fb.com> |
[llvm-profgen] Clean up code dealing with multiple binaries
As we decided to support only one binary each time, this patch cleans up the related code dealing with multiple binaries. We can use `llvm
[llvm-profgen] Clean up code dealing with multiple binaries
As we decided to support only one binary each time, this patch cleans up the related code dealing with multiple binaries. We can use `llvm-profdata` to merge profile from multiple binaries.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D108002
show more ...
|
Revision tags: llvmorg-13.0.0-rc1 |
|
#
6da9241a |
| 28-Jul-2021 |
wlei <wlei@fb.com> |
[llvm-profgen] Refactor PerfReader to allow different types of perf scripts
In order to support different types of perf scripts, this change tried to refactor `PerfReader` by adding the base class `
[llvm-profgen] Refactor PerfReader to allow different types of perf scripts
In order to support different types of perf scripts, this change tried to refactor `PerfReader` by adding the base class `PerfReaderBase` and current HybridPerfReader is derived from it for CS profile generation. Common functions like, passMM2PEvents, extract_lbrs, extract_callstack, etc. can be reused.
Next step is to add LBR only reader(for non-CS profile) and aggregated perf scripts reader(do a pre-aggregation of scripts).
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D107014
show more ...
|
Revision tags: llvmorg-14-init |
|
#
d16f1542 |
| 20-Jul-2021 |
Timm Bäder <tbaeder@redhat.com> |
[llvm][tools] Hide more unrelated LLVM tool options
Differential Revision: https://reviews.llvm.org/D106366
|