Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
9513f2fd |
| 15-Nov-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Print full context hash when reporting hinted bytes (#114465)
Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from t
[MemProf] Print full context hash when reporting hinted bytes (#114465)
Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from the original
profile, similar to what we do when reporting matching statistics. This
will make it easier to correlate with the profile.
Note that the full context hash must be computed at profile match time
and saved in the metadata and summary, because we may trim the context
during matching when it isn't needed for distinguishing hotness.
Similarly, due to the context trimming, we may have more than one full
context id and total size pair per MIB in the metadata and summary,
which now get a list of these pairs.
Remove the old aggregate size from the metadata and summary support.
One other change from the prior support is that we no longer write the
size information into the combined index for the LTO backends, which
don't use this information, which reduces unnecessary bloat in
distributed index files.
show more ...
|
#
236fda55 |
| 06-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[Analysis] Remove unused includes (NFC) (#114936)
Identified with misc-include-cleaner.
|
Revision tags: llvmorg-19.1.3 |
|
#
5995e4b9 |
| 18-Oct-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Disable memprof ICP support by default (#112940)
A failure showed up after this was committed, rather than revert simply
disable this new support to simplify investigation and further tes
[MemProf] Disable memprof ICP support by default (#112940)
A failure showed up after this was committed, rather than revert simply
disable this new support to simplify investigation and further testing.
show more ...
|
#
6264288d |
| 18-Oct-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Fix the option to disable memprof ICP (#112917)
The -enable-memprof-indirect-call-support meant to guard the recently
added memprof ICP support was not used in enough places. Specifically
[MemProf] Fix the option to disable memprof ICP (#112917)
The -enable-memprof-indirect-call-support meant to guard the recently
added memprof ICP support was not used in enough places. Specifically,
it was not checked in mayHaveMemprofSummary, which is called from the
ThinLTO backend applyImports. This led to failures when checking the
callsite records, as we incorrectly expected records for indirect calls.
Fix the option to be checked in all necessary locations, and add
testing.
show more ...
|
Revision tags: llvmorg-19.1.2 |
|
#
1de71652 |
| 11-Oct-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Support cloning for indirect calls with ThinLTO (#110625)
This patch enables support for cloning in indirect callsites.
This is done by synthesizing callsite records for each virtual ca
[MemProf] Support cloning for indirect calls with ThinLTO (#110625)
This patch enables support for cloning in indirect callsites.
This is done by synthesizing callsite records for each virtual call
target from the profile metadata. In the thin link all the synthesized
records for a particular indirect callsite initially share the same
context node, but support is added to partition the callsites and
outgoing edges based on the callee function, creating a separate node
for each target.
In the LTO backend, when cloning is needed we first perform indirect
call promotion, then change the target of the new direct call to the
desired clone.
Note this is ThinLTO-specific, since for regular LTO indirect call
promotion should have already occurred.
show more ...
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
51d3829d |
| 07-Sep-2024 |
Kazu Hirata <kazu@google.com> |
[ThinLTO] Shrink FunctionSummary by 8 bytes (#107706)
During the ThinLTO indexing step for one of our large applications, we
create 4 million instances of FunctionSummary.
Changing:
std::ve
[ThinLTO] Shrink FunctionSummary by 8 bytes (#107706)
During the ThinLTO indexing step for one of our large applications, we
create 4 million instances of FunctionSummary.
Changing:
std::vector<EdgeTy> CallGraphEdgeList;
to:
SmallVector<EdgeTy, 0> CallGraphEdgeList;
in FunctionSummary reduces the size of each instance by 8 bytes. The
rest of the patch makes the same change to other places so that the
types stay compatible across function boundaries.
show more ...
|
#
d4ddf06b |
| 06-Sep-2024 |
Mingming Liu <mingmingl@google.com> |
[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471)
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes o
[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471)
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of
https://github.com/llvm/llvm-project/commit/64498c54831bed9cf069e0923b9b73678c6451d8).
While I'm at it, this PR clean up {SummaryBasedOptimizations,
SyntheticCountsPropagation} since they were not used and there are no
plans to further invest on them.
With this patch, bitcode writer writes a placeholder 0 at the byte
offset of `EntryCount` and bitcode reader can parse the function entry
count at the correct byte offset. Added a TODO to stop writing
`EntryCount` and bump bitcode version
show more ...
|
#
0ffa377c |
| 06-Sep-2024 |
Kazu Hirata <kazu@google.com> |
[ThinLTO] Shrink GlobalValueSummary by 8 bytes (#107342)
During the ThinLTO indexing step for one of our large applications, we
create 7.5 million instances of GlobalValueSummary.
Changing:
[ThinLTO] Shrink GlobalValueSummary by 8 bytes (#107342)
During the ThinLTO indexing step for one of our large applications, we
create 7.5 million instances of GlobalValueSummary.
Changing:
std::vector<ValueInfo> RefEdgeList;
to:
SmallVector<ValueInfo, 0> RefEdgeList;
in GlobalValueSummary reduces the size of each instance by 8 bytes.
The rest of the patch makes the same change to other places so that
the types stay compatible across function boundaries.
show more ...
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
9f8205d9 |
| 11-Jul-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Track and report profiled sizes through cloning (#98382)
If requested, via the -memprof-report-hinted-sizes option, track the
total profiled size of each MIB through the thin link, then r
[MemProf] Track and report profiled sizes through cloning (#98382)
If requested, via the -memprof-report-hinted-sizes option, track the
total profiled size of each MIB through the thin link, then report on
the corresponding allocation coldness after all cloning is complete.
To save size, a different bitcode record type is used for the allocation
info when the option is specified, and the sizes are kept separate from
the MIBs in the index.
show more ...
|
#
b0ae923a |
| 22-Jun-2024 |
Kazu Hirata <kazu@google.com> |
[ProfileData] Add a variant of getValueProfDataFromInst (#95993)
This patch adds a variant of getValueProfDataFromInst that returns
std::vector<InstrProfValueData> instead of
std::unique<InstrProf
[ProfileData] Add a variant of getValueProfDataFromInst (#95993)
This patch adds a variant of getValueProfDataFromInst that returns
std::vector<InstrProfValueData> instead of
std::unique<InstrProfValueData[]>. The new return type carries the
length with it, so we can drop out parameter ActualNumValueData.
Also, the caller can directly feed the return value into a range-based
for loop as shown in the patch.
I'm planning to migrate other callers of getValueProfDataFromInst to
the new variant in follow-up patches.
show more ...
|
#
2c2f4905 |
| 18-Jun-2024 |
Kazu Hirata <kazu@google.com> |
[Analysis] Clean up getPromotionCandidatesForInstruction (NFC) (#95624)
Callers of getPromotionCandidatesForInstruction pass NumVals as an out
parameter for the number of value-count pairs of the v
[Analysis] Clean up getPromotionCandidatesForInstruction (NFC) (#95624)
Callers of getPromotionCandidatesForInstruction pass NumVals as an out
parameter for the number of value-count pairs of the value profiling
data, but nobody uses the out parameter.
This patch removes the parameter and updates the callers. Note that
the number of value-count pairs is still available via
getPromotionCandidatesForInstruction(...).size().
show more ...
|
Revision tags: llvmorg-18.1.8 |
|
#
d3342e5b |
| 12-Jun-2024 |
Abhina Sree <69635948+abhina-sree@users.noreply.github.com> |
[SystemZ][z/OS] Continue marking text files with OF_Text (#95111)
Text files should be opened with OF_Text to have the correct encoding.
|
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
fa9b1be4 |
| 16-May-2024 |
Mingming Liu <mingmingl@google.com> |
[ThinLTO]Mark referencers of local ifunc not eligible for import (#92431)
If an ifunc has local linkage, do not add it into ref edges and mark its
referencer (a function or global variable) not eli
[ThinLTO]Mark referencers of local ifunc not eligible for import (#92431)
If an ifunc has local linkage, do not add it into ref edges and mark its
referencer (a function or global variable) not eligible for import. An
ifunc doesn't have summary and ThinLTO cannot promote it. Importing the
referencer may cause linkage errors.
To reference a similar fix, https://reviews.llvm.org/D158961 marks
callers of local ifunc not eligible for import to fix
https://github.com/llvm/llvm-project/issues/58740
show more ...
|
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4 |
|
#
58c5f50f |
| 11-Apr-2024 |
Leonard Chan <leonardchan@google.com> |
Reapply "[llvm] Teach GlobalDCE about dso_local_equivalent"
Also reapply "[llvm] Teach whole program devirtualization about relative vtables"
This reverts commit 1c604a9780fcfe92a99d539913553f0835b
Reapply "[llvm] Teach GlobalDCE about dso_local_equivalent"
Also reapply "[llvm] Teach whole program devirtualization about relative vtables"
This reverts commit 1c604a9780fcfe92a99d539913553f0835b81de3 and 474f5efebed24547e76d022f0c5ffcc9db97ce6f.
show more ...
|
#
c0b77e0a |
| 15-Apr-2024 |
Leonard Chan <leonardchan@google.com> |
Revert "Reapply "[llvm] Teach whole program devirtualization about relative vtables""
This reverts commit 09c3bfe9b3eb47a2af0c10531b25f90cfb5fa9f4.
|
#
09c3bfe9 |
| 11-Apr-2024 |
Leonard Chan <leonardchan@google.com> |
Reapply "[llvm] Teach whole program devirtualization about relative vtables"
This reverts commit 474f5efebed24547e76d022f0c5ffcc9db97ce6f.
|
#
dda73336 |
| 11-Apr-2024 |
Mingming Liu <mingmingl@google.com> |
[ThinLTO]Record import type in GlobalValueSummary::GVFlags (#87597)
The motivating use case is to support import the function declaration
across modules to construct call graph edges for indirect c
[ThinLTO]Record import type in GlobalValueSummary::GVFlags (#87597)
The motivating use case is to support import the function declaration
across modules to construct call graph edges for indirect calls [1]
when importing the function definition costs too much compile time
(e.g., the function is too large has no `noinline` attribute).
1. Currently, when the compiled IR module doesn't have a function
definition but its postlink combined summary contains the function
summary or a global alias summary with this function as aliasee, the
function definition will be imported from source module by IRMover. The
implementation is in FunctionImporter::importFunctions [2]
2. In order for FunctionImporter to import a declaration of a function,
both function summary and alias summary need to carry the def / decl
state. Specifically, all existing summary fields doesn't differ across
import modules, but the def / decl state of is decided by
`<ImportModule, Function>`.
This change encodes the def/decl state in `GlobalValueSummary::GVFlags`.
In the subsequent changes
1. The indexing step `computeImportForModule` [3]
will compute the set of definitions and the set of declarations for each
module, and passing on the information to bitcode writer.
2. Bitcode writer will look up the def/decl state and sets the state
when it writes out the flag value. This is demonstrated in
https://github.com/llvm/llvm-project/pull/87600
3. Function importer will read the def/decl state when reading the
combined summary to figure out two sets of global values, and IRMover
will be updated to import the declaration (aka linkGlobalValuePrototype [4])
into the destination module.
- The next change is https://github.com/llvm/llvm-project/pull/87600
[1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] https://github.com/llvm/llvm-project/blob/3b337242ee165554f0017b00671381ec5b1ba855/llvm/lib/Transforms/IPO/FunctionImport.cpp#L1608-L1764
[3] https://github.com/llvm/llvm-project/blob/3b337242ee165554f0017b00671381ec5b1ba855/llvm/lib/Transforms/IPO/FunctionImport.cpp#L856
[4] https://github.com/llvm/llvm-project/blob/3b337242ee165554f0017b00671381ec5b1ba855/llvm/lib/Linker/IRMover.cpp#L605
show more ...
|
Revision tags: llvmorg-18.1.3 |
|
#
1e15371d |
| 01-Apr-2024 |
Mingming Liu <mingmingl@google.com> |
[ThinLTO][TypeProf] Implement vtable def import (#79381)
Add annotated vtable GUID as referenced variables in per function
summary, and update bitcode writer to create value-ids for these
referenc
[ThinLTO][TypeProf] Implement vtable def import (#79381)
Add annotated vtable GUID as referenced variables in per function
summary, and update bitcode writer to create value-ids for these
referenced vtables.
- This is the part3 of type profiling work, and described in the "Virtual Table Definition Import" [1] section of the
RFC.
[1] https://github.com/llvm/llvm-project/pull/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
show more ...
|
Revision tags: llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
88fbc4d3 |
| 06-Dec-2023 |
Teresa Johnson <tejohnson@google.com> |
[ThinLTO] Add tail call flag to call edges in summary (#74043)
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of
[ThinLTO] Add tail call flag to call edges in summary (#74043)
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of missing
frames from tail calls in profiled call stacks for MemProf of profiled
binaries that did not disable tail call elimination. A follow on change
will add the use of this new flag during MemProf context disambiguation.
The new flag is encoded in the bitcode along with either the hotness
flag from the profile, or the relative block frequency under the
-write-relbf-to-summary flag when there is no profile data.
Because we now will always have some additional call edge information, I
have removed the non-profile function summary record format, and we
simply encode the tail call flag along with a hotness type of none when
there is no profile information or relative block frequency. The change
of record format and name caused most of the test case changes.
I have added explicit testing of generation of the new tail call flag
into the bitcode and IR assembly format as part of the changes to
llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also
added round trip testing through assembly and bitcode to
llvm/test/Assembler/thinlto-summary.ll.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3 |
|
#
5181156b |
| 05-Oct-2023 |
Matthias Braun <matze@braunis.de> |
Use BlockFrequency type in more places (NFC) (#68266)
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
Use BlockFrequency type in more places (NFC) (#68266)
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.
- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
show more ...
|
Revision tags: llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
#
6276927b |
| 29-Aug-2023 |
Fangrui Song <i@maskray.me> |
[ThinLTO] Mark callers of local ifunc not eligible for import
Fix https://github.com/llvm/llvm-project/issues/58740 The `target_clones` attribute results in ifunc on eligible targets (Linux glibc/An
[ThinLTO] Mark callers of local ifunc not eligible for import
Fix https://github.com/llvm/llvm-project/issues/58740 The `target_clones` attribute results in ifunc on eligible targets (Linux glibc/Android or FreeBSD). If the function has internal linkage, we will get an internal linkage ifunc. ``` __attribute__((target_clones("popcnt", "default"))) static int foo(int n) { return __builtin_popcount(n); } int use(int n) { return foo(n); }
@foo.ifunc = internal ifunc i32 (i32), ptr @foo.resolver define internal nonnull ptr @foo.resolver() comdat { ; local linkage comdat is another issue that should be fixed ... select i1 %.not, ptr @foo.default.1, ptr @foo.popcnt.0 ... } define internal i32 @foo.default.1(i32 noundef %n) ```
ifuncs are not included in module summaries, so LTO doesn't know the local linkage `foo.default.1` referenced by `foo.resolver` should be promoted. If a caller of `foo` (e.g. `use`) is imported, the local linkage `foo.resolver` will be cloned as a definition (IRLinker::shouldLink), leading to linker errors. ``` ld.lld: error: undefined hidden symbol: foo.default.1.llvm.8017227050314953235 >>> referenced by bar.c >>> lto.tmp:(foo.ifunc) ```
As a simple fix, just mark `use` as not eligible for import. Non-local linkage ifuncs do not have the problem, because they are not imported, and not cloned when a caller is imported.
---
https://reviews.llvm.org/D82745 contains a more involved fix, though the original bug it intended to fix (https://github.com/llvm/llvm-project/issues/45833) now works.
Note: importing ifunc is tricky. If we import an ifunc, we need to make sure the resolver and the implementation are in the translation unit, as required by https://sourceware.org/glibc/wiki/GNU_IFUNC
> Requirement (a): Resolver must be defined in the same translation unit as the implementations.
This is infeasible if the implementation is changed to available_externally.
In addition, the imported ifunc may be referenced by two translation units. This doesn't work with PowerPC32 -msecure-plt (https://maskray.me/blog/2021-01-18-gnu-indirect-function). At the very least, every referencing translation unit needs one extra IRELATIVE dynamic relocation.
At least for the local linkage ifunc case, it doesn't have much use outside of `target_clones`, as a global pointer is usually a better replacement.
I think ifuncs just have too many pitfalls to design more IR features around it to optimize them.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D158961
show more ...
|
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1 |
|
#
1b162fab |
| 25-Jul-2023 |
Fangrui Song <i@maskray.me> |
[Support] Change SetVector's default template parameter to SmallVector<*, 0>
Similar to D156016 for MapVector.
This brings back commit fae7b98c221b5b28797f7b56b656b6b819d99f27 with a fix to llvm/un
[Support] Change SetVector's default template parameter to SmallVector<*, 0>
Similar to D156016 for MapVector.
This brings back commit fae7b98c221b5b28797f7b56b656b6b819d99f27 with a fix to llvm/unittests/Support/ThreadPool.cpp's `_WIN32` code path.
show more ...
|
Revision tags: llvmorg-18-init |
|
#
3d83912c |
| 25-Jul-2023 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
Revert rGfae7b98c221b5b28797f7b56b656b6b819d99f27 "[Support] Change SetVector's default template parameter to SmallVector<*, 0>"
This is failing on Windows MSVC builds: llvm\unittests\Support\Thread
Revert rGfae7b98c221b5b28797f7b56b656b6b819d99f27 "[Support] Change SetVector's default template parameter to SmallVector<*, 0>"
This is failing on Windows MSVC builds: llvm\unittests\Support\ThreadPool.cpp(380): error C2440: 'return': cannot convert from 'Vector' to 'std::vector<llvm::BitVector,std::allocator<llvm::BitVector>>' with [ Vector=llvm::SmallVector<llvm::BitVector,0> ]
show more ...
|
#
fae7b98c |
| 25-Jul-2023 |
Fangrui Song <i@maskray.me> |
[Support] Change SetVector's default template parameter to SmallVector<*, 0>
Similar to D156016 for MapVector.
|
#
fb2a971c |
| 25-Jul-2023 |
Fangrui Song <i@maskray.me> |
[Support] Change MapVector's default template parameter to SmallVector<*, 0>
SmallVector<*, 0> is often a better replacement for std::vector : both the object size and the code size are smaller. (Sm
[Support] Change MapVector's default template parameter to SmallVector<*, 0>
SmallVector<*, 0> is often a better replacement for std::vector : both the object size and the code size are smaller. (SmallMapVector uses SmallVector as well, but it is not common.)
clang size decreases by 0.0226%. instructions:u decreases 0.037% when compiling a sqlite3 amalgram.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D156016
show more ...
|