#
ae6d5dd5 |
| 29-Jan-2025 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Prune unneeded non-cold contexts (#124823)
We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
signifi
[MemProf] Prune unneeded non-cold contexts (#124823)
We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.
Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.
For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.
show more ...
|
Revision tags: llvmorg-21-init |
|
#
c725a95e |
| 24-Jan-2025 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Convert Hot contexts to NotCold early (#124219)
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
perfo
[MemProf] Convert Hot contexts to NotCold early (#124219)
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. To address this,
any hot contexts are converted to notcold contexts immediately after
first checking for unambiguous allocation types, and before checking it
again and before adding metadata while performing context trimming.
Note that hot hints are now disabled by default, however, this avoids
adding unnecessary overhead if they are re-enabled.
show more ...
|
#
ae8b5608 |
| 24-Jan-2025 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Disable hot hints by default (#124338)
By default we were marking some contexts as hot, and adding hot hints to
unambiguously hot allocations. However, there is not yet support for
cloni
[MemProf] Disable hot hints by default (#124338)
By default we were marking some contexts as hot, and adding hot hints to
unambiguously hot allocations. However, there is not yet support for
cloning to expose hot allocation contexts, and none is planned for the
forseeable future.
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. This change simply
disables the generation of hot contexts / hints by default, as few
allocations were unambiguously hot.
A subsequent change will address the issue when hot hints are optionally
enabled. See PR124219 for details.
This change resulted in significant overhead reductions for a large
target:
~48% reduction in the per-module ThinLTO bitcode summary sizes
~72% reduction in the distributed ThinLTO bitcode combined summary sizes
~68% reduction in thin link time
~34% reduction in thin link peak memory
show more ...
|
Revision tags: llvmorg-19.1.7 |
|
#
3a423a10 |
| 02-Jan-2025 |
Teresa Johnson <tejohnson@google.com> |
[MemProf][PGO] Prevent dropping of profile metadata during optimization (#121359)
This patch fixes a couple of places where memprof-related metadata
(!memprof and !callsite) were being dropped, and
[MemProf][PGO] Prevent dropping of profile metadata during optimization (#121359)
This patch fixes a couple of places where memprof-related metadata
(!memprof and !callsite) were being dropped, and one place where PGO
metadata (!prof) was being dropped.
All were due to instances of combineMetadata() being invoked. That
function drops all metadata not in the list provided by the client, and
also drops any not in its switch statement.
Memprof metadata needed a case in the combineMetadata switch statement.
For now we simply keep the metadata of the instruction being kept, which
doesn't retain all the profile information when two calls with
memprof metadata are being combined, but at least retains some.
For the memprof metadata being dropped during call CSE, add memprof and
callsite metadata to the list of known ids in combineMetadataForCSE.
Neither memprof nor regular prof metadata were in the list of known ids
for the callsite in MemCpyOptimizer, which was added to combine AA
metadata after optimization of byval arguments fed by memcpy
instructions, and similar types of optimizations of memcpy uses.
There is one other callsite of combineMetadata, but it is only invoked
on load instructions, which do not carry these types of metadata.
show more ...
|
#
d7d0e740 |
| 17-Dec-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Refactor single alloc type handling and use in more cases (#120290)
Emit message when we have aliased contexts that are conservatively
hinted not cold. This is not a change in behavior, j
[MemProf] Refactor single alloc type handling and use in more cases (#120290)
Emit message when we have aliased contexts that are conservatively
hinted not cold. This is not a change in behavior, just in message when
the -memprof-report-hinted-sizes flag is enabled.
show more ...
|
Revision tags: llvmorg-19.1.6 |
|
#
bf700c39 |
| 17-Dec-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Remove dead code (NFC) (#120156)
Remove unused collection of context size information that was likely
leftover from debugging / testing.
|
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
9513f2fd |
| 15-Nov-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Print full context hash when reporting hinted bytes (#114465)
Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from t
[MemProf] Print full context hash when reporting hinted bytes (#114465)
Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from the original
profile, similar to what we do when reporting matching statistics. This
will make it easier to correlate with the profile.
Note that the full context hash must be computed at profile match time
and saved in the metadata and summary, because we may trim the context
during matching when it isn't needed for distinguishing hotness.
Similarly, due to the context trimming, we may have more than one full
context id and total size pair per MIB in the metadata and summary,
which now get a list of these pairs.
Remove the old aggregate size from the metadata and summary support.
One other change from the prior support is that we no longer write the
size information into the combined index for the LTO backends, which
don't use this information, which reduces unnecessary bloat in
distributed index files.
show more ...
|
#
890c4bec |
| 02-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Use SmallVector for InlinedCallStack (NFC) (#114599)
We can stay within 8 inlined elements more than 99% of the time while
building a large application.
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
b2f3ac83 |
| 04-Oct-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Teach createMIBNode to take ArrayRef (NFC) (#111195)
createMIBNode does not modify MIBCallStack, so we can take ArrayRef
instead.
While I am at it, this patch changes the type of MIBPa
[memprof] Teach createMIBNode to take ArrayRef (NFC) (#111195)
createMIBNode does not modify MIBCallStack, so we can take ArrayRef
instead.
While I am at it, this patch changes the type of MIBPayload to
SmallVector. We put at most three elements, so we can avoid a heap
allocation.
show more ...
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
#
0da2ba81 |
| 17-Aug-2024 |
Daniil Fukalov <dfukalov@gmail.com> |
[NFC] Cleanup in ADT and Analysis headers. (#104484)
Remove unused directly includes and forward declarations in ADT and
Analysis headers.
|
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
8c1bd67d |
| 10-Jul-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Optionally print or record the profiled sizes of allocations (#98248)
This is the first step in being able to track the total profiled sizes
of allocations successfully marked as cold.
[MemProf] Optionally print or record the profiled sizes of allocations (#98248)
This is the first step in being able to track the total profiled sizes
of allocations successfully marked as cold.
Under a new option -memprof-report-hinted-sizes:
- For unambiguous (non-context-sensitive) allocations, print the
profiled size and the allocation coldness, along with a hash of the
allocation's location (to allow for deduplication across modules or
inline instances).
- For context sensitive allocations, add the size as a 3rd operand on
the MIB metadata. A follow on patch will propagate this through to the
thin link where the sizes will be reported for each context after
cloning.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
026a29e8 |
| 07-May-2024 |
Kazu Hirata <kazu@google.com> |
[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of StringRef::equals (NFC) (#91304)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::oper
[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of StringRef::equals (NFC) (#91304)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
53 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
show more ...
|
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2 |
|
#
1e7d5871 |
| 02-Feb-2024 |
lifengxiang1025 <lifengxiang.1025@bytedance.com> |
[MemProf] Fix when CallStackTrie has a single chain to leaf with multi alloc type (#79433)
Fix one corner case when `CallStackTrie` has a single chain to leaf with
multi alloc type. This will cause
[MemProf] Fix when CallStackTrie has a single chain to leaf with multi alloc type (#79433)
Fix one corner case when `CallStackTrie` has a single chain to leaf with
multi alloc type. This will cause stackIds in function summary is empty.
show more ...
|
Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4 |
|
#
b8d2f717 |
| 05-May-2023 |
Kan Wu <kanwu@google.com> |
[MemProf] Add hot allocation type
Add "Hot" AllocationType (in addition to existing cold, notcold).
Use lifetime access density as metric to identify hot allocations. Treat hot as notcold for MemPr
[MemProf] Add hot allocation type
Add "Hot" AllocationType (in addition to existing cold, notcold).
Use lifetime access density as metric to identify hot allocations. Treat hot as notcold for MemProfContextDisambiguation for now before the disambiguation for "hot" is done.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D149932
show more ...
|
#
a28261c7 |
| 06-May-2023 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Create single version of helper function (NFC)
Small clean up to keep a single version of getAllocTypeAttributeString which was duplicated locally.
|
#
a4bdb275 |
| 02-May-2023 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Use profiled lifetime access density directly
Now that the runtime tracks the lifetime access density directly, we can use that directly in the threshold checks instead of less accurately
[MemProf] Use profiled lifetime access density directly
Now that the runtime tracks the lifetime access density directly, we can use that directly in the threshold checks instead of less accurately computing from other statistics.
Differential Revision: https://reviews.llvm.org/D149684
show more ...
|
Revision tags: llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3 |
|
#
5fd82ca0 |
| 18-Feb-2023 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Make hasSingleAllocType helper non-static
As suggested in D140908, make the hasSingleAllocType helper non-static so that it can be used in other files. Add unit testing.
Differential Revi
[MemProf] Make hasSingleAllocType helper non-static
As suggested in D140908, make the hasSingleAllocType helper non-static so that it can be used in other files. Add unit testing.
Differential Revision: https://reviews.llvm.org/D144318
show more ...
|
Revision tags: llvmorg-16.0.0-rc2 |
|
#
6827c4f0 |
| 02-Feb-2023 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Add helper to access the back (last) call stack id
This is split out of D140908 as suggested.
Differential Revision: https://reviews.llvm.org/D143184
|
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
caa99a01 |
| 22-Jan-2023 |
Kazu Hirata <kazu@google.com> |
Use llvm::popcount instead of llvm::countPopulation(NFC)
|
Revision tags: llvmorg-15.0.7, llvmorg-15.0.6 |
|
#
9eacbba2 |
| 16-Nov-2022 |
Teresa Johnson <tejohnson@google.com> |
Restore "[MemProf] ThinLTO summary support" with more fixes
This restores commit 98ed423361de2f9dc0113a31be2aa04524489ca9 and follow on fix 00c22351ba697dbddb4b5bf0ad94e4bcea4b316b, which were rever
Restore "[MemProf] ThinLTO summary support" with more fixes
This restores commit 98ed423361de2f9dc0113a31be2aa04524489ca9 and follow on fix 00c22351ba697dbddb4b5bf0ad94e4bcea4b316b, which were reverted in 5d938eb6f79b16f55266dd23d5df831f552ea082 due to an MSVC bot failure. I've included a fix for that failure.
Differential Revision: https://reviews.llvm.org/D135714
show more ...
|
#
5d938eb6 |
| 16-Nov-2022 |
Jeremy Morse <jeremy.morse@sony.com> |
Revert "Restore "[MemProf] ThinLTO summary support" with fixes"
This reverts commit 00c22351ba697dbddb4b5bf0ad94e4bcea4b316b. This reverts commit 98ed423361de2f9dc0113a31be2aa04524489ca9.
Seemingly
Revert "Restore "[MemProf] ThinLTO summary support" with fixes"
This reverts commit 00c22351ba697dbddb4b5bf0ad94e4bcea4b316b. This reverts commit 98ed423361de2f9dc0113a31be2aa04524489ca9.
Seemingly MSVC has some kind of issue with this patch, in terms of linking:
https://lab.llvm.org/buildbot/#/builders/123/builds/14137
I'll post more detail on D135714 momentarily.
show more ...
|
Revision tags: llvmorg-15.0.5 |
|
#
98ed4233 |
| 15-Nov-2022 |
Teresa Johnson <tejohnson@google.com> |
Restore "[MemProf] ThinLTO summary support" with fixes
This restores 47459455009db4790ffc3765a2ec0f8b4934c2a4, which was reverted in commit 452a14efc84edf808d1e2953dad2c694972b312f, along with fixes
Restore "[MemProf] ThinLTO summary support" with fixes
This restores 47459455009db4790ffc3765a2ec0f8b4934c2a4, which was reverted in commit 452a14efc84edf808d1e2953dad2c694972b312f, along with fixes for a couple of bot failures.
show more ...
|
#
452a14ef |
| 15-Nov-2022 |
Teresa Johnson <tejohnson@google.com> |
Revert "[MemProf] ThinLTO summary support"
This reverts commit 47459455009db4790ffc3765a2ec0f8b4934c2a4.
Revert while I try to fix a couple of non-Linux build failures.
|
Revision tags: llvmorg-15.0.4, llvmorg-15.0.3 |
|
#
47459455 |
| 11-Oct-2022 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] ThinLTO summary support
Implements the ThinLTO summary support for memprof related metadata.
This includes support for the assembly format, and for building the summary from IR during Mod
[MemProf] ThinLTO summary support
Implements the ThinLTO summary support for memprof related metadata.
This includes support for the assembly format, and for building the summary from IR during ModuleSummaryAnalysis.
To reduce space in both the bitcode format and the in memory index, we do 2 things: 1. We keep a single vector of all uniq stack id hashes, and record the index into this vector in the callsite and allocation memprof summaries. 2. When building the combined index during the LTO link, the callsite and allocation memprof summaries are only kept on the FunctionSummary of the prevailing copy.
Differential Revision: https://reviews.llvm.org/D135714
show more ...
|
Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2 |
|
#
e20d210e |
| 08-Aug-2022 |
Kazu Hirata <kazu@google.com> |
[llvm] Qualify auto (NFC)
Identified with readability-qualified-auto.
|