Revision tags: llvmorg-21-init |
|
#
416f1c46 |
| 20-Jan-2025 |
Mats Jun Larsen <mats@jun.codes> |
[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617)
In accordance with https://github.com/llvm/llvm-project/issues/123569
In order to keep the patch at reasonable size, this
[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617)
In accordance with https://github.com/llvm/llvm-project/issues/123569
In order to keep the patch at reasonable size, this PR only covers for the llvm subproject, unittests excluded.
show more ...
|
Revision tags: llvmorg-19.1.7 |
|
#
adf0c817 |
| 20-Dec-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Undrift MemProf profile even when some frames are missing (#120500)
This patch makes the MemProf undrifting process a little more lenient.
Consider an inlined call hierarchy:
foo ->
[memprof] Undrift MemProf profile even when some frames are missing (#120500)
This patch makes the MemProf undrifting process a little more lenient.
Consider an inlined call hierarchy:
foo -> bar -> ::new
If bar tail-calls ::new, the profile appears to indicate that foo
directly calls ::new. This is a problem because the perceived call
hierarchy in the profile looks different from what we can obtain from
the inline stack in the IR.
Recall that undrifting works by constructing and comparing a list of
direct calls from the profile and that from the IR. This patch
modifies the construction of the latter. Specifically, if foo calls
bar in the IR, but bar is missing the profile, we pretend that foo
directly calls some heap allocation function. We apply this
transformation only in the inline stack leading to some heap
allocation function.
show more ...
|
#
c7451ffc |
| 20-Dec-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Supporting hinting mostly-cold allocations after cloning (#120633)
Optionally unconditionally hint allocations as cold or not cold during
the cloning step if the percentage of bytes alloc
[MemProf] Supporting hinting mostly-cold allocations after cloning (#120633)
Optionally unconditionally hint allocations as cold or not cold during
the cloning step if the percentage of bytes allocated is at least that
of the given threshold. This is similar to PR120301 which supports this
during matching, but enables the same behavior during cloning, to reduce
the false positives that can be addressed by cloning at the cost of
carrying the additional size metadata/summary.
show more ...
|
#
28865769 |
| 19-Dec-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] clang-format MemProf-related files (NFC) (#120504)
|
#
ac8a9f8f |
| 18-Dec-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Undrift MemProfRecord (#120138)
This patch undrifts source locations in MemProfRecord before readMemprof
starts the matching process.
The thoery of operation is as follows:
1. Colle
[memprof] Undrift MemProfRecord (#120138)
This patch undrifts source locations in MemProfRecord before readMemprof
starts the matching process.
The thoery of operation is as follows:
1. Collect the lists of direct calls, one from the IR and the other
from the profile.
2. Compute the correspondence (called undrift map in the patch)
between the two lists with longestCommonSequence.
3. Apply the undrift map just before readMemprof consumes
MemProfRecord.
The new function gated by a flag that is off by default.
show more ...
|
#
a15e7b11 |
| 17-Dec-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Add option to hint allocations at a given cold byte percentage (#120301)
Optionally unconditionally hint allocations as cold or not cold during
the matching step if the percentage of byte
[MemProf] Add option to hint allocations at a given cold byte percentage (#120301)
Optionally unconditionally hint allocations as cold or not cold during
the matching step if the percentage of bytes allocated is at least that
of the given threshold.
show more ...
|
Revision tags: llvmorg-19.1.6 |
|
#
7c294eb7 |
| 14-Dec-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Simplify readMemprof (NFC) (#119930)
This patch essentially replaces:
std::pair<const std::vector<Frame> *, unsigned>
with:
ArrayRef<Frame>
This way, we can store and pass
[memprof] Simplify readMemprof (NFC) (#119930)
This patch essentially replaces:
std::pair<const std::vector<Frame> *, unsigned>
with:
ArrayRef<Frame>
This way, we can store and pass ArrayRef<Frame>, conceptually one
item, instead of the pointer and index.
The only problem is that we don't have an existing hash function for
ArrayRef<Frame>>, so we provide a custom one, namely
CallStackHash.
show more ...
|
#
2e33ed9e |
| 06-Dec-2024 |
Ellis Hoag <ellis.sparky.hoag@gmail.com> |
[memprof] Use -memprof-runtime-default-options to set options during compile time (#118874)
Add the `__memprof_default_options_str` variable, initialized via the `-memprof-runtime-default-options` L
[memprof] Use -memprof-runtime-default-options to set options during compile time (#118874)
Add the `__memprof_default_options_str` variable, initialized via the `-memprof-runtime-default-options` LLVM flag, to hold the default options string for memprof. This allows us to set these options during compile time in the clang invocation.
Also update the docs to describe the various ways to set these options.
show more ...
|
Revision tags: llvmorg-19.1.5 |
|
#
51cdf1f6 |
| 21-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Skip MemProfUsePass on the empty module (#117210)
This patch teaches the MemProfUsePass to return immediately on
the empty module.
Aside from saving time to deserialize the MemProf pro
[memprof] Skip MemProfUsePass on the empty module (#117210)
This patch teaches the MemProfUsePass to return immediately on
the empty module.
Aside from saving time to deserialize the MemProf profile, this patch
ensures that we can obtain TLI like so:
TargetLibraryInfo &TLI =
FAM.getResult<TargetLibraryAnalysis>(*M.begin());
when we undrift the MemProf profile in near future.
show more ...
|
#
a2e266b3 |
| 20-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Add computeUndriftMap (#116478)
This patch adds computeUndriftMap, a function to compute mappings from
source locations in the MemProf profile to source locations in the IR.
|
Revision tags: llvmorg-19.1.4 |
|
#
9513f2fd |
| 15-Nov-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Print full context hash when reporting hinted bytes (#114465)
Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from t
[MemProf] Print full context hash when reporting hinted bytes (#114465)
Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from the original
profile, similar to what we do when reporting matching statistics. This
will make it easier to correlate with the profile.
Note that the full context hash must be computed at profile match time
and saved in the metadata and summary, because we may trim the context
during matching when it isn't needed for distinguishing hotness.
Similarly, due to the context trimming, we may have more than one full
context id and total size pair per MIB in the metadata and summary,
which now get a list of these pairs.
Remove the old aggregate size from the metadata and summary support.
One other change from the prior support is that we no longer write the
size information into the combined index for the LTO backends, which
don't use this information, which reduces unnecessary bloat in
distributed index files.
show more ...
|
#
95554cbd |
| 13-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Teach extractCallsFromIR to recognize heap allocation functions (#115938)
This patch teaches extractCallsFromIR to recognize heap allocation
functions. Specifically, when we encounter a
[memprof] Teach extractCallsFromIR to recognize heap allocation functions (#115938)
This patch teaches extractCallsFromIR to recognize heap allocation
functions. Specifically, when we encounter a callee that is known to
be a heap allocation function like "new", we set the callee GUID to 0.
Note that I am planning to do the same for the caller-callee pairs
extracted from the profile. That is, when I encounter a frame that
does not have a callee, we assume that the frame is calling some heap
allocation function with GUID 0.
Technically, I'm not recognizing enough functions in this patch.
TCMalloc is known to drop certain frames in the call stack immediately
above new. This patch is meant to lay the groundwork, setting up
GetTLI, plumbing it to extractCallsFromIR, and adjusting the unit
tests. I'll address remaining issues in subsequent patches.
show more ...
|
#
c6183244 |
| 09-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Teach extractCallsFromIR to look into inline stacks (#115441)
To undrift the profile, we need to extract as many caller-callee pairs
from the IR as we can to maximize the number of call s
[memprof] Teach extractCallsFromIR to look into inline stacks (#115441)
To undrift the profile, we need to extract as many caller-callee pairs
from the IR as we can to maximize the number of call sites in the
profile we can undrift.
Now, since MemProfUsePass runs after early inlining, some functions
have been inlined, and we may no longer have bodies for those
functions in the IR. To cope with this, this patch teaches
extractCallsFromIR to extract caller-calee pairs from inline stacks.
The output format of extractCallsFromIR remains the same. We still
return a map from caller GUIDs to lists of corresponding call sites.
show more ...
|
#
e189d619 |
| 07-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Add extractCallsFromIR (#115218)
This patch adds extractCallsFromIR, a function to extract calls from
the IR, which will be used to undrift call site locations in the
MemProf profile.
[memprof] Add extractCallsFromIR (#115218)
This patch adds extractCallsFromIR, a function to extract calls from
the IR, which will be used to undrift call site locations in the
MemProf profile.
In a nutshell, the MemProf undrifting works as follows:
- Extract call site locations from the IR.
- Extract call site locations from the MemProf profile.
- Undrift the call site locations with longestCommonSequence.
This patch implements the first bullet point above. Specifically,
given the IR, the new function returns a map from caller GUIDs to
lists of corresponding call sites. For example:
Given:
foo() {
f1();
f2(); f3();
}
extractCallsFromIR returns:
Caller: foo ->
{{(Line 1, Column 3), Callee: f1},
{(Line 2, Column 3), Callee: f2},
{(Line 2, Column 9), Callee: f3}}
where the line numbers, relative to the beginning of the caller, and
column numbers are sorted in the ascending order. The value side of
the map -- the list of call sites -- can be directly passed to
longestCommonSequence.
To facilitate the review process, I've only implemented basic features
in extractCallsFromIR in this patch.
- The new function extracts calls from the LLVM "call" instructions
only. It does not look into the inline stack.
- It does not recognize or treat heap allocation functions in any
special way.
I will address these missing features in subsequent patches.
show more ...
|
#
77b7d9de |
| 04-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Add const to isAllocationWithHotColdVariant (NFC) (#114719)
|
#
890c4bec |
| 02-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Use SmallVector for InlinedCallStack (NFC) (#114599)
We can stay within 8 inlined elements more than 99% of the time while
building a large application.
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
#
9b00ef52 |
| 26-Aug-2024 |
Snehasish Kumar <snehasishk@google.com> |
Revert "Add unit tests for size returning new funcs in the MemProf use pass. (#105473)" (#106114)
This reverts commit 2e426fe8ff314c2565073e73e27fdbdf36c140a3.
|
#
2e426fe8 |
| 26-Aug-2024 |
Snehasish Kumar <snehasishk@google.com> |
Add unit tests for size returning new funcs in the MemProf use pass. (#105473)
We use a unit test to verify correctness since:
a) we don't have a text format profile
b) size returning new isn't su
Add unit tests for size returning new funcs in the MemProf use pass. (#105473)
We use a unit test to verify correctness since:
a) we don't have a text format profile
b) size returning new isn't supported natively
c) a raw profile will need to be manipulated artificially
The changes this test covers were made in
https://github.com/llvm/llvm-project/pull/102258.
show more ...
|
Revision tags: llvmorg-19.1.0-rc3 |
|
#
95daf1ae |
| 15-Aug-2024 |
Snehasish Kumar <snehasishk@google.com> |
Allow optimization of __size_returning_new variants. (#102258)
https://github.com/llvm/llvm-project/pull/101564 added support to TLI to
detect variants of operator new which provide feedback on the
Allow optimization of __size_returning_new variants. (#102258)
https://github.com/llvm/llvm-project/pull/101564 added support to TLI to
detect variants of operator new which provide feedback on the actual
size of memory allocated (http://wg21.link/P0901R5). This patch extends
SimplifyLibCalls to handle hot cold hinting of these variants.
show more ...
|
Revision tags: llvmorg-19.1.0-rc2 |
|
#
17993eb1 |
| 29-Jul-2024 |
Matthew Weingarten <matt@weingarten.org> |
[Memprof] Adds instrumentation support for memprof with histograms. (#100834)
This patch allows running `-fmemory-profile` without the flag
`-memprof-use-callbacks`, meaning the `RecordAccessesHist
[Memprof] Adds instrumentation support for memprof with histograms. (#100834)
This patch allows running `-fmemory-profile` without the flag
`-memprof-use-callbacks`, meaning the `RecordAccessesHistogram` is
injected into IR as a sequence of instructions. This significantly
increases performance of the instrumented binary.
show more ...
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
8c1bd67d |
| 10-Jul-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Optionally print or record the profiled sizes of allocations (#98248)
This is the first step in being able to track the total profiled sizes
of allocations successfully marked as cold.
[MemProf] Optionally print or record the profiled sizes of allocations (#98248)
This is the first step in being able to track the total profiled sizes
of allocations successfully marked as cold.
Under a new option -memprof-report-hinted-sizes:
- For unambiguous (non-context-sensitive) allocations, print the
profiled size and the allocation coldness, along with a hash of the
allocation's location (to allow for deduplication across modules or
inline instances).
- For context sensitive allocations, add the size as a 3rd operand on
the MIB metadata. A follow on patch will propagate this through to the
thin link where the sizes will be reported for each context after
cloning.
show more ...
|
#
9df71d76 |
| 28-Jun-2024 |
Nikita Popov <npopov@redhat.com> |
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, re
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
show more ...
|
#
30b93db5 |
| 26-Jun-2024 |
Matthew Weingarten <matt@weingarten.org> |
[Memprof] Adds the option to collect AccessCountHistograms for memprof. (#94264)
Adds compile time flag -mllvm -memprof-histogram and runtime flag
histogram=true|false to turn Histogram collection
[Memprof] Adds the option to collect AccessCountHistograms for memprof. (#94264)
Adds compile time flag -mllvm -memprof-histogram and runtime flag
histogram=true|false to turn Histogram collection on and off. The
-memprof-histogram flag relies on -memprof-use-callbacks=true to work.
Updates shadow mapping logic in histogram mode from having one 8 byte
counter for 64 bytes, to 1 byte for 8 bytes, capped at 255. Only
supports this granularity as of now.
Updates the RawMemprofReader and serializing MemoryInfoBlocks to binary
format, including changing to a new version of the raw binary format
from version 3 to version 4.
Updates creating MemoryInfoBlocks with and without Histograms. When two
MemoryInfoBlocks are merged, AccessCounts are summed up and the shorter
Histogram is removed.
Adds a memprof_histogram test case.
Initial commit for adding AccessCountHistograms up until RawProfile for
memprof
show more ...
|
#
d75f9dd1 |
| 24-Jun-2024 |
Stephen Tozer <stephen.tozer@sony.com> |
Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and did not update all callsites:
https://lab.llvm.org/buildbot
Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and did not update all callsites:
https://lab.llvm.org/buildbot/#/builders/29/builds/382
This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
show more ...
|
#
6481dc57 |
| 24-Jun-2024 |
Stephen Tozer <stephen.tozer@sony.com> |
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
show more ...
|