Revision tags: llvmorg-19.1.5 |
|
#
82b43794 |
| 27-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Use "using" directives in unit tests (NFC) (#117852)
This tests uses existing "using" directives to shorten unit tests.
- llvm::memprof::hashCallStack -> hashCallStack
- testing::Pair
[memprof] Use "using" directives in unit tests (NFC) (#117852)
This tests uses existing "using" directives to shorten unit tests.
- llvm::memprof::hashCallStack -> hashCallStack
- testing::Pair -> Pair
- testing::ElementsAreArray -> ElementsAre
- testing::Contains -> UnorderedElementsAre
show more ...
|
#
e98396f4 |
| 27-Nov-2024 |
Kazu Hirata <kazu@google.com> |
Reapply [memprof] Add YAML-based deserialization for MemProf profile (#117829)
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes be
Reapply [memprof] Add YAML-based deserialization for MemProf profile (#117829)
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes because we do not have a text format for the MemProf profile. We would write a test case in C++, run it for a binary MemProf profile, and then finally run a test written in LLVM IR with the binary profile.
This patch paves the way toward YAML-based MemProf profile. Specifically, it adds new class YAMLMemProfReader derived from MemProfReader. For now, it only adds a function to parse StringRef pointing to YAML data. Subseqeunt patches will wire it to llvm-profdata and read from a file.
The field names are based on various printYAML functions in MemProf.h. I'm not aiming for compatibility with the format used in printYAML, but I don't see a point in changing the field names.
This iteration works around the unavailability of ScalarTraits<uintptr_t> on macOS.
show more ...
|
#
7e312c3b |
| 27-Nov-2024 |
Florian Hahn <flo@fhahn.com> |
Revert "[memprof] Add YAML-based deserialization for MemProf profile (#117829)"
This reverts commit c00e53208db638c35499fc80b555f8e14baa35f0.
It looks like this breaks building LLVM on macOS and so
Revert "[memprof] Add YAML-based deserialization for MemProf profile (#117829)"
This reverts commit c00e53208db638c35499fc80b555f8e14baa35f0.
It looks like this breaks building LLVM on macOS and some other platform/compiler combos
https://lab.llvm.org/buildbot/#/builders/23/builds/5252 https://green.lab.llvm.org/job/llvm.org/job/clang-san-iossim/5356/console
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/lib/ProfileData/MemProfReader.cpp:34: In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/MemProfReader.h:24: In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/InstrProfReader.h:22: In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/InstrProfCorrelator.h:21: /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:1173:36: error: implicit instantiation of undefined template 'llvm::yaml::MissingTrait<unsigned long>' char missing_yaml_trait_for_type[sizeof(MissingTrait<T>)]; ^ /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:961:7: note: in instantiation of function template specialization 'llvm::yaml::yamlize<unsigned long>' requested here yamlize(*this, Val, Required, Ctx); ^ /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:883:11: note: in instantiation of function template specialization 'llvm::yaml::IO::processKey<unsigned long, llvm::yaml::EmptyContext>' requested here this->processKey(Key, Val, true, Ctx); ^ /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/MIBEntryDef.inc:55:1: note: in instantiation of function template specialization 'llvm::yaml::IO::mapRequired<unsigned long>' requested here MIBEntryDef(AccessHistogram = 27, AccessHistogram, uintptr_t) ^ /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/lib/ProfileData/MemProfReader.cpp:77:8: note: expanded from macro 'MIBEntryDef' Io.mapRequired(KeyStr.str().c_str(), MIB.Name); \ ^ /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:310:8: note: template is declared here struct MissingTrait; ^ 1 error generated.
show more ...
|
#
c00e5320 |
| 27-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Add YAML-based deserialization for MemProf profile (#117829)
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes because
[memprof] Add YAML-based deserialization for MemProf profile (#117829)
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes because we do not
have a text format for the MemProf profile. We would write a test
case in C++, run it for a binary MemProf profile, and then finally run
a test written in LLVM IR with the binary profile.
This patch paves the way toward YAML-based MemProf profile.
Specifically, it adds new class YAMLMemProfReader derived from
MemProfReader. For now, it only adds a function to parse StringRef
pointing to YAML data. Subseqeunt patches will wire it to
llvm-profdata and read from a file.
The field names are based on various printYAML functions in MemProf.h.
I'm not aiming for compatibility with the format used in printYAML,
but I don't see a point in changing the field names.
show more ...
|
#
5add295f |
| 26-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Use IndexedMemProfRecord in MemProfReader (NFC) (#117613)
IndexedMemProfRecord contains a complete package of the MemProf
profile, including frames, call stacks, and records. This patch
[memprof] Use IndexedMemProfRecord in MemProfReader (NFC) (#117613)
IndexedMemProfRecord contains a complete package of the MemProf
profile, including frames, call stacks, and records. This patch
replaces the three member variables of MemProfReader with
IndexedMemProfRecord.
This transition significantly simplies both the constructor and the
final "take" method:
MemProfReader(IndexedMemProfData MemProfData)
: MemProfData(std::move(MemProfData)) {}
IndexedMemProfData takeMemProfData() { return std::move(MemProfData); }
show more ...
|
#
ff7b42c1 |
| 25-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Speed up llvm-profdata (#117446)
CallStackRadixTreeBuilder::build takes the parameter
MemProfFrameIndexes by value, involving copies:
std::optional<const llvm::DenseMap<FrameIdTy, Li
[memprof] Speed up llvm-profdata (#117446)
CallStackRadixTreeBuilder::build takes the parameter
MemProfFrameIndexes by value, involving copies:
std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>>
MemProfFrameIndexes
Then "build" makes another copy of MemProfFrameIndexe and passes it to
encodeCallStack for every call stack, which is painfully slow.
This patch changes the type to a pointer so that we don't have to make
a copy every time we pass the argument.
Without this patch, it takes 553 seconds to run "llvm-profdata merge"
on a large MemProf raw profile. This patch shortenes that down to 67
seconds.
show more ...
|
#
ad2bdd8f |
| 22-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Remove MemProf format Version 1 (#117357)
This patch removes MemProf format Version 1 now that Version 2 and 3
are working well.
|
#
b170ab21 |
| 20-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Construct MemProfReader with IndexedMemProfData (#117022)
This patch updates a unit test to construct MemProfReader with
IndexedMemProfData, a complete package of MemProf profile.
With
[memprof] Construct MemProfReader with IndexedMemProfData (#117022)
This patch updates a unit test to construct MemProfReader with
IndexedMemProfData, a complete package of MemProf profile.
With this change, nobody in the LLVM codebase is using the
MemProfReader constructor that takes individual components of the
MemProf profile, so this patch deprecates the constructor.
show more ...
|
#
e14827f0 |
| 20-Nov-2024 |
Teresa Johnson <tejohnson@google.com> |
[MemProf] Templatize CallStackRadixTreeBuilder (NFC) (#117014)
Prepare for usage in the bitcode reader/writer where we already have a
LinearFrameId:
- templatize input frame id type in CallStackRa
[MemProf] Templatize CallStackRadixTreeBuilder (NFC) (#117014)
Prepare for usage in the bitcode reader/writer where we already have a
LinearFrameId:
- templatize input frame id type in CallStackRadixTreeBuilder
- templatize input frame id type in computeFrameHistogram
- make the map from FrameId to LinearFrameId optional
We plan to use the same radix format in the ThinLTO summary records,
where we already have a LinearFrameId.
show more ...
|
#
f88c913f |
| 20-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Add a new constructor to MemProfReader (NFC) (#116918)
This patch adds a new constructor to MemProfReader that takes
IndexedMemProfData, a complete package of MemProf profile. To
showca
[memprof] Add a new constructor to MemProfReader (NFC) (#116918)
This patch adds a new constructor to MemProfReader that takes
IndexedMemProfData, a complete package of MemProf profile. To
showcase its usage, I'm updating one of the unit tests to use the new
constructor.
Because of type mismatches between DenseMap and MapVector, I'm copying
Frames and CallStacks for now. Once we remove the methods and old
constructors that take or return individual components (frames, call
stacks, and records), we will drop the copying, and the new
constructor will collapse down to:
MemProfReader(IndexedMemProfData MemProfData)
: MemProfData(std::move(MemProfData)) {}
Since nobody in the LLVM codebase uses the constructor that takes the
three indivdual components, I'm deprecating the old constructor.
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
a4e1a3dc |
| 18-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Add another constructor to IndexedAllocationInfo (NFC) (#116684)
This patch adds another constructor to IndexedAllocationInfo that is
identical to the existing constructor except that the
[memprof] Add another constructor to IndexedAllocationInfo (NFC) (#116684)
This patch adds another constructor to IndexedAllocationInfo that is
identical to the existing constructor except that the new one leaves
the CallStack field empty.
I'm planning to remove MemProf format Version 1. Then we will migrate
the users of the existing constructor to the new one as nobody will be
using the CallStack field anymore.
Adding the new constructor now allows us to migrate a few existing
users of the old constructor even before we remove the CallStack
field. In turn, that simplifies the patch to actually remove the
field.
show more ...
|
#
0d38f64e |
| 15-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Remove MemProf format Version 0 (#116442)
This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.
I'm not touching version 1 for now because some
[memprof] Remove MemProf format Version 0 (#116442)
This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.
I'm not touching version 1 for now because some tests still rely on
version 1.
Note that Version 0 is identical to Version 1 except that the MemProf
section of the indexed format has a MemProf version field.
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0 |
|
#
459a82e6 |
| 13-Sep-2024 |
JOE1994 <joseph942010@gmail.com> |
[llvm][unittests] Don't call raw_string_ostream::flush() (NFC)
raw_string_ostream::flush() is essentially a no-op (also specified in docs). Don't call it in tests that aren't meant to test 'raw_stri
[llvm][unittests] Don't call raw_string_ostream::flush() (NFC)
raw_string_ostream::flush() is essentially a no-op (also specified in docs). Don't call it in tests that aren't meant to test 'raw_string_ostream' itself.
p.s. remove a few redundant calls to raw_string_ostream::str()
show more ...
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
30b93db5 |
| 26-Jun-2024 |
Matthew Weingarten <matt@weingarten.org> |
[Memprof] Adds the option to collect AccessCountHistograms for memprof. (#94264)
Adds compile time flag -mllvm -memprof-histogram and runtime flag
histogram=true|false to turn Histogram collection
[Memprof] Adds the option to collect AccessCountHistograms for memprof. (#94264)
Adds compile time flag -mllvm -memprof-histogram and runtime flag
histogram=true|false to turn Histogram collection on and off. The
-memprof-histogram flag relies on -memprof-use-callbacks=true to work.
Updates shadow mapping logic in histogram mode from having one 8 byte
counter for 64 bytes, to 1 byte for 8 bytes, capped at 255. Only
supports this granularity as of now.
Updates the RawMemprofReader and serializing MemoryInfoBlocks to binary
format, including changing to a new version of the raw binary format
from version 3 to version 4.
Updates creating MemoryInfoBlocks with and without Histograms. When two
MemoryInfoBlocks are merged, AccessCounts are summed up and the shorter
Histogram is removed.
Adds a memprof_histogram test case.
Initial commit for adding AccessCountHistograms up until RawProfile for
memprof
show more ...
|
Revision tags: llvmorg-18.1.8 |
|
#
dc3f8c2f |
| 08-Jun-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Improve deserialization performance in V3 (#94787)
We call llvm::sort in a couple of places in the V3 encoding:
- We sort Frames by FrameIds for stability of the output.
- We sort ca
[memprof] Improve deserialization performance in V3 (#94787)
We call llvm::sort in a couple of places in the V3 encoding:
- We sort Frames by FrameIds for stability of the output.
- We sort call stacks in the dictionary order to maximize the length
of the common prefix between adjacent call stacks.
It turns out that we can improve the deserialization performance by
modifying the comparison functions -- without changing the format at
all. Both places take advantage of the histogram of Frames -- how
many times each Frame occurs in the call stacks.
- Frames: We serialize popular Frames in the descending order of
popularity for improved cache locality. For two equally popular
Frames, we break a tie by serializing one that tends to appear
earlier in call stacks. Here, "earlier" means a smaller index
within llvm::SmallVector<FrameId>.
- Call Stacks: We sort the call stacks to reduce the number of times
we follow pointers to parents during deserialization. Specifically,
instead of comparing two call stacks in the strcmp style -- integer
comparisons of FrameIds, we compare two FrameIds F1 and F2 with
Histogram[F1] < Histogram[F2] at respective indexes. Since we
encode from the end of the sorted list of call stacks, we tend to
encode popular call stacks first.
Since the two places use the same histogram, we compute it once and
share it in the two places.
Sorting the call stacks reduces the number of "jumps" by 74% when we
deserialize all MemProfRecords. The cycle and instruction counts go
down by 10% and 1.5%, respectively.
If we sort the Frames in addition to the call stacks, then the cycle
and instruction counts go down by 14% and 1.6%, respectively, relative
to the same baseline (that is, without this patch).
show more ...
|
#
c348e265 |
| 07-Jun-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Use CallStackRadixTreeBuilder in the V3 format (#94708)
This patch integrates CallStackRadixTreeBuilder into the V3 format,
reducing the profile size to about 27% of the V2 profile size.
[memprof] Use CallStackRadixTreeBuilder in the V3 format (#94708)
This patch integrates CallStackRadixTreeBuilder into the V3 format,
reducing the profile size to about 27% of the V2 profile size.
- Serialization: writeMemProfCallStackArray just needs to write out
the radix tree array prepared by CallStackRadixTreeBuilder.
Mappings from CallStackIds to LinearCallStackIds are moved by new
function CallStackRadixTreeBuilder::takeCallStackPos.
- Deserialization: Deserializing a call stack is the same as
deserializing an array encoded in the obvious manner -- the length
followed by the payload, except that we need to follow a pointer to
the parent to take advantage of common prefixes once in a while.
This patch teaches LinearCallStackIdConverter to how to handle those
pointers.
show more ...
|
#
5c0df5fe |
| 06-Jun-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Add CallStackRadixTreeBuilder (#93784)
Call stacks are a huge portion of the MemProf profile, taking up 70+%
of the profile file size.
This patch implements a radix tree to compress ca
[memprof] Add CallStackRadixTreeBuilder (#93784)
Call stacks are a huge portion of the MemProf profile, taking up 70+%
of the profile file size.
This patch implements a radix tree to compress call stacks, which are
known to have long common prefixes. Specifically,
CallStackRadixTreeBuilder, introduced in this patch, takes call stacks
in the MemProf profile, sorts them in the dictionary order to maximize
the common prefix between adjacent call stacks, and then encodes a
radix tree into a single array that is ready for serialization.
The resulting radix array is essentially a concatenation of call stack
arrays, each encoded with its length followed by the payload, except
that these arrays contain "instructions" like "skip 7 elements
forward" to borrow common prefixes from other call stacks.
This patch does not integrate with the MemProf
serialization/deserialization infrastructure yet. Once integrated,
the radix tree is expected to roughly halve the file size of the
MemProf profile.
show more ...
|
Revision tags: llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
26fabdde |
| 16-May-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Pass FrameIdConverter and CallStackIdConverter by reference (#92327)
CallStackIdConverter sets LastUnmappedId when a mapping failure
occurs. Now, since toMemProfRecord takes an instance
[memprof] Pass FrameIdConverter and CallStackIdConverter by reference (#92327)
CallStackIdConverter sets LastUnmappedId when a mapping failure
occurs. Now, since toMemProfRecord takes an instance of
CallStackIdConverter by value, namely std::function, the caller of
toMemProfRecord never receives the mapping failure that occurs inside
toMemProfRecord. The same problem applies to FrameIdConverter.
The patch fixes the problem by passing FrameIdConverter and
CallStackIdConverter by reference, namely llvm::function_ref.
While I am it, this patch deletes the copy constructor and copy
assignment operator to avoid accidental copies.
show more ...
|
#
181e2e8f |
| 10-May-2024 |
Mircea Trofin <mtrofin@google.com> |
[nfc][memprof] Add missing license to `MemProfTest` (#91695)
|
Revision tags: llvmorg-18.1.5 |
|
#
c9dae434 |
| 28-Apr-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Add access checks to PortableMemInfoBlock::get* (#90121)
commit 4c8ec8f8bc3fb4dda4fd36c3b2ad745bd3451970
Author: Kazu Hirata <kazu@google.com>
Date: Wed Apr 24 16:25:35 2024 -0700
[memprof] Add access checks to PortableMemInfoBlock::get* (#90121)
commit 4c8ec8f8bc3fb4dda4fd36c3b2ad745bd3451970
Author: Kazu Hirata <kazu@google.com>
Date: Wed Apr 24 16:25:35 2024 -0700
introduced the idea of serializing/deserializing a subset of the
fields in PortableMemInfoBlock. While it reduces the size of the
indexed MemProf profile file, we now could inadvertently access
unavailable fields and go without noticing.
To protect ourselves from the risk, this patch adds access checks to
PortableMemInfoBlock::get* methods by embedding a bit set representing
available fields into PortableMemInfoBlock.
show more ...
|
#
35260201 |
| 28-Apr-2024 |
Kazu Hirata <kazu@google.com> |
Repply [memprof] Introduce FrameIdConverter and CallStackIdConverter (#90307)
Currently, we convert FrameId to Frame and CallStackId to a call stack at several places. This patch unifies those into
Repply [memprof] Introduce FrameIdConverter and CallStackIdConverter (#90307)
Currently, we convert FrameId to Frame and CallStackId to a call stack at several places. This patch unifies those into function objects -- FrameIdConverter and CallStackIdConverter.
The existing implementation of CallStackIdConverter, being removed in this patch, handles both FrameId and CallStackId conversions. This patch splits it into two phases for flexibility (but make them composable) because some places only require the FrameId conversion.
This iteration fixes a problem uncovered with ubsan, where we were dereferencing an uninitialized std::unique_ptr.
show more ...
|
#
7aa6896d |
| 27-Apr-2024 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[memprof] Introduce FrameIdConverter and CallStackIdConverter" (#90318)
Reverts llvm/llvm-project#90307
Breaks bots https://lab.llvm.org/buildbot/#/builders/5/builds/42943
|
#
e04df693 |
| 27-Apr-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Introduce FrameIdConverter and CallStackIdConverter (#90307)
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places. This patch unifies those into funct
[memprof] Introduce FrameIdConverter and CallStackIdConverter (#90307)
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places. This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.
The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions. This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.
show more ...
|
#
1f38b8a2 |
| 25-Apr-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Use DenseMap::contains (NFC) (#90124)
This patch replaces count with contains, following the spirit of
clang-tidy's readability-container-contains.
|
#
cb9589b2 |
| 25-Apr-2024 |
Kazu Hirata <kazu@google.com> |
[memprof] Move getFullSchema and getHotColdSchema outside PortableMemInfoBlock (#90103)
These functions do not operate on PortableMemInfoBlock. This patch
moves them outside the class.
|