History log of /llvm-project/llvm/lib/ProfileData/InstrProfReader.cpp (Results 1 – 25 of 241)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# 684e79f2 08-Dec-2024 Kazu Hirata <kazu@google.com>

[memprof] Add YAML read/write support to llvm-profdata (#118915)

This patch adds YAML read/write support to llvm-profdata. The primary
intent is to accommodate MemProf profiles in test cases, ther

[memprof] Add YAML read/write support to llvm-profdata (#118915)

This patch adds YAML read/write support to llvm-profdata. The primary
intent is to accommodate MemProf profiles in test cases, thereby
avoiding the binary format.

The read support is via llvm-profdata merge. This is useful when we
want to verify that the compiler does the right thing on a given .ll
file and a MemProf profile in a test case. In the test case, we would
convert the MemProf profile in YAML to an indexed profile and invoke
the compiler on the .ll file along with the indexed profile.

The write support is via llvm-profdata show --memory. This is useful
when we wish to convert an indexed MemProf profile to YAML while
writing tests. We would compile a test case in C++, run it for an
indexed MemProf profile, and then convert it to the text format.

show more ...


# ff281f7d 04-Dec-2024 ronryvchin <94285266+ronryvchin@users.noreply.github.com>

[PGO] Add option to always instrumenting loop entries (#116789)

This patch extends the PGO infrastructure with an option to prefer the
instrumentation of loop entry blocks.
This option is a genera

[PGO] Add option to always instrumenting loop entries (#116789)

This patch extends the PGO infrastructure with an option to prefer the
instrumentation of loop entry blocks.
This option is a generalization of
https://github.com/llvm/llvm-project/commit/19fb5b467bb97f95eace1f3637d2d1041cebd3ce,
and helps to cover cases where the loop exit is never executed.
An example where this can occur are event handling loops.

Note that change does NOT change the default behavior.

show more ...


Revision tags: llvmorg-19.1.5
# a0153eaa 23-Nov-2024 Kazu Hirata <kazu@google.com>

[memprof] Fix builds under EXPENSIVE_CHECKS

memprof::Version1 has been removed, so the whole block of code is
dead.


# ad2bdd8f 22-Nov-2024 Kazu Hirata <kazu@google.com>

[memprof] Remove MemProf format Version 1 (#117357)

This patch removes MemProf format Version 1 now that Version 2 and 3
are working well.


# 4f1b20f0 20-Nov-2024 Kazu Hirata <kazu@google.com>

[ProfileData] Remove unused includes (NFC) (#116751)

Identified with misc-include-cleaner.


Revision tags: llvmorg-19.1.4
# 0d38f64e 15-Nov-2024 Kazu Hirata <kazu@google.com>

[memprof] Remove MemProf format Version 0 (#116442)

This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.

I'm not touching version 1 for now because some

[memprof] Remove MemProf format Version 0 (#116442)

This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.

I'm not touching version 1 for now because some tests still rely on
version 1.

Note that Version 0 is identical to Version 1 except that the MemProf
section of the indexed format has a MemProf version field.

show more ...


# 57ed628f 15-Nov-2024 Kazu Hirata <kazu@google.com>

[memprof] Speed up caller-callee pair extraction (Part 2) (#116441)

This patch further speeds up the extraction of caller-callee pairs
from the profile.

Recall that we reconstruct a call stack b

[memprof] Speed up caller-callee pair extraction (Part 2) (#116441)

This patch further speeds up the extraction of caller-callee pairs
from the profile.

Recall that we reconstruct a call stack by traversing the radix tree
from one of its leaf nodes toward a root. The implication is that
when we decode many different call stacks, we end up visiting nodes
near the root(s) repeatedly. That in turn adds many duplicates to our
data structure:

DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> Calls;

only to be deduplicated later with sort+unique for each vector.

This patch makes the extraction process more efficient by keeping
track of indices of the radix tree array we've visited so far and
terminating traversal as soon as we encounter an element previously
visited.

Note that even with this improvement, we still add at least one
caller-callee pair to the data structure above for each call stack
because we do need to add a caller-callee pair for the leaf node with
the callee GUID being 0.

Without this patch, it takes 4 seconds to extract caller-callee pairs
from a large MemProf profile. This patch shortenes that down to
900ms.

show more ...


# ec353b74 15-Nov-2024 Kazu Hirata <kazu@google.com>

[memprof] Use llvm::function_ref instead of std::function (#116306)

We've seen bugs where we lost track of error states stored in the
functor because we passed the functor by value (that is,
std::

[memprof] Use llvm::function_ref instead of std::function (#116306)

We've seen bugs where we lost track of error states stored in the
functor because we passed the functor by value (that is,
std::function) as opposed to reference (llvm::function_ref).

This patch fixes a couple of places we pass functors by value.

While we are at it, this patch adds curly braces around a "for" loop
spanning multiple lines.

show more ...


# 59da1afd 14-Nov-2024 Kazu Hirata <kazu@google.com>

[memprof] Speed up caller-callee pair extraction (#116184)

We know that the MemProf profile has a lot of duplicate call stacks.
Extracting caller-callee pairs from a call stack we've seen before is

[memprof] Speed up caller-callee pair extraction (#116184)

We know that the MemProf profile has a lot of duplicate call stacks.
Extracting caller-callee pairs from a call stack we've seen before is
a wasteful effort.

This patch makes the extraction more efficient by first coming up with
a work list of linear call stack IDs -- the set of starting positions
in the radix tree array -- and then extract caller-callee pairs from
each call stack in the work list.

We implement the work list as a bit vector because we expect the work
list to be dense in the range [0, RadixTreeSize). Also, we want the
set insertion to be cheap.

Without this patch, it takes 25 seconds to extract caller-callee pairs
from a large MemProf profile. This patch shortenes that down to 4
seconds.

show more ...


# 9a730d87 14-Nov-2024 Kazu Hirata <kazu@google.com>

[memprof] Add IndexedMemProfReader::getMemProfCallerCalleePairs (#115807)

Undrifting the MemProf profile requires two sets of information:

- caller-callee pairs from the profile
- callee-callee

[memprof] Add IndexedMemProfReader::getMemProfCallerCalleePairs (#115807)

Undrifting the MemProf profile requires two sets of information:

- caller-callee pairs from the profile
- callee-callee pairs from the IR

This patch adds a function to do the former. The latter has been
addressed by extractCallsFromIR.

Unfortunately, the current MemProf format does not directly give us
the caller-callee pairs from the profile. "struct Frame" just tells
us where the call site is -- Caller GUID and line/column numbers; it
doesn't tell us what function a given Frame is calling. To extract
caller-callee pairs, we need to scan each call stack, look at two
adjacent Frames, and extract a caller-callee pair.

Conceptually, we would extract caller-callee pairs with:

for each MemProfRecord in the profile:
for each call stack in AllocSites:
extract caller-callee pairs from adjacent pairs of Frames

However, this is highly inefficient. Obtaining MemProfRecord involves
looking up the OnDiskHashTable, allocating several vectors on the
heap, and populating fields that are irrelevant to us, such as MIB and
CallSites.

This patch adds an efficient way of doing the above. Specifically, we

- go though all IndexedMemProfRecords,
- look at each linear call stack ID
- extract caller-callee pairs from each call stack

The extraction is done by a new class CallerCalleePairExtractor,
modified from LinearCallStackIdConverter, which reconstructs a call
stack from the radix tree array. For our purposes, we skip the
reconstruction and immediately populates the data structure for
caller-callee pairs.

The resulting caller-callee-pairs is of the type:

DenseMap<uint64_t, SmallVector<CallEdgeTy, 0>> CallerCalleePairs;

which can be passed directly to longestCommonSequence just like the
result of extractCallsFromIR.

Further performance optimizations are possible for the new functions
in this patch. I'll address those in follow-up patches.

show more ...


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0
# 787cd8f0 06-Sep-2024 gulfemsavrun <gulfem@google.com>

[InstrProf] Add debuginfod correlation support (#106606)

This patch adds debuginfod support into llvm-profdata to
find the assosicated executable by a build id in a raw
profile to correlate a prof

[InstrProf] Add debuginfod correlation support (#106606)

This patch adds debuginfod support into llvm-profdata to
find the assosicated executable by a build id in a raw
profile to correlate a profile with a provided correlation
kind (debug-info or binary).

show more ...


Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 6c8ff4cb 11-Jul-2024 Kazu Hirata <kazu@google.com>

[ProfileData] Take ArrayRef<InstrProfValueData> in addValueData (NFC) (#97363)

This patch fixes another place in ProfileData where we have a pointer
to an array of InstrProfValueData and its length

[ProfileData] Take ArrayRef<InstrProfValueData> in addValueData (NFC) (#97363)

This patch fixes another place in ProfileData where we have a pointer
to an array of InstrProfValueData and its length separately.

addValueData is a bit unique in that it remaps incoming values in
place before adding them to ValueSites. AFAICT, no caller of
addValueData uses updated incoming values. With this patch, we add
value data to ValueSites first and then remaps values there. This
way, we can take ArrayRef<InstrProfValueData> as a parameter.

show more ...


# afbd7d1e 09-Jul-2024 Mircea Trofin <mtrofin@google.com>

[NFC] Coding style: drop `k` in `kGlobalIdentifierDelimiter` (#98230)


# fef144ce 25-Jun-2024 Kazu Hirata <kazu@google.com>

Revert "[llvm] Use llvm::sort (NFC) (#96434)"

This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d.

Reverting the patch fixes the following under EXPENSIVE_CHECKS:

LLVM :: CodeGen/AMDGPU

Revert "[llvm] Use llvm::sort (NFC) (#96434)"

This reverts commit 05d167fc201b4f2e96108be0d682f6800a70c23d.

Reverting the patch fixes the following under EXPENSIVE_CHECKS:

LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir
LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll
LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir
LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll
LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir
LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll

show more ...


# 05d167fc 23-Jun-2024 Kazu Hirata <kazu@google.com>

[llvm] Use llvm::sort (NFC) (#96434)


Revision tags: llvmorg-18.1.8
# 4403cdba 10-Jun-2024 Kazu Hirata <kazu@google.com>

[ProfileData] Refactor BinaryIdsStart and BinaryIdsSize (NFC) (#94922)

BinaryIdsStart and BinaryIdsSize in IndexedInstrProfReader are always
used together, so this patch packages them into an Array

[ProfileData] Refactor BinaryIdsStart and BinaryIdsSize (NFC) (#94922)

BinaryIdsStart and BinaryIdsSize in IndexedInstrProfReader are always
used together, so this patch packages them into an ArrayRef<uint8_t>.

For now, readBinaryIdsInternal immediately unpacks ArrayRef into its
constituents to avoid touching the rest of readBinaryIdsInternal.

show more ...


# 521238d1 09-Jun-2024 Kazu Hirata <kazu@google.com>

[ProfileData] Refactor VTableNamePtr and CompressedVTableNamesLen (NFC) (#94859)

VTableNamePtr and CompressedVTableNamesLen are always used together to
create a StringRef in getSymtab.

We can cr

[ProfileData] Refactor VTableNamePtr and CompressedVTableNamesLen (NFC) (#94859)

VTableNamePtr and CompressedVTableNamesLen are always used together to
create a StringRef in getSymtab.

We can create the StringRef ahead of time in readHeader. This way,
IndexedInstrProfReader becomes a tiny bit simpler with fewer member
variables. Also, StringRef default-constructs itself with its Data
and Length set to nullptr and 0, respectively, which is exactly what
we need.

show more ...


# 089c4bb5 09-Jun-2024 Kazu Hirata <kazu@google.com>

[ProfileData] Use ArrayRef instead of const std::vector<T> & (NFC) (#94878)


# e62c2146 08-Jun-2024 Kazu Hirata <kazu@google.com>

[ProfileData] Simplify calls to readNext in readBinaryIdsInternal (NFC) (#94862)

readNext has two variants:

- readNext<uint64_t, endian>(ptr)
- readNext<uint64_t>(ptr, endian)

This patch uses

[ProfileData] Simplify calls to readNext in readBinaryIdsInternal (NFC) (#94862)

readNext has two variants:

- readNext<uint64_t, endian>(ptr)
- readNext<uint64_t>(ptr, endian)

This patch uses the latter to simplify readBinaryIdsInternal. Both
forms default to unaligned.

show more ...


# 38124fef 08-Jun-2024 Kazu Hirata <kazu@google.com>

[ProfileData] Use a range-based for loop (NFC) (#94856)

While I am at it, this patch adds const to a couple of places.


# eb33e462 07-Jun-2024 Kazu Hirata <kazu@google.com>

[memprof] Clean up IndexedMemProfReader (NFC) (#94710)

Parameter "Version" is confusing in deserializeV012 and deserializeV3
because we also have member variable "Version". Fortunately,
parameter

[memprof] Clean up IndexedMemProfReader (NFC) (#94710)

Parameter "Version" is confusing in deserializeV012 and deserializeV3
because we also have member variable "Version". Fortunately,
parameter "Version" and member variable "Version" always have the same
value because IndexedMemProfReader::deserialize initializes the member
variable and passes it to deserializeV012 and deserializeV3.

This patch removes the parameter.

show more ...


Revision tags: llvmorg-18.1.7
# 4ce65423 02-Jun-2024 Kazu Hirata <kazu@google.com>

[memprof] Use const ref for IndexedRecord (#94114)

The type of *Iter here is "const IndexedMemProfRecord &" as defined in
RecordLookupTrait. Assigning *Iter to a variable of type
"const IndexedMe

[memprof] Use const ref for IndexedRecord (#94114)

The type of *Iter here is "const IndexedMemProfRecord &" as defined in
RecordLookupTrait. Assigning *Iter to a variable of type
"const IndexedMemProfRecord &" avoids a copy, reducing the cycle and
instruction counts by 1.8% and 0.2%, respectively, with
"llvm-profdata show" modified to deserialize all MemProfRecords.

Note that RecordLookupTrait has an internal copy of
IndexedMemProfRecord, so we don't have to worry about a dangling
reference to a temporary.

show more ...


# 90acfbf9 30-May-2024 Kazu Hirata <kazu@google.com>

[memprof] Use linear IDs for Frames and call stacks (#93740)

With this patch, we stop using on-disk hash tables for Frames and call
stacks. Instead, we'll write out all the Frames as a flat array

[memprof] Use linear IDs for Frames and call stacks (#93740)

With this patch, we stop using on-disk hash tables for Frames and call
stacks. Instead, we'll write out all the Frames as a flat array while
maintaining mappings from FrameIds to the indexes into the array.
Then we serialize call stacks in terms of those indexes.

Likewise, we'll write out all the call stacks as another flat array
while maintaining mappings from CallStackIds to the indexes into the
call stack array. One minor difference from Frames is that the
indexes into the call stack array are not contiguous because call
stacks are variable-length objects.

Then we serialize IndexedMemProfRecords in terms of the indexes
into the call stack array.

Now, we describe each call stack with 32-bit indexes into the Frame
array (as opposed to the 64-bit FrameIds in Version 2). The use of
the smaller type cuts down the profile file size by about 40% relative
to Version 2. The departure from the on-disk hash tables contributes
a little bit to the savings, too.

For now, IndexedMemProfRecords refer to call stacks with 64-bit
indexes into the call stack array. As a follow-up, I'll change that
to uint32_t, including necessary updates to RecordWriterTrait.

show more ...


# 99b9ab45 29-May-2024 Kazu Hirata <kazu@google.com>

[memprof] Reorder MemProf sections in profile (#93640)

This patch teaches the V3 format to serialize Frames, call stacks, and
IndexedMemProfRecords, in that order.

I'm planning to use linear IDs

[memprof] Reorder MemProf sections in profile (#93640)

This patch teaches the V3 format to serialize Frames, call stacks, and
IndexedMemProfRecords, in that order.

I'm planning to use linear IDs for Frames. That is, Frames will be
numbered 0, 1, 2, and so on in the order we serialize them. In turn,
we will seialize the call stacks in terms of those linear IDs.

Likewise, I'm planning to use linear IDs for call stacks and then
serialize IndexedMemProfRecords in terms of those linear IDs for call
stacks.

With the new order, we can successively free data structures as we
serialize them. That is, once we serialize Frames, we can free the
Frames' data proper and just retain mappings from FrameIds to linear
IDs. A similar story applies to call stacks.

show more ...


# 737a3018 29-May-2024 Mingming Liu <mingmingl@google.com>

[nfc][InstrFDO] Add Header::getIndexedProfileVersion and use it to decide profile version. (#93613)

This is a split of https://github.com/llvm/llvm-project/pull/93346 as
discussed.


12345678910