#
5740bb80 |
| 14-Dec-2021 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is e
[CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.
For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.
In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.
A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum.
```
main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ```
Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile.
I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115205
show more ...
|
Revision tags: llvmorg-13.0.1-rc1 |
|
#
d1378544 |
| 01-Nov-2021 |
Hongtao Yu <hoy@fb.com> |
[SamplePGO] Fix callsite sample lookup to use dwarf names when dwarf linkage name isn't available.
When linkage name isn't available in dwarf (ususally the case of C code), looking up callee sample
[SamplePGO] Fix callsite sample lookup to use dwarf names when dwarf linkage name isn't available.
When linkage name isn't available in dwarf (ususally the case of C code), looking up callee samples should be based on the dwarf name instead of using an empty string.
Also fixing a test issue where using empty string to look up callee samples accidentally returns the correct samples because it is treated as indirect call.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D112948
show more ...
|
#
259e4c56 |
| 27-Oct-2021 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Trim cold base profiles for the CS preinliner.
Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner
[CSSPGO] Trim cold base profiles for the CS preinliner.
Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner. Also disable the existing profile merger when preinliner is on unless explicitly specified.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D112489
show more ...
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
#
b9db7036 |
| 25-Aug-2021 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Split context string to deduplicate function name used in the context.
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size.
[CSSPGO] Split context string to deduplicate function name used in the context.
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.
A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.
When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.
Testing This is no-diff change in terms of code quality and profile content (for text profile).
For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.
The compile time of ads is dropped by 25%.
Differential Revision: https://reviews.llvm.org/D107299
show more ...
|
#
f27fee62 |
| 16-Aug-2021 |
Hongtao Yu <hoy@fb.com> |
[SamplePGO][NFC] Dump function profiles in order
Sample profiles are stored in a string map which is basically an unordered map. Printing out profiles by simply walking the string map doesn't enforc
[SamplePGO][NFC] Dump function profiles in order
Sample profiles are stored in a string map which is basically an unordered map. Printing out profiles by simply walking the string map doesn't enforce an order. I'm sorting the map in the decreasing order of total samples to enable a more stable dump, which is good for comparing two dumps.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D108147
show more ...
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3 |
|
#
c60f1d5d |
| 16-Jun-2021 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Fix an invalid hash table reference issue in the CS preinliner.
We were using a `StringMap` object to store all profiles to be emitted. The object is basically an unordered hash table, ther
[CSSPGO] Fix an invalid hash table reference issue in the CS preinliner.
We were using a `StringMap` object to store all profiles to be emitted. The object is basically an unordered hash table, therefore updating it in the process of trasvering it may cause issue since the underlying bucket array could change.
I'm also moving the `csspgo-preinliner` switch around so that no context tri will be constructed (by the constructor of `CSPreInliner`) when the switch is off.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104267
show more ...
|
Revision tags: llvmorg-12.0.1-rc2 |
|
#
cef9b96b |
| 15-Jun-2021 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Report zero-count probe in profile instead of dangling probes.
Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. Thi
[CSSPGO] Report zero-count probe in profile instead of dangling probes.
Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits:
1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode.
2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader.
3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.
4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples.
5. Better readability.
6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler.
Note that the current patch does include any work for #3. There will be follow-up changes.
For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of LBRProfileSection is reduced by 35%.
For #4, I have seen general counts quality for SPEC2017 is improved by 10%.
Reviewed By: wenlei, wlei, wmi
Differential Revision: https://reviews.llvm.org/D104129
show more ...
|
#
863184dd |
| 11-Jun-2021 |
wlei <wlei@fb.com> |
[CSSPGO] Aggregation by the last K context frames for cold profiles
This change provides the option to merge and aggregate cold context by the last k frames instead of context-less name. By default
[CSSPGO] Aggregation by the last K context frames for cold profiles
This change provides the option to merge and aggregate cold context by the last k frames instead of context-less name. By default K = 1 means the context-less one.
This is for better perf tuning. The more selective merging and trimming will rely on llvm-profgen's preinliner.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D104131
show more ...
|
#
6745ffe4 |
| 27-May-2021 |
Rong Xu <xur@google.com> |
[SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part)
This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitiv
[SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part)
This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is mainly for ProfileData part of change. It will load FS Profile when such profile is detected. For an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. For other format profiles, the users need to use an internal option (-profile-isfs) to tell the compiler that the profile uses FS discriminators.
This patch also simplified the bit API used by FS discriminators.
Differential Revision: https://reviews.llvm.org/D103041
show more ...
|
Revision tags: llvmorg-12.0.1-rc1 |
|
#
00ef28ef |
| 08-Apr-2021 |
Wenlei He <aktoon@gmail.com> |
[CSSPGO] Fix dangling context strings and improve profile order consistency and error handling
This patch fixed the following issues along side with some refactoring:
1. Fix bugs where StringRef fo
[CSSPGO] Fix dangling context strings and improve profile order consistency and error handling
This patch fixed the following issues along side with some refactoring:
1. Fix bugs where StringRef for context string out live the underlying std::string. We now keep string table in profile generator to hold std::strings. We also do the same for bracketed context strings in profile writer. 2. Make sure profile output strictly follow (total sample, name) order. Previously, there's inconsistency between ProfileMap's key and FunctionSamples's name, leading to inconsistent ordering. This is now fixed by introducing context profile canonicalization. Assertions are also added to make sure ProfileMap's key and FunctionSamples's name are always consistent. 3. Enhanced error handling for profile writing to make sure we bubble up errors properly for both llvm-profgen and llvm-profdata when string table is not populated correctly for extended binary profile. 4. Keep all internal context representation bracket free. This avoids creating new strings for context trimming, merging and preinline. getNameWithContext API is now simplied accordingly. 5. Factor out the code for context trimming and merging into SampleContextTrimmer in SampleProf.cpp. This enables llvm-profdata to use the trimmer when merging profiles. Changes in llvm-profgen will be in separate patch.
Differential Revision: https://reviews.llvm.org/D100090
show more ...
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
#
ee35784a |
| 19-Jan-2021 |
Wei Mi <wmi@google.com> |
[SampleFDO] Support enabling -funique-internal-linkage-name.
now -funique-internal-linkage-name flag is available, and we want to flip it on by default since it is beneficial to have separate sample
[SampleFDO] Support enabling -funique-internal-linkage-name.
now -funique-internal-linkage-name flag is available, and we want to flip it on by default since it is beneficial to have separate sample profiles for different internal symbols with the same name. As a preparation, we want to avoid regression caused by the flip.
When we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This kind of mismatch may introduce transient regression.
To avoid such mismatch, we introduce a NameTable section flag indicating whether there is any name in the profile containing uniq suffix. Compiler will decide whether to keep uniq suffix during name canonicalization depending on the NameTable section flag. The flag is only available for extbinary format. For other formats, by default compiler will keep uniq suffix so they will only experience transient regression when -funique-internal-linkage-name is just flipped.
Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed.
Differential Revision: https://reviews.llvm.org/D96932
show more ...
|
#
ad2a59f5 |
| 25-Feb-2021 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Introducing dangling pseudo probes.
Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is
[CSSPGO] Introducing dangling pseudo probes.
Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block.
To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission.
If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D95962
show more ...
|
#
7fb40011 |
| 28-Feb-2021 |
Wei Mi <wmi@google.com> |
[SampleFDO] Add a cutoff flag to control how many symbols will be included into profile symbol list.
When test is unrepresentative to production behavior, sample profile collected from production ca
[SampleFDO] Add a cutoff flag to control how many symbols will be included into profile symbol list.
When test is unrepresentative to production behavior, sample profile collected from production can cause unexpected performance behavior in test. To triage such issue, it is useful to have a cutoff flag to control how many symbols will be included into profile symbol list in order to do binary search.
Differential Revision: https://reviews.llvm.org/D97623
show more ...
|
#
28ea50f5 |
| 18-Jan-2021 |
Kazu Hirata <kazu@google.com> |
[llvm] Populate std::vector at construction time (NFC)
|
Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
#
ac068e01 |
| 16-Dec-2020 |
Hongtao Yu <hoy@fb.com> |
[CSSPGO] Consume pseudo-probe-based AutoFDO profile
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch w
[CSSPGO] Consume pseudo-probe-based AutoFDO profile
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.
Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).
**Adjustment to profile format **
A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like ``` !CFGChecksum: 12345 ``` is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.
Differential Revision: https://reviews.llvm.org/D92347
show more ...
|
Revision tags: llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
#
6b989a17 |
| 24-Mar-2020 |
Wenlei He <wenlei@fb.com> |
[CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining
This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1r
[CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining
This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon.
Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO.
**Interface**
There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile.
- Query base profile (`getBaseSamplesFor`) Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function.
- Query context profile (`getContextSamplesFor`) Context profile is a function's CFG profile for a given calling context. We can query context profile by context string.
- Track inlined context profile (`markContextSamplesInlined`) When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function.
- Track not-inlined context profile (`promoteMergeContextSamplesTree`) When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination.
**Implementation**
Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state.
On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes.
**Integration**
`SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`.
Differential Revision: https://reviews.llvm.org/D90125
show more ...
|
#
c67ccf5f |
| 25-Aug-2020 |
Wei Mi <wmi@google.com> |
[SampleFDO] Enhance profile remapping support for searching inline instance and indirect call promotion candidate.
Profile remapping is a feature to match a function in the module with its profile i
[SampleFDO] Enhance profile remapping support for searching inline instance and indirect call promotion candidate.
Profile remapping is a feature to match a function in the module with its profile in sample profile if the function name and the name in profile look different but are equivalent using given remapping rules. This is a useful feature to keep the performance stable by specifying some remapping rules when sampleFDO targets are going through some large scale function signature change.
However, currently profile remapping support is only valid for outline function profile in SampleFDO. It cannot match a callee with an inline instance profile if they have different but equivalent names. We found that without the support for inline instance profile, remapping is less effective for some large scale change.
To add that support, before any remapping lookup happens, all the names in the profile will be inserted into remapper and the Key to the name mapping will be recorded in a map called NameMap in the remapper. During name lookup, a Key will be returned for the given name and it will be used to extract an equivalent name in the profile from NameMap. So with the help of the NameMap, we can translate any given name to an equivalent name in the profile if it exists. Whenever we try to match a name in the module to a name in the profile, we will try the match with the original name first, and if it doesn't match, we will use the equivalent name got from remapper to try the match for another time. In this way, the patch can enhance the profile remapping support for searching inline instance and searching indirect call promotion candidate.
In a planned large scale change of int64 type (long long) to int64_t (long), we found the performance of a google internal benchmark degraded by 2% if nothing was done. If existing profile remapping was enabled, the performance degradation dropped to 1.2%. If the profile remapping with the current patch was enabled, the performance degradation further dropped to 0.14% (Note the experiment was done before searching indirect call promotion candidate was added. We hope with the remapping support of searching indirect call promotion candidate, the degradation can drop to 0% in the end. It will be evaluated post commit).
Differential Revision: https://reviews.llvm.org/D86332
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3 |
|
#
ebad6788 |
| 03-Mar-2020 |
Wei Mi <wmi@google.com> |
[SampleFDO] Port MD5 name table support to extbinary format.
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression w
[SampleFDO] Port MD5 name table support to extbinary format.
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it.
Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support.
Differential Revision: https://reviews.llvm.org/D76255
show more ...
|
Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
#
b3342e18 |
| 07-Oct-2019 |
Wenlei He <aktoon@gmail.com> |
[llvm-profdata] Minor format fix
Summary: Minor format fix for output of "llvm-profdata -show"
Reviewers: wmi
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://revi
[llvm-profdata] Minor format fix
Summary: Minor format fix for output of "llvm-profdata -show"
Reviewers: wmi
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68440
llvm-svn: 373917
show more ...
|
#
b523790a |
| 07-Oct-2019 |
Wei Mi <wmi@google.com> |
[SampleFDO] Add compression support for any section in ExtBinary profile format
Previously ExtBinary profile format only supports compression using zlib for profile symbol list. In this patch, we ex
[SampleFDO] Add compression support for any section in ExtBinary profile format
Previously ExtBinary profile format only supports compression using zlib for profile symbol list. In this patch, we extend the compression support to any section. User can select some or all of the sections to compress. In an experiment, for a 45M profile in ExtBinary format, compressing name table reduced its size to 24M, and compressing all the sections reduced its size to 11M.
Differential Revision: https://reviews.llvm.org/D68253
llvm-svn: 373914
show more ...
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4 |
|
#
198009ae |
| 31-Aug-2019 |
Wei Mi <wmi@google.com> |
Fix some errors introduced by rL370563 which were not exposed on my local machine. 1. zlib::compress accept &size_t but the param is an uint64_t. 2. Some systems don't have zlib installed. Don't use
Fix some errors introduced by rL370563 which were not exposed on my local machine. 1. zlib::compress accept &size_t but the param is an uint64_t. 2. Some systems don't have zlib installed. Don't use compression by default.
llvm-svn: 370564
show more ...
|
#
798e59b8 |
| 31-Aug-2019 |
Wei Mi <wmi@google.com> |
[SampleFDO] Add profile symbol list section to discriminate function being cold versus function being newly added.
This is the second half of https://reviews.llvm.org/D66374.
Profile symbol list is
[SampleFDO] Add profile symbol list section to discriminate function being cold versus function being newly added.
This is the second half of https://reviews.llvm.org/D66374.
Profile symbol list is the collection of function symbols showing up in the binary which generates the current profile. It is used to discriminate function being cold versus function being newly added. Profile symbol list is only added for profile with ExtBinary format.
During profile use compilation, when profile-sample-accurate is enabled, a function without profile will be regarded as cold only when it is contained in that list.
Differential Revision: https://reviews.llvm.org/D66766
llvm-svn: 370563
show more ...
|
Revision tags: llvmorg-9.0.0-rc3 |
|
#
5adace35 |
| 20-Aug-2019 |
Wenlei He <aktoon@gmail.com> |
[AutoFDO] Make call targets order deterministic for sample profile
Summary: StringMap is used for storing call target to frequency map for AutoFDO. However the iterating order of StringMap is non-de
[AutoFDO] Make call targets order deterministic for sample profile
Summary: StringMap is used for storing call target to frequency map for AutoFDO. However the iterating order of StringMap is non-deterministic, which leads to non-determinism in AutoFDO profile output. Now new API getSortedCallTargets and SortCallTargets are added for deterministic ordering and output.
Roundtrip test for text profile and binary profile is added.
Reviewers: wmi, davidxl, danielcdh
Subscribers: hiraditya, mgrang, llvm-commits, twoh
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66191
llvm-svn: 369440
show more ...
|
Revision tags: llvmorg-9.0.0-rc2 |
|
#
4b99b58a |
| 12-Aug-2019 |
Wenlei He <aktoon@gmail.com> |
[ThinLTO][AutoFDO] Fix memory corruption due to race condition from thin backends
Summary: This commit fixed a race condition from multi-threaded thinLTO backends that causes non-deterministic memor
[ThinLTO][AutoFDO] Fix memory corruption due to race condition from thin backends
Summary: This commit fixed a race condition from multi-threaded thinLTO backends that causes non-deterministic memory corruption for a data structure used only by AutoFDO with compact binary profile. GUIDToFuncNameMap, a static data member of type DenseMap in FunctionSamples is used as a per-module mapping from function name MD5 to name string when input AutoFDO profile is in compact binary format. However with ThinLTO, we can have parallel backends modifying and accessing the class static map concurrently. The fix is to make GUIDToFuncNameMap a member of SampleProfileLoader instead of a file static data.
Reviewers: wmi, davidxl, danielcdh
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65848
llvm-svn: 368596
show more ...
|
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|