Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
f6795e6b |
| 12-Nov-2024 |
Michael Kruse <llvm-project@meinersbur.de> |
[CodeExtractor] Refactor extractCodeRegion, fix alloca emission. (#114419)
Reorganize the code into phases:
* Analyze/normalize
* Create extracted function prototype
* Generate the new funct
[CodeExtractor] Refactor extractCodeRegion, fix alloca emission. (#114419)
Reorganize the code into phases:
* Analyze/normalize
* Create extracted function prototype
* Generate the new function's implementation
* Generate call to new function
* Connect call to original function's CFG
The motivation is #114669 to optionally clone the selected code region
into the new function instead of moving it. The current structure made
it difficult to add such functionality since there was no obvious place
to do so, not made easier by some functions doing more than their name
suggests. For instance, constructFunction modifies code outside the
constructed function, but also function properties such as
setPersonalityFn are derived somewhere else. Another example is
emitCallAndSwitchStatement, which despite its name also inserts stores
for output parameters.
Many operations also implicitly depend on the order they are applied
which this patch tries to reduce. For instance, ExtractedFuncRetVals
becomes the list exit blocks which also defines the return value when
leaving via that block. It is computed early such that the new
function's return instructions and the switch can be generated
independently. Also, ExtractedFuncRetVals is combining the lists
ExitBlocks and OldTargets which were not always kept consistent with
each other or NumExitBlocks. The method recomputeExitBlocks() will
update it when necessary.
The coding style partially contradict the current coding standard. For
instance some local variable start with lower case letters. I updated
some, but not all occurrences to make the diff match at least some lines
as unchanged.
The patch [D96854](https://reviews.llvm.org/D96854) introduced some
confusion of function argument indexes this is fixed here as well, hence
the patch is not NFC anymore. Tested in modified CodeExtractorTest.cpp.
Patch [D121061](https://reviews.llvm.org/D121061) introduced
AllocationBlock, but not all allocas were inserted there.
Efectively includes the following fixes:
1. https://github.com/llvm/llvm-project/commit/ce73b1672a6053d5974dc2342881aac02efe2dbb
2. https://github.com/llvm/llvm-project/commit/4aaa92578686176243a294eeb2ca5697a99edcaa
3. Missing allocas, still unfixed
Originally submitted as https://reviews.llvm.org/D115218
show more ...
|
#
4aaa9257 |
| 04-Nov-2024 |
Tom Eccles <tom.eccles@arm.com> |
[llvm][CodeExtractor] fix bug in parameter naming (#114237)
The code extractor tries to apply the names of source input and output
values to function arguments. Not all input and output values get
[llvm][CodeExtractor] fix bug in parameter naming (#114237)
The code extractor tries to apply the names of source input and output
values to function arguments. Not all input and output values get added
as arguments: some are instead placed inside of a struct passed to the
function. The existing renaming code skipped trying to set these
struct-packed arguments names (as there is no corresponding function
argument to rename), but it still incremented the iterator over the
function arguments. This could result in dereferencing an end iterator
if struct-packed inputs/outputs preceded non-struct-packed
inputs/outputs.
This patch rewrites this loop to avoid the end iterator dereference.
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
32f9983c |
| 15-Dec-2023 |
Jessica Del <50999226+OutOfCache@users.noreply.github.com> |
[AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers al
[AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.
We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4 |
|
#
eee8dd90 |
| 18-Oct-2023 |
Dominik Adamski <dominik.adamski@amd.com> |
[CodeExtractor] Allow to use 0 addr space for aggregate arg (#66998)
The user of CodeExtractor should be able to specify that
the aggregate argument should be passed as a pointer in zero address
s
[CodeExtractor] Allow to use 0 addr space for aggregate arg (#66998)
The user of CodeExtractor should be able to specify that
the aggregate argument should be passed as a pointer in zero address
space.
CodeExtractor is used to generate outlined functions required by OpenMP
runtime. The arguments of the outlined functions for OpenMP GPU code
are in 0 address space. 0 address space does not need to be the default
address space for GPU device. That's why there is a need to allow
the user of CodeExtractor to specify, that the allocated aggregate parameter
is passed as pointer in zero address space.
show more ...
|
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
#
c1b96725 |
| 23-Feb-2022 |
Bill Wendling <isanbard@gmail.com> |
[NFC] Add #include for constants
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init |
|
#
95b981ca |
| 26-Jan-2022 |
Giorgis Georgakoudis <georgakoudis1@llnl.gov> |
[CodeExtractor] Enable partial aggregate arguments
Summary: Enable CodeExtractor to construct output functions that partially aggregate inputs/outputs in their argument list. A use case is the OMPIR
[CodeExtractor] Enable partial aggregate arguments
Summary: Enable CodeExtractor to construct output functions that partially aggregate inputs/outputs in their argument list. A use case is the OMPIRBuilder to create outlined functions for parallel regions that aggregate in a struct the payload variables for the region while passing as scalars thread and bound identifiers.
Differential Revision: https://reviews.llvm.org/D96854
show more ...
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2 |
|
#
144cd22b |
| 23-Aug-2021 |
Andrew Litteken <andrew_litteken@apple.com> |
[CodeExtractor] Creating exit stubs based off original order branch instructions.
Previously the CodeExtractor created exit stubs, and the subsequent return value of the outlined function based on t
[CodeExtractor] Creating exit stubs based off original order branch instructions.
Previously the CodeExtractor created exit stubs, and the subsequent return value of the outlined function based on the order of out-of-region blocks after splitting any phi nodes, and collecting the blocks to be outlined. This could cause differences in order if there was a difference of exit block phi nodes between the two regions. This patch moves the collection of the output target blocks to be before this occurs, so that the assignment of target block to output value will be the same, regardless of the contents of the output block.
Reviewers: paquette, roelofs
Differential Revision: https://reviews.llvm.org/D108657
show more ...
|
#
9d2c859e |
| 26-Aug-2021 |
Andrew Litteken <andrew.litteken@gmail.com> |
[CodeExtractor] Making the arguments outlined easier to access from the outside
The Code Extractor does not provide an easy mechanism for determining the inputs and outputs after extraction has occu
[CodeExtractor] Making the arguments outlined easier to access from the outside
The Code Extractor does not provide an easy mechanism for determining the inputs and outputs after extraction has occurred, this patch gives the ability to pass in empty SetVectors to be filled with the inputs and outputs if they need to be analyzed.
Added Tests: - InputOutputMonitoring in unittests/Transforms/Utils/CodeExtractorTests.cpp
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D106991
show more ...
|
Revision tags: llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
#
700d2417 |
| 03-Nov-2020 |
Giorgis Georgakoudis <georgakoudis1@llnl.gov> |
[CodeExtractor] Replace uses of extracted bitcasts in out-of-region lifetime markers
CodeExtractor handles bitcasts in the extracted region that have lifetime markers users in the outer region as ou
[CodeExtractor] Replace uses of extracted bitcasts in out-of-region lifetime markers
CodeExtractor handles bitcasts in the extracted region that have lifetime markers users in the outer region as outputs. That creates unnecessary alloca/reload instructions and extra lifetime markers. The patch identifies those cases, and replaces uses in out-of-region lifetime markers with new bitcasts in the outer region.
**Example** ``` define void @foo() { entry: %0 = alloca i32 br label %extract
extract: %1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* %1) call void @use(i32* %0) br label %exit
exit: call void @use(i32* %0) call void @llvm.lifetime.end.p0i8(i64 4, i8* %1) ret void } ```
**Current extraction** ``` define void @foo() { entry: %.loc = alloca i8*, align 8 %0 = alloca i32, align 4 br label %codeRepl
codeRepl: ; preds = %entry %lt.cast = bitcast i8** %.loc to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast) %lt.cast1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1) call void @foo.extract(i32* %0, i8** %.loc) %.reload = load i8*, i8** %.loc, align 8 call void @llvm.lifetime.end.p0i8(i64 -1, i8* %lt.cast) br label %exit
exit: ; preds = %codeRepl call void @use(i32* %0) call void @llvm.lifetime.end.p0i8(i64 4, i8* %.reload) ret void }
define internal void @foo.extract(i32* %0, i8** %.out) { newFuncRoot: br label %extract
exit.exitStub: ; preds = %extract ret void
extract: ; preds = %newFuncRoot %1 = bitcast i32* %0 to i8* store i8* %1, i8** %.out, align 8 call void @use(i32* %0) br label %exit.exitStub } ```
**Extraction with patch** ``` define void @foo() { entry: %0 = alloca i32, align 4 br label %codeRepl
codeRepl: ; preds = %entry %lt.cast1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1) call void @foo.extract(i32* %0) br label %exit
exit: ; preds = %codeRepl call void @use(i32* %0) %lt.cast = bitcast i32* %0 to i8* call void @llvm.lifetime.end.p0i8(i64 4, i8* %lt.cast) ret void }
define internal void @foo.extract(i32* %0) { newFuncRoot: br label %extract
exit.exitStub: ; preds = %extract ret void
extract: ; preds = %newFuncRoot %1 = bitcast i32* %0 to i8* call void @use(i32* %0) br label %exit.exitStub } ```
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D90689
show more ...
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
#
8359511c |
| 29-Jan-2020 |
Vedant Kumar <vsk@apple.com> |
[CodeExtractor] Remove stale llvm.assume calls from extracted region
During extraction, stale llvm.assume handles may be retained in the original function. The setup is:
1) CodeExtractor unregister
[CodeExtractor] Remove stale llvm.assume calls from extracted region
During extraction, stale llvm.assume handles may be retained in the original function. The setup is:
1) CodeExtractor unregisters assumptions in the blocks that are to be extracted.
2) Extraction happens. There are now two functions: f1 and f1.extracted.
3) Leftover assumptions in f1 (/not/ removed as they were not in the set of blocks to be extracted) now have affected-value llvm.assume handles in f1.extracted.
When assumptions for a value used in f1 are looked up, ValueTracking can assert as some of the handles are in the wrong function. To fix this, simply erase the llvm.assume calls in the extracted function.
Alternatives include flushing the assumption cache in the original function, or walking all values used in the original function to prune stale affected-value handles. Both seem more expensive.
Testing: check-llvm, LNT run with -mllvm -hot-cold-split enabled
rdar://58460728
show more ...
|
Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
#
9852699d |
| 08-Oct-2019 |
Vedant Kumar <vsk@apple.com> |
[CodeExtractor] Factor out and reuse shrinkwrap analysis
Factor out CodeExtractor's analysis of allocas (for shrinkwrapping purposes), and allow the analysis to be reused.
This resolves a quadratic
[CodeExtractor] Factor out and reuse shrinkwrap analysis
Factor out CodeExtractor's analysis of allocas (for shrinkwrapping purposes), and allow the analysis to be reused.
This resolves a quadratic compile-time bug observed when compiling AMDGPUDisassembler.cpp.o.
Pre-patch (Release + LTO clang):
``` ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 176.5278 ( 57.8%) 0.4915 ( 18.5%) 177.0192 ( 57.4%) 177.4112 ( 57.3%) Hot Cold Splitting ```
Post-patch (ReleaseAsserts clang):
``` ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 1.4051 ( 3.3%) 0.0079 ( 0.3%) 1.4129 ( 3.2%) 1.4129 ( 3.2%) Hot Cold Splitting ```
Testing: check-llvm, and comparing the AMDGPUDisassembler.cpp.o binary pre- vs. post-patch.
An alternate approach is to hide CodeExtractorAnalysisCache from clients of CodeExtractor, and to recompute the analysis from scratch inside of CodeExtractor::extractCodeRegion(). This eliminates some redundant work in the shrinkwrapping legality check. However, some clients continue to exhibit O(n^2) compile time behavior as computing the analysis is O(n).
rdar://55912966
Differential Revision: https://reviews.llvm.org/D68616
llvm-svn: 374089
show more ...
|
#
50afaa9d |
| 04-Oct-2019 |
Aditya Kumar <hiraditya@msn.com> |
Add a unittest to verify for assumption cache
Reviewers: vsk, tejohnson
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D68095
llvm-svn: 373811
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1 |
|
#
0e5dd512 |
| 08-Feb-2019 |
Vedant Kumar <vsk@apple.com> |
[CodeExtractor] Restore outputs after creating exit stubs
When CodeExtractor saves the result of InvokeInst at the first insertion point of the 'normal destination' basic block, this block can be om
[CodeExtractor] Restore outputs after creating exit stubs
When CodeExtractor saves the result of InvokeInst at the first insertion point of the 'normal destination' basic block, this block can be omitted in the outlined region, so store is placed outside of the function. The suggested solution is to process saving outputs after creating exit stubs for new function, and stores will be placed in that blocks before return in this case.
Patch by Sergei Kachkov!
Fixes llvm.org/PR40455.
Differential Revision: https://reviews.llvm.org/D57919
llvm-svn: 353562
show more ...
|
Revision tags: llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
#
b2a6f8e5 |
| 07-Dec-2018 |
Vedant Kumar <vsk@apple.com> |
[CodeExtractor] Store outputs at the first valid insertion point
When CodeExtractor outlines values which are used by the original function, it must store those values in some in-out parameter. This
[CodeExtractor] Store outputs at the first valid insertion point
When CodeExtractor outlines values which are used by the original function, it must store those values in some in-out parameter. This store instruction must not be inserted in between a PHI and an EH pad instruction, as that results in invalid IR.
This fixes the following verifier failure seen while outlining within ObjC methods with live exit values:
The unwind destination does not have an exception handling instruction! %call35 = invoke i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %exn.adjusted, i8* %1) to label %invoke.cont34 unwind label %lpad33, !dbg !4183 The unwind destination does not have an exception handling instruction! invoke void @objc_exception_throw(i8* %call35) #12 to label %invoke.cont36 unwind label %lpad33, !dbg !4184 LandingPadInst not the first non-PHI instruction in the block. %3 = landingpad { i8*, i32 } catch i8* null, !dbg !1411
rdar://46540815
llvm-svn: 348562
show more ...
|
#
d129569e |
| 03-Dec-2018 |
Vedant Kumar <vsk@apple.com> |
[CodeExtractor] Split PHI nodes with incoming values from outlined region (PR39433)
If a PHI node out of extracted region has multiple incoming values from it, split this PHI on two parts. First PHI
[CodeExtractor] Split PHI nodes with incoming values from outlined region (PR39433)
If a PHI node out of extracted region has multiple incoming values from it, split this PHI on two parts. First PHI has incomings only from region and extracts with it (they are placed to the separate basic block that added to the list of outlined), and incoming values in original PHI are replaced by first PHI. Similar solution is already used in CodeExtractor for PHIs in entry block (severSplitPHINodes method). It covers PR39433 bug.
Patch by Sergei Kachkov!
Differential Revision: https://reviews.llvm.org/D55018
llvm-svn: 348205
show more ...
|
Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
#
c2990068 |
| 24-Oct-2018 |
Vedant Kumar <vsk@apple.com> |
[HotColdSplitting] Identify larger cold regions using domtree queries
The current splitting algorithm works in three stages:
1) Identify cold blocks, then 2) Use forward/backward propagation to
[HotColdSplitting] Identify larger cold regions using domtree queries
The current splitting algorithm works in three stages:
1) Identify cold blocks, then 2) Use forward/backward propagation to mark hot blocks, then 3) Grow a SESE region of blocks *outside* of the set of hot blocks and start outlining.
While testing this pass on Apple internal frameworks I noticed that some kinds of control flow (e.g. loops) are never outlined, even though they unconditionally lead to / follow cold blocks. I noticed two other issues related to how cold regions are identified:
- An inconsistency can arise in the internal state of the hotness propagation stage, as a block may end up in both the ColdBlocks set and the HotBlocks set. Further inconsistencies can arise as these sets do not match what's in ProfileSummaryInfo.
- It isn't necessary to limit outlining to single-exit regions.
This patch teaches the splitting algorithm to identify maximal cold regions and outline them. A maximal cold region is defined as the set of blocks post-dominated by a cold sink block, or dominated by that sink block. This approach can successfully outline loops in the cold path. As a side benefit, it maintains less internal state than the current approach.
Due to a limitation in CodeExtractor, blocks within the maximal cold region which aren't dominated by a single entry point (a so-called "max ancestor") are filtered out.
Results: - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to 47KB pre-patch, or a ~3x improvement. Did not see a performance impact across two runs. - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB of TEXT were outlined. Ditto re: performance impact. - Outlining results improve marginally in the internal frameworks I tested.
Follow-ups: - Outline more than once per function, outline large single basic blocks, & try to remove unconditional branches in outlined functions.
Differential Revision: https://reviews.llvm.org/D53627
llvm-svn: 345209
show more ...
|
Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3 |
|
#
8267b333 |
| 03-Sep-2018 |
Nico Weber <nicolasweber@gmx.de> |
Rename a few unittests/.../Foo.cpp files to FooTest.cpp
The convention for unit test sources is that they're called FooTest.cpp.
No behavior change. https://reviews.llvm.org/D51579
llvm-svn: 341313
|