#
29441e4f |
| 29-Jan-2025 |
Nikita Popov <npopov@redhat.com> |
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
show more ...
|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6 |
|
#
455b4fd0 |
| 04-Dec-2024 |
Mariusz Sikora <mariusz.sikora@amd.com> |
[AMDGPU] Emit amdgcn.if.break in the same BB as amdgcn.loop (#118081)
Before this change if.break was placed in wrong loop level which
resulted in accumulating values only from last iteration of th
[AMDGPU] Emit amdgcn.if.break in the same BB as amdgcn.loop (#118081)
Before this change if.break was placed in wrong loop level which
resulted in accumulating values only from last iteration of the inner
loop.
show more ...
|
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
6548b635 |
| 09-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
|
#
ca33649a |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both hip and openmp buildbots.
|
#
e215a1e2 |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
b1bcb7ca |
| 15-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.
Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive.
show more ...
|
#
adaff46d |
| 15-Jul-2024 |
dyung <douglas.yung@sony.com> |
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.
The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614
These bots have been broken for a day, so reverting to get everything
back to green.
show more ...
|
#
78bc1b64 |
| 14-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
9e9907f1 |
| 17-Jan-2024 |
Fangrui Song <i@maskray.me> |
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.
This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:
```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4 |
|
#
e39f6c18 |
| 25-Oct-2023 |
Alex Richardson <alexrichardson@google.com> |
[opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. Thi
[opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file.
One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
show more ...
|
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
bdf2fbba |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[AMDGPU] Convert some tests to opaque pointers (NFC)
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3 |
|
#
40e9284f |
| 22-Aug-2022 |
Ruiling Song <ruiling.song@amd.com> |
StructurizeCFG: prefer reduced number of live values
The instruction simplification will try to simplify the affected phis. In some cases, this might extend the liveness of values. For example:
B
StructurizeCFG: prefer reduced number of live values
The instruction simplification will try to simplify the affected phis. In some cases, this might extend the liveness of values. For example:
BB0: | \ | BB1 | / BB2:phi (BB0, v), (BB1, undef)
The phi in BB2 will be simplified to v as v dominates BB2, but this is increasing the number of active values in BB1. By setting CanUseUndef to false, we will not simplify the phi in this way, this would help register pressure. This is mandatory for the later change to help reducing VGPR pressure for AMDGPU.
Reviewed by: foad, sameerds
Differential Revision: https://reviews.llvm.org/D132449
show more ...
|
Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
#
716ca2e3 |
| 20-Jul-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Pre-sink IR input for some tests
Edit the IR input for some codegen tests to simulate what the IR code sinking pass would do to it. This makes the tests immune to the presence or absence of
[AMDGPU] Pre-sink IR input for some tests
Edit the IR input for some codegen tests to simulate what the IR code sinking pass would do to it. This makes the tests immune to the presence or absence of the code sinking pass in the codegen pass pipeline, which does not belong there.
Differential Revision: https://reviews.llvm.org/D130169
show more ...
|
#
c945d88d |
| 14-Jul-2022 |
Brendon Cahoon <brendon.cahoon@amd.com> |
Revert "[StructurizeCFG] Improve basic block ordering"
This reverts commit f1b05a0a2bbbea160002be709f8a1c59de366761.
Need to revert to due to issues identified with testing. The transformation is i
Revert "[StructurizeCFG] Improve basic block ordering"
This reverts commit f1b05a0a2bbbea160002be709f8a1c59de366761.
Need to revert to due to issues identified with testing. The transformation is incorrect for blocks that contain convergent instructions.
show more ...
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
f1b05a0a |
| 06-Apr-2022 |
Brendon Cahoon <brendon.cahoon@amd.com> |
[StructurizeCFG] Improve basic block ordering
StructurizeCFG linearizes the successors of branching basic block by adding Flow blocks to record the true/false path for branches and back edges. This
[StructurizeCFG] Improve basic block ordering
StructurizeCFG linearizes the successors of branching basic block by adding Flow blocks to record the true/false path for branches and back edges. This patch reduces the number of Phi values needed to capture the control flow path by improving the basic block ordering.
Previously, StructurizeCFG adds loop exit blocks outside of the loop. StructurizeCFG sets a boolean value to indicate the path taken, and all exit block live values extend to after the loop. For loops with a large number of exits blocks, this creates a huge number of values that are maintained, which increases compilation time and register pressure. This is problem especially with ASAN, which adds early exits to blocks with unreachable instructions for each instrumented check in the loop.
In specific cases, this patch reduces the number of values needed after the loop by moving the exit block into the loop. This is done for blocks that have a single predecessor and single successor by moving the block to appear just after the predecessor.
Differential Revision: https://reviews.llvm.org/D123231
show more ...
|
#
24e16e4a |
| 27-May-2022 |
Serguei Katkov <serguei.katkov@azul.com> |
[SSAUpdaterImpl] Do not generate phi node with all the same incoming values
If all available vals to basic block are the same - do not build new phi node and just use this value.
Reviewed By: samee
[SSAUpdaterImpl] Do not generate phi node with all the same incoming values
If all available vals to basic block are the same - do not build new phi node and just use this value.
Reviewed By: sameerds Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D126525
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
#
565af157 |
| 25-Feb-2022 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Extend pre-emit peephole to redundantly masked VCC
Extend pre-emit peephole for S_CBRANCH_VCC[N]Z to eliminate redundant S_AND operations against EXEC for V_CMP results in VCC. These occur
[AMDGPU] Extend pre-emit peephole to redundantly masked VCC
Extend pre-emit peephole for S_CBRANCH_VCC[N]Z to eliminate redundant S_AND operations against EXEC for V_CMP results in VCC. These occur after after register allocation when VCC has been selected as the comparison destination.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D120202
show more ...
|
Revision tags: llvmorg-14.0.0-rc1 |
|
#
b9cf52bc |
| 03-Feb-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitLoadInst
Always set uniform metadata on the pointer if it is an instruction, but otherwise do not bother to create a trivial getelementptr instruc
[AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitLoadInst
Always set uniform metadata on the pointer if it is an instruction, but otherwise do not bother to create a trivial getelementptr instruction, because AMDGPUInstrInfo::isUniformMMO can already detect that various non-instruction pointers are uniform.
Most of the test case churn is from tests that used undef as a pointer, which AMDGPUInstrInfo::isUniformMMO treats as uniform.
Differential Revision: https://reviews.llvm.org/D118909
show more ...
|
Revision tags: llvmorg-15-init |
|
#
d2e5d351 |
| 31-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[StructurizeCFG] Clean up some boolean not instructions
In some cases StructurizeCFG inserts i1 xor instructions to invert predicates. Add a quick loop to clean these up afterwards if we can get awa
[StructurizeCFG] Clean up some boolean not instructions
In some cases StructurizeCFG inserts i1 xor instructions to invert predicates. Add a quick loop to clean these up afterwards if we can get away with modifying an existing compare instruction instead. (StructurizeCFG is generally run late in the pipeline so instcombine does not clean them up for us.)
Differential Revision: https://reviews.llvm.org/D118623
show more ...
|
#
8faad296 |
| 31-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
Revert "[Local] invertCondition: try modifying an existing ICmpInst"
This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016.
Apparently it is not safe to modify the condition even if it passe
Revert "[Local] invertCondition: try modifying an existing ICmpInst"
This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016.
Apparently it is not safe to modify the condition even if it passes the hasOneUse test, because StructurizeCFG might have other references to the condition that are not manifest in the IR use-def chains.
show more ...
|
#
a6b54dda |
| 28-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[Local] invertCondition: try modifying an existing ICmpInst
This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pi
[Local] invertCondition: try modifying an existing ICmpInst
This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pipeline, instcombine does not clean up the xor-of-cmp pattern.
Differential Revision: https://reviews.llvm.org/D118478
show more ...
|
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
#
18f93512 |
| 19-Nov-2021 |
RamNalamothu <VenkataRamanaiah.Nalamothu@amd.com> |
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slow
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly.
Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly.
Fixes: SWDEV-312223
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D114273
show more ...
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
#
2f499b9a |
| 19-Dec-2020 |
Tony <Tony.Tye@amd.com> |
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant cache
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant caches will be bypassed.
A volatile atomic is not changed and still only bypasses caches upto the level specified by the SyncScope operand.
Differential Revision: https://reviews.llvm.org/D94214
show more ...
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
#
7ecf1969 |
| 17-Nov-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways:
1. Fix the case where the def
[AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways:
1. Fix the case where the def of vcc was in a previous basic block, by pessimistically assuming that vccz might be incorrect at a basic block boundary.
2. Fix the handling of pre-existing waitcnt instructions by calling generateWaitcntInstBefore before examining ScoreBrackets to determine whether there's an outstanding smem read operation.
Differential Revision: https://reviews.llvm.org/D91636
show more ...
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
#
d8f651d3 |
| 09-Jun-2020 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
[AMDGPU] Enable structurizer workarounds by default
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D81211
|