Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
6548b635 |
| 09-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
|
#
ca33649a |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both hip and openmp buildbots.
|
#
e215a1e2 |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
b1bcb7ca |
| 15-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.
Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive.
show more ...
|
#
adaff46d |
| 15-Jul-2024 |
dyung <douglas.yung@sony.com> |
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.
The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614
These bots have been broken for a day, so reverting to get everything
back to green.
show more ...
|
#
78bc1b64 |
| 14-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
9e9907f1 |
| 17-Jan-2024 |
Fangrui Song <i@maskray.me> |
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while
[AMDGPU,test] Change llc -march= to -mtriple= (#75982)
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.
This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:
```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6 |
|
#
5b657f50 |
| 08-Jun-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move LICM after AMDGPUCodeGenPrepare
The commit that added the run says it's to hoist uniform parts of integer division expansion. That expansion is performed later, so this didn't do anythi
AMDGPU: Move LICM after AMDGPUCodeGenPrepare
The commit that added the run says it's to hoist uniform parts of integer division expansion. That expansion is performed later, so this didn't do anything in that case. Move this later so the original test shows the improvement.
This also saves a run of "Canonicalize natural loops". Not sure why this appears to be still getting a separate loop PM run. Also feels a bit heavy to run this just for divide. Is there a way to specifically hoist the divide sequence when it expands?
show more ...
|
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
bdf2fbba |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[AMDGPU] Convert some tests to opaque pointers (NFC)
|
#
5ecd3632 |
| 05-Dec-2022 |
Jonas Paulsson <paulsson@linux.vnet.ibm.com> |
Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
This reverts commit 122efef8ee9be57055d204d52c38700fe933c033.
- Patch fixed to not reuse definitions from predecessors in
Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
This reverts commit 122efef8ee9be57055d204d52c38700fe933c033.
- Patch fixed to not reuse definitions from predecessors in EH landing pads. - Late review suggestions (by MaskRay) have been addressed. - M68k/pipeline.ll test updated. - Init captures added in processBlock() to avoid capturing structured bindings. - RISCV has this disabled for now.
Original commit message:
A new pass MachineLateInstrsCleanup is added to be run after PEI.
This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde().
This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'.
This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet.
Differential Revision: https://reviews.llvm.org/D123394
Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
show more ...
|
#
122efef8 |
| 04-Dec-2022 |
Jonas Paulsson <paulsson@linux.vnet.ibm.com> |
Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions.""
This reverts commit 17db0de330f943833296ae72e26fa988bba39cb3.
Some more bots got broken - need to investigate.
|
#
17db0de3 |
| 01-Dec-2022 |
Jonas Paulsson <paulsson@linux.vnet.ibm.com> |
Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
Init captures added in processBlock() to avoid capturing structured bindings, which caused the build problems (with clang)
Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
Init captures added in processBlock() to avoid capturing structured bindings, which caused the build problems (with clang).
RISCV has this disabled for now until problems relating to post RA pseudo expansions are resolved.
show more ...
|
#
8ef46326 |
| 01-Dec-2022 |
Jonas Paulsson <paulsson@linux.vnet.ibm.com> |
Revert "[CodeGen] Add new pass for late cleanup of redundant definitions."
Temporarily revert and fix buildbot failure.
This reverts commit 6d12599fd4134c1da63198c74a25490d28c733f6.
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
#
6d12599f |
| 06-Apr-2022 |
Jonas Paulsson <paulsson@linux.vnet.ibm.com> |
[CodeGen] Add new pass for late cleanup of redundant definitions.
A new pass MachineLateInstrsCleanup is added to be run after PEI.
This is a simple pass that removes redundant and identical instru
[CodeGen] Add new pass for late cleanup of redundant definitions.
A new pass MachineLateInstrsCleanup is added to be run after PEI.
This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde().
This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'.
This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet.
Differential Revision: https://reviews.llvm.org/D123394
Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
show more ...
|
#
6b852ffa |
| 18-Nov-2022 |
Fangrui Song <i@maskray.me> |
[Sink] Process basic blocks with a single successor
This condition seems unnecessary.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D93511
|
#
f59205ae |
| 09-Oct-2022 |
Brendon Cahoon <brendon.cahoon@amd.com> |
[BasicBlockUtils] Add a new way for CreateControlFlowHub()
The existing way of creating the predicate in the guard blocks uses a boolean value per outgoing block. This increases the number of live b
[BasicBlockUtils] Add a new way for CreateControlFlowHub()
The existing way of creating the predicate in the guard blocks uses a boolean value per outgoing block. This increases the number of live booleans as the number of outgoing blocks increases. The new way added in this change is to store one integer to represent the outgoing block we want to branch to, then at each guard block, an integer equality check is performed to decide which a specific outgoing block is taken.
Using an integer reduces the number of live values and decreases register pressure especially in cases where there are a large number of outgoing blocks. The integer based approach is used when the number of outgoing blocks crosses a threshold, which is currently set to 32.
Patch by Ruiling Song.
Differential review: https://reviews.llvm.org/D127831
show more ...
|
#
40e9284f |
| 22-Aug-2022 |
Ruiling Song <ruiling.song@amd.com> |
StructurizeCFG: prefer reduced number of live values
The instruction simplification will try to simplify the affected phis. In some cases, this might extend the liveness of values. For example:
B
StructurizeCFG: prefer reduced number of live values
The instruction simplification will try to simplify the affected phis. In some cases, this might extend the liveness of values. For example:
BB0: | \ | BB1 | / BB2:phi (BB0, v), (BB1, undef)
The phi in BB2 will be simplified to v as v dominates BB2, but this is increasing the number of active values in BB1. By setting CanUseUndef to false, we will not simplify the phi in this way, this would help register pressure. This is mandatory for the later change to help reducing VGPR pressure for AMDGPU.
Reviewed by: foad, sameerds
Differential Revision: https://reviews.llvm.org/D132449
show more ...
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1 |
|
#
b9cf52bc |
| 03-Feb-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitLoadInst
Always set uniform metadata on the pointer if it is an instruction, but otherwise do not bother to create a trivial getelementptr instruc
[AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitLoadInst
Always set uniform metadata on the pointer if it is an instruction, but otherwise do not bother to create a trivial getelementptr instruction, because AMDGPUInstrInfo::isUniformMMO can already detect that various non-instruction pointers are uniform.
Most of the test case churn is from tests that used undef as a pointer, which AMDGPUInstrInfo::isUniformMMO treats as uniform.
Differential Revision: https://reviews.llvm.org/D118909
show more ...
|
Revision tags: llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
#
da067ed5 |
| 10-Nov-2021 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Set most sched model resource's BufferSize to one
Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memor
[AMDGPU] Set most sched model resource's BufferSize to one
Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memory ops and their consumers on an in-order processor. After this change, the scheduler will treat the data edges from loads as blocking so that stalls are guaranteed when waiting for data to be retreaved from memory. Since we don't actually track waitcnt here, this should do a better job at modeling their behavior.
Practically, this means that the scheduler will trigger the 'STALL' heuristic more often.
This type of change needs to be evaluated experimentally. Preliminary results are positive.
Fixes: SWDEV-282962
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D114777
show more ...
|
#
18f93512 |
| 19-Nov-2021 |
RamNalamothu <VenkataRamanaiah.Nalamothu@amd.com> |
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slow
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly.
Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly.
Fixes: SWDEV-312223
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D114273
show more ...
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
#
ed0f4415 |
| 15-Jul-2021 |
alex-t <alexander.timofeev@amd.com> |
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode dive
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D106079
show more ...
|
#
d719f1c3 |
| 03-Aug-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add alloc priority to global ranges
The requested register class priorities weren't respected globally. Not sure why this is a target option, and not just the expected behavior (recently add
AMDGPU: Add alloc priority to global ranges
The requested register class priorities weren't respected globally. Not sure why this is a target option, and not just the expected behavior (recently added in 1a6dc92be7d68611077f0fb0b723b361817c950c). This avoids an allocation failure when many wide tuple spills are introduced. I think this is a workaround since I would not expect the allocation priority to be required, and only a performance hint. The allocator should be smarter about when only a subregister needs to be spilled and restored.
This does regress a couple of degenerate store stress lit tests which shouldn't be too important.
show more ...
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
#
2f499b9a |
| 19-Dec-2020 |
Tony <Tony.Tye@amd.com> |
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant cache
[AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant caches will be bypassed.
A volatile atomic is not changed and still only bypasses caches upto the level specified by the SyncScope operand.
Differential Revision: https://reviews.llvm.org/D94214
show more ...
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
#
7ecf1969 |
| 17-Nov-2020 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways:
1. Fix the case where the def
[AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways:
1. Fix the case where the def of vcc was in a previous basic block, by pessimistically assuming that vccz might be incorrect at a basic block boundary.
2. Fix the handling of pre-existing waitcnt instructions by calling generateWaitcntInstBefore before examining ScoreBrackets to determine whether there's an outstanding smem read operation.
Differential Revision: https://reviews.llvm.org/D91636
show more ...
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3 |
|
#
f2284e34 |
| 31-Aug-2020 |
Fangrui Song <i@maskray.me> |
[Sink] Optimize/simplify sink candidate finding with nearest common dominator
For an instruction in the basic block BB, SinkingPass enumerates basic blocks dominated by BB and BB's successors. For e
[Sink] Optimize/simplify sink candidate finding with nearest common dominator
For an instruction in the basic block BB, SinkingPass enumerates basic blocks dominated by BB and BB's successors. For each enumerated basic block, SinkingPass uses `AllUsesDominatedByBlock` to check whether the basic block dominates all of the instruction's users. This is inefficient.
Use the nearest common dominator of all users to avoid enumerating the candidate. The nearest common dominator may be in a parent loop which is not beneficial. In that case, find the ancestors in the dominator tree.
In the case that the instruction has no user, with this change we will not perform unnecessary move. This causes some amdgpu test changes.
A stage-2 x86-64 clang is a byte identical with this change.
show more ...
|