|
Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
| #
d3508ccd |
| 19-Dec-2024 |
Konstantina Mitropoulou <44334539+kmitropoulou@users.noreply.github.com> |
[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (#120588)
- **[AMDGPU] Add new test.**
- **[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions.**
---------
Co-authored-by: Ko
[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (#120588)
- **[AMDGPU] Add new test.**
- **[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions.**
---------
Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>
show more ...
|
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
| #
6548b635 |
| 09-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
|
| #
ca33649a |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both hip and openmp buildbots.
|
| #
e215a1e2 |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)
|
|
Revision tags: llvmorg-19.1.3 |
|
| #
3277c7cd |
| 21-Oct-2024 |
Stanislav Mekhanoshin <rampitec@users.noreply.github.com> |
[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765)
MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for
[AMDGPU] Skip VGPR deallocation for waveslot limited kernels (#112765)
MSG_DEALLOC_VGPRS slows down very small waveslot limited kernels. It's
been identified this message is only really needed for VGPR limited
kernels. A kernel becomes VGPR limited if a total number of VGPRs per
SIMD / number of used VGPRs is more than a number of wave slots.
show more ...
|
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1 |
|
| #
d147b6d5 |
| 22-Sep-2024 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Add hazard workarounds to insertIndirectBranch (#109127)
BranchRelaxation runs after the hazard recognizer, so workarounds for
SGPR accesses need to be applied directly inline to the code
[AMDGPU] Add hazard workarounds to insertIndirectBranch (#109127)
BranchRelaxation runs after the hazard recognizer, so workarounds for
SGPR accesses need to be applied directly inline to the code it
generates.
show more ...
|
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
| #
b1bcb7ca |
| 15-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.
Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive.
show more ...
|
| #
adaff46d |
| 15-Jul-2024 |
dyung <douglas.yung@sony.com> |
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.
The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614
These bots have been broken for a day, so reverting to get everything
back to green.
show more ...
|
| #
78bc1b64 |
| 14-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
show more ...
|
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5 |
|
| #
521ac12a |
| 06-Nov-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Remove AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough (#71407)
The special handling for blocks ending with a long branch has been
unnecessary since D106445:
"[amdgpu] Add 64-bit PC s
[AMDGPU] Remove AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough (#71407)
The special handling for blocks ending with a long branch has been
unnecessary since D106445:
"[amdgpu] Add 64-bit PC support when expanding unconditional branches."
show more ...
|
|
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0 |
|
| #
806761a7 |
| 11-Sep-2023 |
Fangrui Song <i@maskray.me> |
[test] Change llc -march= to -mtriple=
The issue is uncovered by #47698: for IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture p
[test] Change llc -march= to -mtriple=
The issue is uncovered by #47698: for IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. riscv64-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly.
show more ...
|
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
| #
853b2a84 |
| 29-Jun-2023 |
Brendon Cahoon <brendon.cahoon@amd.com> |
[AMDGPU] Reserve SGPR pair when long branches are present
Branch relaxation requires 2 additional SGPRs for AMDGPU to handle the case when an indirect branch target is too far away. The register sca
[AMDGPU] Reserve SGPR pair when long branches are present
Branch relaxation requires 2 additional SGPRs for AMDGPU to handle the case when an indirect branch target is too far away. The register scavanger may not find available registers, which causes a “did not find scavenging index” assert to occur in assignRegToScavengingIndex.
In this patch, we estimate before register allocation whether an indirect branch is likely to be needed, and reserve 2 SGPRs if the branch distance is found to be above a threshold. The distance threshold is an approximation as the exact code size and branch distance are unknown prior to register allocation.
Patch by Corbin Robeck. Thanks!
Differential Review: https://reviews.llvm.org/D149775
show more ...
|
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4 |
|
| #
15d5c592 |
| 01-Mar-2023 |
Zhongyunde <zhongyunde@huawei.com> |
[InstCombine] Improvement the analytics through the dominating condition
Address the dominating condition, the urem fold is benefit from the analytics improvements. Fix https://github.com/llvm/llvm-
[InstCombine] Improvement the analytics through the dominating condition
Address the dominating condition, the urem fold is benefit from the analytics improvements. Fix https://github.com/llvm/llvm-project/issues/60546
NOTE: delete the calls in simplifyBinaryIntrinsic and foldICmpWithDominatingICmp is used to reduce compile time.
Reviewed By: nikic, arsenm, erikdesjardins Differential Revision: https://reviews.llvm.org/D144248
show more ...
|
| #
d514726d |
| 27-Feb-2023 |
zhongyunde <zhongyunde@huawei.com> |
[AMDGPU] Update the CHECK autogenerated as it's expired
Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144771
|
|
Revision tags: llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
4bbcbdae |
| 04-Jan-2023 |
Anshil Gandhi <gandhi21299@gmail.com> |
[AMDGPU] Unify divergent nodes if the PostDom tree has one root
This patch allows AMDGPUUnifyDivergenceExitNodes pass to transform a function whose PDT has exactly one root and ends in a branch inst
[AMDGPU] Unify divergent nodes if the PostDom tree has one root
This patch allows AMDGPUUnifyDivergenceExitNodes pass to transform a function whose PDT has exactly one root and ends in a branch instruction. Fixes https://github.com/llvm/llvm-project/issues/58861.
Reviewed By: ruiling, arsenm
Differential Revision: https://reviews.llvm.org/D139780
show more ...
|
| #
bdf2fbba |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[AMDGPU] Convert some tests to opaque pointers (NFC)
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
716ca2e3 |
| 20-Jul-2022 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Pre-sink IR input for some tests
Edit the IR input for some codegen tests to simulate what the IR code sinking pass would do to it. This makes the tests immune to the presence or absence of
[AMDGPU] Pre-sink IR input for some tests
Edit the IR input for some codegen tests to simulate what the IR code sinking pass would do to it. This makes the tests immune to the presence or absence of the code sinking pass in the codegen pass pipeline, which does not belong there.
Differential Revision: https://reviews.llvm.org/D130169
show more ...
|
| #
fd64a857 |
| 29-Jun-2022 |
Thomas Symalla <thomas.symalla@amd.com> |
[AMDGPU] Combine s_or_saveexec, s_xor instructions.
This patch merges a consecutive sequence of
s_or_saveexec s_o, s_i s_xor exec, exec, s_o
into a single
s_andn2_saveexec s_o, s_i instruction. T
[AMDGPU] Combine s_or_saveexec, s_xor instructions.
This patch merges a consecutive sequence of
s_or_saveexec s_o, s_i s_xor exec, exec, s_o
into a single
s_andn2_saveexec s_o, s_i instruction. This patch also cleans up the SIOptimizeExecMasking pass a bit.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D129073
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3 |
|
| #
f510045d |
| 14-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.
Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].
Differential Revision: https://reviews.llvm.org/D117298
|
| #
d2e5d351 |
| 31-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[StructurizeCFG] Clean up some boolean not instructions
In some cases StructurizeCFG inserts i1 xor instructions to invert predicates. Add a quick loop to clean these up afterwards if we can get awa
[StructurizeCFG] Clean up some boolean not instructions
In some cases StructurizeCFG inserts i1 xor instructions to invert predicates. Add a quick loop to clean these up afterwards if we can get away with modifying an existing compare instruction instead. (StructurizeCFG is generally run late in the pipeline so instcombine does not clean them up for us.)
Differential Revision: https://reviews.llvm.org/D118623
show more ...
|
| #
8faad296 |
| 31-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
Revert "[Local] invertCondition: try modifying an existing ICmpInst"
This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016.
Apparently it is not safe to modify the condition even if it passe
Revert "[Local] invertCondition: try modifying an existing ICmpInst"
This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016.
Apparently it is not safe to modify the condition even if it passes the hasOneUse test, because StructurizeCFG might have other references to the condition that are not manifest in the IR use-def chains.
show more ...
|
| #
a6b54dda |
| 28-Jan-2022 |
Jay Foad <jay.foad@amd.com> |
[Local] invertCondition: try modifying an existing ICmpInst
This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pi
[Local] invertCondition: try modifying an existing ICmpInst
This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pipeline, instcombine does not clean up the xor-of-cmp pattern.
Differential Revision: https://reviews.llvm.org/D118478
show more ...
|
|
Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
da067ed5 |
| 10-Nov-2021 |
Austin Kerbow <Austin.Kerbow@amd.com> |
[AMDGPU] Set most sched model resource's BufferSize to one
Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memor
[AMDGPU] Set most sched model resource's BufferSize to one
Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memory ops and their consumers on an in-order processor. After this change, the scheduler will treat the data edges from loads as blocking so that stalls are guaranteed when waiting for data to be retreaved from memory. Since we don't actually track waitcnt here, this should do a better job at modeling their behavior.
Practically, this means that the scheduler will trigger the 'STALL' heuristic more often.
This type of change needs to be evaluated experimentally. Preliminary results are positive.
Fixes: SWDEV-282962
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D114777
show more ...
|
| #
18f93512 |
| 19-Nov-2021 |
RamNalamothu <VenkataRamanaiah.Nalamothu@amd.com> |
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slow
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly.
Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly.
Fixes: SWDEV-312223
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D114273
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
ed0f4415 |
| 15-Jul-2021 |
alex-t <alexander.timofeev@amd.com> |
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode dive
[AMDGPU] Divergence-driven compare operations instruction selection
Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D106079
show more ...
|