Revision tags: llvmorg-21-init |
|
#
11b04019 |
| 24-Jan-2025 |
Aaditya <115080342+easyonaadit@users.noreply.github.com> |
[AMDGPU] Restore SP from saved-FP or saved-BP (#124007)
Currently, the AMDGPU backend bumps the Stack Pointer
by fixed size offsets in the prolog of device functions, and
restores it by the same
[AMDGPU] Restore SP from saved-FP or saved-BP (#124007)
Currently, the AMDGPU backend bumps the Stack Pointer
by fixed size offsets in the prolog of device functions, and
restores it by the same amount in the epilog.
Prolog:
sp += frameSize
Epilog:
sp -= frameSize
If a function has dynamic stack realignment,
Prolog:
sp += frameSize + max_alignment
Epilog:
sp -= frameSize + max_alignment
These calculations are not optimal in case of dynamic
stack realignment, and completely fail in case of
dynamic stack readjustment.
This patch uses the saved Frame Pointer to restore SP.
Prolog:
fp = sp
sp += frameSize
Epilog:
sp = fp
In case of dynamic stack realignment, SP is restored from
the saved Base Pointer.
Prolog:
fp = sp + (max_alignment - 1)
fp = fp & (-max_alignment)
bp = sp
sp += frameSize + max_alignment
Epilog:
sp = bp
(Note: The presence of BP has been enforced in case of any
dynamic stack realignment.)
---------
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
show more ...
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
69879ffa |
| 13-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Fix using illegal VOP3 literal in frame index elimination (#115747)
|
#
1bf385f1 |
| 09-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Default to selecting frame indexes to SGPRs (#115060)
Only select to a VGPR if it's trivally used in VGPR only contexts. This fixes mishandling frame indexes used in SGPR only contexts, like
AMDGPU: Default to selecting frame indexes to SGPRs (#115060)
Only select to a VGPR if it's trivally used in VGPR only contexts. This fixes mishandling frame indexes used in SGPR only contexts, like inline assembly constraints.
This is suboptimal in the common case where the frame index is transitively used by only VALU ops. We make up for this by later folding the copy to VALU plus scalar op in SIFoldOperands.
show more ...
|
#
6548b635 |
| 09-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.
|
#
ca33649a |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"
This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both hip and openmp buildbots.
|
#
e215a1e2 |
| 08-Nov-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
5f94b0cb |
| 07-Oct-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Try to reuse dest reg for s_add_i32 frame indexes (#111201)
Hack around the register scavenger doing the wrong thing.
It does not find the result register as available in the
case the fram
AMDGPU: Try to reuse dest reg for s_add_i32 frame indexes (#111201)
Hack around the register scavenger doing the wrong thing.
It does not find the result register as available in the
case the frame index add isn't also reading the dest register.
This is the quick fix for a regression where the scavenge would
create a broken spill of SGPR to memory. I believe this is still
broken for cases we cannot use the result register.
I'm confused about what position the scavenger iterator
is supposed to be in, and what RestoreAfter is for. The scavenger
is missing a full set of forward/backward APIs and there seems
to be an off by one somewhere.
show more ...
|
Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
#
8039886e |
| 22-Aug-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle folding frame indexes into s_add_i32 (#101694)
This does not yet enable producing direct frame index references in s_add_i32, only the lowering.
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
b1bcb7ca |
| 15-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799
Reapply "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commit adaff46d087799072438dd744b038e6fd50a2d78.
Drop the -O3 checks from default-attributes.hip. I don't know why they are different on some bots but reverting this is far too disruptive.
show more ...
|
#
adaff46d |
| 15-Jul-2024 |
dyung <douglas.yung@sony.com> |
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0
Revert "AMDGPU: Move attributor into optimization pipeline (#83131)" and follow up commit "clang/AMDGPU: Defeat attribute optimization in attribute test" (#98851)
This reverts commits 677cc15e0ff2e0e6aa30538eb187990a6a8f53c0 and
78bc1b64a6dc3fb6191355a5e1b502be8b3668e7.
The test CodeGenHIP/default-attributes.hip is failing on multiple bots
even after the attempted fix including the following:
- https://lab.llvm.org/buildbot/#/builders/3/builds/1473
- https://lab.llvm.org/buildbot/#/builders/65/builds/1380
- https://lab.llvm.org/buildbot/#/builders/161/builds/595
- https://lab.llvm.org/buildbot/#/builders/154/builds/1372
- https://lab.llvm.org/buildbot/#/builders/133/builds/1547
- https://lab.llvm.org/buildbot/#/builders/81/builds/755
- https://lab.llvm.org/buildbot/#/builders/40/builds/570
- https://lab.llvm.org/buildbot/#/builders/13/builds/748
- https://lab.llvm.org/buildbot/#/builders/12/builds/1845
- https://lab.llvm.org/buildbot/#/builders/11/builds/1695
- https://lab.llvm.org/buildbot/#/builders/190/builds/1829
- https://lab.llvm.org/buildbot/#/builders/193/builds/962
- https://lab.llvm.org/buildbot/#/builders/23/builds/991
- https://lab.llvm.org/buildbot/#/builders/144/builds/2256
- https://lab.llvm.org/buildbot/#/builders/46/builds/1614
These bots have been broken for a day, so reverting to get everything
back to green.
show more ...
|
#
78bc1b64 |
| 14-Jul-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
AMDGPU: Move attributor into optimization pipeline (#83131)
Removing it from the codegen pipeline induces a lot of test churn
because llc is no longer optimizing out implicit arguments to kernels.
Mostly mechanical, but there are some creative test updates. I preferred
to take the changes as-is in tests where the ABI isn't relevant. In
cases where it's more relevant, or the optimize out logic was too
ingrained in the test, I pre-run the optimization. Some cases manually
add attributes to disable inputs.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4 |
|
#
5aea839a |
| 16-May-2023 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Switch to backwards scavenging in eliminateFrameIndex
Frame index elimination runs backwards so we must use backwards scavenging. Otherwise, when a scavenged register is spilled, the scaven
[AMDGPU] Switch to backwards scavenging in eliminateFrameIndex
Frame index elimination runs backwards so we must use backwards scavenging. Otherwise, when a scavenged register is spilled, the scavenger will remember that the register is in use until the restore point, but it will never reach that restore point. The result is that in some cases it will keep scavenging different registers instead of reusing the same one.
Differential Revision: https://reviews.llvm.org/D152394
show more ...
|
#
8fcb4fa8 |
| 17-May-2023 |
Jay Foad <jay.foad@amd.com> |
[RegScavenger] Change scavengeRegister to pick registers in allocation order
This matches what scavengeRegisterBackwards does.
This is in preparation for converting most uses of scavengeRegister to
[RegScavenger] Change scavengeRegister to pick registers in allocation order
This matches what scavengeRegisterBackwards does.
This is in preparation for converting most uses of scavengeRegister to scavengeRegisterBackwards, to reduce test case churn when that lands and to help with bisection if anything goes wrong.
Differential Revision: https://reviews.llvm.org/D150792
show more ...
|
Revision tags: llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
#
bdf2fbba |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[AMDGPU] Convert some tests to opaque pointers (NFC)
|
Revision tags: llvmorg-15.0.6 |
|
#
32bd7571 |
| 17-Nov-2022 |
Alexander Timofeev <alexander.timofeev@amd.com> |
PEI should be able to use backward walk in replaceFrameIndicesBackward.
The backward register scavenger has correct register liveness information. PEI should leverage the backward register scavenger
PEI should be able to use backward walk in replaceFrameIndicesBackward.
The backward register scavenger has correct register liveness information. PEI should leverage the backward register scavenger.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D137574
show more ...
|
#
6b852ffa |
| 18-Nov-2022 |
Fangrui Song <i@maskray.me> |
[Sink] Process basic blocks with a single successor
This condition seems unnecessary.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D93511
|
Revision tags: llvmorg-15.0.5 |
|
#
6c7666a4 |
| 15-Nov-2022 |
Fangrui Song <i@maskray.me> |
Revert D137574 "PEI should be able to use backward walk in replaceFrameIndicesBackward."
This reverts commit e05ce03cfa0b36e9b99149e21afcb1fc039df813.
Caused asan use-after-poison to 4 DebugInfo/AM
Revert D137574 "PEI should be able to use backward walk in replaceFrameIndicesBackward."
This reverts commit e05ce03cfa0b36e9b99149e21afcb1fc039df813.
Caused asan use-after-poison to 4 DebugInfo/AMDGPU/ tests. Triggered in PEI::replaceFrameIndicesBackward called llvm::MachineInstr::getNumOperands
show more ...
|
#
e05ce03c |
| 04-Nov-2022 |
Alexander Timofeev <alexander.timofeev@amd.com> |
PEI should be able to use backward walk in replaceFrameIndicesBackward.
The backward register scavenger has correct register liveness information. PEI should leverage the backward register scavenger
PEI should be able to use backward walk in replaceFrameIndicesBackward.
The backward register scavenger has correct register liveness information. PEI should leverage the backward register scavenger.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D137574
show more ...
|
Revision tags: llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
#
a5d4f82b |
| 11-Feb-2022 |
Sebastian Neubauer <Sebastian.Neubauer@amd.com> |
[AMDGPU] Make enable-flat-scratch a subtarget feature
Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and
[AMDGPU] Make enable-flat-scratch a subtarget feature
Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and this doesn't work well with command line options.
Differential Revision: https://reviews.llvm.org/D119425
show more ...
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
#
273a0c8b |
| 04-Nov-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the em
PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the emergency stack slots placed as low as possible to ensure they are in range of load/store instruction immediate offsets. The existing logic is mostly OK, but failed if we required stack realignment.
I don't understand what the existing control isFPCloseToIncomingSP is supposed to mean, but can only be used to stop placing the scavenge slots earlier. Make this explicit so that targets can opt-in rather than opt-out only.
show more ...
|
#
18f93512 |
| 19-Nov-2021 |
RamNalamothu <VenkataRamanaiah.Nalamothu@amd.com> |
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slow
[AMDGPU] Do not generate ELF symbols for the local branch target labels
The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly.
Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly.
Fixes: SWDEV-312223
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D114273
show more ...
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
#
3ce1b963 |
| 08-Sep-2021 |
Joe Nash <Joseph.Nash@amd.com> |
[AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D1095
[AMDGPU] Switch PostRA sched to MachineSched
Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D109536
Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
show more ...
|
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
#
4359b870 |
| 14-Jul-2021 |
Sebastian Neubauer <sebastian.neubauer@amd.com> |
[AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to initialize the flat scratch hardware register.
Differential Revision: https://reviews.llvm.org
[AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to initialize the flat scratch hardware register.
Differential Revision: https://reviews.llvm.org/D105920
show more ...
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
#
96e1fcb1 |
| 07-Jun-2021 |
Sebastian Neubauer <sebastian.neubauer@amd.com> |
[AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.
Differe
[AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.
Differential Revision: https://reviews.llvm.org/D103322
show more ...
|
Revision tags: llvmorg-12.0.1-rc1 |
|
#
13c03162 |
| 03-May-2021 |
Sebastian Neubauer <sebastian.neubauer@amd.com> |
[AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets.
This is slightly more
[AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets.
This is slightly more conservative than needed, gfx9 does support negative offsets when a VGPR address is used and gfx10 supports negative, unaligned offsets when an SGPR address is used, but we do not make use of that with this patch.
Differential Revision: https://reviews.llvm.org/D101292
show more ...
|