Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 6548b635 09-Nov-2024 Shilei Tian <i@tianshilei.me>

Reapply "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"

This reverts commit ca33649abe5fad93c57afef54e43ed9b3249cd86.


# ca33649a 08-Nov-2024 Shilei Tian <i@tianshilei.me>

Revert "[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)"

This reverts commit e215a1e27d84adad2635a52393621eb4fa439dc9 as it broke both
hip and openmp buildbots.


# e215a1e2 08-Nov-2024 Shilei Tian <i@tianshilei.me>

[AMDGPU] Still set up the two SGPRs for queue ptr even it is COV5 (#112403)


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0
# 86627149 04-Sep-2024 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)

Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on a

[AMDGPU] Mitigate GFX12 VALU read SGPR hazard (#100067)

Any SGPR read by a VALU can potentially obscure SALU writes to the same
register.
Insert s_wait_alu instructions to mitigate the hazard on affected paths.

Compute a global cache of SGPRs with any VALU reads and use this to
avoid inserting mitigation for SGPRs never accessed by VALUs.

To avoid excessive search when compile time is priority implement
secondary mode where all SALU writes are mitigated.

Co-authored-by: Shilei Tian <shilei.tian@amd.com>

show more ...


Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5
# 0b21b25e 01-May-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Do not optimize away pre-existing waitcnt instructions at -O0 (#90716)

The autogenerated memory legalizer tests use -O0 so this allows us to
see the exact waitcnts that were inserted by th

[AMDGPU] Do not optimize away pre-existing waitcnt instructions at -O0 (#90716)

The autogenerated memory legalizer tests use -O0 so this allows us to
see the exact waitcnts that were inserted by the memory legalizer
without them being optimized away.

show more ...


Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1
# 982e9022 04-Mar-2024 Mirko Brkušanin <Mirko.Brkusanin@amd.com>

[AMDGPU] Add GFX12 memory legalizer tests (#83814)


Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# ac296b69 22-Jan-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU] Drop verify from SIMemoryLegalizer tests (#78697)

SIMemoryLegalizer tests were slow, with most of them taking 4.5 to 5.3s
to complete and that's on a fast machine. I also recall seeing the

[AMDGPU] Drop verify from SIMemoryLegalizer tests (#78697)

SIMemoryLegalizer tests were slow, with most of them taking 4.5 to 5.3s
to complete and that's on a fast machine. I also recall seeing them in
the slowest tests list on build bots.

This removes the verify-machineinstrs option from these tests to speed
them up, bringing the slowest test down to +-2s.
Verifier still runs in EXPENSIVE_CHECKS builds.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 8dce4333 05-Dec-2022 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Bulk update memory legalizer tests to use opaque pointers


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1
# 4c4db816 30-Jul-2022 Carl Ritson <carl.ritson@amd.com>

[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions

Apply merging to s_load as is done for s_buffer_load.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130742


Revision tags: llvmorg-16-init
# d1af09ad 23-Jun-2022 Joe Nash <Joseph.Nash@amd.com>

[AMDGPU] gfx11 Generate VOPD Instructions

We form VOPD instructions in the GCNCreateVOPD pass by combining
back-to-back component instructions. There are strict register
constraints for creating a

[AMDGPU] gfx11 Generate VOPD Instructions

We form VOPD instructions in the GCNCreateVOPD pass by combining
back-to-back component instructions. There are strict register
constraints for creating a legal VOPD, namely that the matching operands
(e.g. src0x and src0y, src1x and src1y) must be in different register
banks. We add a PostRA scheduler
mutation to put possible VOPD components back-to-back.

Depends on D128442, D128270

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D128656

show more ...


Revision tags: llvmorg-14.0.6, llvmorg-14.0.5
# a3fc8adb 09-Jun-2022 Jay Foad <jay.foad@amd.com>

[AMDGPU] Add GFX11 test coverage for the memory legalizer


Revision tags: llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3
# 47bac63d 08-Mar-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] gfx940 memory model

Differential Revision: https://reviews.llvm.org/D121242


Revision tags: llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# 89c447e4 16-Jan-2022 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal

This was inheriting the mesa behavior, and as far as I know nobody is
using opencl kernels with amdpal. The isMesaKernel check was

AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal

This was inheriting the mesa behavior, and as far as I know nobody is
using opencl kernels with amdpal. The isMesaKernel check was
irrelevant because this property needs to be held for all functions.

show more ...


Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# da067ed5 10-Nov-2021 Austin Kerbow <Austin.Kerbow@amd.com>

[AMDGPU] Set most sched model resource's BufferSize to one

Using a BufferSize of one for memory ProcResources will result in better
ILP since it more accurately models the dependencies between memor

[AMDGPU] Set most sched model resource's BufferSize to one

Using a BufferSize of one for memory ProcResources will result in better
ILP since it more accurately models the dependencies between memory ops
and their consumers on an in-order processor. After this change, the
scheduler will treat the data edges from loads as blocking so that
stalls are guaranteed when waiting for data to be retreaved from memory.
Since we don't actually track waitcnt here, this should do a better job
at modeling their behavior.

Practically, this means that the scheduler will trigger the 'STALL'
heuristic more often.

This type of change needs to be evaluated experimentally. Preliminary
results are positive.

Fixes: SWDEV-282962

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D114777

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# 53eb4691 23-Jul-2021 Tony Tye <Tony.Tye@amd.com>

[AMDGPU] Support non-strictly stronger memory orderings in SIMemoryLegalizer

C++20 no longer requires the failure memory ordering to be no stronger than the
success memory ordering. Adjust assert in

[AMDGPU] Support non-strictly stronger memory orderings in SIMemoryLegalizer

C++20 no longer requires the failure memory ordering to be no stronger than the
success memory ordering. Adjust assert in AMD GPU SIMemoryLegalizer, and merge
instruction memory orderings

Add common operation to merge memory orders that allows non strict memory
orderings to be combined. Use it in SIMemoryLegalizer and
MachineMemOperand::getMergedOrdering.

Reviewed By: efriedma, rampitec

Differential Revision: https://reviews.llvm.org/D106729

show more ...


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2
# a8d9d507 17-Feb-2021 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] gfx90a support

Differential Revision: https://reviews.llvm.org/D96906


Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 61177943 24-Dec-2020 Praveen Velliengiri <Praveen.Velliengiri@amd.com>

[AMDGPU] Use MUBUF instructions for global address space access

Currently, the compiler crashes in instruction selection of global
load/stores in gfx600 due to the lack of FLAT instructions. This pa

[AMDGPU] Use MUBUF instructions for global address space access

Currently, the compiler crashes in instruction selection of global
load/stores in gfx600 due to the lack of FLAT instructions. This patch
fix the crash by selecting MUBUF instructions for global load/stores
in gfx600.

Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com>

Reviewed by: t-tye

Differential revision: https://reviews.llvm.org/D92483

show more ...


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2
# f6b9afae 30-Nov-2020 Scott Linder <Scott.Linder@amd.com>

[AMDGPU] Extend and reorganize memory legalizer tests

* Rename some tests to try to make a convention (where all components
are optional) of:

<addrspace>_<syncscope>_<memory-orders>_<operatio

[AMDGPU] Extend and reorganize memory legalizer tests

* Rename some tests to try to make a convention (where all components
are optional) of:

<addrspace>_<syncscope>_<memory-orders>_<operation>

* Split up at a level of granularity appropriate for the different RUN
lines (i.e. split on addrspace so GFX6 can avoid FLAT) and that makes
running a specific test reasonable in terms of wall time taken. This
also means when run as part of the test suite the testing is not one
serial bottleneck.

* Auto-generate check lines with `update_llc_test_checks.py` to make
future maintenance more tractable.

Reviewed By: rampitec, t-tye

Differential Revision: https://reviews.llvm.org/D91545

show more ...