#
f7060f4f |
| 30-Apr-2020 |
Ram Nalamothu <VenkataRamanaiah.Nalamothu@amd.com> |
For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer
Since SRSRC has alignment requirements, first find non GIT pointer clobbered registers for SRSRC and then if those registers c
For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer
Since SRSRC has alignment requirements, first find non GIT pointer clobbered registers for SRSRC and then if those registers clobber preloaded Scratch Wave Offset register, copy the Scratch Wave Offset register to a free SGPR.
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1 |
|
#
60b1967c |
| 21-Jan-2020 |
Scott Linder <Scott.Linder@amd.com> |
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us t
[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI.
As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative.
Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first.
Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention.
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75138
show more ...
|
#
db099f99 |
| 11-Mar-2020 |
Scott Linder <Scott.Linder@amd.com> |
[AMDGPU][NFC] Refactor some uses of unsigned to Register
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76035
|
Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2 |
|
#
1024b73e |
| 03-Dec-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet.
This is just a mec
AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet.
This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled
show more ...
|
Revision tags: llvmorg-9.0.1-rc1 |
|
#
db0ed3e4 |
| 01-Nov-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Refactor treatment of denormal mode
Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the sub
AMDGPU: Refactor treatment of denormal mode
Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute.
This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
show more ...
|
#
19e7f8a2 |
| 28-Oct-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add default denormal mode to MachineFunctionInfo
The default FP mode should really be a property of a specific function, and not a subtarget. Introduce the necessary fields to the SIMachineF
AMDGPU: Add default denormal mode to MachineFunctionInfo
The default FP mode should really be a property of a specific function, and not a subtarget. Introduce the necessary fields to the SIMachineFunctionInfo to help move towards this goal.
show more ...
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3 |
|
#
ff07631b |
| 27-Aug-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add amdgpu-32bit-address-high-bits to MIR serialization
llvm-svn: 370089
|
#
0eaee545 |
| 15-Aug-2019 |
Jonas Devlieghere <jonas@devlieghere.com> |
[llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of
[llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
show more ...
|
Revision tags: llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init |
|
#
937ff6e7 |
| 11-Jul-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx908 agpr spilling
Differential Revision: https://reviews.llvm.org/D64594
llvm-svn: 365833
|
#
58426a37 |
| 10-Jul-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Serialize mode from MachineFunctionInfo
llvm-svn: 365653
|
Revision tags: llvmorg-8.0.1, llvmorg-8.0.1-rc4 |
|
#
71dfb7ec |
| 08-Jul-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Make s34 the FP register
Make the FP register callee saved.
This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame r
AMDGPU: Make s34 the FP register
Make the FP register callee saved.
This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog.
If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort.
This also doesn't attempt to handle SGPR spilling with scalar stores.
llvm-svn: 365372
show more ...
|
#
80177ca5 |
| 03-Jul-2019 |
Michael Liao <michael.hliao@gmail.com> |
[AMDGPU] Enable serializing of argument info.
Summary: - Support serialization of all arguments in machine function info. This enables fabricating MIR tests depending on argument info.
Reviewers:
[AMDGPU] Enable serializing of argument info.
Summary: - Support serialization of all arguments in machine function info. This enables fabricating MIR tests depending on argument info.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64096
llvm-svn: 364995
show more ...
|
#
4dc3b2bf |
| 01-Jul-2019 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Support GDS atomics
Summary: Original patch by Marek Olšák
Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab
Reviewers: mareko, arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, y
AMDGPU: Support GDS atomics
Summary: Original patch by Marek Olšák
Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab
Reviewers: mareko, arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63452
llvm-svn: 364814
show more ...
|
#
1b317685 |
| 01-Jul-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Convert some places to Register
llvm-svn: 364769
|
Revision tags: llvmorg-8.0.1-rc3 |
|
#
4be636eb |
| 25-Jun-2019 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Removed dead SIMachineFunctionInfo::getWorkItemIDVGPR()
Differential Revision: https://reviews.llvm.org/D63780
llvm-svn: 364339
|
#
4d55d024 |
| 19-Jun-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"
This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the or
Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"
This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node.
llvm-svn: 363870
show more ...
|
#
128ce93c |
| 19-Jun-2019 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://l
Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/
llvm-svn: 363797
show more ...
|
#
8d35dcd7 |
| 18-Jun-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on SI/CI.
llvm-svn: 363678
|
#
e683eba0 |
| 17-Jun-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Cleanup custom PseudoSourceValue definitions
Use separate enums for each kind, avoid repeating overloads, and add missing classof implementation.
llvm-svn: 363558
|
Revision tags: llvmorg-8.0.1-rc2 |
|
#
b812b7a4 |
| 05-Jun-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Invert frame index offset interpretation
Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave o
AMDGPU: Invert frame index offset interpretation
Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP.
Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions.
The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate.
Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination.
Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide.
llvm-svn: 362661
show more ...
|
Revision tags: llvmorg-8.0.1-rc1 |
|
#
0a30f33c |
| 01-Apr-2019 |
Neil Henning <neil.henning@amd.com> |
[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactiv
[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes.
Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it).
This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!).
Differential Revision: https://reviews.llvm.org/D59295
llvm-svn: 357400
show more ...
|
#
055e4dce |
| 29-Mar-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdg
AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired.
Also introduce a new amdgpu-ieee attribute to match.
The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td.
Eventually the calling convention lowering will need to insert a mode switch somewhere for these.
llvm-svn: 357302
show more ...
|
Revision tags: llvmorg-8.0.0 |
|
#
bc6d07ca |
| 14-Mar-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
MIR: Allow targets to serialize MachineFunctionInfo
This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined f
MIR: Allow targets to serialize MachineFunctionInfo
This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU.
Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty.
llvm-svn: 356215
show more ...
|
Revision tags: llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3 |
|
#
aa6fb4c4 |
| 21-Feb-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove debugger related subtarget features
As far as I know these aren't needed anymore.
llvm-svn: 354634
|
Revision tags: llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|