#
361b5b21 |
| 21-Mar-2019 |
Tim Renouf <tpr.llvm@botech.co.uk> |
[AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet.
SI (gfx6) does not have dwordx3 instructions, so they are not enabled there
[AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet.
SI (gfx6) does not have dwordx3 instructions, so they are not enabled there.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58902
Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659
show more ...
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4 |
|
#
6023d599 |
| 04-Mar-2019 |
Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> |
[AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, v_readlane_b32 and v_writelane_b32
See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662
Reviewers: artem.tamazov, arsenm, rampi
[AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, v_readlane_b32 and v_writelane_b32
See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662
Reviewers: artem.tamazov, arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D58713
llvm-svn: 355312
show more ...
|
Revision tags: llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1 |
|
#
d7047276 |
| 08-Feb-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove GCN features and predicates
These are no longer necessary since the R600 tablegen files are split out now.
llvm-svn: 353548
|
Revision tags: llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|
#
76504a4c |
| 12-Dec-2018 |
Neil Henning <neil.henning@amd.com> |
[AMDGPU] Extend the SI Load/Store optimizer to combine more things.
I've extended the load/store optimizer to be able to produce dwordx3 loads and stores, This change allows many more load/stores to
[AMDGPU] Extend the SI Load/Store optimizer to combine more things.
I've extended the load/store optimizer to be able to produce dwordx3 loads and stores, This change allows many more load/stores to be combined, and results in much more optimal code for our hardware.
Differential Revision: https://reviews.llvm.org/D54042
llvm-svn: 348937
show more ...
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
#
a7b00058 |
| 30-Nov-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar
AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis.
If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane.
There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads.
Change-Id: I170e6816323beb1348677b358c9d380865cd1a19
Reviewers: arsenm, alex-t, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53283
llvm-svn: 348050
show more ...
|
#
1a94cbb3 |
| 29-Nov-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU/InsertWaitcnts: Untangle some semi-global state
Summary: Reduce the statefulness of the algorithm in two ways:
1. More clearly split generateWaitcntInstBefore into two phases: the first o
AMDGPU/InsertWaitcnts: Untangle some semi-global state
Summary: Reduce the statefulness of the algorithm in two ways:
1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets.
2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets.
To simplify these changes, a Waitcnt structure is introduced which carries the counts of an s_waitcnt instruction in decoded form.
There are some functional changes:
1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters.
2. We now properly track pre-existing waitcnt's in all cases, which leads to less conservative waitcnts being emitted in some cases.
s_load_dword ... s_waitcnt lgkmcnt(0) <-- pre-existing wait count ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is too conservative use(v0) more code use(v1)
This increases code size a bit, but the reduced latency should still be a win in basically all cases. The worst code size regressions in my shader-db are:
WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0]
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam
Differential Revision: https://reviews.llvm.org/D54226
llvm-svn: 347848
show more ...
|
#
cac749ac |
| 16-Nov-2018 |
Ron Lieberman <ronlieb.g@gmail.com> |
[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD|STORE}_* and GLOBAL_Atomic_* i
[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction.
llvm-svn: 347008
show more ...
|
#
af7b5d70 |
| 15-Nov-2018 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
AMDHSA: More code object v3 fixes:
- Make sure IsaInfo::hasCodeObjectV3 returns true only for AMDHSA - Update assembler metadata tests to use v2 by default
llvm-svn: 347001
|
#
bc233f55 |
| 07-Nov-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"
This reverts commit r344696 for now (except for some test additions).
See https://bugs.freedesktop.org/show_bug.cgi?id=
Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"
This reverts commit r344696 for now (except for some test additions).
See https://bugs.freedesktop.org/show_bug.cgi?id=108611.
llvm-svn: 346364
show more ...
|
#
108927b9 |
| 05-Nov-2018 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
AMDGPU: Add sram-ecc feature
Differential Revision: https://reviews.llvm.org/D53222
llvm-svn: 346177
|
Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1 |
|
#
c4a2ff09 |
| 17-Oct-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar
AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis.
If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane.
There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads.
Change-Id: I170e6816323beb1348677b358c9d380865cd1a19
Reviewers: arsenm, alex-t, rampitec, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D53283
llvm-svn: 344696
show more ...
|
Revision tags: llvmorg-7.0.0 |
|
#
71e43ee4 |
| 12-Sep-2018 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
AMDGPU: Re-apply r341982 after fixing the layering issue
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when determining isa version. This fix
AMDGPU: Re-apply r341982 after fixing the layering issue
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900).
llvm-svn: 342069
show more ...
|
#
95066496 |
| 12-Sep-2018 |
Ilya Biryukov <ibiryukov@google.com> |
Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser."
This reverts commit r341982.
The change introduced a layering violation. Reverting to unbreak our integrate.
Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser."
This reverts commit r341982.
The change introduced a layering violation. Reverting to unbreak our integrate.
llvm-svn: 342023
show more ...
|
#
941615e4 |
| 11-Sep-2018 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser.
Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wr
AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser.
Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900).
Differential Revision: https://reviews.llvm.org/D51890
llvm-svn: 341982
show more ...
|
Revision tags: llvmorg-7.0.0-rc3 |
|
#
0da6350d |
| 31-Aug-2018 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Remove remnants of old address space mapping
llvm-svn: 341165
|
Revision tags: llvmorg-7.0.0-rc2 |
|
#
4f703f5e |
| 21-Aug-2018 |
Tim Renouf <tpr.llvm@botech.co.uk> |
[AMDGPU] New buffer intrinsics
Summary: This commit adds new intrinsics llvm.amdgcn.raw.buffer.load llvm.amdgcn.raw.buffer.load.format llvm.amdgcn.raw.buffer.load.format.d16 llvm.amdgcn.stru
[AMDGPU] New buffer intrinsics
Summary: This commit adds new intrinsics llvm.amdgcn.raw.buffer.load llvm.amdgcn.raw.buffer.load.format llvm.amdgcn.raw.buffer.load.format.d16 llvm.amdgcn.struct.buffer.load llvm.amdgcn.struct.buffer.load.format llvm.amdgcn.struct.buffer.load.format.d16 llvm.amdgcn.raw.buffer.store llvm.amdgcn.raw.buffer.store.format llvm.amdgcn.raw.buffer.store.format.d16 llvm.amdgcn.struct.buffer.store llvm.amdgcn.struct.buffer.store.format llvm.amdgcn.struct.buffer.store.format.d16 llvm.amdgcn.raw.buffer.atomic.* llvm.amdgcn.struct.buffer.atomic.*
with the following changes from the llvm.amdgcn.buffer.* intrinsics:
* there are separate raw and struct versions: raw does not have an index arg and sets idxen=0 in the instruction, and struct always sets idxen=1 in the instruction even if the index is 0, to allow for the fact that gfx9 does bounds checking differently depending on whether idxen is set;
* there is a combined cachepolicy arg (glc+slc)
* there are now only two offset args: one for the offset that is included in bounds checking and swizzling, to be split between the instruction's voffset and immoffset fields, and one for the offset that is excluded from bounds checking and swizzling, to go into the instruction's soffset field.
The AMDISD::BUFFER_* SD nodes always have an index operand, all three offset operands, combined cachepolicy operand, and an extra idxen operand.
The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50306
Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205 llvm-svn: 340269
show more ...
|
Revision tags: llvmorg-7.0.0-rc1 |
|
#
894c8fd0 |
| 01-Aug-2018 |
Ryan Taylor <rtayl@amd.com> |
[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero
Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to
[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero
Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ.
Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71
Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49483
llvm-svn: 338523
show more ...
|
#
c5a154db |
| 28-Jun-2018 |
Tom Stellard <tstellar@redhat.com> |
AMDGPU: Separate R600 and GCN TableGen files
Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, regis
AMDGPU: Separate R600 and GCN TableGen files
Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc.
Reviewers: arsenm, nhaehnle, jvesely
Reviewed By: arsenm
Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46365
llvm-svn: 335942
show more ...
|
#
1e8c2c70 |
| 21-Jun-2018 |
Scott Linder <scott@scottlinder.com> |
[AMDGPU] Update assembler for HSA Code Object v3
Update AMDGPU assembler syntax behind the code-object-v3 feature:
* Replace/rename most AMDGPU assembler directives/symbols and document them. * Pro
[AMDGPU] Update assembler for HSA Code Object v3
Update AMDGPU assembler syntax behind the code-object-v3 feature:
* Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes.
Differential Revision: https://reviews.llvm.org/D47736
llvm-svn: 335281
show more ...
|
#
7a9c03f4 |
| 21-Jun-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Select MIMG instructions manually in SITargetLowering
Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing
AMDGPU: Select MIMG instructions manually in SITargetLowering
Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually.
Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand.
This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage.
Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6
Reviewers: arsenm, rampitec, rtaylor, tstellar
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48017
llvm-svn: 335228
show more ...
|
#
0ab200b6 |
| 21-Jun-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Refactor MIMG instruction TableGen using generic tables
Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcode
AMDGPU: Refactor MIMG instruction TableGen using generic tables
Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural.
This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes.
Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48016
llvm-svn: 335227
show more ...
|
#
e741d7e0 |
| 21-Jun-2018 |
Nicolai Haehnle <nhaehnle@gmail.com> |
AMDGPU: Use generic tables instead of SearchableTable
Summary:
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://
AMDGPU: Use generic tables instead of SearchableTable
Summary:
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48014
Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641 llvm-svn: 335226
show more ...
|
Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3 |
|
#
00f2cb11 |
| 12-Jun-2018 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
AMDHSA: Code object v3 updates
- Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pa
AMDHSA: Code object v3 updates
- Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments
llvm-svn: 334519
show more ...
|
Revision tags: llvmorg-6.0.1-rc2 |
|
#
c72ece6c |
| 16-May-2018 |
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> |
AMDGPU : Recalculate SGPRs when trap handler is supported
Differential Revision: https://reviews.llvm.org/D29911
llvm-svn: 332523
|