|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
| #
5879162f |
| 15-Dec-2023 |
Mirko Brkušanin <Mirko.Brkusanin@amd.com> |
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492)
|
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2 |
|
| #
c3cfbbc4 |
| 07-Aug-2023 |
pvanhout <pierre.vanhoutryve@amd.com> |
[GlobalISel] Add dead flags to implicit defs in ISel
Checks for implicit defs that are unused within a pattern and mark them as dead.
This is done directly at the TableGen level forr efficiency. Th
[GlobalISel] Add dead flags to implicit defs in ISel
Checks for implicit defs that are unused within a pattern and mark them as dead.
This is done directly at the TableGen level forr efficiency. The instructions are directly created with the "dead" operand and no further analysis is needed later.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D157273
show more ...
|
|
Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4 |
|
| #
f0415f2a |
| 03-May-2023 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
Re-land "[AMDGPU] Define data layout entries for buffers""
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.
This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27.
Di
Re-land "[AMDGPU] Define data layout entries for buffers""
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.
This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27.
Differential Revision: https://reviews.llvm.org/D149776
show more ...
|
| #
3f2fbe92 |
| 03-May-2023 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
Revert "[AMDGPU] Define data layout entries for buffers"
This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850.
Differential Revision: https://reviews.llvm.org/D149758
|
|
Revision tags: llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2 |
|
| #
f9c1ede2 |
| 07-Feb-2023 |
Krzysztof Drewniak <Krzysztof.Drewniak@amd.com> |
[AMDGPU] Define data layout entries for buffers
Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new addr
[AMDGPU] Define data layout entries for buffers
Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets.
The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation.
The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them.
Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations.
This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table.
Depends on D143437
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D145441
show more ...
|
|
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
3f4055de |
| 09-Jan-2023 |
Chen Zheng <czhengsz@cn.ibm.com> |
[GlobalISelEmitter] handle operand without MVT/class
There are some patterns in td files without MVT/class set for some operands in target pattern that are from the source pattern. This prevents Glo
[GlobalISelEmitter] handle operand without MVT/class
There are some patterns in td files without MVT/class set for some operands in target pattern that are from the source pattern. This prevents GlobalISelEmitter from adding them as a valid rule, because the target child operand is an unsupported kind operand. For now, for a leaf child, only IntInit and DefInit are handled in GlobalISelEmitter.
This issue can be workaround by adding MVT/class to the patterns in the td files, like the workarounds for patterns anyext and setcc in PPCInstrInfo.td in D140878.
To avoid adding the same workarounds for other patterns in td files, this patch tries to handle the UnsetInit case in GlobalISelEmitter.
Adding the new handling allows us to remove the workarounds in the td files and also generates many selection rules for PPC target.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D141247
show more ...
|
| #
d9a6fc82 |
| 24-Jan-2023 |
Pierre van Houtryve <pierre.vanhoutryve@amd.com> |
[AMDGPU] Run unmerge combines post regbankselect
RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which left us with a lot of unmerge/merge pairs that could be simplified. These often go
[AMDGPU] Run unmerge combines post regbankselect
RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which left us with a lot of unmerge/merge pairs that could be simplified. These often got in the way of pattern matching and made codegen worse.
This patch: - Makes the necessary changes to the merge/unmerge combines so they can run post RegBankSelect - Adds relevant unmerge combines to the list of RegBankSelect combines for AMDGPU - Updates some tablegen patterns that were missing explicit cross-regbank copies (V_BFI patterns were causing constant bus violations with this change).
This seems to be mostly beneficial for code quality.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D142192
show more ...
|
|
Revision tags: llvmorg-15.0.6 |
|
| #
43b86bf9 |
| 25-Nov-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU: Remove BufferPseudoSourceValue
The use of a PSV for buffer intrinsics is misleading because it may be misinterpreted as all buffer intrinsics accessing the same address in memory, which is c
AMDGPU: Remove BufferPseudoSourceValue
The use of a PSV for buffer intrinsics is misleading because it may be misinterpreted as all buffer intrinsics accessing the same address in memory, which is clearly not true.
Instead, build MachineMemOperands without a pointer value but with an address space, so that address space-based alias analysis can still work.
There is a lot of test churn because previously address space 4 (constant address space) was used as an address space for buffer intrinsics. This doesn't make much sense and seems to have been an accident -- see the change in AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind.
Differential Revision: https://reviews.llvm.org/D138711
show more ...
|
|
Revision tags: llvmorg-15.0.5 |
|
| #
1b560e6a |
| 14-Nov-2022 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][MC] Support TFE modifiers in MUBUF loads and stores.
Reviewed By: dp, arsenm
Differential Revision: https://reviews.llvm.org/D137783
|
|
Revision tags: llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
affa1b1c |
| 10-May-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane
A later change will add a 3rd user, so factoring out the common code seems useful.
Reorganizing the executeInWaterfallLoop causes
AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane
A later change will add a 3rd user, so factoring out the common code seems useful.
Reorganizing the executeInWaterfallLoop causes some more COPYs to be generated, but those all fold away during instruction selection. Generating the comparisons uses generic instructions over machine instructions now which admittedly shouldn't make a difference (though it should make it easier to move the waterfall loop generation to another place).
(Resubmit with missing test added.)
Differential Revision: https://reviews.llvm.org/D125324
show more ...
|
| #
afc90101 |
| 25-May-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
Revert "AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane"
This reverts commit 2a28467e5389c4d741d1825fadd39ae84ecaa5dc.
|
| #
2a28467e |
| 10-May-2022 |
Nicolai Hähnle <nicolai.haehnle@amd.com> |
AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane
A later change will add a 3rd user, so factoring out the common code seems useful.
Reorganizing the executeInWaterfallLoop causes
AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane
A later change will add a 3rd user, so factoring out the common code seems useful.
Reorganizing the executeInWaterfallLoop causes some more COPYs to be generated, but those all fold away during instruction selection. Generating the comparisons uses generic instructions over machine instructions now which admittedly shouldn't make a difference (though it should make it easier to move the waterfall loop generation to another place).
Differential Revision: https://reviews.llvm.org/D125324
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1 |
|
| #
1f52d02c |
| 28-Mar-2022 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Split waterfall loop exec manipulation
Split waterfall loops into multiple blocks so that exec mask manipulation (s_and_saveexec) does not occur in the middle of a block.
VGPR live range o
[AMDGPU] Split waterfall loop exec manipulation
Split waterfall loops into multiple blocks so that exec mask manipulation (s_and_saveexec) does not occur in the middle of a block.
VGPR live range optimizer is updated to handle waterfall loops spanning multiple blocks.
Reviewed By: ruiling
Differential Revision: https://reviews.llvm.org/D122200
show more ...
|
|
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
| #
bd2c01e9 |
| 11-Jan-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU/GlobalISel: Do not use terminator copy before waterfall loops
Stop using the _term variants of the mov to save the initial exec value before the waterfall loop. This cannot be glued to the bo
AMDGPU/GlobalISel: Do not use terminator copy before waterfall loops
Stop using the _term variants of the mov to save the initial exec value before the waterfall loop. This cannot be glued to the bottom of the block because we may need to spill the result register. Just use a regular mov, like the loops produced on the DAG path. Fixes some verification errors with regalloc fast.
show more ...
|
| #
0cf860ec |
| 11-Jan-2022 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU/GlobalISel: Regenerate baseline checks to include -NEXT
|
|
Revision tags: llvmorg-13.0.1-rc1 |
|
| #
fd1cfc90 |
| 28-Oct-2021 |
Sebastian Neubauer <Sebastian.Neubauer@amd.com> |
[AMDGPU][GlobalISel] Fix waterfall loops
- Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benef
[AMDGPU][GlobalISel] Fix waterfall loops
- Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit from optimizations - Add support for indirect function calls
To support indirect calls, add a G_SI_CALL instruction without register class restrictions and insert a waterfall loop when applying register banks.
Differential Revision: https://reviews.llvm.org/D109052
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init |
|
| #
59f68652 |
| 21-Jul-2021 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset
Codegen for the raw/struct buffer access intrinsics would update the offset in the MMO to reflect the combined offset, i
[AMDGPU][GISel] Fix MMO for raw/struct buffer access with non-constant offset
Codegen for the raw/struct buffer access intrinsics would update the offset in the MMO to reflect the combined offset, if it was known to be constant. If the combined offset was not known to be constant, or if there was an index, it would set the offset in the MMO to 0. This is unsafe because it makes it look like the access does not alias with another access with a fixed non-zero offset.
Fix these cases by setting the pointer in the MMO to null, to reflect the fact that we do not have any known IR value pointer + constant offset for the access.
D106284 did this for SelectionDAG. This is the corresponding fix for GlobalISel.
Differential Revision: https://reviews.llvm.org/D106451
show more ...
|
|
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1 |
|
| #
fae05692 |
| 20-May-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
CodeGen: Print/parse LLTs in MachineMemOperands
This will currently accept the old number of bytes syntax, and convert it to a scalar. This should be removed in the near future (I think I converted
CodeGen: Print/parse LLTs in MachineMemOperands
This will currently accept the old number of bytes syntax, and convert it to a scalar. This should be removed in the near future (I think I converted all of the tests already, but likely missed a few).
Not sure what the exact syntax and policy should be. We can continue printing the number of bytes for non-generic instructions to avoid test churn and only allow non-scalar types for generic instructions.
This will currently print the LLT in parentheses, but accept parsing the existing integers and implicitly converting to scalar. The parentheses are a bit ugly, but the parser logic seems unable to deal without either parentheses or some keyword to indicate the start of a type.
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
| #
3bffb1cd |
| 09-Feb-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amoun
[AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway.
Additional advantage that parser will accept these flags in any order unlike now.
Differential Revision: https://reviews.llvm.org/D96469
show more ...
|
| #
62d946e1 |
| 07-Feb-2021 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
GlobalISel: Merge some AMDGPU ABI lowering code to generic code
AMDGPU currently has a lot of pre-processing code to pre-split argument types into 32-bit pieces before passing it to the generic code
GlobalISel: Merge some AMDGPU ABI lowering code to generic code
AMDGPU currently has a lot of pre-processing code to pre-split argument types into 32-bit pieces before passing it to the generic code in handleAssignments. This is a bit sloppy and also requires some overly fancy iterator work when building the calls. It's better if all argument marshalling code is handled directly in handleAssignments. This handles more situations like decomposing large element vectors into sub-element sized pieces.
This should mostly be NFC, but does change the generated code by shifting where the initial argument packing instructions are placed. I think this is nicer looking, since it now emits the packing code directly after the relevant copies, rather than after the copies for the remaining arguments.
This doubles down on gfx6/gfx7 using the gfx8+ ABI for 16-bit types. This is ultimately the better option, but incompatible with the DAG. Fixing this requires more work, especially for f16.
show more ...
|
| #
a8d9d507 |
| 17-Feb-2021 |
Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> |
[AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
|
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2 |
|
| #
8214982b |
| 21-Jan-2021 |
Sebastian Neubauer <sebastian.neubauer@amd.com> |
[AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names.
Relands ba7dc
[AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names.
Relands ba7dcd8542ab, which had memory leaks.
Differential Revision: https://reviews.llvm.org/D95215
show more ...
|
| #
4dbdff66 |
| 21-Jan-2021 |
Sebastian Neubauer <sebastian.neubauer@amd.com> |
Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue"
This reverts commit ba7dcd8542abfc784255efcb0767701dec42fe83.
(caused memory leaks)
|
| #
ba7dcd85 |
| 15-Jan-2021 |
Sebastian Neubauer <sebastian.neubauer@amd.com> |
[AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names.
Differential
[AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names.
Differential Revision: https://reviews.llvm.org/D94768
show more ...
|
|
Revision tags: llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4 |
|
| #
d07f9e73 |
| 10-Mar-2020 |
Carl Ritson <carl.ritson@amd.com> |
[AMDGPU] Allow struct.buffer.*.format intrinsics to accept i32
Summary: In the same manner as struct.buffer.load / struct.buffer.store, allow struct.buffer.load.format / struct.buffer.store.format t
[AMDGPU] Allow struct.buffer.*.format intrinsics to accept i32
Summary: In the same manner as struct.buffer.load / struct.buffer.store, allow struct.buffer.load.format / struct.buffer.store.format to return / accept any type. This simplifies front-end code gen.
Reviewers: tpr, arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75789
show more ...
|