Revision tags: llvmorg-21-init |
|
#
e8811ad3 |
| 22-Jan-2025 |
Shoreshen <372660931@qq.com> |
[AMDGPU] Fix unreachable reg bit width (#122107)
Add register class bit width for SReg_256_XNULL and SReg_128_XNULL
|
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6 |
|
#
7dbd6cd2 |
| 11-Dec-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357)
If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwi
[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357)
If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwise, it uses the default range as a starting
point. We will no longer manipulate the known range, which can cause issues
because the known range is a "throttle" to the assumed range such that the
assumed range can't get widened properly in `updateImpl` if the known range is
not set properly for whatever reasons. Another benefit of not touching the known
range is, if we indicate pessimistic state, it also invalidates the AA such that
`manifest` will not be called. Since we honor the attribute, we don't want and
will not add any half-baked attribute added to a function.
show more ...
|
#
5e007afa |
| 11-Dec-2024 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)
Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-4
[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)
Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop
for above cases irrespective of opsel values.
Note: We might need to add few others into the same table.
show more ...
|
#
24699841 |
| 05-Dec-2024 |
Pravin Jagtap <Pravin.Jagtap@amd.com> |
[AMDGPU][NFC] Delete duplicate decl and impl defines. (#118843)
|
#
68bcba6d |
| 04-Dec-2024 |
Shilei Tian <i@tianshilei.me> |
Revert "[AMDGPU] Use COV6 by default (#118515)"
This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.
|
#
410cbe3c |
| 04-Dec-2024 |
Shilei Tian <i@tianshilei.me> |
[AMDGPU] Use COV6 by default (#118515)
|
Revision tags: llvmorg-19.1.5 |
|
#
39337ff2 |
| 02-Dec-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)
gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruct
AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)
gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruction writing some other destination VREG after a conversion to F4/F8 since it writes either low/high half or bytes.
Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>
show more ...
|
#
d9c4e9ff |
| 27-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Verify f8f6f4 formats in assembler (#117826)
Verify the register widths of the corresponding operands match the floating point format expected size.
|
#
716364eb |
| 26-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNa
AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)
The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.
Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>
show more ...
|
#
01c9a14c |
| 21-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. Non
AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)
These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp).
I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns.
The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
5a556d55 |
| 18-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309)
|
#
8ed3b055 |
| 14-Nov-2024 |
Joe Nash <joseph.nash@amd.com> |
[AMDGPU][True16][MC] Implement V_CVT_PK_F32_FP8/BF8 (#116106)
Existing Fake16 versions of these instructions do not support op_sel on
the _e32 encoding, which leaves a hole in the disassembler sup
[AMDGPU][True16][MC] Implement V_CVT_PK_F32_FP8/BF8 (#116106)
Existing Fake16 versions of these instructions do not support op_sel on
the _e32 encoding, which leaves a hole in the disassembler support.
Implement the true16 version of the instructions in the MC layer.
show more ...
|
#
be187369 |
| 14-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
|
#
e8644e3b |
| 05-Nov-2024 |
Brox Chen <guochen2@amd.com> |
[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436)
Some old "t16" VOP2 instructions are actually in fake16 format. Correct
and update test file
|
#
0b40f979 |
| 05-Nov-2024 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751)
0 does not make sense as a value for this to be, much less the default. Also stop emitting each individual field
AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751)
0 does not make sense as a value for this to be, much less the default. Also stop emitting each individual field if it is the default, rather than if any element was the default. Also fix the name of the test since it didn't exactly match the real attribute name.
show more ...
|
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2 |
|
#
8d13e7b8 |
| 03-Oct-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
|
Revision tags: llvmorg-19.1.1 |
|
#
6f956e31 |
| 30-Sep-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Rename LocalMemorySize features to AddressableLocalMemorySize (#110242)
Change the names of the TableGen features to match the names used by
AMDGPUSubtarget. "Addressable" refers to the am
[AMDGPU] Rename LocalMemorySize features to AddressableLocalMemorySize (#110242)
Change the names of the TableGen features to match the names used by
AMDGPUSubtarget. "Addressable" refers to the amount that can be accessed
by a single workgroup. Add some explanatory comments. NFC.
show more ...
|
#
fd50cdfb |
| 28-Sep-2024 |
Craig Topper <craig.topper@sifive.com> |
[AMDGPU] Use MCRegister. NFC
|
#
396f6775 |
| 24-Sep-2024 |
Scott Egerton <9487234+ScottEgerton@users.noreply.github.com> |
[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769)
|
#
d31e3141 |
| 20-Sep-2024 |
Youngsuk Kim <youngsuk.kim@hpe.com> |
[llvm] Don't call raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b1361
[llvm] Don't call raw_string_ostream::flush() (NFC)
Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
show more ...
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
#
7bcf4d63 |
| 22-Aug-2024 |
Jeffrey Byrnes <jeffrey.byrnes@amd.com> |
[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)
MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position"
[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)
MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position" and "VALU op
consumes result of that op"
This includes the case where the second op is SDWA with same dest and
dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there
is an implicit read of the first op dst and the compiler needs to
resolve this hazard. Confirmed with HW team.
We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand,
so this PR checks for that.
MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and
CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue.
Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 --
this PR adds support opsel[2] != 0 as well
show more ...
|
Revision tags: llvmorg-19.1.0-rc3 |
|
#
2f89c1c7 |
| 17-Aug-2024 |
Mariusz Sikora <mariusz.sikora@amd.com> |
[AMDGPU][NFC] Remove duplicate code by using getAddressableLocalMemorySize (#104604)
|
#
f0fe6c66 |
| 14-Aug-2024 |
Ivan Kosarev <ivan.kosarev@amd.com> |
[AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888)
And declare it to take an MCRegister.
Also rename related entities and remove a comment for the function that
depending on its p
[AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888)
And declare it to take an MCRegister.
Also rename related entities and remove a comment for the function that
depending on its purpose is either irrelevant or misleading.
show more ...
|
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
63fae3ed |
| 17-Jul-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298)
|
#
74b87b02 |
| 16-Jul-2024 |
Jay Foad <jay.foad@amd.com> |
[AMDGPU] Fix and add namespace closing comments. NFC.
|