History log of /llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (Results 1 – 25 of 364)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# e8811ad3 22-Jan-2025 Shoreshen <372660931@qq.com>

[AMDGPU] Fix unreachable reg bit width (#122107)

Add register class bit width for SReg_256_XNULL and SReg_128_XNULL


Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# 7dbd6cd2 11-Dec-2024 Shilei Tian <i@tianshilei.me>

[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357)

If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwi

[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357)

If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwise, it uses the default range as a starting
point. We will no longer manipulate the known range, which can cause issues
because the known range is a "throttle" to the assumed range such that the
assumed range can't get widened properly in `updateImpl` if the known range is
not set properly for whatever reasons. Another benefit of not touching the known
range is, if we indicate pessimistic state, it also invalidates the AA such that
`manifest` will not be called. Since we honor the attribute, we don't want and
will not add any half-baked attribute added to a function.

show more ...


# 5e007afa 11-Dec-2024 Pravin Jagtap <Pravin.Jagtap@amd.com>

[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)

Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-4

[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589)

Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when
partially writing to high bytes.
Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop
for above cases irrespective of opsel values.

Note: We might need to add few others into the same table.

show more ...


# 24699841 05-Dec-2024 Pravin Jagtap <Pravin.Jagtap@amd.com>

[AMDGPU][NFC] Delete duplicate decl and impl defines. (#118843)


# 68bcba6d 04-Dec-2024 Shilei Tian <i@tianshilei.me>

Revert "[AMDGPU] Use COV6 by default (#118515)"

This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some
buildbots are not ready yet.


# 410cbe3c 04-Dec-2024 Shilei Tian <i@tianshilei.me>

[AMDGPU] Use COV6 by default (#118515)


Revision tags: llvmorg-19.1.5
# 39337ff2 02-Dec-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)

gfx950 SP changes doc says:
No 4 clk forwarding on opcodes that convert from
F32/F16->F8 or F32/F16->F4. Must insert a NOP or
instruct

AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844)

gfx950 SP changes doc says:
No 4 clk forwarding on opcodes that convert from
F32/F16->F8 or F32/F16->F4. Must insert a NOP or
instruction writing some other destination VREG
after a conversion to F4/F8 since it writes either
low/high half or bytes.

Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>
Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com>

show more ...


# d9c4e9ff 27-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Verify f8f6f4 formats in assembler (#117826)

Verify the register widths of the corresponding operands match
the floating point format expected size.


# 716364eb 26-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)

The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNa

AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598)

The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a,
both from gfx9 series. This required a new decoderNameSpace GFX950_DOT.

Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>

show more ...


# 01c9a14c 21-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)

These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. Non

AMDGPU: Define v_mfma_f32_{16x16x128|32x32x64}_f8f6f4 instructions (#116723)

These use a new VOP3PX encoding for the v_mfma_scale_* instructions,
which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers
are supported yet (op_sel, neg or clamp).

I'm not sure the intrinsic should really expose op_sel (or any of the
others). If I'm reading the documentation correctly, we should be able
to just have the raw scale operands and auto-match op_sel to byte
extract patterns.

The op_sel syntax also seems extra horrible in this usage, especially with the
usual assumed op_sel_hi=-1 behavior.

show more ...


Revision tags: llvmorg-19.1.4
# 5a556d55 18-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309)


# 8ed3b055 14-Nov-2024 Joe Nash <joseph.nash@amd.com>

[AMDGPU][True16][MC] Implement V_CVT_PK_F32_FP8/BF8 (#116106)

Existing Fake16 versions of these instructions do not support op_sel on
the _e32 encoding, which leaves a hole in the disassembler sup

[AMDGPU][True16][MC] Implement V_CVT_PK_F32_FP8/BF8 (#116106)

Existing Fake16 versions of these instructions do not support op_sel on
the _e32 encoding, which leaves a hole in the disassembler support.
Implement the true16 version of the instructions in the MC layer.

show more ...


# be187369 14-Nov-2024 Kazu Hirata <kazu@google.com>

[AMDGPU] Remove unused includes (NFC) (#116154)

Identified with misc-include-cleaner.


# e8644e3b 05-Nov-2024 Brox Chen <guochen2@amd.com>

[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436)

Some old "t16" VOP2 instructions are actually in fake16 format. Correct
and update test file


# 0b40f979 05-Nov-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751)

0 does not make sense as a value for this to be, much less the default.
Also stop emitting each individual field

AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751)

0 does not make sense as a value for this to be, much less the default.
Also stop emitting each individual field if it is the default, rather than
if any element was the default. Also fix the name of the test since it didn't
exactly match the real attribute name.

show more ...


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2
# 8d13e7b8 03-Oct-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Qualify auto. NFC. (#110878)

Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)


Revision tags: llvmorg-19.1.1
# 6f956e31 30-Sep-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Rename LocalMemorySize features to AddressableLocalMemorySize (#110242)

Change the names of the TableGen features to match the names used by
AMDGPUSubtarget. "Addressable" refers to the am

[AMDGPU] Rename LocalMemorySize features to AddressableLocalMemorySize (#110242)

Change the names of the TableGen features to match the names used by
AMDGPUSubtarget. "Addressable" refers to the amount that can be accessed
by a single workgroup. Add some explanatory comments. NFC.

show more ...


# fd50cdfb 28-Sep-2024 Craig Topper <craig.topper@sifive.com>

[AMDGPU] Use MCRegister. NFC


# 396f6775 24-Sep-2024 Scott Egerton <9487234+ScottEgerton@users.noreply.github.com>

[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769)


# d31e3141 20-Sep-2024 Youngsuk Kim <youngsuk.kim@hpe.com>

[llvm] Don't call raw_string_ostream::flush() (NFC)

Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b1361

[llvm] Don't call raw_string_ostream::flush() (NFC)

Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )

show more ...


Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4
# 7bcf4d63 22-Aug-2024 Jeffrey Byrnes <jeffrey.byrnes@amd.com>

[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)

MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position"

[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276)

MI300 ISA section 4.5 states there is a hazard between "VALU op which
uses OPSEL or SDWA with changes the result’s bit position" and "VALU op
consumes result of that op"

This includes the case where the second op is SDWA with same dest and
dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there
is an implicit read of the first op dst and the compiler needs to
resolve this hazard. Confirmed with HW team.

We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand,
so this PR checks for that.

MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and
CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue.
Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 --
this PR adds support opsel[2] != 0 as well

show more ...


Revision tags: llvmorg-19.1.0-rc3
# 2f89c1c7 17-Aug-2024 Mariusz Sikora <mariusz.sikora@amd.com>

[AMDGPU][NFC] Remove duplicate code by using getAddressableLocalMemorySize (#104604)


# f0fe6c66 14-Aug-2024 Ivan Kosarev <ivan.kosarev@amd.com>

[AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888)

And declare it to take an MCRegister.

Also rename related entities and remove a comment for the function that
depending on its p

[AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888)

And declare it to take an MCRegister.

Also rename related entities and remove a comment for the function that
depending on its purpose is either irrelevant or misleading.

show more ...


Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 63fae3ed 17-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298)


# 74b87b02 16-Jul-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Fix and add namespace closing comments. NFC.


12345678910>>...15