AMDGPUBaseInfo.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# e8811ad3	22-Jan-2025	Shoreshen <372660931@qq.com>	[AMDGPU] Fix unreachable reg bit width (#122107) Add register class bit width for SReg_256_XNULL and SReg_128_XNULL
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# 7dbd6cd2	11-Dec-2024	Shilei Tian <i@tianshilei.me>	[AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357) If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by taking its value directly; otherwi [AMDGPU][Attributor] Make `AAAMDFlatWorkGroupSize` honor existing attribute (#114357) If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by taking its value directly; otherwise, it uses the default range as a starting point. We will no longer manipulate the known range, which can cause issues because the known range is a "throttle" to the assumed range such that the assumed range can't get widened properly in `updateImpl` if the known range is not set properly for whatever reasons. Another benefit of not touching the known range is, if we indicate pessimistic state, it also invalidates the AA such that `manifest` will not be called. Since we honor the attribute, we don't want and will not add any half-baked attribute added to a function. show more ...
# 5e007afa	11-Dec-2024	Pravin Jagtap <Pravin.Jagtap@amd.com>	[AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-4 [AMDGPU] Handle hazard in v_scalef32_sr_fp4_* conversions (#118589) Presently, compiler selectivelly adds nop when opsel != 0 i.e. only when partially writing to high bytes. Experiments in SWDEV-499733 and SWDEV-501347 suggest that we need nop for above cases irrespective of opsel values. Note: We might need to add few others into the same table. show more ...
# 24699841	05-Dec-2024	Pravin Jagtap <Pravin.Jagtap@amd.com>	[AMDGPU][NFC] Delete duplicate decl and impl defines. (#118843)
# 68bcba6d	04-Dec-2024	Shilei Tian <i@tianshilei.me>	Revert "[AMDGPU] Use COV6 by default (#118515)" This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.
# 410cbe3c	04-Dec-2024	Shilei Tian <i@tianshilei.me>	[AMDGPU] Use COV6 by default (#118515)
Revision tags: llvmorg-19.1.5
# 39337ff2	02-Dec-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844) gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruct AMDGPU: Handle cvt_scale F32/F16->F4/F8 gfx950 hazard (#117844) gfx950 SP changes doc says: No 4 clk forwarding on opcodes that convert from F32/F16->F8 or F32/F16->F4. Must insert a NOP or instruction writing some other destination VREG after a conversion to F4/F8 since it writes either low/high half or bytes. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Jeffrey Byrnes <Jeffrey.Byrnes@amd.com> show more ...
# d9c4e9ff	27-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Verify f8f6f4 formats in assembler (#117826) Verify the register widths of the corresponding operands match the floating point format expected size.
# 716364eb	26-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNa AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com> show more ...
# 01c9a14c	21-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. Non AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior. show more ...
Revision tags: llvmorg-19.1.4
# 5a556d55	18-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309)
# 8ed3b055	14-Nov-2024	Joe Nash <joseph.nash@amd.com>	[AMDGPU][True16][MC] Implement V_CVT_PK_F32_FP8/BF8 (#116106) Existing Fake16 versions of these instructions do not support op_sel on the _e32 encoding, which leaves a hole in the disassembler sup [AMDGPU][True16][MC] Implement V_CVT_PK_F32_FP8/BF8 (#116106) Existing Fake16 versions of these instructions do not support op_sel on the _e32 encoding, which leaves a hole in the disassembler support. Implement the true16 version of the instructions in the MC layer. show more ...
# be187369	14-Nov-2024	Kazu Hirata <kazu@google.com>	[AMDGPU] Remove unused includes (NFC) (#116154) Identified with misc-include-cleaner.
# e8644e3b	05-Nov-2024	Brox Chen <guochen2@amd.com>	[AMDGPU][True16][MC] VOP2 update instructions with fake16 format (#114436) Some old "t16" VOP2 instructions are actually in fake16 format. Correct and update test file
# 0b40f979	05-Nov-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751) 0 does not make sense as a value for this to be, much less the default. Also stop emitting each individual field AMDGPU: Treat uint32_max as the default value for amdgpu-max-num-workgroups (#113751) 0 does not make sense as a value for this to be, much less the default. Also stop emitting each individual field if it is the default, rather than if any element was the default. Also fix the name of the test since it didn't exactly match the real attribute name. show more ...
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2
# 8d13e7b8	03-Oct-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Qualify auto. NFC. (#110878) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)
Revision tags: llvmorg-19.1.1
# 6f956e31	30-Sep-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Rename LocalMemorySize features to AddressableLocalMemorySize (#110242) Change the names of the TableGen features to match the names used by AMDGPUSubtarget. "Addressable" refers to the am [AMDGPU] Rename LocalMemorySize features to AddressableLocalMemorySize (#110242) Change the names of the TableGen features to match the names used by AMDGPUSubtarget. "Addressable" refers to the amount that can be accessed by a single workgroup. Add some explanatory comments. NFC. show more ...
# fd50cdfb	28-Sep-2024	Craig Topper <craig.topper@sifive.com>	[AMDGPU] Use MCRegister. NFC
# 396f6775	24-Sep-2024	Scott Egerton <9487234+ScottEgerton@users.noreply.github.com>	[AMDGPU] Remove unused VGPRSingleUseHintInsts feature (#109769)
# d31e3141	20-Sep-2024	Youngsuk Kim <youngsuk.kim@hpe.com>	[llvm] Don't call raw_string_ostream::flush() (NFC) Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b1361 [llvm] Don't call raw_string_ostream::flush() (NFC) Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b13610a5226b84889b923bae884ba395ad084d for further reference ) show more ...
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4
# 7bcf4d63	22-Aug-2024	Jeffrey Byrnes <jeffrey.byrnes@amd.com>	[AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276) MI300 ISA section 4.5 states there is a hazard between "VALU op which uses OPSEL or SDWA with changes the result’s bit position" [AMDGPU] Correctly insert s_nops for dst forwarding hazard (#100276) MI300 ISA section 4.5 states there is a hazard between "VALU op which uses OPSEL or SDWA with changes the result’s bit position" and "VALU op consumes result of that op" This includes the case where the second op is SDWA with same dest and dst_sel != DWORD && dst_unused == UNUSED_PRESERVE. In this case, there is an implicit read of the first op dst and the compiler needs to resolve this hazard. Confirmed with HW team. We model dst_unused == UNUSED_PRESERVE as tied-def of implicit operand, so this PR checks for that. MI300_SP_MAS section 1.3.9.2 specifies that CVT_SR_FP8_F32 and CVT_SR_BF8_F32 with opsel[3:2] !=0 have dest forwarding issue. Currently, we only add check for CVT_SR_FP8_F32 with opsel[3] != 0 -- this PR adds support opsel[2] != 0 as well show more ...
Revision tags: llvmorg-19.1.0-rc3
# 2f89c1c7	17-Aug-2024	Mariusz Sikora <mariusz.sikora@amd.com>	[AMDGPU][NFC] Remove duplicate code by using getAddressableLocalMemorySize (#104604)
# f0fe6c66	14-Aug-2024	Ivan Kosarev <ivan.kosarev@amd.com>	[AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888) And declare it to take an MCRegister. Also rename related entities and remove a comment for the function that depending on its p [AMDGPU][NFC] Rename isHi() to isHi16Reg() for clarity. (#103888) And declare it to take an MCRegister. Also rename related entities and remove a comment for the function that depending on its purpose is either irrelevant or misleading. show more ...
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 63fae3ed	17-Jul-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298)
# 74b87b02	16-Jul-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Fix and add namespace closing comments. NFC.
12 3 4 5 6 7 8 9 10 >>...15