SIMachineFunctionInfo.cpp - OpenGrok history log for /llvm-project/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# 6206f544	23-Jan-2025	Lucas Ramirez <11032120+lucas-rami@users.noreply.github.com>	[AMDGPU] Occupancy w.r.t. workgroup size range is also a range (#123748) Occupancy (i.e., the number of waves per EU) depends, in addition to register usage, on per-workgroup LDS usage as well as on [AMDGPU] Occupancy w.r.t. workgroup size range is also a range (#123748) Occupancy (i.e., the number of waves per EU) depends, in addition to register usage, on per-workgroup LDS usage as well as on the range of possible workgroup sizes. Mirroring the latter, occupancy should therefore be expressed as a range since different group sizes generally yield different achievable occupancies. `getOccupancyWithLocalMemSize` currently returns a scalar occupancy based on the maximum workgroup size and LDS usage. With respect to the workgroup size range, this scalar can be the minimum, the maximum, or neither of the two of the range of achievable occupancies. This commit fixes the function by making it compute and return the range of achievable occupancies w.r.t. workgroup size and LDS usage; it also renames it to `getOccupancyWithWorkGroupSizes` since it is the range of workgroup sizes that produces the range of achievable occupancies. Computing the achievable occupancy range is surprisingly involved. Minimum/maximum workgroup sizes do not necessarily yield maximum/minimum occupancies i.e., sometimes workgroup sizes inside the range yield the occupancy bounds. The implementation finds these sizes in constant time; heavy documentation explains the rationale behind the sometimes relatively obscure calculations. As a justifying example, consider a target with 10 waves / EU, 4 EUs/CU, 64-wide waves. Also consider a function with no LDS usage and a flat workgroup size range of [513,1024]. - A group of 513 items requires 9 waves per group. Only 4 groups made up of 9 waves each can fit fully on a CU at any given time, for a total of 36 waves on the CU, or 9 per EU. However, filling as much as possible the remaining 40-36=4 wave slots without decreasing the number of groups reveals that a larger group of 640 items yields 40 waves on the CU, or 10 per EU. - Similarly, a group of 1024 items requires 16 waves per group. Only 2 groups made up of 16 waves each can fit fully on a CU ay any given time, for a total of 32 waves on the CU, or 8 per EU. However, removing as many waves as possible from the groups without being able to fit another equal-sized group on the CU reveals that a smaller group of 896 items yields 28 waves on the CU, or 7 per EU. Therefore the achievable occupancy range for this function is not [8,9] as the group size bounds directly yield, but [7,10]. Naturally this change causes a lot of test churn as instruction scheduling is driven by achievable occupancy estimates. In most unit tests the flat workgroup size range is the default [1,1024] which, ignoring potential LDS limitations, would previously produce a scalar occupancy of 8 (derived from 1024) on a lot of targets, whereas we now consider the maximum occupancy to be 10 in such cases. Most tests are updated automatically and checked manually for sanity. I also manually changed some non-automatically generated assertions when necessary. Fixes #118220. show more ...
# 21704a68	17-Jan-2025	Stanislav Mekhanoshin <rampitec@users.noreply.github.com>	[AMDGPU] Fix printing hasInitWholeWave in mir (#123232)
Revision tags: llvmorg-19.1.7
# 2e5c2982	10-Jan-2025	Austin Kerbow <Austin.Kerbow@amd.com>	[AMDGPU] Add backward compatibility layer for kernarg preloading (#119167) Add a prologue to the kernel entry to handle cases where code designed for kernarg preloading is executed on hardware equi [AMDGPU] Add backward compatibility layer for kernarg preloading (#119167) Add a prologue to the kernel entry to handle cases where code designed for kernarg preloading is executed on hardware equipped with incompatible firmware. If hardware has compatible firmware the 256 bytes at the start of the kernel entry will be skipped. This skipping is done automatically by hardware that supports the feature. A pass is added which is intended to be run at the very end of the pipeline to avoid any optimizations that would assume the prologue is a real predecessor block to the actual code start. In reality we have two possible entry points for the function. 1. The optimized path that supports kernarg preloading which begins at an offset of 256 bytes. 2. The backwards compatible entry point which starts at offset 0. show more ...
# 67c55b1f	18-Dec-2024	Ruiling, Song <ruiling.song@amd.com>	[AMDGPU] Make max dwords of memory cluster configurable (#119342) We find it helpful to increase the value for graphics workload. Make it configurable so we can experiment with a different value.
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# bc7e099a	07-Nov-2024	dyung <douglas.yung@sony.com>	Revert "[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes" (#115353) Reverts llvm/llvm-project#115291 Reverting due to test failures on many bots including https://lab.llvm.org/buildbot/#/builde Revert "[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes" (#115353) Reverts llvm/llvm-project#115291 Reverting due to test failures on many bots including https://lab.llvm.org/buildbot/#/builders/174/builds/8049 show more ...
# 21835ee2	07-Nov-2024	Akshat Oke <Akshat.Oke@amd.com>	[AMDGPU][MIR] Serialize NumPhysicalVGPRSpillLanes (#115291)
# 3495d045	05-Nov-2024	Akshat Oke <Akshat.Oke@amd.com>	[AMDGPU][MIR] Serialize SpillPhysVGPRs (#113129)
Revision tags: llvmorg-19.1.3, llvmorg-19.1.2
# 8d13e7b8	03-Oct-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Qualify auto. NFC. (#110878) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)
Revision tags: llvmorg-19.1.1
# ac0f64f0	30-Sep-2024	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU] Split vgpr regalloc pipeline (#93526) Allocating wwm-registers and per-thread VGPR operands together imposes many challenges in the way the registers are reused during allocation. There a [AMDGPU] Split vgpr regalloc pipeline (#93526) Allocating wwm-registers and per-thread VGPR operands together imposes many challenges in the way the registers are reused during allocation. There are times when regalloc reuses the registers of regular VGPRs operations for wwm-operations in a small range leading to unwantedly clobbering their inactive lanes causing correctness issues that are hard to trace. This patch splits the VGPR allocation pipeline further to allocate wwm-registers first and the regular VGPR operands in a separate pipeline. The splitting would ensure that the physical registers used for wwm allocations won't take part in the next allocation pipeline to avoid any such clobbering. show more ...
# 23487be4	26-Sep-2024	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU] Merge the conditions used for deciding CS spills for amdgpu_cs_chain[_preserve] (#109911) Multiple conditions exist to decide whether callee save spills/restores are required for amdgpu_cs [AMDGPU] Merge the conditions used for deciding CS spills for amdgpu_cs_chain[_preserve] (#109911) Multiple conditions exist to decide whether callee save spills/restores are required for amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions. This patch consolidates them all and moves to a single place. show more ...
# e03f4271	19-Sep-2024	Jay Foad <jay.foad@amd.com>	[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133) It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all oc [LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133) It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all occurrences I could find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor could be deprecated or removed. show more ...
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3
# a5666359	19-Aug-2024	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU] Move AMDGPUCodeGenPassBuilder into AMDGPUTargetMachine(NFC) (#103720) This will allow us to reuse the existing flags and the static functions while building the pipeline for new pass manage [AMDGPU] Move AMDGPUCodeGenPassBuilder into AMDGPUTargetMachine(NFC) (#103720) This will allow us to reuse the existing flags and the static functions while building the pipeline for new pass manager. show more ...
Revision tags: llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 63fae3ed	17-Jul-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298)
# c7309dad	17-Jul-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Use range-based for loops. NFC. (#99047)
# 5e338f1f	17-Jul-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] clang-tidy: use emplace_back instead of push_back. NFC.
# 0b43d573	16-Jul-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] clang-tidy: replace macro with enum. NFC.
# 7e9b49f6	25-Jun-2024	Nicolai Hähnle <nicolai.haehnle@amd.com>	AMDGPU: Add plumbing for private segment size argument (#96445) The actual size of scratch/private is determined at dispatch time, so add more plumbing to request it. Will be used in subsequent cha AMDGPU: Add plumbing for private segment size argument (#96445) The actual size of scratch/private is determined at dispatch time, so add more plumbing to request it. Will be used in subsequent change. show more ...
# d6c74102	25-Jun-2024	Nicolai Hähnle <nicolai.haehnle@amd.com>	AMDGPU: Remove an outdated TODO (#96446) We have a fixed calling convention for stack pointer and frame pointer, we shouldn't try to shift anything around.
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5
# 5b187751	29-Apr-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Fix typo in #89773 Fixes #90281
# 46163688	24-Apr-2024	Jay Foad <jay.foad@amd.com>	[AMDGPU] Allow WorkgroupID intrinsics in amdgpu_gfx functions (#89773) With GFX12 architected SGPRs the workgroup ids are trivially available in any function called from a compute entrypoint.
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3
# b6b703b2	21-Mar-2024	Matt Arsenault <Matthew.Arsenault@amd.com>	AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948) SIMachineFunctionInfo has a scan of the function body for inline asm which may use AGPRs, or callees in SIMachineFunctionInfo. Move this i AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948) SIMachineFunctionInfo has a scan of the function body for inline asm which may use AGPRs, or callees in SIMachineFunctionInfo. Move this into the attributor, so it actually works interprocedurally. Could probably avoid most of the test churn if this bothered to avoid adding this on subtargets without AGPRs. We should also probably try to delete the MIR scan in usesAGPRs but it seems to be trickier to eliminate. show more ...
Revision tags: llvmorg-18.1.2
# c4e517f5	12-Mar-2024	Jun Wang <jwang86@yahoo.com>	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows progr [AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com> show more ...
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3
# bc6955f1	09-Feb-2024	Diana Picus <Diana-Magda.Picus@amd.com>	[AMDGPU] Don't fix the scavenge slot at offset 0 (#79136) At the moment, the emergency spill slot is a fixed object for entry functions and chain functions, and a regular stack object otherwise. T [AMDGPU] Don't fix the scavenge slot at offset 0 (#79136) At the moment, the emergency spill slot is a fixed object for entry functions and chain functions, and a regular stack object otherwise. This patch adopts the latter behaviour for entry/chain functions too. It seems this was always the intention [1] and it will also save us a bit of stack space in cases where the first stack object has a large alignment. [1] https://github.com/llvm/llvm-project/commit/34c8b835b16fb3879f1b9770e91df21883356bb6 show more ...
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 230c13d5	24-Jan-2024	Christudasan Devadasan <christudasan.devadasan@amd.com>	[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669) CSR SGPR spilling currently uses the early available physical VGPRs. It currently imposes a high register pressure while trying to a [AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669) CSR SGPR spilling currently uses the early available physical VGPRs. It currently imposes a high register pressure while trying to allocate large VGPR tuples within the default register budget. This patch changes the spilling strategy by picking the VGPRs in the reverse order, the highest available VGPR first and later after regalloc shift them back to the lowest available range. With that, the initial VGPRs would be available for allocation and possibility of finding large number of contiguous registers will be more. show more ...
# 51392996	17-Dec-2023	Carl Ritson <carl.ritson@amd.com>	[AMDGPU] Track physical VGPRs used for SGPR spills (#75573) Physical VGPRs used for SGPR spills need to be tracked independent of WWM reserved registers. The WWM reserved set contains extra registe [AMDGPU] Track physical VGPRs used for SGPR spills (#75573) Physical VGPRs used for SGPR spills need to be tracked independent of WWM reserved registers. The WWM reserved set contains extra registers allocated during WWM pre-allocation pass. This causes SGPR spills allocated after WWM pre-allocation to overlap with WWM register usage, e.g. if frame pointer is spilt during prologue/epilog insertion. show more ...
12 3 4 5 6 7 8 9