AMDGPULowerModuleLDSPass.cpp - OpenGrok cross reference for /freebsd-src/contrib/llvm-project/llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

Lines Matching defs:kernel
9 // This pass eliminates local data store, LDS, uses from non-kernel functions.
10 // LDS is contiguous memory allocated per kernel execution.
16 // kernels this is straightforward - assign an integer to the kernel for the
21 // If this is more than the kernel assumed, the excess can be made available
28 // - a global accessed by one kernel exists independent of other kernels
29 // - a global exists independent of simultaneous execution of the same kernel
34 // Uses from kernels are implemented here by grouping them in a per-kernel
44 // design goal is to avoid one kernel imposing overhead on another.
56 // struct and allocate that struct in every kernel such that the original
68 // memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69 // for variables that are known reachable from a single kernel. "Hybrid" picks
74 // non-kernel functions and creates a new struct with a field for each of those
81 // Second, each kernel allocates LDS variables independent of other kernels and
83 // order. If the kernel does not allocate a given variable, it writes undef to
85 // table in the order matching the kernel unique integer identifier.
86 // Third, uses from non-kernel functions are replaced with a table lookup using
90 // reachable from exactly one kernel. For those cases, accesses to the variable
92 // one kernel. This is zero cost in space and in compute. It will raise a fatal
96 // Hybrid lowering is a mixture of the above. It uses the zero cost kernel
111 // - Implementations that instantiate templates per-kernel where those templates
117 // same intrinsic to identify which kernel is at the root of the dynamic call
120 // kernel. Therefore this pass creates new dynamic LDS variables for each kernel
123 // The corresponding optimisation for "kernel" lowering where the table lookup
128 // A single LDS global variable represents an instance per kernel that can reach
129 // said variables. This pass essentially specialises said variables per kernel.
150 // symbol metadata will occur. For kernel scope and dynamic, this is by _name_
152 // kernel to have a name (which is only a limitation for tests in practice) and
158 // have the same memory layout can map to the same kernel id (as the address
160 // uses of the "kernel" style faster lowering and reduce the size of the lookup
172 // the kernel scope (and thus error on the address check). Given a new invariant
224 enum class LoweringKind { module, table, kernel, hybrid };
233             LoweringKind::kernel, "kernel",
234             "Lower variables reachable from one kernel, otherwise abort"),
281     // immediately used by the kernel must still be allocated by it. An
307   // for each kernel
308   // an array with an element for each kernel containing where the corresponding
315     // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
498         // A variable reachable by only one kernel is best lowered with kernel
585         OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
634       case LoweringKind::kernel:
640               "' to kernel access as it is reachable from multiple kernels");
671     // Replace all uses of those variables from non-kernel functions with the
672     // new struct instance Replace only the uses from kernel functions that will
676     // of the per-kernel lowering).
692     // module.lds will be allocated at zero in any kernel that allocates it
698     // Replace all uses of module scope variable from non-kernel functions
709     // Replace uses of module scope variable from kernel functions that
711     // Record on each kernel whether the module scope global is used by it
742     // Create a struct for each kernel for the non-module-scope variables.
767       // not to the per-kernel instance.
780       // The association between kernel function and LDS struct is done by
791           (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();
808       // Rewrite uses within kernel to the new struct
823     // reachable from this kernel. Dynamic LDS is allocated after the static LDS
826     // reachable by that kernel. All dynamic LDS variables are allocated at the
827     // same address in each kernel in order to provide the documented aliasing
933     // For each kernel, what variables does it access directly or through
957     // If the kernel accesses a variable that is going to be stored in the
958     // module instance through a call then that kernel needs to allocate the
979     // Lower zero cost accesses to the kernel instances just created
982       assert(funcs.size() == 1); // Only one kernel can access it
994     // The ith element of this vector is kernel id i
1014       // Strip amdgpu-no-lds-kernel-id from all functions reachable from the
1015       // kernel. We may have inferred this wasn't used prior to the pass.
1019         removeFnAttrFromReachable(CG, F, {"amdgpu-no-lds-kernel-id"});
1027     // All kernel frames have been allocated. Calculate and record the
1042         //  kernel instance
1062           // kernel
1075         // variable in the kernel itself, so without including it here, that
1405                       "Lower uses of LDS variables from non-kernel functions",
1409                     "Lower uses of LDS variables from non-kernel functions",