Lines Matching defs:kernel
9 // This pass eliminates local data store, LDS, uses from non-kernel functions.
10 // LDS is contiguous memory allocated per kernel execution.
16 // kernels this is straightforward - assign an integer to the kernel for the
21 // If this is more than the kernel assumed, the excess can be made available
28 // - a global accessed by one kernel exists independent of other kernels
29 // - a global exists independent of simultaneous execution of the same kernel
34 // Uses from kernels are implemented here by grouping them in a per-kernel
44 // design goal is to avoid one kernel imposing overhead on another.
56 // struct and allocate that struct in every kernel such that the original
68 // memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69 // for variables that are known reachable from a single kernel. "Hybrid" picks
74 // non-kernel functions and creates a new struct with a field for each of those
81 // Second, each kernel allocates LDS variables independent of other kernels and
83 // order. If the kernel does not allocate a given variable, it writes undef to
85 // table in the order matching the kernel unique integer identifier.
86 // Third, uses from non-kernel functions are replaced with a table lookup using
90 // reachable from exactly one kernel. For those cases, accesses to the variable
92 // one kernel. This is zero cost in space and in compute. It will raise a fatal
96 // Hybrid lowering is a mixture of the above. It uses the zero cost kernel
111 // - Implementations that instantiate templates per-kernel where those templates
117 // same intrinsic to identify which kernel is at the root of the dynamic call
120 // kernel. Therefore this pass creates new dynamic LDS variables for each kernel
123 // The corresponding optimisation for "kernel" lowering where the table lookup
128 // A single LDS global variable represents an instance per kernel that can reach
129 // said variables. This pass essentially specialises said variables per kernel.
150 // symbol metadata will occur. For kernel scope and dynamic, this is by _name_
152 // kernel to have a name (which is only a limitation for tests in practice) and
158 // have the same memory layout can map to the same kernel id (as the address
160 // uses of the "kernel" style faster lowering and reduce the size of the lookup
172 // the kernel scope (and thus error on the address check). Given a new invariant
224 enum class LoweringKind { module, table, kernel, hybrid };
233 LoweringKind::kernel, "kernel",
234 "Lower variables reachable from one kernel, otherwise abort"),
281 // immediately used by the kernel must still be allocated by it. An
307 // for each kernel
308 // an array with an element for each kernel containing where the corresponding
315 // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
498 // A variable reachable by only one kernel is best lowered with kernel
585 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
634 case LoweringKind::kernel:
640 "' to kernel access as it is reachable from multiple kernels");
671 // Replace all uses of those variables from non-kernel functions with the
672 // new struct instance Replace only the uses from kernel functions that will
676 // of the per-kernel lowering).
692 // module.lds will be allocated at zero in any kernel that allocates it
698 // Replace all uses of module scope variable from non-kernel functions
709 // Replace uses of module scope variable from kernel functions that
711 // Record on each kernel whether the module scope global is used by it
742 // Create a struct for each kernel for the non-module-scope variables.
767 // not to the per-kernel instance.
780 // The association between kernel function and LDS struct is done by
791 (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();
808 // Rewrite uses within kernel to the new struct
823 // reachable from this kernel. Dynamic LDS is allocated after the static LDS
826 // reachable by that kernel. All dynamic LDS variables are allocated at the
827 // same address in each kernel in order to provide the documented aliasing
933 // For each kernel, what variables does it access directly or through
957 // If the kernel accesses a variable that is going to be stored in the
958 // module instance through a call then that kernel needs to allocate the
979 // Lower zero cost accesses to the kernel instances just created
982 assert(funcs.size() == 1); // Only one kernel can access it
994 // The ith element of this vector is kernel id i
1014 // Strip amdgpu-no-lds-kernel-id from all functions reachable from the
1015 // kernel. We may have inferred this wasn't used prior to the pass.
1019 removeFnAttrFromReachable(CG, F, {"amdgpu-no-lds-kernel-id"});
1027 // All kernel frames have been allocated. Calculate and record the
1042 // kernel instance
1062 // kernel
1075 // variable in the kernel itself, so without including it here, that
1405 "Lower uses of LDS variables from non-kernel functions",
1409 "Lower uses of LDS variables from non-kernel functions",