NVVMDialect.cpp - OpenGrok history log for /llvm-project/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp

Revision	Date	Author	Comments
# aa295216	29-Jan-2025	Jay Foad <jay.foad@amd.com>	Fix typo "tranpose" (#124929)
# 2e6cc79f	23-Jan-2025	Durgadoss R <durgadossr@nvidia.com>	[MLIR][NVVM] Migrate CpAsyncOp to intrinsics (#123789) Intrinsics are available for the 'cpSize' variants also. So, this patch migrates the Op to lower to the intrinsics for all cases. * Update [MLIR][NVVM] Migrate CpAsyncOp to intrinsics (#123789) Intrinsics are available for the 'cpSize' variants also. So, this patch migrates the Op to lower to the intrinsics for all cases. * Update the existing tests to check the lowering to intrinsics. * Add newer cp_async_zfill tests to verify the lowering for the 'cpSize' variants. * Tidy-up CHECK lines in cp_async() function in nvvmir.mlir (NFC) PTX spec link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async Signed-off-by: Durgadoss R <durgadossr@nvidia.com> show more ...
# 6dcb2a09	17-Jan-2025	Durgadoss R <durgadossr@nvidia.com>	[MLIR][NVVM] Add Float to TF32 conversion Op (#123199) PR #121507 added 'cvt' intrinsics to convert float to tf32, with the valid set of rounding and saturation modes. This PR adds an NVVM Dialect [MLIR][NVVM] Add Float to TF32 conversion Op (#123199) PR #121507 added 'cvt' intrinsics to convert float to tf32, with the valid set of rounding and saturation modes. This PR adds an NVVM Dialect Op for the same. * lit tests are added to verify the lowering to intrinsics. * Negative tests are also added to check the error-handling of invalid combinations. Signed-off-by: Durgadoss R <durgadossr@nvidia.com> show more ...
# d03f35f9	13-Jan-2025	xiaoleis-nv <99947620+xiaoleis-nv@users.noreply.github.com>	[MLIR][NVVM] Fix the datatype error for nvvm.mma.sync when the operand is bf16 (#122664) The PR fixes the datatype error for `nvvm.mma.sync` when the operand is `bf16`. This operation originally re [MLIR][NVVM] Fix the datatype error for nvvm.mma.sync when the operand is bf16 (#122664) The PR fixes the datatype error for `nvvm.mma.sync` when the operand is `bf16`. This operation originally requires the A/B type to be `f16x2` for the `bf16` MMA. However, it violates the NVVM intrinsic [[here](https://github.com/xiaoleis-nv/llvm-project/blob/372044ee09d39942925824f8f335aef40bfe92f0/llvm/include/llvm/IR/IntrinsicsNVVM.td#L119)], where the A/B operand type should be `i32`. This is a bug, and there are no tests in MLIR that cover this datatype. ``` // mma bf16 -> s32 @ m16n8k16/m16n8k8 !eq(gft,"m16n8k16:a:bf16") : !listsplat(llvm_i32_ty, 4), !eq(gft,"m16n8k16:b:bf16") : !listsplat(llvm_i32_ty, 2), !eq(gft,"m16n8k8:a:bf16") : !listsplat(llvm_i32_ty, 2), !eq(gft,"m16n8k8:b:bf16") : [llvm_i32_ty], ``` This PR addresses this bug and adds tests to guarantee correctness. Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com> show more ...
# 66e41a1a	10-Jan-2025	Guray Ozen <guray.ozen@gmail.com>	[MLIR][NVVM] Declare InferIntRangeInterface for RangeableRegisterOp (#122263)
# b4b819ce	11-Dec-2024	Durgadoss R <durgadossr@nvidia.com>	[MLIR][NVVM] Add Op for TMA Store with reduction (#118853) PR #116854 adds intrinsics for TMA Store with reduction. This patch adds an NVVM Dialect Op for the same. * Lit tests are added to verify [MLIR][NVVM] Add Op for TMA Store with reduction (#118853) PR #116854 adds intrinsics for TMA Store with reduction. This patch adds an NVVM Dialect Op for the same. * Lit tests are added to verify the lowering to LLVM intrinsics and invalid cases. * The common verifier method is updated to handle im2col modes without offsets. This helps Ops like TMA Store, TMA StoreReduce etc. * The nvvmir.mlir test file is already large. So, this patch adds the tests for this Op in a new file under a separate "nvvm/" directory. [mlir/test/Target/LLVMIR/"nvvm"/tma_store_reduce.mlir] PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-reduce-async-bulk-tensor Signed-off-by: Durgadoss R <durgadossr@nvidia.com> show more ...
# 81055ff0	20-Nov-2024	arthurqiu <arthurq@nvidia.com>	[mlir][nvvm] Add attributes for cluster dimension PTX directives (#116973) PTX programming models provides cluster dimension directives, which are leveraged by the downstream `ptxas` compiler. See [mlir][nvvm] Add attributes for cluster dimension PTX directives (#116973) PTX programming models provides cluster dimension directives, which are leveraged by the downstream `ptxas` compiler. See https://docs.nvidia.com/cuda/nvvm-ir-spec/#supported-properties and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-dimension-directives This PR introduces the cluster dimension directives to MLIR's NVVM dialect as listed below: ``` cluster_dim_{x,y,z} -> exact number of CTAs per cluster cluster_max_blocks -> max number of CTAs per cluster ``` show more ...
# 1b23ebe0	15-Nov-2024	Durgadoss R <durgadossr@nvidia.com>	[MLIR][NVVM] Add Op for TMA Prefetch (#116232) PR #115527 adds intrinsics for TMA prefetch. This patch adds an NVVM Dialect Op for the same. Lit tests to verify the lowering to LLVM intrinsics a [MLIR][NVVM] Add Op for TMA Prefetch (#116232) PR #115527 adds intrinsics for TMA prefetch. This patch adds an NVVM Dialect Op for the same. Lit tests to verify the lowering to LLVM intrinsics as well as verifier tests (for invalid cases) are added. PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-prefetch-tensor Signed-off-by: Durgadoss R <durgadossr@nvidia.com> show more ...
# 884221ed	16-Sep-2024	JOE1994 <joseph942010@gmail.com>	[mlir] Tidy uses of llvm::raw_stream_ostream (NFC) As specified in the docs, 1) raw_string_ostream is always unbuffered and 2) the underlying buffer may be used directly ( 65b13610a5226b84889b923ba [mlir] Tidy uses of llvm::raw_stream_ostream (NFC) As specified in the docs, 1) raw_string_ostream is always unbuffered and 2) the underlying buffer may be used directly ( 65b13610a5226b84889b923bae884ba395ad084d for further reference ) * Don't call raw_string_ostream::flush(), which is essentially a no-op. * Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection. show more ...
# ece65660	30-Aug-2024	Pradeep Kumar <pradeepku@nvidia.com>	[MLIR][NVVM] Add support for fence.proxy.{acquire, release} Ops (#106689)
# 948862b2	04-Jul-2024	bangyu shen <94283495+shubaoyu2@users.noreply.github.com>	[mlir][nvvm] Fix the verifier of `wgmma.mma_async` wrt transposed layouts (#97538) the WGMMA expect layouts for A/B are row/col, the transposed version should be col/row. when checking other datat [mlir][nvvm] Fix the verifier of `wgmma.mma_async` wrt transposed layouts (#97538) the WGMMA expect layouts for A/B are row/col, the transposed version should be col/row. when checking other datatypes cannot use transposed layout, it should reject col-major for A and row-major for B show more ...
# db791b27	02-Jul-2024	Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com>	mlir/LogicalResult: move into llvm (#97309) This patch is part of a project to move the Presburger library into LLVM.
# 0cc3fe46	28-Jun-2024	Jeff Niu <jeff@modular.com>	Revert "[mlir][NVVM] Disallow results on kernel functions (#96399)" (#97074) NVVM IR itself doesn't place any restriction that a function annotated as `nvvm.kernel` actually has no results, so this Revert "[mlir][NVVM] Disallow results on kernel functions (#96399)" (#97074) NVVM IR itself doesn't place any restriction that a function annotated as `nvvm.kernel` actually has no results, so this is a mismatch at the NVVMDialect level and NVVMIR. The GPU dialect might enforce that kernel functions have no results, but it doesn't make sense to propagate this constraint downstream. Reverts llvm/llvm-project#96399 show more ...
# 346c4a88	23-Jun-2024	Matthias Springer <me@m-sp.org>	[mlir][NVVM] Disallow results on kernel functions (#96399) Functions that have the `nvvm.kernel` attribute should have 0 results.
# 35d55f28	27-Mar-2024	Justin Fargnoli <justinfargnoli@gmail.com>	[NFC][mlir] Reorder `declarePromisedInterface()` operands (#86628) Reorder the template operands of `declarePromisedInterface()` to match `declarePromisedInterfaces()`.
# 8819f879	19-Mar-2024	Guray Ozen <guray.ozen@gmail.com>	[MLIR][NVVM] Add barrier.arrive (#85412) PR adds `nvvm.barrier.arrive` Op. It is useful op for producer consumer modeling.
# b5d694ba	14-Feb-2024	Guray Ozen <guray.ozen@gmail.com>	[mlir][nvvm] Introduce `nvvm.barrier` OP (#81487) This PR that introduces the `nvvm.barrier` OP to the NVVM dialect. Currently, NVVM only supports the `nvvm.barrier0`, which synchronizes all threa [mlir][nvvm] Introduce `nvvm.barrier` OP (#81487) This PR that introduces the `nvvm.barrier` OP to the NVVM dialect. Currently, NVVM only supports the `nvvm.barrier0`, which synchronizes all threads using barrier resource 0. The new `nvvm.barrier` has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model. I think we can deprecate `nvvm.barrier0` in favor of the more generic `nvvm.barrier`. ``` // Equivalent to nvvm.barrier0 (or __syncthreads() in CUDA) nvvm.barrier // Synchronize all threads using the 3rd barrier resource. nvvm.barrier id = 3 // Synchronize %numberOfThreads threads using the 3rd barrier resource. nvvm.barrier id = 3 number_of_threads = %numberOfThreads ``` show more ...
# fa6850a9	12-Feb-2024	Rishi Surendran <142182875+rishisurendran@users.noreply.github.com>	[mlir][nvvm]Add support for grid_constant attribute on LLVM function arguments (#78228) Add support for attribute nvvm.grid_constant on LLVM function arguments. The attribute can be attached only t [mlir][nvvm]Add support for grid_constant attribute on LLVM function arguments (#78228) Add support for attribute nvvm.grid_constant on LLVM function arguments. The attribute can be attached only to arguments of type llvm.ptr that have llvm.byval attribute. Generate LLVM metadata for functions with nvvm.grid_constant arguments. The metadata node is a list of integers, where each integer n denotes that the nth parameter has the grid_constant annotation (numbering from 1). The generated metadata node will be handled by NVVM compiler. See https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#supported-properties for documentation on grid_constant property. This patch also adds convertParameterAttr to LLVMTranslationDialectInterface for supporting the translation of derived dialect attributes on function parameters show more ...
# 12c241b3	22-Jan-2024	Guray Ozen <guray.ozen@gmail.com>	[MLIR][NVVM] Explicit Data Type for Output in `wgmma.mma_async` (#78713) The current implementation of `nvvm.wgmma.mma_async` Op deduces the data type of the output matrix from the data type of str [MLIR][NVVM] Explicit Data Type for Output in `wgmma.mma_async` (#78713) The current implementation of `nvvm.wgmma.mma_async` Op deduces the data type of the output matrix from the data type of struct member, which can be non-intuitive, especially in cases where types like `2xf16` are packed into `i32`. This PR addresses this issue by improving the Op to include an explicit data type for the output matrix. The modified Op now includes an explicit data type for Matrix-D (<f16>), and looks as follows: ``` %result = llvm.mlir.undef : !llvm.struct<(struct<(i32, i32, ... nvvm.wgmma.mma_async %descA, %descB, %result, #nvvm.shape<m = 64, n = 32, k = 16>, D [<f16>, #nvvm.wgmma_scale_out<zero>], A [<f16>, #nvvm.wgmma_scale_in<neg>, <col>], B [<f16>, #nvvm.wgmma_scale_in<neg>, <col>] ``` show more ...
# 2aec7083	09-Jan-2024	Guray Ozen <guray.ozen@gmail.com>	[mlir][gpu] Use DenseI32Array for NVVM's maxntid and reqntid (NFC) (#77466)
# 85b23271	22-Dec-2023	Adam Paszke <apaszke@google.com>	[mlir][nvvm] Fix the PTX lowering of wgmma.mma_async (#76150)
# 80ff67be	04-Dec-2023	Guray Ozen <guray.ozen@gmail.com>	[mlir][nvvm] Introduce `nvvm.fence.proxy` (#74057) This PR introduce `nvvm.fence.proxy` OP for the following cases: ``` nvvm.fence.proxy { kind = #nvvm.proxy_kind<alias>} nvvm.fence.proxy { kin [mlir][nvvm] Introduce `nvvm.fence.proxy` (#74057) This PR introduce `nvvm.fence.proxy` OP for the following cases: ``` nvvm.fence.proxy { kind = #nvvm.proxy_kind<alias>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.global>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.shared>, space = #nvvm.shared_space<cta>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.shared>, space = #nvvm.shared_space<cluster>} ``` show more ...
# 68433f6b	29-Nov-2023	Guray Ozen <guray.ozen@gmail.com>	[mlir][nvvm] Introduce `setmaxregister.sync.aligned` Op (#73780) This PR introduce `setmaxregister.sync.aligned` Op to increase or decrease the register size. https://docs.nvidia.com/cuda/para [mlir][nvvm] Introduce `setmaxregister.sync.aligned` Op (#73780) This PR introduce `setmaxregister.sync.aligned` Op to increase or decrease the register size. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg show more ...
# 9ceea088	22-Nov-2023	Guray Ozen <guray.ozen@gmail.com>	[mlir] `im2col` & `l2cache` on cp.async.bulk.tensor.shared.cluster.global` (#72967) PR adds support of `im2col` and `l2cache` to `cp.async.bulk.tensor.shared.cluster.global`. The Op is now supports [mlir] `im2col` & `l2cache` on cp.async.bulk.tensor.shared.cluster.global` (#72967) PR adds support of `im2col` and `l2cache` to `cp.async.bulk.tensor.shared.cluster.global`. The Op is now supports all the traits of the corresponding PTX instruction. The current structure of this operation looks somewhat like below. The PR also simplifies types so we don't need to write obvious types after `:` anymore. ``` nvvm.cp.async.bulk.tensor.shared.cluster.global %dest, %tmaDescriptor, %barrier, box[%crd0,%crd1,%crd2,%crd3,%crd4] im2col[%off0,%off1,%off2] <-- PR introduces multicast_mask = %ctamask l2_cache_hint = %cacheHint <-- PR introduces : !llvm.ptr<3>, !llvm.ptr ``` show more ...
# 5316d19e	19-Oct-2023	Guray Ozen <guray.ozen@gmail.com>	[mlir][nvvm] Introduce `nvvm.stmatrix` Op (#69467) This PR adds `nvvm.stmatrix` Op to NVVM dialect. The Op collectively store one or more matrices across all threads in a warp to the given address [mlir][nvvm] Introduce `nvvm.stmatrix` Op (#69467) This PR adds `nvvm.stmatrix` Op to NVVM dialect. The Op collectively store one or more matrices across all threads in a warp to the given address location in shared memory. show more ...
12 3 4 5