LowerGpuOpsToROCDLOps.cpp - OpenGrok history log for /llvm-project/mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp

Revision	Date	Author	Comments
# 599c7399	06-Jan-2025	Matthias Springer <me@m-sp.org>	[mlir][GPU] Add NVVM-specific `cf.assert` lowering (#120431) This commit add an NVIDIA-specific lowering of `cf.assert` to to `__assertfail`. Note: `getUniqueFormatGlobalName`, `getOrCreateForma [mlir][GPU] Add NVVM-specific `cf.assert` lowering (#120431) This commit add an NVIDIA-specific lowering of `cf.assert` to to `__assertfail`. Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and `getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can be reused. show more ...
# 0e23cb0c	31-Dec-2024	Ivan Butygin <ivan.butygin@gmail.com>	[mlir][nfc] GpuToROCDL: Remove some dead code (#121403)
# 018b32ca	31-Dec-2024	Ivan Butygin <ivan.butygin@gmail.com>	Revert "[mlir][nfc] GpuToROCDL: Remove some dead code" (#121402) Reverts llvm/llvm-project#121395
# 0b08e095	31-Dec-2024	Ivan Butygin <ivan.butygin@gmail.com>	[mlir][nfc] GpuToROCDL: Remove some dead code (#121395)
# 09dfc571	20-Dec-2024	Jacques Pienaar <jpienaar@google.com>	[mlir] Enable decoupling two kinds of greedy behavior. (#104649) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracin [mlir] Enable decoupling two kinds of greedy behavior. (#104649) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's\|applyPatternsAndFoldGreedily\|applyPatternsGreedily\|g' {} \;` variety) as the API arguments hasn't changed between the two. show more ...
# 596bfb80	20-Nov-2024	Dragan Mladjenovic <Dragan.Mladjenovic@syrmia.com>	[MLIR][AMDGPU] Support gpu::ShuffleMode::DOWN lowering in ROCDL (#106237)
# 206fad0e	05-Oct-2024	Matthias Springer <me@m-sp.org>	[mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250) This commit marks the type converter in `populate...` functions as `const`. This is useful for debugging. Patterns [mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250) This commit marks the type converter in `populate...` functions as `const`. This is useful for debugging. Patterns already take a `const` type converter. However, some `populate...` functions do not only add new patterns, but also add additional type conversion rules. That makes it difficult to find the place where a type conversion was added in the code base. With this change, all `populate...` functions that only populate pattern now have a `const` type converter. Programmers can then conclude from the function signature that these functions do not register any new type conversion rules. Also some minor cleanups around the 1:N dialect conversion infrastructure, which did not always pass the type converter as a `const` object internally. show more ...
# 1c47fa9b	23-Sep-2024	Daniel Hernandez-Juarez <dhernandez0@gmail.com>	[mlir][AMDGPU] Add support for AMD f16 math library calls (#108809) In this PR we add support for AMD f16 math library calls (`__ocml_*_f16`) CC: @krzysz00 @manupak
# a16164d0	12-Sep-2024	Nirvedh Meshram <96096277+nirvedhmeshram@users.noreply.github.com>	[MLIR][ROCDL] Add dynamically legal ops to LowerGpuOpsToROCDLOpsPass (#108302) Similar to https://github.com/llvm/llvm-project/pull/108266 After https://github.com/llvm/llvm-project/pull/102971 It [MLIR][ROCDL] Add dynamically legal ops to LowerGpuOpsToROCDLOpsPass (#108302) Similar to https://github.com/llvm/llvm-project/pull/108266 After https://github.com/llvm/llvm-project/pull/102971 It is legal to generate `LLVM::ExpOp` and `LLVM::LogOp` if the type is is a float16 or float32 show more ...
# c31d3438	11-Sep-2024	Nirvedh Meshram <96096277+nirvedhmeshram@users.noreply.github.com>	Update legalizations for LowerGpuOpsToROCDLOps (#108266) LLVM::FAbsOp and LLVM::SqrtOp are legal after https://github.com/llvm/llvm-project/pull/102971
# 70302803	13-Aug-2024	Matthias Springer <me@m-sp.org>	[mlir][GPU] Improve `gpu.module` op implementation (#102866) - Replace hand-written parser/printer with auto-generated assembly format. - Remove implicit `gpu.module_end` terminator and use the `N [mlir][GPU] Improve `gpu.module` op implementation (#102866) - Replace hand-written parser/printer with auto-generated assembly format. - Remove implicit `gpu.module_end` terminator and use the `NoTerminator` trait instead. (Same as `builtin.module`.) - Turn the region into a graph region. (Same as `builtin.module`.) show more ...
# d45de800	09-Aug-2024	Victor Perez <victor.perez@codeplay.com>	[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (#101664) Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to `llvm.func` operations. - `spir_kernel`/`spir_func` calling conven [MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (#101664) Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to `llvm.func` operations. - `spir_kernel`/`spir_func` calling conventions used for kernels/functions. - `workgroup` attributions encoded as additional `llvm.ptr<3>` arguments. - No attribute used to annotate kernels - `reqd_work_group_size` attribute using to encode `gpu.known_block_size`. - `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution sizes. This will be attached to the pointer argument workgroup attributions lower to. Note: A notable missing feature that will be addressed in a follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com> show more ...
# 3fae5551	17-Jul-2024	Jan Leyonberg <jan_sjodin@yahoo.com>	[MLIR][ROCDL] Refactor conversion of math operations to ROCDL calls to a separate pass (#98653) This patch refactors the conversion of math operations to ROCDL library calls. This pass will also be [MLIR][ROCDL] Refactor conversion of math operations to ROCDL calls to a separate pass (#98653) This patch refactors the conversion of math operations to ROCDL library calls. This pass will also be used in flang to lower Fortran intrinsics/math functions for OpenMP target offloading codgen. show more ...
# 43fd4c49	18-Jun-2024	Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	[mlir][GPU] Improve handling of GPU bounds (#95166) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known [mlir][GPU] Improve handling of GPU bounds (#95166) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known_grid_size` become inherent attributes of GPU functions. This makes them less clunky to work with. As a consequence, the `gpu.func` lowering patterns now only look at the inherent attributes when setting target-specific attributes on the `llvm.func` that they lower to. 2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size` are made official dialect-level discardable attributes which can be placed on arbitrary functions. This allows for progressive lowerings (without this, a lowering for `gpu.thread_id` couldn't know about the bounds if it had already been moved from a `gpu.func` to an `llvm.func`) and allows for range information to be provided even when `gpu._{id,dim}` are being used outside of a `gpu.func` context. 3. All of these index operations have gained an optional `upper_bound` attribute, allowing for an alternate mode of operation where the bounds are specified locally and not inherited from the operation's context. These also allow handling of cases where the precise launch sizes aren't known, but can be bounded more precisely than the maximum of what any platform's API allows. (I'd like to thank @benvanik for pointing out that this could be useful.) When inferring bounds (either for range inference or for setting `range` during lowering) these sources of information are consulted in order of specificity (`upper_bound` > inherent attribute > discardable attribute, except that dimension sizes check for `known__bounds` to see if they can be constant-folded before checking their `upper_bound`). This patch also updates the documentation about the bounds and inference behavior to clarify what these attributes do when set and the consequences of setting them up incorrectly. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com> show more ...
# 94be8018	28-May-2024	stefankoncarevic <skoncare@amd.com>	[mlir][ROCDL] Update the LLVM data layout for ROCDL lowering. (#92127) This change updates the dataLayout string to ensure alignment with the latest LLVM TargetMachine configuration. The aim is to [mlir][ROCDL] Update the LLVM data layout for ROCDL lowering. (#92127) This change updates the dataLayout string to ensure alignment with the latest LLVM TargetMachine configuration. The aim is to maintain consistency and prevent potential compilation issues related to memory address space handling. show more ...
# 4cba5957	27-Feb-2024	Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	[mlir][ROCDL] Set the LLVM data layout when lowering to ROCDL LLVM (#74501) In order to ensure operations lower correctly (especially memref.addrspacecast, which relies on the data layout benig set [mlir][ROCDL] Set the LLVM data layout when lowering to ROCDL LLVM (#74501) In order to ensure operations lower correctly (especially memref.addrspacecast, which relies on the data layout benig set correctly then dealing with dynamic memrefs) and to prevent compilation issues later down the line, set the `llvm.data_layout` attribute on GPU modules when lowering their contents to a ROCDL / AMDGPU target. If there's a good way to test the embedded string to prevent it from going out of sync with the LLVM TargetMachine, I'd appreciate hearing about it. (Or, alternatively, if there's a place I could farctor the string out to). show more ...
# 45c226d4	20-Feb-2024	Mehdi Amini <joker.eph@gmail.com>	[MLIR] Add ODS support for generating helpers for dialect (discardable) attributes (#77024) This is a new ODS feature that allows dialects to define a list of key/value pair representing an attribu [MLIR] Add ODS support for generating helpers for dialect (discardable) attributes (#77024) This is a new ODS feature that allows dialects to define a list of key/value pair representing an attribute type and a name. This will generate helper classes on the dialect to be able to manage discardable attributes on operations in a type safe way. For example the `test` dialect can define: ``` let discardableAttrs = (ins "mlir::IntegerAttr":$discardable_attr_key, ); ``` And the following will be generated in the TestDialect class: ``` /// Helper to manage the discardable attribute `discardable_attr_key`. class DiscardableAttrKeyAttrHelper { ::mlir::StringAttr name; public: static constexpr ::llvm::StringLiteral getNameStr() { return "test.discardable_attr_key"; } constexpr ::mlir::StringAttr getName() { return name; } DiscardableAttrKeyAttrHelper(::mlir::MLIRContext ctx) : name(::mlir::StringAttr::get(ctx, getNameStr())) {} mlir::IntegerAttr getAttr(::mlir::Operation op) { return op->getAttrOfType<mlir::IntegerAttr>(name); } void setAttr(::mlir::Operation op, mlir::IntegerAttr val) { op->setAttr(name, val); } bool isAttrPresent(::mlir::Operation op) { return op->hasAttrOfType<mlir::IntegerAttr>(name); } void removeAttr(::mlir::Operation *op) { assert(op->hasAttrOfType<mlir::IntegerAttr>(name)); op->removeAttr(name); } }; DiscardableAttrKeyAttrHelper getDiscardableAttrKeyAttrHelper() { return discardableAttrKeyAttrName; } ``` User code having an instance of the TestDialect can then manipulate this attribute on operation using: ``` auto helper = testDialect.getDiscardableAttrKeyAttrHelper(); helper.setAttr(op, value); helper.isAttrPresent(op); ... ``` show more ...
# 65066c02	01-Feb-2024	Hugo Trachino <32955781+nujaa@users.noreply.github.com>	[mlir] Use `create` instead of `createOrFold` for ConstantOp as folding has no effect (NFC) (#80129) This aims to clean-up confusing uses of builder.createOrFold<ConstantOp> since folding of consta [mlir] Use `create` instead of `createOrFold` for ConstantOp as folding has no effect (NFC) (#80129) This aims to clean-up confusing uses of builder.createOrFold<ConstantOp> since folding of constants fails. show more ...
# 391a7577	05-Dec-2023	Guray Ozen <guray.ozen@gmail.com>	[mlir][gpu] Add lowering dynamic_shared_memory op for rocdl (#74473) This PR adds lowering of `gpu.dynamic_shared_memory` to rocdl target.
# 4279a642	01-Nov-2023	Christian Ulmann <christian.ulmann@nextsilicon.com>	[MLIR][GPUToROCDL] Remove typed pointer support (#70908) This commit removes the support for lowering GPU to ROCDL dialect with typed pointers. Typed pointers have been deprecated for a while now a [MLIR][GPUToROCDL] Remove typed pointer support (#70908) This commit removes the support for lowering GPU to ROCDL dialect with typed pointers. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502 show more ...
# baf2d135	04-Sep-2023	Adrian Kuegel <akuegel@google.com>	[mlir][GPUToROCDL] Lower arith.remf to GPU intrinsic. Differential Revision: https://reviews.llvm.org/D159423
# 18960960	24-Aug-2023	Stanley Winata <stanley@nod-labs.com>	[mlir][ROCM] Add Wave/Warp shuffle lowering and op for ROCM. Reduction is heavily used for many DL workload especially with softmax/Attention layers. Wave/Warp shuffle and reduction is known to be a [mlir][ROCM] Add Wave/Warp shuffle lowering and op for ROCM. Reduction is heavily used for many DL workload especially with softmax/Attention layers. Wave/Warp shuffle and reduction is known to be a speedy/efficient way to do these reductions. In this patch we introduce AMD shuffle intrinsic Ops to ROCDL, along with it's corresponding lowering from gpu.shuffle. This should speed up a lot of DL workloads on ROCM backend. Currently, we have support for xor and idx, which are the more common ones. In the future, we plan on adding support for Down and Up, as well as using the ds_swizzle to further enhance it's performance when width and offsets are constant. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D158684 show more ...
# 51b65d08	20-Jul-2023	Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	[mlir][AMDGPU] Improve BF16 handling through AMDGPU compilation Many previous sets of AMDGPU dialect code have been incorrect in the presence of the bf16 type (when lowered to LLVM's bfloat) as they [mlir][AMDGPU] Improve BF16 handling through AMDGPU compilation Many previous sets of AMDGPU dialect code have been incorrect in the presence of the bf16 type (when lowered to LLVM's bfloat) as they were developed in a setting that run a custom bf16-to-i16 pass before LLVM lowering. An overall effect of this patch is that you should run --arith-emulate-unsupported-floats="source-types=bf16 target-type=f32" on your GPU module before calling --convert-gpu-to-rocdl if your code performs bf16 arithmetic. While LLVM now supports software bfloat, initial experiments showed that using this support on AMDGPU inserted a large number of conversions around loads and stores which had substantial performance imparts. Furthermore, all of the native AMDGPU operations on bf16 types (like the WMMA operations) operate on 16-bit integers instead of the bfloat type. First, we make the following changes to preserve compatibility once the LLVM bfloat type is reenabled. 1. The matrix multiplication operations (MFMA and WMMA) will bitcast bfloat vectors to i16 vectors. 2. Buffer loads and stores will operate on the relevant integer datatype and then cast to bfloat if needed. Second, we add type conversions to convert bf16 and vectors of it to equivalent i16 types. Third, we add the bfloat <-> f32 expansion patterns to the set of operations run before the main LLVM conversion so that MLIR's implementation of these conversion routines is used. Finally, we extend the "floats treated as integers" support in the LLVM exporter to handle types other than fp8. We also fix a bug in the unsupported floats emulation where it tried to operate on `arith.bitcast` due to an oversight. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D156361 show more ...
# 888717e8	09-Aug-2023	Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>	[mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD This revision untangles a few more conversion pieces and allows rewriting the relatively intricate (and somewhat inconsisten [mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD This revision untangles a few more conversion pieces and allows rewriting the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass in a declarative fashion that provides a much better understanding and control. Differential Revision: https://reviews.llvm.org/D157617 show more ...
# cdf7ca6d	26-Jul-2023	SJW <swaters@amd.com>	[MLIR][ROCDL] Add conversion for gpu.lane_id to ROCDL Creates rocdl.lane_id op with llvm conversion to: __device__ static unsigned int __lane_id() { return __builtin_amdgcn_mbcnt_hi( [MLIR][ROCDL] Add conversion for gpu.lane_id to ROCDL Creates rocdl.lane_id op with llvm conversion to: __device__ static unsigned int __lane_id() { return __builtin_amdgcn_mbcnt_hi( -1, __builtin_amdgcn_mbcnt_lo(-1, 0)); } Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D154666 show more ...
12 3 4 5