History log of /llvm-project/mlir/lib/Conversion/ArithToAMDGPU/ArithToAMDGPU.cpp (Results 1 – 12 of 12)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# 7a77f14c 20-Jan-2025 Matthias Springer <me@m-sp.org>

[mlir][IR] Remove `isF...()` type API for low-precision FP types (#123326)

Remove `type.isFloat4E2M1FN()` etc. Use `isa<Float4E2M1FNType>(type)`
instead.

For details, see:
https://discourse.llv

[mlir][IR] Remove `isF...()` type API for low-precision FP types (#123326)

Remove `type.isFloat4E2M1FN()` etc. Use `isa<Float4E2M1FNType>(type)`
instead.

For details, see:
https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361/28

show more ...


Revision tags: llvmorg-19.1.7
# 09dfc571 20-Dec-2024 Jacques Pienaar <jpienaar@google.com>

[mlir] Enable decoupling two kinds of greedy behavior. (#104649)

The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracin

[mlir] Enable decoupling two kinds of greedy behavior. (#104649)

The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.

These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.

Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.

For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.

show more ...


Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# 8e663039 13-Nov-2024 Kunwar Grover <groverkss@gmail.com>

[mlir][Vector] Remove trivial uses of vector.extractelement/vector.insertelement (1/N) (#116053)

This patch removes trivial usages of
vector.extractelement/vector.insertelement. These operations ca

[mlir][Vector] Remove trivial uses of vector.extractelement/vector.insertelement (1/N) (#116053)

This patch removes trivial usages of
vector.extractelement/vector.insertelement. These operations can be
fully represented by vector.extract/vector.insert. See
https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116
for more information.

Further patches will remove more usages of these ops.

show more ...


Revision tags: llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0
# 763bc924 09-Sep-2024 Jakub Kuderski <jakub@nod-labs.com>

[mlir][amdgpu] Align Chipset with TargetParser (#107720)

Update the Chipset struct to follow the `IsaVersion` definition from
llvm's `TargetParser`. This is a follow up to
https://github.com/llvm/

[mlir][amdgpu] Align Chipset with TargetParser (#107720)

Update the Chipset struct to follow the `IsaVersion` definition from
llvm's `TargetParser`. This is a follow up to
https://github.com/llvm/llvm-project/pull/106169#discussion_r1733955012.

* Add the stepping version. Note: This may break downstream code that
compares against the minor version directly.
* Use comparisons with full Chipset version where possible.

Note that we can't use the code in `TargetParser` directly because the
chipset utility is outside of `mlir/Target` that re-exports llvm's
target library.

show more ...


Revision tags: llvmorg-19.1.0-rc4
# 1387ba48 26-Aug-2024 Giuseppe Rossini <giuseppe.rossini@amd.com>

[MLIR][AMDGPU] Introduce fp16 packed arithmetic (#105688)

This PR is introducing rocdl.cvt.pkrtz in the ROCDL dialect and it is
using that instruction when lowering `arith::TruncFOp`.


Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# f35318e8 09-Jul-2024 Rob Suderman <rob.suderman@gmail.com>

[mlir][amdgpu] Add support for multi-dim arith.truncf/extf fp8 lowering (#98074)

The existing `fp8` lowering from `arith` to `amdgpu` bails out on the
multidimensional case. We can handle this by `

[mlir][amdgpu] Add support for multi-dim arith.truncf/extf fp8 lowering (#98074)

The existing `fp8` lowering from `arith` to `amdgpu` bails out on the
multidimensional case. We can handle this by `vector.shape_cast`
collapsing to the 1-D case on extraction and re-casting back to the
desired output shape.

show more ...


# db791b27 02-Jul-2024 Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com>

mlir/LogicalResult: move into llvm (#97309)

This patch is part of a project to move the Presburger library into
LLVM.


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5
# a5757c5b 19-Apr-2024 Christian Sigg <chsigg@users.noreply.github.com>

Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356)

This change cleans up call sites. Next step is to mark the member
functions deprecated.

See https://mlir.llvm.org/

Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356)

This change cleans up call sites. Next step is to mark the member
functions deprecated.

See https://mlir.llvm.org/deprecation and
https://discourse.llvm.org/t/preferred-casting-style-going-forward.

show more ...


Revision tags: llvmorg-18.1.4, llvmorg-18.1.3
# 8827ff92 01-Apr-2024 Victor Perez <victor.perez@codeplay.com>

[MLIR][Arith] Add rounding mode attribute to `truncf` (#86152)

Add rounding mode attribute to `arith`. This attribute can be used in
different FP `arith` operations to control rounding mode. Roundi

[MLIR][Arith] Add rounding mode attribute to `truncf` (#86152)

Add rounding mode attribute to `arith`. This attribute can be used in
different FP `arith` operations to control rounding mode. Rounding modes
correspond to IEEE 754-specified rounding modes. Use in `arith.truncf` folding.

As this is not supported in dialects other than LLVM, conversion should fail for
now in case this attribute is present.

---------

Signed-off-by: Victor Perez <victor.perez@codeplay.com>

show more ...


Revision tags: llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2
# 65066c02 01-Feb-2024 Hugo Trachino <32955781+nujaa@users.noreply.github.com>

[mlir] Use `create` instead of `createOrFold` for ConstantOp as folding has no effect (NFC) (#80129)

This aims to clean-up confusing uses of
builder.createOrFold<ConstantOp> since folding of consta

[mlir] Use `create` instead of `createOrFold` for ConstantOp as folding has no effect (NFC) (#80129)

This aims to clean-up confusing uses of
builder.createOrFold<ConstantOp> since folding of constants fails.

show more ...


Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init
# 750e90e4 23-Jan-2024 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[mlir][ArithToAMDGPU] Add option for saturating truncation to fp8 (#74153)

Many machine-learning applications (and most software written at AMD)
expect the operation that truncates floats to 8-bit

[mlir][ArithToAMDGPU] Add option for saturating truncation to fp8 (#74153)

Many machine-learning applications (and most software written at AMD)
expect the operation that truncates floats to 8-bit floats to be
saturatinng. That is, they expect `truncf 256.0 : f32 to f8E4M3FNUZ` to
yield `240.0`, not `NaN`, and similarly for negative numbers. However,
the underlying hardware instruction that can be used for this truncation
implements overflow-to-NaN semantics.

To enable handling this usecase, we add the saturate-fp8-truncf option
to ArithToAMDGPU (off by default), which causes the requisite clamping
code to be emitted. Said clamping code ensures that Inf and NaN are
passed through exactly (and thus trancate to NaN).

Per review feedback, this commit efactors
createScalarOrSplatConstant() to the Arith dialect utilities and uses
it in this code. It also fixes naming of existing patterns and
switches from vector.extractelement/insertelement to
vector.extract/insert.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4
# 2ebd633f 12-May-2023 Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>

[mlir][AMDGPU] Add packed 8-bit float conversion ops and lowering

Define operations that wrap the gfx940's new operations for converting
between f32 and registers containing packed sets of four 8-bi

[mlir][AMDGPU] Add packed 8-bit float conversion ops and lowering

Define operations that wrap the gfx940's new operations for converting
between f32 and registers containing packed sets of four 8-bit floats.

Define rocdl operations for the intrinsics and an AMDGPU dialect
wrapper around them (to account for the fact that MLIR distinguishes
the two float formats at the type level but that the LLVM IR does
not).

Define an ArithToAMDGPU pass, meant to run before conversion to LLVM,
that replaces relevant calls to arith.extf and arith.truncf with the
packed operations in the AMDGPU dialect. Note that the conversion
currently only handles scalars and vectors of rank <= 1, as we do not
have a usecase for multi-dimensional vector support right now.

Reviewed By: jsjodin

Differential Revision: https://reviews.llvm.org/D152457

show more ...