History log of /llvm-project/flang/runtime/matmul.cpp (Results 1 – 15 of 15)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7
# fc97d2e6 18-Dec-2024 Peter Klausler <pklausler@nvidia.com>

[flang] Add UNSIGNED (#113504)

Implement the UNSIGNED extension type and operations under control of a
language feature flag (-funsigned).

This is nearly identical to the UNSIGNED feature that h

[flang] Add UNSIGNED (#113504)

Implement the UNSIGNED extension type and operations under control of a
language feature flag (-funsigned).

This is nearly identical to the UNSIGNED feature that has been available
in Sun Fortran for years, and now implemented in GNU Fortran for
gfortran 15, and proposed for ISO standardization in J3/24-116.txt.

See the new documentation for details; but in short, this is C's
unsigned type, with guaranteed modular arithmetic for +, -, and *, and
the related transformational intrinsic functions SUM & al.

show more ...


Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1
# 104f3c18 19-Sep-2024 Slava Zakharin <szakharin@nvidia.com>

Reland "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078)" (#109207)

`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change make

Reland "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078)" (#109207)

`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.

Additional change on top of #109078 is to use `cuda::std::complex`
only for the device compilation, otherwise the host compilation
fails because `libcudacxx` may not support `long double` specialization
at all (depending on the compiler).

show more ...


# 36192fdf 18-Sep-2024 Slava Zakharin <szakharin@nvidia.com>

Revert "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build." (#109173)

Reverts llvm/llvm-project#109078


# be187a68 18-Sep-2024 Slava Zakharin <szakharin@nvidia.com>

[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078)

`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std

[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078)

`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.

show more ...


Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# 8ce1aed5 04-Jul-2024 Slava Zakharin <szakharin@nvidia.com>

[flang] Lower MATMUL to type specific runtime calls. (#97547)

Lower MATMUL to the new runtime entries added in #97406.


# dd220853 03-Jul-2024 Slava Zakharin <szakharin@nvidia.com>

[flang][runtime] Split MATMUL[_TRANSPOSE] into separate entries. (#97406)

Device compilation is much faster for separate MATMUL[_TRANPOSE]
entries than for a single one that covers all data types.

[flang][runtime] Split MATMUL[_TRANSPOSE] into separate entries. (#97406)

Device compilation is much faster for separate MATMUL[_TRANPOSE]
entries than for a single one that covers all data types.
The lowering changes and the removal of the generic entries will follow.

show more ...


# e55aa027 27-Jun-2024 Pete Steinfeld <47540744+psteinfeld@users.noreply.github.com>

[flang] Fix runtime error messages for the MATMUL intrinsic (#96928)

There are three forms of MATMUL -- where the first argument is a rank 1
array, where the second argument is a rank 1 array, and

[flang] Fix runtime error messages for the MATMUL intrinsic (#96928)

There are three forms of MATMUL -- where the first argument is a rank 1
array, where the second argument is a rank 1 array, and where both
arguments are rank 2 arrays. There's code in the runtime that detects
when the array shapes are incorrect. But the code that emits an error
message assumes that both arguments are rank 2 arrays.

This change contains code for the other two cases.

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2
# 71e0261f 15-Mar-2024 Slava Zakharin <szakharin@nvidia.com>

[flang][runtime] Added Fortran::common::optional for use on device.

This is a simplified implementation of std::optional that can be used
in the offload builds for the device code. The methods are p

[flang][runtime] Added Fortran::common::optional for use on device.

This is a simplified implementation of std::optional that can be used
in the offload builds for the device code. The methods are properly
marked with RT_API_ATTRS so that the device compilation succedes.

Reviewers: klausler, jeanPerier

Reviewed By: jeanPerier

Pull Request: https://github.com/llvm/llvm-project/pull/85177

show more ...


Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 76facde3 28-Dec-2023 Slava Zakharin <szakharin@nvidia.com>

[flang][runtime] Enable more APIs in the offload build. (#76486)


# b4b23ff7 20-Dec-2023 Slava Zakharin <szakharin@nvidia.com>

[flang][runtime] Enable more APIs in the offload build. (#75996)

This patch enables more numeric (mod, sum, matmul, etc.) APIs,
and some others.

I added new macros to disable warnings about usin

[flang][runtime] Enable more APIs in the offload build. (#75996)

This patch enables more numeric (mod, sum, matmul, etc.) APIs,
and some others.

I added new macros to disable warnings about using C++ STD methods
like operators of std::complex, which do not have __device__ attribute.
This may probably result in unresolved references, if the header files
implementation relies on libstdc++. I will need to follow up on this.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4
# 4d977174 29-Aug-2023 Slava Zakharin <szakharin@nvidia.com>

[flang] Improved performance of runtime Matmul/MatmulTranspose.

This patch mostly affects performance of the code produced by
HLIFR lowering. If MATMUL argument is an array slice, then
HLFIR lowerin

[flang] Improved performance of runtime Matmul/MatmulTranspose.

This patch mostly affects performance of the code produced by
HLIFR lowering. If MATMUL argument is an array slice, then
HLFIR lowering passes the slice to the runtime, whereas
FIR lowering would create a contiguous temporary for the slice.
Performance might be better than the generic implementation
for cases where the leading dimension is contiguous.
This patch improves CPU2000/178.galgel making HLFIR version
faster than FIR version (due to avoiding the temporary copies
for MATMUL arguments).

Reviewed By: klausler

Differential Revision: https://reviews.llvm.org/D159134

show more ...


Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3
# f5884fd9 12-Aug-2022 Peter Klausler <pklausler@nvidia.com>

[flang][runtime] Improve error message for incompatible MATMUL arguments

Print the full shapes of both argument when the dimensions that must
match do not do so.

Differential Revision: https://revi

[flang][runtime] Improve error message for incompatible MATMUL arguments

Print the full shapes of both argument when the dimensions that must
match do not do so.

Differential Revision: https://reviews.llvm.org/D132153

show more ...


Revision tags: llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# a5a493e1 19-Oct-2021 Peter Klausler <pklausler@nvidia.com>

[flang] Speed common runtime cases of DOT_PRODUCT & MATMUL

Look for contiguous numeric argument arrays at runtime and
use specialized code for them.

Differential Revision: https://reviews.llvm.org/

[flang] Speed common runtime cases of DOT_PRODUCT & MATMUL

Look for contiguous numeric argument arrays at runtime and
use specialized code for them.

Differential Revision: https://reviews.llvm.org/D112239

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 830c0b90 01-Sep-2021 Peter Klausler <pklausler@nvidia.com>

[flang] Move runtime API headers to flang/include/flang/Runtime

Move the closure of the subset of flang/runtime/*.h header files that
are referenced by source files outside flang/runtime (apart from

[flang] Move runtime API headers to flang/include/flang/Runtime

Move the closure of the subset of flang/runtime/*.h header files that
are referenced by source files outside flang/runtime (apart from unit tests)
into a new directory (flang/include/flang/Runtime) so that relative
include paths into ../runtime need not be used.

flang/runtime/pgmath.h.inc is moved to flang/include/flang/Evaluate;
it's not used by the runtime.

Differential Revision: https://reviews.llvm.org/D109107

show more ...


Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 5e1421b2 17-May-2021 peter klausler <pklausler@nvidia.com>

[flang] Implement MATMUL in the runtime

Define an API for the transformational intrinsic function MATMUL,
implement it, and add some basic unit tests. The large number of
possible argument type com

[flang] Implement MATMUL in the runtime

Define an API for the transformational intrinsic function MATMUL,
implement it, and add some basic unit tests. The large number of
possible argument type combinations are covered by a set of
generalized templates that are instantiated for each valid
pair of possible argument types.

Places where BLAS-2/3 routines could be called for acceleration
are marked with TODOs. Handling for other special cases (e.g.,
known-shape 3x3 matrices and vectors) are deferred.

Some minor tweaks were made to the recent related implementation
of DOT_PRODUCT to reflect lessons learned.

Differential Revision: https://reviews.llvm.org/D102652

show more ...