Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1 |
|
#
104f3c18 |
| 19-Sep-2024 |
Slava Zakharin <szakharin@nvidia.com> |
Reland "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078)" (#109207)
`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change make
Reland "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078)" (#109207)
`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.
Additional change on top of #109078 is to use `cuda::std::complex`
only for the device compilation, otherwise the host compilation
fails because `libcudacxx` may not support `long double` specialization
at all (depending on the compiler).
show more ...
|
#
36192fdf |
| 18-Sep-2024 |
Slava Zakharin <szakharin@nvidia.com> |
Revert "[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build." (#109173)
Reverts llvm/llvm-project#109078
|
#
be187a68 |
| 18-Sep-2024 |
Slava Zakharin <szakharin@nvidia.com> |
[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078)
`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std
[flang][runtime] Use cuda::std::complex in F18 runtime CUDA build. (#109078)
`std::complex` operators do not work for the CUDA device compilation
of F18 runtime. This change makes use of `cuda::std::complex` from
`libcudacxx`.
`cuda::std::complex` does not have specializations for `long double`,
so the change is accompanied with a clean-up for `long double` usage.
show more ...
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
8ce1aed5 |
| 04-Jul-2024 |
Slava Zakharin <szakharin@nvidia.com> |
[flang] Lower MATMUL to type specific runtime calls. (#97547)
Lower MATMUL to the new runtime entries added in #97406.
|
#
dd220853 |
| 03-Jul-2024 |
Slava Zakharin <szakharin@nvidia.com> |
[flang][runtime] Split MATMUL[_TRANSPOSE] into separate entries. (#97406)
Device compilation is much faster for separate MATMUL[_TRANPOSE]
entries than for a single one that covers all data types.
[flang][runtime] Split MATMUL[_TRANSPOSE] into separate entries. (#97406)
Device compilation is much faster for separate MATMUL[_TRANPOSE]
entries than for a single one that covers all data types.
The lowering changes and the removal of the generic entries will follow.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2 |
|
#
71e0261f |
| 15-Mar-2024 |
Slava Zakharin <szakharin@nvidia.com> |
[flang][runtime] Added Fortran::common::optional for use on device.
This is a simplified implementation of std::optional that can be used in the offload builds for the device code. The methods are p
[flang][runtime] Added Fortran::common::optional for use on device.
This is a simplified implementation of std::optional that can be used in the offload builds for the device code. The methods are properly marked with RT_API_ATTRS so that the device compilation succedes.
Reviewers: klausler, jeanPerier
Reviewed By: jeanPerier
Pull Request: https://github.com/llvm/llvm-project/pull/85177
show more ...
|
Revision tags: llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
76facde3 |
| 28-Dec-2023 |
Slava Zakharin <szakharin@nvidia.com> |
[flang][runtime] Enable more APIs in the offload build. (#76486)
|
#
b4b23ff7 |
| 20-Dec-2023 |
Slava Zakharin <szakharin@nvidia.com> |
[flang][runtime] Enable more APIs in the offload build. (#75996)
This patch enables more numeric (mod, sum, matmul, etc.) APIs,
and some others.
I added new macros to disable warnings about usin
[flang][runtime] Enable more APIs in the offload build. (#75996)
This patch enables more numeric (mod, sum, matmul, etc.) APIs,
and some others.
I added new macros to disable warnings about using C++ STD methods
like operators of std::complex, which do not have __device__ attribute.
This may probably result in unresolved references, if the header files
implementation relies on libstdc++. I will need to follow up on this.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
#
4d977174 |
| 29-Aug-2023 |
Slava Zakharin <szakharin@nvidia.com> |
[flang] Improved performance of runtime Matmul/MatmulTranspose.
This patch mostly affects performance of the code produced by HLIFR lowering. If MATMUL argument is an array slice, then HLFIR lowerin
[flang] Improved performance of runtime Matmul/MatmulTranspose.
This patch mostly affects performance of the code produced by HLIFR lowering. If MATMUL argument is an array slice, then HLFIR lowering passes the slice to the runtime, whereas FIR lowering would create a contiguous temporary for the slice. Performance might be better than the generic implementation for cases where the leading dimension is contiguous. This patch improves CPU2000/178.galgel making HLFIR version faster than FIR version (due to avoiding the temporary copies for MATMUL arguments).
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D159134
show more ...
|
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
#
ea7d6a1b |
| 21-Jul-2023 |
Slava Zakharin <szakharin@nvidia.com> |
[NFC][flang] Distinguish MATMUL and MATMUL-TRANSPOSE printouts.
When MatmulTranpose reports incorrect shapes of the arguments it cannot represent itself as MATMUL, because the reading of the first a
[NFC][flang] Distinguish MATMUL and MATMUL-TRANSPOSE printouts.
When MatmulTranpose reports incorrect shapes of the arguments it cannot represent itself as MATMUL, because the reading of the first argument's shape will be confusing.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D155911
show more ...
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0 |
|
#
4ff8ba72 |
| 17-Mar-2023 |
Tom Eccles <tom.eccles@arm.com> |
[flang] add fused matmul-transpose to the runtime
This fused operation should run a lot faster than first transposing the lhs array and then multiplying the matrices separately.
Based on flang/runt
[flang] add fused matmul-transpose to the runtime
This fused operation should run a lot faster than first transposing the lhs array and then multiplying the matrices separately.
Based on flang/runtime/matmul.cpp
Depends on D145959
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D145960
show more ...
|