#
8ce1aed5 |
| 04-Jul-2024 |
Slava Zakharin <szakharin@nvidia.com> |
[flang] Lower MATMUL to type specific runtime calls. (#97547)
Lower MATMUL to the new runtime entries added in #97406.
|
#
dd220853 |
| 03-Jul-2024 |
Slava Zakharin <szakharin@nvidia.com> |
[flang][runtime] Split MATMUL[_TRANSPOSE] into separate entries. (#97406)
Device compilation is much faster for separate MATMUL[_TRANPOSE]
entries than for a single one that covers all data types.
[flang][runtime] Split MATMUL[_TRANSPOSE] into separate entries. (#97406)
Device compilation is much faster for separate MATMUL[_TRANPOSE]
entries than for a single one that covers all data types.
The lowering changes and the removal of the generic entries will follow.
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
a599a612 |
| 06-Dec-2023 |
kkwli <kkwli@users.noreply.github.com> |
[flang] match the actual data size with the KIND (NFC) (#73179)
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2 |
|
#
ffc67bb3 |
| 02-Oct-2023 |
David Spickett <david.spickett@linaro.org> |
Revert "[Flang] [FlangRT] Introduce FlangRT project as solution to Flang's runtime LLVM integration"
This reverts commit 6403287eff71a3d6f6c862346d6ed3f0f000eb70.
This is failing on all but 1 of Li
Revert "[Flang] [FlangRT] Introduce FlangRT project as solution to Flang's runtime LLVM integration"
This reverts commit 6403287eff71a3d6f6c862346d6ed3f0f000eb70.
This is failing on all but 1 of Linaro's flang builders. CMake Error at /home/tcwg-buildbot/worker/clang-aarch64-full-2stage/llvm/flang-rt/unittests/CMakeLists.txt:37 (message): Target llvm_gtest not found.
show more ...
|
Revision tags: llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
#
4d977174 |
| 29-Aug-2023 |
Slava Zakharin <szakharin@nvidia.com> |
[flang] Improved performance of runtime Matmul/MatmulTranspose.
This patch mostly affects performance of the code produced by HLIFR lowering. If MATMUL argument is an array slice, then HLFIR lowerin
[flang] Improved performance of runtime Matmul/MatmulTranspose.
This patch mostly affects performance of the code produced by HLIFR lowering. If MATMUL argument is an array slice, then HLFIR lowering passes the slice to the runtime, whereas FIR lowering would create a contiguous temporary for the slice. Performance might be better than the generic implementation for cases where the leading dimension is contiguous. This patch improves CPU2000/178.galgel making HLFIR version faster than FIR version (due to avoiding the temporary copies for MATMUL arguments).
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D159134
show more ...
|
Revision tags: llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0 |
|
#
4ff8ba72 |
| 17-Mar-2023 |
Tom Eccles <tom.eccles@arm.com> |
[flang] add fused matmul-transpose to the runtime
This fused operation should run a lot faster than first transposing the lhs array and then multiplying the matrices separately.
Based on flang/runt
[flang] add fused matmul-transpose to the runtime
This fused operation should run a lot faster than first transposing the lhs array and then multiplying the matrices separately.
Based on flang/runtime/matmul.cpp
Depends on D145959
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D145960
show more ...
|