History log of /llvm-project/llvm/test/CodeGen/NVPTX/cp-async-bulk.ll (Results 1 – 2 of 2)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# fa7f0e58 23-Jan-2025 Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>

[NVPTX] Add Bulk Copy Prefetch Intrinsics (#123226)

This patch adds NVVM intrinsics and NVPTX codegen for:

- cp.async.bulk.prefetch.L2.* variants
- These intrinsics optionally support cache_hin

[NVPTX] Add Bulk Copy Prefetch Intrinsics (#123226)

This patch adds NVVM intrinsics and NVPTX codegen for:

- cp.async.bulk.prefetch.L2.* variants
- These intrinsics optionally support cache_hints as indicated by the
boolean flag argument.
- Lit tests are added for all combinations of these intrinsics in
cp-async-bulk.ll.
- The generated PTX is verified with a 12.3 ptxas executable.
- Added docs for these intrinsics in NVPTXUsage.rst file.

PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-prefetch


Co-authored-by: abmajumder <abmajumder@nvidia.com>

show more ...


Revision tags: llvmorg-19.1.7
# 372044ee 10-Jan-2025 Durgadoss R <durgadossr@nvidia.com>

[NVPTX] Add TMA Bulk Copy intrinsics (#122344)

PR #96083 added intrinsics for async copy of 'tensor' data
using TMA. Following a similar design, this PR adds intrinsics
for async copy of bulk data

[NVPTX] Add TMA Bulk Copy intrinsics (#122344)

PR #96083 added intrinsics for async copy of 'tensor' data
using TMA. Following a similar design, this PR adds intrinsics
for async copy of bulk data (non-tensor variants) through TMA.

* These intrinsics optionally support multicast and cache_hints,
as indicated by the boolean arguments at the end of the intrinsics.
* The backend looks through these flag arguments and lowers to the
appropriate PTX instructions.
* Lit tests are added for all combinations of these intrinsics in
cp-async-bulk.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.

PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>

show more ...