| #
fa7f0e58 |
| 23-Jan-2025 |
Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> |
[NVPTX] Add Bulk Copy Prefetch Intrinsics (#123226)
This patch adds NVVM intrinsics and NVPTX codegen for:
- cp.async.bulk.prefetch.L2.* variants
- These intrinsics optionally support cache_hin
[NVPTX] Add Bulk Copy Prefetch Intrinsics (#123226)
This patch adds NVVM intrinsics and NVPTX codegen for:
- cp.async.bulk.prefetch.L2.* variants
- These intrinsics optionally support cache_hints as indicated by the
boolean flag argument.
- Lit tests are added for all combinations of these intrinsics in
cp-async-bulk.ll.
- The generated PTX is verified with a 12.3 ptxas executable.
- Added docs for these intrinsics in NVPTXUsage.rst file.
PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-prefetch
Co-authored-by: abmajumder <abmajumder@nvidia.com>
show more ...
|
| #
372044ee |
| 10-Jan-2025 |
Durgadoss R <durgadossr@nvidia.com> |
[NVPTX] Add TMA Bulk Copy intrinsics (#122344)
PR #96083 added intrinsics for async copy of 'tensor' data
using TMA. Following a similar design, this PR adds intrinsics
for async copy of bulk data
[NVPTX] Add TMA Bulk Copy intrinsics (#122344)
PR #96083 added intrinsics for async copy of 'tensor' data
using TMA. Following a similar design, this PR adds intrinsics
for async copy of bulk data (non-tensor variants) through TMA.
* These intrinsics optionally support multicast and cache_hints,
as indicated by the boolean arguments at the end of the intrinsics.
* The backend looks through these flag arguments and lowers to the
appropriate PTX instructions.
* Lit tests are added for all combinations of these intrinsics in
cp-async-bulk.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.
PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
show more ...
|