cp-async-bulk.ll - OpenGrok history log for /llvm-project/llvm/test/CodeGen/NVPTX/cp-async-bulk.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# fa7f0e58	23-Jan-2025	Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	[NVPTX] Add Bulk Copy Prefetch Intrinsics (#123226) This patch adds NVVM intrinsics and NVPTX codegen for: - cp.async.bulk.prefetch.L2.* variants - These intrinsics optionally support cache_hin [NVPTX] Add Bulk Copy Prefetch Intrinsics (#123226) This patch adds NVVM intrinsics and NVPTX codegen for: - cp.async.bulk.prefetch.L2.* variants - These intrinsics optionally support cache_hints as indicated by the boolean flag argument. - Lit tests are added for all combinations of these intrinsics in cp-async-bulk.ll. - The generated PTX is verified with a 12.3 ptxas executable. - Added docs for these intrinsics in NVPTXUsage.rst file. PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-prefetch Co-authored-by: abmajumder <abmajumder@nvidia.com> show more ...
Revision tags: llvmorg-19.1.7
# 372044ee	10-Jan-2025	Durgadoss R <durgadossr@nvidia.com>	[NVPTX] Add TMA Bulk Copy intrinsics (#122344) PR #96083 added intrinsics for async copy of 'tensor' data using TMA. Following a similar design, this PR adds intrinsics for async copy of bulk data [NVPTX] Add TMA Bulk Copy intrinsics (#122344) PR #96083 added intrinsics for async copy of 'tensor' data using TMA. Following a similar design, this PR adds intrinsics for async copy of bulk data (non-tensor variants) through TMA. * These intrinsics optionally support multicast and cache_hints, as indicated by the boolean arguments at the end of the intrinsics. * The backend looks through these flag arguments and lowers to the appropriate PTX instructions. * Lit tests are added for all combinations of these intrinsics in cp-async-bulk.ll. * The generated PTX is verified with a 12.3 ptxas executable. * Added docs for these intrinsics in NVPTXUsage.rst file. PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk Signed-off-by: Durgadoss R <durgadossr@nvidia.com> show more ...