History log of /llvm-project/llvm/test/CodeGen/NVPTX/async-copy.ll (Results 1 – 8 of 8)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# b279f6b0 15-Dec-2024 Fangrui Song <i@maskray.me>

[NVPTX,test] Change llc -march= to -mtriple=

Similar to 806761a7629df268c8aed49657aeccffa6bca449

-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the de

[NVPTX,test] Change llc -march= to -mtriple=

Similar to 806761a7629df268c8aed49657aeccffa6bca449

-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple (e.g. Windows, macOS),
leaving a target triple which may not make sense.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
nvptx{,64}-apple-darwin as ELF instead of rejecting it outrightly.

show more ...


Revision tags: llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6
# ef8655ad 05-Jun-2023 Artem Belevich <tra@google.com>

[NVPTX] Adapt tests to make them usable with CUDA-12.x

CUDA-12 no longer supports 32-bit compilation.

Tests agnostic to 32/64 compilation mode are switched to use nvptx64.
Tests that do care about

[NVPTX] Adapt tests to make them usable with CUDA-12.x

CUDA-12 no longer supports 32-bit compilation.

Tests agnostic to 32/64 compilation mode are switched to use nvptx64.
Tests that do care about it have 32-bit ptxas compilation disabled with cuda-12+.

Differential Revision: https://reviews.llvm.org/D152199

show more ...


Revision tags: llvmorg-16.0.5
# 6963c61f 17-May-2023 Artem Belevich <tra@google.com>

[NVPTX/CUDA] added an optional src_size argument to __nvvm_cp_async*

The optional argument is needed for CUDA-11+ headers when we're compiling for
sm_80+ GPUs.

Differential Revision: https://review

[NVPTX/CUDA] added an optional src_size argument to __nvvm_cp_async*

The optional argument is needed for CUDA-11+ headers when we're compiling for
sm_80+ GPUs.

Differential Revision: https://reviews.llvm.org/D150820

show more ...


# 0e43eb24 18-May-2023 Artem Belevich <tra@google.com>

Revert "[NVPTX/CUDA] added an optional src_size argument to __nvvm_cp_async*"

Breaks MLIR which happens to be using the intrinsics.

This reverts commit e7b9c2f00fa04ef8d9b69ee0e36d7775823dbe6b.


# e7b9c2f0 17-May-2023 Artem Belevich <tra@google.com>

[NVPTX/CUDA] added an optional src_size argument to __nvvm_cp_async*

The optional argument is needed for CUDA-11+ headers when we're compiling for
sm_80+ GPUs.

Differential Revision: https://review

[NVPTX/CUDA] added an optional src_size argument to __nvvm_cp_async*

The optional argument is needed for CUDA-11+ headers when we're compiling for
sm_80+ GPUs.

Differential Revision: https://reviews.llvm.org/D150820

show more ...


Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 9b81548a 19-Dec-2022 Nikita Popov <npopov@redhat.com>

[NVPTX] Convert some tests to opaque pointers (NFC)


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3
# 0f1b5f11 27-Apr-2022 Andrew Savonichev <andrew.savonichev@gmail.com>

[NVPTX] Integrate ptxas to LIT tests

ptxas is a proprietary compiler from Nvidia that can compile PTX to
machine code (SASS). It has a lot of diagnostics to catch errors
in PTX, which can be used to

[NVPTX] Integrate ptxas to LIT tests

ptxas is a proprietary compiler from Nvidia that can compile PTX to
machine code (SASS). It has a lot of diagnostics to catch errors
in PTX, which can be used to verify PTX output from llc.

Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it.
If this option is not set, then ptxas is substituted to true which
effectively disables all ptxas RUN lines.

LLVM_PTXAS_EXECUTABLE environment variable takes precedence over
the CMake option, and allows to override ptxas executable that is used for LIT
without complete re-configuration.

Differential Revision: https://reviews.llvm.org/D121727

show more ...


Revision tags: llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1
# 02c24688 17-May-2021 Stuart Adams <stuart.adams@codeplay.com>

[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions

Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for
`sm_80` architecture or newer.

PT

[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions

Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for
`sm_80` architecture or newer.

PTX ISA description of `cp.async`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive

Authored-by: Stuart Adams <stuart.adams@codeplay.com>
Co-Authored-by: Alexander Johnston <alexander@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100394

show more ...