|
Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
| #
4583f6d3 |
| 08-Jan-2025 |
Alex MacLean <amaclean@nvidia.com> |
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time.
This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.
show more ...
|
|
Revision tags: llvmorg-19.1.6 |
|
| #
b279f6b0 |
| 15-Dec-2024 |
Fangrui Song <i@maskray.me> |
[NVPTX,test] Change llc -march= to -mtriple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449
-mtriple= specifies the full target triple while -march= merely sets the architecture part of the de
[NVPTX,test] Change llc -march= to -mtriple=
Similar to 806761a7629df268c8aed49657aeccffa6bca449
-mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple (e.g. Windows, macOS), leaving a target triple which may not make sense.
Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize nvptx{,64}-apple-darwin as ELF instead of rejecting it outrightly.
show more ...
|
|
Revision tags: llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3 |
|
| #
0f0a96b8 |
| 19-Oct-2024 |
Youngsuk Kim <youngsuk.kim@hpe.com> |
[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017)
Remove the extraneous '+0' immediate offset part in PTX load/stores, to
improve readability of output PTX code.
|
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6 |
|
| #
ef8655ad |
| 05-Jun-2023 |
Artem Belevich <tra@google.com> |
[NVPTX] Adapt tests to make them usable with CUDA-12.x
CUDA-12 no longer supports 32-bit compilation.
Tests agnostic to 32/64 compilation mode are switched to use nvptx64. Tests that do care about
[NVPTX] Adapt tests to make them usable with CUDA-12.x
CUDA-12 no longer supports 32-bit compilation.
Tests agnostic to 32/64 compilation mode are switched to use nvptx64. Tests that do care about it have 32-bit ptxas compilation disabled with cuda-12+.
Differential Revision: https://reviews.llvm.org/D152199
show more ...
|
|
Revision tags: llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
9b81548a |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[NVPTX] Convert some tests to opaque pointers (NFC)
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3 |
|
| #
0f1b5f11 |
| 27-Apr-2022 |
Andrew Savonichev <andrew.savonichev@gmail.com> |
[NVPTX] Integrate ptxas to LIT tests
ptxas is a proprietary compiler from Nvidia that can compile PTX to machine code (SASS). It has a lot of diagnostics to catch errors in PTX, which can be used to
[NVPTX] Integrate ptxas to LIT tests
ptxas is a proprietary compiler from Nvidia that can compile PTX to machine code (SASS). It has a lot of diagnostics to catch errors in PTX, which can be used to verify PTX output from llc.
Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it. If this option is not set, then ptxas is substituted to true which effectively disables all ptxas RUN lines.
LLVM_PTXAS_EXECUTABLE environment variable takes precedence over the CMake option, and allows to override ptxas executable that is used for LIT without complete re-configuration.
Differential Revision: https://reviews.llvm.org/D121727
show more ...
|
|
Revision tags: llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1 |
|
| #
51eefa81 |
| 14-Oct-2021 |
Andrew Savonichev <andrew.savonichev@gmail.com> |
[NVPTX] Add VRFrame and VRFrameLocal to integer register classes
These registers are used as operands for instructions that expect an integer register, so they should be added to Int32Regs or Int64R
[NVPTX] Add VRFrame and VRFrameLocal to integer register classes
These registers are used as operands for instructions that expect an integer register, so they should be added to Int32Regs or Int64Regs register classes. Otherwise the machine verifier emits an error for the following LIT tests when LLVM_ENABLE_MACHINE_VERIFIER=1 environment variable is set:
*** Bad machine code: Illegal physical register for instruction *** - function: kernel_func - basic block: %bb.0 entry (0x55c8903d5438) - instruction: %3:int64regs = LEA_ADDRi64 $vrframelocal, 0 - operand 1: $vrframelocal $vrframelocal is not a Int64Regs register.
CodeGen/NVPTX/call-with-alloca-buffer.ll CodeGen/NVPTX/disable-opt.ll CodeGen/NVPTX/lower-alloca.ll CodeGen/NVPTX/lower-args.ll CodeGen/NVPTX/param-align.ll CodeGen/NVPTX/reg-types.ll DebugInfo/NVPTX/dbg-declare-alloca.ll DebugInfo/NVPTX/dbg-value-const-byref.ll
Differential Revision: https://reviews.llvm.org/D110164
show more ...
|
|
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1, llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1, llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1, llvmorg-3.8.1, llvmorg-3.8.1-rc1, llvmorg-3.8.0, llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1, llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1, llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, studio-1.4, llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1 |
|
| #
77b5b385 |
| 01-Jul-2015 |
Jingyue Wu <jingyue@google.com> |
[NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass
Summary: Offset of frame index is calculated by NVPTXPrologEpilogPass. Before that the correct offset of stack objects cannot be obtained, whic
[NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass
Summary: Offset of frame index is calculated by NVPTXPrologEpilogPass. Before that the correct offset of stack objects cannot be obtained, which leads to wrong offset if there are more than 2 frame objects. This patch move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check VRFrame register instead, and try to remove the VRFrame if there is no usage after NVPTXPeephole pass.
Patched by Xuetian Weng.
Test Plan: Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the offset calculation based on SP and SPL.
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10853
llvm-svn: 241185
show more ...
|
| #
9fe08c4b |
| 30-Jun-2015 |
Jingyue Wu <jingyue@google.com> |
[NVPTX] Fix issue introduced in D10321
Summary: Really check if %SP is not used in other places, instead of checking only exact one non-dbg use.
Patched by Xuetian Weng.
Test Plan: @foo4 in test/
[NVPTX] Fix issue introduced in D10321
Summary: Really check if %SP is not used in other places, instead of checking only exact one non-dbg use.
Patched by Xuetian Weng.
Test Plan: @foo4 in test/CodeGen/NVPTX/local-stack-frame.ll, create a case that SP will appear twice.
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: llvm-commits, sfantao, jholewinski
Differential Revision: http://reviews.llvm.org/D10844
llvm-svn: 241099
show more ...
|
| #
9c71150b |
| 24-Jun-2015 |
Jingyue Wu <jingyue@google.com> |
Add NVPTXPeephole pass to reduce unnecessary address cast
Summary: This patch first change the register that holds local address for stack frame to %SPL. Then the new NVPTXPeephole pass will try to
Add NVPTXPeephole pass to reduce unnecessary address cast
Summary: This patch first change the register that holds local address for stack frame to %SPL. Then the new NVPTXPeephole pass will try to scan the following pattern
%vreg0<def> = LEA_ADDRi64 <fi#0>, 4 %vreg1<def> = cvta_to_local %vreg0
and transform it into
%vreg1<def> = LEA_ADDRi64 %VRFrameLocal, 4
Patched by Xuetian Weng
Test Plan: test/CodeGen/NVPTX/local-stack-frame.ll
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: eliben, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10549
llvm-svn: 240587
show more ...
|
|
Revision tags: llvmorg-3.6.2, llvmorg-3.6.2-rc1, llvmorg-3.6.1, llvmorg-3.6.1-rc1, llvmorg-3.5.2, llvmorg-3.5.2-rc1, llvmorg-3.6.0, llvmorg-3.6.0-rc4, llvmorg-3.6.0-rc3, llvmorg-3.6.0-rc2, llvmorg-3.6.0-rc1, llvmorg-3.5.1, llvmorg-3.5.1-rc2, llvmorg-3.5.1-rc1, llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2, llvmorg-3.5.0-rc1 |
|
| #
3e037d98 |
| 16-Jul-2014 |
Justin Holewinski <jholewinski@nvidia.com> |
[NVPTX] Rename registers %fl -> %fd and %rl -> %rd
This matches the internal behavior of NVIDIA tools like libnvvm.
llvm-svn: 213168
|
|
Revision tags: llvmorg-3.4.2, llvmorg-3.4.2-rc1, llvmorg-3.4.1, llvmorg-3.4.1-rc2 |
|
| #
7cd70df7 |
| 21-Apr-2014 |
Eli Bendersky <eliben@google.com> |
Fix the test: DCE optimized away everything.
Use volatile store to protect the generated PTX from DCE.
Patch by Jingyue Wu.
llvm-svn: 206763
|
|
Revision tags: llvmorg-3.4.1-rc1, llvmorg-3.4.0, llvmorg-3.4.0-rc3, llvmorg-3.4.0-rc2, llvmorg-3.4.0-rc1 |
|
| #
871ec939 |
| 06-Aug-2013 |
Justin Holewinski <jholewinski@nvidia.com> |
[NVPTX] Fix bug in stack code generation causes by MC conversion
We do use a very small set of physical registers, so account for them in the virtual register encoding between MachineInstr and MC
l
[NVPTX] Fix bug in stack code generation causes by MC conversion
We do use a very small set of physical registers, so account for them in the virtual register encoding between MachineInstr and MC
llvm-svn: 187799
show more ...
|