|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3 |
|
| #
0f0a96b8 |
| 19-Oct-2024 |
Youngsuk Kim <youngsuk.kim@hpe.com> |
[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017)
Remove the extraneous '+0' immediate offset part in PTX load/stores, to
improve readability of output PTX code.
|
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1 |
|
| #
da46244e |
| 17-Sep-2024 |
Craig Topper <craig.topper@sifive.com> |
Revert "[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific."
This reverts commit 884ff9e3f9741ac282b6cf8087b8d3f62b8e138a.
Regression was reported in Halide for arm32.
|
|
Revision tags: llvmorg-19.1.0 |
|
| #
884ff9e3 |
| 17-Sep-2024 |
Craig Topper <craig.topper@sifive.com> |
[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific.
Only scalarize single element vectors when vector FSUB is not supported and scalar FNEG is supported.
|
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
| #
d2645141 |
| 25-Jun-2024 |
Akshay Deodhar <adeodhar@nvidia.com> |
Enforce parameter order in f16 call, flipped call unit tests (#96258)
|
| #
049630d0 |
| 18-Jun-2024 |
Justin Fargnoli <jfargnoli@nvidia.com> |
[NFC][NVPTX][test] Update test for `fneg half` (#95856)
`test_fneg` function uses `fsub half 0.0, %x`. Add a test that uses the
`fneg` instruction directly.
|
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
| #
250f2bb2 |
| 23-Jun-2023 |
root <kahmadian@nvidia.com> |
adding bf16 support to NVPTX
Currently, bf16 has been scatteredly added to the PTX codegen. This patch aims to complete the set of instructions and code path required to support bf16 data type.
Rev
adding bf16 support to NVPTX
Currently, bf16 has been scatteredly added to the PTX codegen. This patch aims to complete the set of instructions and code path required to support bf16 data type.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D144911
Co-authored-by: Artem Belevich <tra@google.com>
show more ...
|
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5 |
|
| #
dc90f42e |
| 27-May-2023 |
Artem Belevich <tra@google.com> |
Coalesce 16-bit FP types to use integer register classes.
i16/f16/bf16 will use the same .b16 registers and i32/v2f16 and v2bf16 will share .b32 registers.
The changes are mostly mechanical, intend
Coalesce 16-bit FP types to use integer register classes.
i16/f16/bf16 will use the same .b16 registers and i32/v2f16 and v2bf16 will share .b32 registers.
The changes are mostly mechanical, intended to remove unnecessary register classes which tend to produce redundant register moves.
Differential Revision: https://reviews.llvm.org/D151601
v2f16 regtype conversion to i32
show more ...
|
|
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2 |
|
| #
70459669 |
| 07-Feb-2023 |
Artem Belevich <tra@google.com> |
[NVPTX] Lower extraction of upper half of i32/i64 as partial move.
This produces better SASS than right-shift + truncate and is fairly common for CUDA code that operates on __half2 values represente
[NVPTX] Lower extraction of upper half of i32/i64 as partial move.
This produces better SASS than right-shift + truncate and is fairly common for CUDA code that operates on __half2 values represented as opaque integer.
Differential Revision: https://reviews.llvm.org/D143448
show more ...
|
|
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
2c3f82b7 |
| 05-Jan-2023 |
Benjamin Chetioui <bchetioui@google.com> |
[NVPTX] Fix NVPTX lowering of frem when denominator is infinite.
`frem x, {+,-}inf` must return x to match the specification of LLVM's frem.
Reviewed By: tra
Differential Revision: https://reviews
[NVPTX] Fix NVPTX lowering of frem when denominator is infinite.
`frem x, {+,-}inf` must return x to match the specification of LLVM's frem.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D140846
show more ...
|
| #
9b81548a |
| 19-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[NVPTX] Convert some tests to opaque pointers (NFC)
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3 |
|
| #
8407fdbd |
| 13-Oct-2022 |
Jakub Chlanda <j.chlanda@gmail.com> |
[NVPTX] Support neg{.ftz} for f16 and f16x2
Differential Revision: https://reviews.llvm.org/D135428
|
|
Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0 |
|
| #
3ccaabe0 |
| 25-Aug-2022 |
Benjamin Kramer <benny.kra@googlemail.com> |
[NVPTX] Lower llvm.roundeven to cvt.rni
|
|
Revision tags: llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
2e7e0975 |
| 30-Apr-2022 |
Dmitry Vassiliev <dvassiliev@accesssoftek.com> |
[NVPTX] Prefix "$L__" for branch label names
A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this.
Reviewed By: MaskRay, tra
Diffe
[NVPTX] Prefix "$L__" for branch label names
A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this.
Reviewed By: MaskRay, tra
Differential Revision: https://reviews.llvm.org/D119669
show more ...
|
|
Revision tags: llvmorg-14.0.3 |
|
| #
0f1b5f11 |
| 27-Apr-2022 |
Andrew Savonichev <andrew.savonichev@gmail.com> |
[NVPTX] Integrate ptxas to LIT tests
ptxas is a proprietary compiler from Nvidia that can compile PTX to machine code (SASS). It has a lot of diagnostics to catch errors in PTX, which can be used to
[NVPTX] Integrate ptxas to LIT tests
ptxas is a proprietary compiler from Nvidia that can compile PTX to machine code (SASS). It has a lot of diagnostics to catch errors in PTX, which can be used to verify PTX output from llc.
Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it. If this option is not set, then ptxas is substituted to true which effectively disables all ptxas RUN lines.
LLVM_PTXAS_EXECUTABLE environment variable takes precedence over the CMake option, and allows to override ptxas executable that is used for LIT without complete re-configuration.
Differential Revision: https://reviews.llvm.org/D121727
show more ...
|
|
Revision tags: llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2 |
|
| #
a00ae86a |
| 14-Feb-2022 |
Fangrui Song <i@maskray.me> |
Revert D119669 "[NVPTX] Prefix "$L__" for branch label names"
This reverts commit cccef321096c20825fe8738045c1d91d3b9fd57d.
Broke clang-cuda-t4
``` /buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin
Revert D119669 "[NVPTX] Prefix "$L__" for branch label names"
This reverts commit cccef321096c20825fe8738045c1d91d3b9fd57d.
Broke clang-cuda-t4
``` /buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin/clang++ -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time -UNDEBUG --cuda-path=/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/cuda-11.0 -I/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/cuda-11.0/include --cuda-gpu-arch=sm_75 -std=c++20 -stdlib=libstdc++ --gcc-toolchain=/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/gcc-8 -DSTDLIB_VERSION=2014 -MD -MT External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o -MF External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o.d -o External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o -c /buildbot/cuda-t4-0/work/clang-cuda-t4/llvm-test-suite/External/CUDA/complex.cu ptxas /tmp/complex-cfa050/complex-sm_75.s, line 250; fatal : Parsing error near '$L__BB6_2': syntax error ptxas fatal : Ptx assembly aborted due to errors ```
show more ...
|
| #
cccef321 |
| 14-Feb-2022 |
Dmitry Vassiliev <dvassiliev@accesssoftek.com> |
[NVPTX] Prefix "$L__" for branch label names
A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this.
Reviewed By: MaskRay, tra
Diffe
[NVPTX] Prefix "$L__" for branch label names
A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this.
Reviewed By: MaskRay, tra
Differential Revision: https://reviews.llvm.org/D119669
show more ...
|
|
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4 |
|
| #
4c7f820b |
| 26-Mar-2021 |
Bjorn Pettersson <bjorn.a.pettersson@ericsson.com> |
Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705e82a4fe20e2, that changed the second argument of __powidf2, __powisf2 and __powit
Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705e82a4fe20e2, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int.
The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function.
One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
show more ...
|
| #
505933a4 |
| 24-May-2021 |
thomasraoux <thomasraoux@google.com> |
[NVPTX] Fix lowering of frem for negative values
to match fmod frem result must have the dividend sign. Previous implementation had the wrong sign when passing negative numbers. For ex: frem(-16, 7)
[NVPTX] Fix lowering of frem for negative values
to match fmod frem result must have the dividend sign. Previous implementation had the wrong sign when passing negative numbers. For ex: frem(-16, 7) was returning 5 instead of -2. We should just a ftrunc instead of floor when lowering to get the right behavior.
Differential Revision: https://reviews.llvm.org/D102528
show more ...
|
|
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1 |
|
| #
05e90cef |
| 11-Jan-2021 |
Mircea Trofin <mtrofin@google.com> |
[NFC] Disallow unused prefixes under llvm/test/CodeGen
This patch finishes addressing unused prefixes under CodeGen: 2 remaining tests fixed, and then undo-ing the lit.local.cfg changes under variou
[NFC] Disallow unused prefixes under llvm/test/CodeGen
This patch finishes addressing unused prefixes under CodeGen: 2 remaining tests fixed, and then undo-ing the lit.local.cfg changes under various subdirs and moving the policy under CodeGen.
Differential Revision: https://reviews.llvm.org/D94430
show more ...
|
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2 |
|
| #
a8cc9047 |
| 04-Dec-2019 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
CodeGen: Add -denormal-fp-math-f32 flag
Make the set of FP related attributes and command flags closer.
|
| #
0fd8030b |
| 27-Mar-2020 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
Fix line endings in test
|
|
Revision tags: llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
| #
6c21ccd2 |
| 01-Apr-2019 |
Bixia Zheng <bixia@google.com> |
[NVPTX] Fix the codegen for llvm.round.
Summary: Previously, we translate llvm.round to PTX cvt.rni, which rounds to the even interger when the source is equidistant between two integers. This is no
[NVPTX] Fix the codegen for llvm.round.
Summary: Previously, we translate llvm.round to PTX cvt.rni, which rounds to the even interger when the source is equidistant between two integers. This is not correct as llvm.round should round away from zero. This change replaces llvm.round with a round away from zero implementation through target specific custom lowering.
Modify a few affected tests to not check for cvt.rni. Instead, we check for the use of a few constants used in implementing round. We are also adding CUDA runnable tests to check for the values produced by llvm.round to test-suites/External/CUDA.
Reviewers: tra
Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59947
llvm-svn: 357407
show more ...
|
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
| #
b7cef81f |
| 14-Jan-2019 |
Francis Visoiu Mistrih <francisvm@yahoo.com> |
Replace "no-frame-pointer-*" function attributes with "frame-pointer"
Part of the effort to refactoring frame pointer code generation. We used to use two function attributes "no-frame-pointer-elim"
Replace "no-frame-pointer-*" function attributes with "frame-pointer"
Part of the effort to refactoring frame pointer code generation. We used to use two function attributes "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" to represent three kinds of frame pointer usage: (all) frames use frame pointer, (non-leaf) frames use frame pointer, (none) frame use frame pointer. This CL makes the idea explicit by using only one enum function attribute "frame-pointer"
Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as llc.
"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still supported for easy migration to "frame-pointer".
tests are mostly updated with
// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’ grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"
// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’ grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"
Patch by Yuanfang Chen (tabloid.adroit)!
Differential Revision: https://reviews.llvm.org/D56351
llvm-svn: 351049
show more ...
|
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2 |
|
| #
d66dde5a |
| 21-Aug-2018 |
Benjamin Kramer <benny.kra@googlemail.com> |
[NVPTX] Remove ftz variants of cvt with rounding mode
These do not exist in ptxas, it refuses to compile them.
Differential Revision: https://reviews.llvm.org/D51042
llvm-svn: 340317
|
|
Revision tags: llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2 |
|
| #
55ff5786 |
| 15-May-2017 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)
Follow up to D33147
NVPTXTargetLowering::LowerCall was trusting the default argument values.
Fixes anoth
[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)
Follow up to D33147
NVPTXTargetLowering::LowerCall was trusting the default argument values.
Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.
Differential Revision: https://reviews.llvm.org/D33189
llvm-svn: 303082
show more ...
|