f16-instructions.ll - OpenGrok history log for /llvm-project/llvm/test/CodeGen/NVPTX/f16-instructions.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3
# 0f0a96b8	19-Oct-2024	Youngsuk Kim <youngsuk.kim@hpe.com>	[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017) Remove the extraneous '+0' immediate offset part in PTX load/stores, to improve readability of output PTX code.
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1
# da46244e	17-Sep-2024	Craig Topper <craig.topper@sifive.com>	Revert "[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific." This reverts commit 884ff9e3f9741ac282b6cf8087b8d3f62b8e138a. Regression was reported in Halide for arm32.
Revision tags: llvmorg-19.1.0
# 884ff9e3	17-Sep-2024	Craig Topper <craig.topper@sifive.com>	[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific. Only scalarize single element vectors when vector FSUB is not supported and scalar FNEG is supported.
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# d2645141	25-Jun-2024	Akshay Deodhar <adeodhar@nvidia.com>	Enforce parameter order in f16 call, flipped call unit tests (#96258)
# 049630d0	18-Jun-2024	Justin Fargnoli <jfargnoli@nvidia.com>	[NFC][NVPTX][test] Update test for `fneg half` (#95856) `test_fneg` function uses `fsub half 0.0, %x`. Add a test that uses the `fneg` instruction directly.
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 250f2bb2	23-Jun-2023	root <kahmadian@nvidia.com>	adding bf16 support to NVPTX Currently, bf16 has been scatteredly added to the PTX codegen. This patch aims to complete the set of instructions and code path required to support bf16 data type. Rev adding bf16 support to NVPTX Currently, bf16 has been scatteredly added to the PTX codegen. This patch aims to complete the set of instructions and code path required to support bf16 data type. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D144911 Co-authored-by: Artem Belevich <tra@google.com> show more ...
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# dc90f42e	27-May-2023	Artem Belevich <tra@google.com>	Coalesce 16-bit FP types to use integer register classes. i16/f16/bf16 will use the same .b16 registers and i32/v2f16 and v2bf16 will share .b32 registers. The changes are mostly mechanical, intend Coalesce 16-bit FP types to use integer register classes. i16/f16/bf16 will use the same .b16 registers and i32/v2f16 and v2bf16 will share .b32 registers. The changes are mostly mechanical, intended to remove unnecessary register classes which tend to produce redundant register moves. Differential Revision: https://reviews.llvm.org/D151601 v2f16 regtype conversion to i32 show more ...
Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2
# 70459669	07-Feb-2023	Artem Belevich <tra@google.com>	[NVPTX] Lower extraction of upper half of i32/i64 as partial move. This produces better SASS than right-shift + truncate and is fairly common for CUDA code that operates on __half2 values represente [NVPTX] Lower extraction of upper half of i32/i64 as partial move. This produces better SASS than right-shift + truncate and is fairly common for CUDA code that operates on __half2 values represented as opaque integer. Differential Revision: https://reviews.llvm.org/D143448 show more ...
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 2c3f82b7	05-Jan-2023	Benjamin Chetioui <bchetioui@google.com>	[NVPTX] Fix NVPTX lowering of frem when denominator is infinite. `frem x, {+,-}inf` must return x to match the specification of LLVM's frem. Reviewed By: tra Differential Revision: https://reviews [NVPTX] Fix NVPTX lowering of frem when denominator is infinite. `frem x, {+,-}inf` must return x to match the specification of LLVM's frem. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D140846 show more ...
# 9b81548a	19-Dec-2022	Nikita Popov <npopov@redhat.com>	[NVPTX] Convert some tests to opaque pointers (NFC)
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3
# 8407fdbd	13-Oct-2022	Jakub Chlanda <j.chlanda@gmail.com>	[NVPTX] Support neg{.ftz} for f16 and f16x2 Differential Revision: https://reviews.llvm.org/D135428
Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0
# 3ccaabe0	25-Aug-2022	Benjamin Kramer <benny.kra@googlemail.com>	[NVPTX] Lower llvm.roundeven to cvt.rni
Revision tags: llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4
# 2e7e0975	30-Apr-2022	Dmitry Vassiliev <dvassiliev@accesssoftek.com>	[NVPTX] Prefix "$L__" for branch label names A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this. Reviewed By: MaskRay, tra Diffe [NVPTX] Prefix "$L__" for branch label names A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this. Reviewed By: MaskRay, tra Differential Revision: https://reviews.llvm.org/D119669 show more ...
Revision tags: llvmorg-14.0.3
# 0f1b5f11	27-Apr-2022	Andrew Savonichev <andrew.savonichev@gmail.com>	[NVPTX] Integrate ptxas to LIT tests ptxas is a proprietary compiler from Nvidia that can compile PTX to machine code (SASS). It has a lot of diagnostics to catch errors in PTX, which can be used to [NVPTX] Integrate ptxas to LIT tests ptxas is a proprietary compiler from Nvidia that can compile PTX to machine code (SASS). It has a lot of diagnostics to catch errors in PTX, which can be used to verify PTX output from llc. Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it. If this option is not set, then ptxas is substituted to true which effectively disables all ptxas RUN lines. LLVM_PTXAS_EXECUTABLE environment variable takes precedence over the CMake option, and allows to override ptxas executable that is used for LIT without complete re-configuration. Differential Revision: https://reviews.llvm.org/D121727 show more ...
Revision tags: llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# a00ae86a	14-Feb-2022	Fangrui Song <i@maskray.me>	Revert D119669 "[NVPTX] Prefix "$L__" for branch label names" This reverts commit cccef321096c20825fe8738045c1d91d3b9fd57d. Broke clang-cuda-t4 ``` /buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin Revert D119669 "[NVPTX] Prefix "$L__" for branch label names" This reverts commit cccef321096c20825fe8738045c1d91d3b9fd57d. Broke clang-cuda-t4 ``` /buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin/clang++ -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time -UNDEBUG --cuda-path=/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/cuda-11.0 -I/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/cuda-11.0/include --cuda-gpu-arch=sm_75 -std=c++20 -stdlib=libstdc++ --gcc-toolchain=/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/gcc-8 -DSTDLIB_VERSION=2014 -MD -MT External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o -MF External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o.d -o External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o -c /buildbot/cuda-t4-0/work/clang-cuda-t4/llvm-test-suite/External/CUDA/complex.cu ptxas /tmp/complex-cfa050/complex-sm_75.s, line 250; fatal : Parsing error near '$L__BB6_2': syntax error ptxas fatal : Ptx assembly aborted due to errors ``` show more ...
# cccef321	14-Feb-2022	Dmitry Vassiliev <dvassiliev@accesssoftek.com>	[NVPTX] Prefix "$L__" for branch label names A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this. Reviewed By: MaskRay, tra Diffe [NVPTX] Prefix "$L__" for branch label names A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this. Reviewed By: MaskRay, tra Differential Revision: https://reviews.llvm.org/D119669 show more ...
Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# 4c7f820b	26-Mar-2021	Bjorn Pettersson <bjorn.a.pettersson@ericsson.com>	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit 0ee439b705e82a4fe20e2, that changed the second argument of __powidf2, __powisf2 and __powit Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit 0ee439b705e82a4fe20e2, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439 show more ...
# 505933a4	24-May-2021	thomasraoux <thomasraoux@google.com>	[NVPTX] Fix lowering of frem for negative values to match fmod frem result must have the dividend sign. Previous implementation had the wrong sign when passing negative numbers. For ex: frem(-16, 7) [NVPTX] Fix lowering of frem for negative values to match fmod frem result must have the dividend sign. Previous implementation had the wrong sign when passing negative numbers. For ex: frem(-16, 7) was returning 5 instead of -2. We should just a ftrunc instead of floor when lowering to get the right behavior. Differential Revision: https://reviews.llvm.org/D102528 show more ...
Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 05e90cef	11-Jan-2021	Mircea Trofin <mtrofin@google.com>	[NFC] Disallow unused prefixes under llvm/test/CodeGen This patch finishes addressing unused prefixes under CodeGen: 2 remaining tests fixed, and then undo-ing the lit.local.cfg changes under variou [NFC] Disallow unused prefixes under llvm/test/CodeGen This patch finishes addressing unused prefixes under CodeGen: 2 remaining tests fixed, and then undo-ing the lit.local.cfg changes under various subdirs and moving the policy under CodeGen. Differential Revision: https://reviews.llvm.org/D94430 show more ...
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2
# a8cc9047	04-Dec-2019	Matt Arsenault <Matthew.Arsenault@amd.com>	CodeGen: Add -denormal-fp-math-f32 flag Make the set of FP related attributes and command flags closer.
# 0fd8030b	27-Mar-2020	Matt Arsenault <Matthew.Arsenault@amd.com>	Fix line endings in test
Revision tags: llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# 6c21ccd2	01-Apr-2019	Bixia Zheng <bixia@google.com>	[NVPTX] Fix the codegen for llvm.round. Summary: Previously, we translate llvm.round to PTX cvt.rni, which rounds to the even interger when the source is equidistant between two integers. This is no [NVPTX] Fix the codegen for llvm.round. Summary: Previously, we translate llvm.round to PTX cvt.rni, which rounds to the even interger when the source is equidistant between two integers. This is not correct as llvm.round should round away from zero. This change replaces llvm.round with a round away from zero implementation through target specific custom lowering. Modify a few affected tests to not check for cvt.rni. Instead, we check for the use of a few constants used in implementing round. We are also adding CUDA runnable tests to check for the values produced by llvm.round to test-suites/External/CUDA. Reviewers: tra Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59947 llvm-svn: 357407 show more ...
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1
# b7cef81f	14-Jan-2019	Francis Visoiu Mistrih <francisvm@yahoo.com>	Replace "no-frame-pointer-" function attributes with "frame-pointer" Part of the effort to refactoring frame pointer code generation. We used to use two function attributes "no-frame-pointer-elim" Replace "no-frame-pointer-" function attributes with "frame-pointer" Part of the effort to refactoring frame pointer code generation. We used to use two function attributes "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" to represent three kinds of frame pointer usage: (all) frames use frame pointer, (non-leaf) frames use frame pointer, (none) frame use frame pointer. This CL makes the idea explicit by using only one enum function attribute "frame-pointer" Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as llc. "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still supported for easy migration to "frame-pointer". tests are mostly updated with // replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’ grep -iIrnl '\-disable-fp-elim=false' * \| xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g" // replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’ grep -iIrnl '\-disable-fp-elim' * \| xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g" Patch by Yuanfang Chen (tabloid.adroit)! Differential Revision: https://reviews.llvm.org/D56351 llvm-svn: 351049 show more ...
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2
# d66dde5a	21-Aug-2018	Benjamin Kramer <benny.kra@googlemail.com>	[NVPTX] Remove ftz variants of cvt with rounding mode These do not exist in ptxas, it refuses to compile them. Differential Revision: https://reviews.llvm.org/D51042 llvm-svn: 340317
Revision tags: llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2
# 55ff5786	15-May-2017	Simon Pilgrim <llvm-dev@redking.me.uk>	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146) Follow up to D33147 NVPTXTargetLowering::LowerCall was trusting the default argument values. Fixes anoth [NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146) Follow up to D33147 NVPTXTargetLowering::LowerCall was trusting the default argument values. Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33189 llvm-svn: 303082 show more ...
12