History log of /llvm-project/llvm/test/CodeGen/NVPTX/f16-instructions.ll (Results 1 – 25 of 31)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3
# 0f0a96b8 19-Oct-2024 Youngsuk Kim <youngsuk.kim@hpe.com>

[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017)

Remove the extraneous '+0' immediate offset part in PTX load/stores, to
improve readability of output PTX code.


Revision tags: llvmorg-19.1.2, llvmorg-19.1.1
# da46244e 17-Sep-2024 Craig Topper <craig.topper@sifive.com>

Revert "[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific."

This reverts commit 884ff9e3f9741ac282b6cf8087b8d3f62b8e138a.

Regression was reported in Halide for arm32.


Revision tags: llvmorg-19.1.0
# 884ff9e3 17-Sep-2024 Craig Topper <craig.topper@sifive.com>

[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific.

Only scalarize single element vectors when vector FSUB is not
supported and scalar FNEG is supported.


Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# d2645141 25-Jun-2024 Akshay Deodhar <adeodhar@nvidia.com>

Enforce parameter order in f16 call, flipped call unit tests (#96258)


# 049630d0 18-Jun-2024 Justin Fargnoli <jfargnoli@nvidia.com>

[NFC][NVPTX][test] Update test for `fneg half` (#95856)

`test_fneg` function uses `fsub half 0.0, %x`. Add a test that uses the
`fneg` instruction directly.


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init
# 250f2bb2 23-Jun-2023 root <kahmadian@nvidia.com>

adding bf16 support to NVPTX

Currently, bf16 has been scatteredly added to the PTX codegen. This patch aims to complete the set of instructions and code path required to support bf16 data type.

Rev

adding bf16 support to NVPTX

Currently, bf16 has been scatteredly added to the PTX codegen. This patch aims to complete the set of instructions and code path required to support bf16 data type.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D144911

Co-authored-by: Artem Belevich <tra@google.com>

show more ...


Revision tags: llvmorg-16.0.6, llvmorg-16.0.5
# dc90f42e 27-May-2023 Artem Belevich <tra@google.com>

Coalesce 16-bit FP types to use integer register classes.

i16/f16/bf16 will use the same .b16 registers and
i32/v2f16 and v2bf16 will share .b32 registers.

The changes are mostly mechanical, intend

Coalesce 16-bit FP types to use integer register classes.

i16/f16/bf16 will use the same .b16 registers and
i32/v2f16 and v2bf16 will share .b32 registers.

The changes are mostly mechanical, intended to remove unnecessary register
classes which tend to produce redundant register moves.

Differential Revision: https://reviews.llvm.org/D151601

v2f16 regtype conversion to i32

show more ...


Revision tags: llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2
# 70459669 07-Feb-2023 Artem Belevich <tra@google.com>

[NVPTX] Lower extraction of upper half of i32/i64 as partial move.

This produces better SASS than right-shift + truncate and is fairly common for
CUDA code that operates on __half2 values represente

[NVPTX] Lower extraction of upper half of i32/i64 as partial move.

This produces better SASS than right-shift + truncate and is fairly common for
CUDA code that operates on __half2 values represented as opaque integer.

Differential Revision: https://reviews.llvm.org/D143448

show more ...


Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 2c3f82b7 05-Jan-2023 Benjamin Chetioui <bchetioui@google.com>

[NVPTX] Fix NVPTX lowering of frem when denominator is infinite.

`frem x, {+,-}inf` must return x to match the specification of LLVM's frem.

Reviewed By: tra

Differential Revision: https://reviews

[NVPTX] Fix NVPTX lowering of frem when denominator is infinite.

`frem x, {+,-}inf` must return x to match the specification of LLVM's frem.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D140846

show more ...


# 9b81548a 19-Dec-2022 Nikita Popov <npopov@redhat.com>

[NVPTX] Convert some tests to opaque pointers (NFC)


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3
# 8407fdbd 13-Oct-2022 Jakub Chlanda <j.chlanda@gmail.com>

[NVPTX] Support neg{.ftz} for f16 and f16x2

Differential Revision: https://reviews.llvm.org/D135428


Revision tags: working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0
# 3ccaabe0 25-Aug-2022 Benjamin Kramer <benny.kra@googlemail.com>

[NVPTX] Lower llvm.roundeven to cvt.rni


Revision tags: llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4
# 2e7e0975 30-Apr-2022 Dmitry Vassiliev <dvassiliev@accesssoftek.com>

[NVPTX] Prefix "$L__" for branch label names

A global variable may have the same name as a label, and ptxas does not accept it.
Prefix labels with $L__ to fix this.

Reviewed By: MaskRay, tra

Diffe

[NVPTX] Prefix "$L__" for branch label names

A global variable may have the same name as a label, and ptxas does not accept it.
Prefix labels with $L__ to fix this.

Reviewed By: MaskRay, tra

Differential Revision: https://reviews.llvm.org/D119669

show more ...


Revision tags: llvmorg-14.0.3
# 0f1b5f11 27-Apr-2022 Andrew Savonichev <andrew.savonichev@gmail.com>

[NVPTX] Integrate ptxas to LIT tests

ptxas is a proprietary compiler from Nvidia that can compile PTX to
machine code (SASS). It has a lot of diagnostics to catch errors
in PTX, which can be used to

[NVPTX] Integrate ptxas to LIT tests

ptxas is a proprietary compiler from Nvidia that can compile PTX to
machine code (SASS). It has a lot of diagnostics to catch errors
in PTX, which can be used to verify PTX output from llc.

Set -DPXTAS_EXECUTABLE=/path/to/ptxas CMake option to enable it.
If this option is not set, then ptxas is substituted to true which
effectively disables all ptxas RUN lines.

LLVM_PTXAS_EXECUTABLE environment variable takes precedence over
the CMake option, and allows to override ptxas executable that is used for LIT
without complete re-configuration.

Differential Revision: https://reviews.llvm.org/D121727

show more ...


Revision tags: llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2
# a00ae86a 14-Feb-2022 Fangrui Song <i@maskray.me>

Revert D119669 "[NVPTX] Prefix "$L__" for branch label names"

This reverts commit cccef321096c20825fe8738045c1d91d3b9fd57d.

Broke clang-cuda-t4

```
/buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin

Revert D119669 "[NVPTX] Prefix "$L__" for branch label names"

This reverts commit cccef321096c20825fe8738045c1d91d3b9fd57d.

Broke clang-cuda-t4

```
/buildbot/cuda-t4-0/work/clang-cuda-t4/clang/bin/clang++ -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time -UNDEBUG --cuda-path=/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/cuda-11.0 -I/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/cuda-11.0/include --cuda-gpu-arch=sm_75 -std=c++20 -stdlib=libstdc++ --gcc-toolchain=/buildbot/cuda-t4-0/work/clang-cuda-t4/external/cuda/gcc-8 -DSTDLIB_VERSION=2014 -MD -MT External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o -MF External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o.d -o External/CUDA/CMakeFiles/complex-cuda-11.0-c++20-libstdc++-8.dir/complex.cu.o -c /buildbot/cuda-t4-0/work/clang-cuda-t4/llvm-test-suite/External/CUDA/complex.cu
ptxas /tmp/complex-cfa050/complex-sm_75.s, line 250; fatal : Parsing error near '$L__BB6_2': syntax error
ptxas fatal : Ptx assembly aborted due to errors
```

show more ...


# cccef321 14-Feb-2022 Dmitry Vassiliev <dvassiliev@accesssoftek.com>

[NVPTX] Prefix "$L__" for branch label names

A global variable may have the same name as a label, and ptxas does not accept it.
Prefix labels with $L__ to fix this.

Reviewed By: MaskRay, tra

Diffe

[NVPTX] Prefix "$L__" for branch label names

A global variable may have the same name as a label, and ptxas does not accept it.
Prefix labels with $L__ to fix this.

Reviewed By: MaskRay, tra

Differential Revision: https://reviews.llvm.org/D119669

show more ...


Revision tags: llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# 4c7f820b 26-Mar-2021 Bjorn Pettersson <bjorn.a.pettersson@ericsson.com>

Update @llvm.powi to handle different int sizes for the exponent

This can be seen as a follow up to commit 0ee439b705e82a4fe20e2,
that changed the second argument of __powidf2, __powisf2 and
__powit

Update @llvm.powi to handle different int sizes for the exponent

This can be seen as a follow up to commit 0ee439b705e82a4fe20e2,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439

show more ...


# 505933a4 24-May-2021 thomasraoux <thomasraoux@google.com>

[NVPTX] Fix lowering of frem for negative values

to match fmod frem result must have the dividend sign. Previous implementation
had the wrong sign when passing negative numbers. For ex: frem(-16, 7)

[NVPTX] Fix lowering of frem for negative values

to match fmod frem result must have the dividend sign. Previous implementation
had the wrong sign when passing negative numbers. For ex: frem(-16, 7) was
returning 5 instead of -2. We should just a ftrunc instead of floor when
lowering to get the right behavior.

Differential Revision: https://reviews.llvm.org/D102528

show more ...


Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 05e90cef 11-Jan-2021 Mircea Trofin <mtrofin@google.com>

[NFC] Disallow unused prefixes under llvm/test/CodeGen

This patch finishes addressing unused prefixes under CodeGen: 2
remaining tests fixed, and then undo-ing the lit.local.cfg changes under
variou

[NFC] Disallow unused prefixes under llvm/test/CodeGen

This patch finishes addressing unused prefixes under CodeGen: 2
remaining tests fixed, and then undo-ing the lit.local.cfg changes under
various subdirs and moving the policy under CodeGen.

Differential Revision: https://reviews.llvm.org/D94430

show more ...


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2
# a8cc9047 04-Dec-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

CodeGen: Add -denormal-fp-math-f32 flag

Make the set of FP related attributes and command flags closer.


# 0fd8030b 27-Mar-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

Fix line endings in test


Revision tags: llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1
# 6c21ccd2 01-Apr-2019 Bixia Zheng <bixia@google.com>

[NVPTX] Fix the codegen for llvm.round.

Summary:
Previously, we translate llvm.round to PTX cvt.rni, which rounds to the
even interger when the source is equidistant between two integers. This
is no

[NVPTX] Fix the codegen for llvm.round.

Summary:
Previously, we translate llvm.round to PTX cvt.rni, which rounds to the
even interger when the source is equidistant between two integers. This
is not correct as llvm.round should round away from zero. This change
replaces llvm.round with a round away from zero implementation through
target specific custom lowering.

Modify a few affected tests to not check for cvt.rni. Instead, we check
for the use of a few constants used in implementing round. We are also
adding CUDA runnable tests to check for the values produced by
llvm.round to test-suites/External/CUDA.

Reviewers: tra

Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59947

llvm-svn: 357407

show more ...


Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1
# b7cef81f 14-Jan-2019 Francis Visoiu Mistrih <francisvm@yahoo.com>

Replace "no-frame-pointer-*" function attributes with "frame-pointer"

Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim"

Replace "no-frame-pointer-*" function attributes with "frame-pointer"

Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"

Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.

"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".

tests are mostly updated with

// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"

// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"

Patch by Yuanfang Chen (tabloid.adroit)!

Differential Revision: https://reviews.llvm.org/D56351

llvm-svn: 351049

show more ...


Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2
# d66dde5a 21-Aug-2018 Benjamin Kramer <benny.kra@googlemail.com>

[NVPTX] Remove ftz variants of cvt with rounding mode

These do not exist in ptxas, it refuses to compile them.

Differential Revision: https://reviews.llvm.org/D51042

llvm-svn: 340317


Revision tags: llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2
# 55ff5786 15-May-2017 Simon Pilgrim <llvm-dev@redking.me.uk>

[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)

Follow up to D33147

NVPTXTargetLowering::LowerCall was trusting the default argument values.

Fixes anoth

[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)

Follow up to D33147

NVPTXTargetLowering::LowerCall was trusting the default argument values.

Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33189

llvm-svn: 303082

show more ...


12