History log of /llvm-project/llvm/test/CodeGen/AMDGPU/inline-asm.ll (Results 1 – 25 of 36)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2
# 9843843c 30-Jul-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

SelectionDAG: Do not propagate divergence through copy glue (#101210)

This fixes DAG divergence mishandling inline asm.

This was considering the glue nodes for divergence, when the
divergence sh

SelectionDAG: Do not propagate divergence through copy glue (#101210)

This fixes DAG divergence mishandling inline asm.

This was considering the glue nodes for divergence, when the
divergence should only come from the individual CopyFromRegs
that are glued. As a result, having any VGPR CopyFromRegs would
taint any uniform SGPR copies as divergent, resulting in SGPR
copies to VGPR virtual registers later.

show more ...


# d2b6a8ee 30-Jul-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix asserting when trying to print scc (#101175)

This is printable using inline assembly. Also we should
handle using scc directly as instruction operands.


Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 9e9907f1 17-Jan-2024 Fangrui Song <i@maskray.me>

[AMDGPU,test] Change llc -march= to -mtriple= (#75982)

Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while

[AMDGPU,test] Change llc -march= to -mtriple= (#75982)

Similar to 806761a7629df268c8aed49657aeccffa6bca449.

For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.

Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.

This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:

```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4
# fe8335ba 30-Oct-2023 Stanislav Mekhanoshin <rampitec@users.noreply.github.com>

[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395)

This allows folding of 64-bit operands if fit into 32-bit. Fixes
https://github.com/llvm/llvm-project/issues/67781


Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2
# a3dfa4e0 18-Apr-2023 Kriti Gupta <kriti.gupta@research.iiit.ac.in>

[test] Remove occurences of br undef in CodeGen/AMDGPU tests

Differential Revision: https://reviews.llvm.org/D148041


Revision tags: llvmorg-16.0.1
# 047efda6 04-Apr-2023 Nuno Lopes <nuno.lopes@tecnico.ulisboa.pt>

Revert "[test] Remove occurences of br undef in CodeGen/AMDGPU tests"

This reverts commit 18c594c176036b7ffcd8439ed9c4b08d2085a244.
Build bots broke


# 18c594c1 04-Apr-2023 Kriti Gupta <kriti.gupta@research.iiit.ac.in>

[test] Remove occurences of br undef in CodeGen/AMDGPU tests

Differential Revision: https://reviews.llvm.org/D145622


Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# bdf2fbba 19-Dec-2022 Nikita Popov <npopov@redhat.com>

[AMDGPU] Convert some tests to opaque pointers (NFC)


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# f510045d 14-Jan-2022 Jay Foad <jay.foad@amd.com>

[CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.

Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].

Differential Revision: https://reviews.llvm.org/D117298


Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# 8a52bd82 19-Nov-2021 Jay Foad <jay.foad@amd.com>

[AMDGPU] Only select VOP3 forms of VOP2 instructions

Change VOP_PAT_GEN to default to not generating an instruction selection
pattern for the VOP2 (e32) form of an instruction, only for the VOP3
(e6

[AMDGPU] Only select VOP3 forms of VOP2 instructions

Change VOP_PAT_GEN to default to not generating an instruction selection
pattern for the VOP2 (e32) form of an instruction, only for the VOP3
(e64) form. This allows SIFoldOperands maximum freedom to fold copies
into the operands of an instruction, before SIShrinkInstructions tries
to shrink it back to the smaller encoding.

This affects the following VOP2 instructions:
v_min_i32
v_max_i32
v_min_u32
v_max_u32
v_and_b32
v_or_b32
v_xor_b32
v_lshr_b32
v_ashr_i32
v_lshl_b32

A further cleanup could simplify or remove VOP_PAT_GEN, since its
optional second argument is never used.

Differential Revision: https://reviews.llvm.org/D114252

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# 381ded34 28-Jun-2021 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants

This is to allow 64 bit constant rematerialization. If a constant
is split into two separate moves initializing sub0 and sub1 like
now RA cannot

[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants

This is to allow 64 bit constant rematerialization. If a constant
is split into two separate moves initializing sub0 and sub1 like
now RA cannot rematerizalize a 64 bit register.

This gives 10-20% uplift in a set of huge apps heavily using double
precession math.

Fixes: SWDEV-292645

Differential Revision: https://reviews.llvm.org/D104874

show more ...


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1
# 2f499b9a 19-Dec-2020 Tony <Tony.Tye@amd.com>

[AMDGPU] Add volatile support to SIMemoryLegalizer

Treat a non-atomic volatile load and store as a relaxed atomic at
system scope for the address spaces accessed. This will ensure all
relevant cache

[AMDGPU] Add volatile support to SIMemoryLegalizer

Treat a non-atomic volatile load and store as a relaxed atomic at
system scope for the address spaces accessed. This will ensure all
relevant caches will be bypassed.

A volatile atomic is not changed and still only bypasses caches upto
the level specified by the SyncScope operand.

Differential Revision: https://reviews.llvm.org/D94214

show more ...


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2
# cb7b661d 03-Feb-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Analyze divergence of inline asm


# 5df1ac78 31-Jan-2020 alex-t <alexander.timofeev@amd.com>

[AMDGPU] fixed divergence driven shift operations selection

Differential Revision: https://reviews.llvm.org/D73483

Reviewers: rampitec


Revision tags: llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# 21bc8e5a 28-Oct-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Make VReg_1 only include 1 artificial register

When TableGen is inferring register classes from contexts, it uses a
sorting function based on the number of registers in the class. Since
this

AMDGPU: Make VReg_1 only include 1 artificial register

When TableGen is inferring register classes from contexts, it uses a
sorting function based on the number of registers in the class. Since
this was being treated as an alias of VGPR_32, they had exactly the
same size. The sort used wasn't a stable sort, and even if it were, I
believe the tie breaker would effectively end up being the
alphabetical ordering of the class name. There appear to be issues
trying to use an empty set of registers, so add only one so this will
always sort to the end.

Also add a comment explaining how VReg_1 is a dirty hack for
SelectionDAG.

This does end up changing the behavior of i1 with inline asm and VGPR
constraints, but the existing behavior was was already nonsensical and
inconsistent. It should probably be disallowed anyway.

Fixes bug 43699

show more ...


Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2
# 5fc1dfa7 28-May-2019 Michael Liao <michael.hliao@gmail.com>

[AMDGPU] Correct the handling of inlineasm output registers.

Summary:
- There's a regression due to the cross-block RC assignment. Use the
proper way to derive the output register RC in inline asm

[AMDGPU] Correct the handling of inlineasm output registers.

Summary:
- There's a regression due to the cross-block RC assignment. Use the
proper way to derive the output register RC in inline asm.

Reviewers: rampitec, alex-t

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62537

llvm-svn: 361868

show more ...


Revision tags: llvmorg-8.0.1-rc1
# fbdd2a18 15-Apr-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix printed format of SReg_96

These are artificial, so I think this should only come up with inline
asm comments.

llvm-svn: 358446


Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1
# 814abb59 31-Oct-2018 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Rewrite SILowerI1Copies to always stay on SALU

Summary:
Instead of writing boolean values temporarily into 32-bit VGPRs
if they are involved in PHIs or are observed from outside a loop,
we u

AMDGPU: Rewrite SILowerI1Copies to always stay on SALU

Summary:
Instead of writing boolean values temporarily into 32-bit VGPRs
if they are involved in PHIs or are observed from outside a loop,
we use bitwise masking operations to combine lane masks in a way
that is consistent with wave control flow.

Move SIFixSGPRCopies to before this pass, since that pass
incorrectly attempts to move SGPR phis to VGPRs.

This should recover most of the code quality that was lost with
the bug fix in "AMDGPU: Remove PHI loop condition optimization".

There are still some relevant cases where code quality could be
improved, in particular:

- We often introduce redundant masks with EXEC. Ideally, we'd
have a generic computeKnownBits-like analysis to determine
whether masks are already masked by EXEC, so we can avoid this
masking both here and when lowering uniform control flow.

- The criterion we use to determine whether a def is observed
from outside a loop is conservative: it doesn't check whether
(loop) branch conditions are uniform.

Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b

Reviewers: arsenm, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53496

llvm-svn: 345719

show more ...


# 611b533f 30-Oct-2018 Jonas Paulsson <paulsson@linux.vnet.ibm.com>

[SchedModel] Fix for read advance cycles with implicit pseudo operands.

The SchedModel allows the addition of ReadAdvances to express that certain
operands of the instructions are needed at a later

[SchedModel] Fix for read advance cycles with implicit pseudo operands.

The SchedModel allows the addition of ReadAdvances to express that certain
operands of the instructions are needed at a later point than the others.

RegAlloc may add pseudo operands that are not part of the instruction
descriptor, and therefore cannot have any read advance entries. This meant
that in some cases the desired read advance was nullified by such a pseudo
operand, which still had the original latency.

This patch fixes this by making sure that such pseudo operands get a zero
latency during DAG construction.

Review: Matthias Braun, Ulrich Weigand.
https://reviews.llvm.org/D49671

llvm-svn: 345606

show more ...


Revision tags: llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2
# 36cd1859 09-Aug-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix assert on n inline asm constraint

llvm-svn: 310515


Revision tags: llvmorg-5.0.0-rc1
# 9aa45f04 06-Jul-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add macro fusion schedule DAG mutation

Try to increase opportunities to shrink vcc uses.

llvm-svn: 307313


Revision tags: llvmorg-4.0.1, llvmorg-4.0.1-rc3
# 3c7581bb 08-Jun-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Use correct register names in inline assembly

Fixes using physical registers in inline asm from clang.

llvm-svn: 305004


Revision tags: llvmorg-4.0.1-rc2
# 2b1f9aa5 17-May-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Start defining a calling convention

Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as

AMDGPU: Start defining a calling convention

Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

llvm-svn: 303308

show more ...


# 2a80369a 29-Apr-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix copies from physical registers in SIFixSGPRCopies

This would assert when there were multiple defs of
a physical register.

We just need to move all of the users of it.

llvm-svn: 301730


Revision tags: llvmorg-4.0.1-rc1
# 0d0d6c2f 12-Apr-2017 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Fix invalid copies when copying i1 to phys reg

Insert a VReg_1 virtual register so the i1 workaround pass
can handle it.

llvm-svn: 300113


12