History log of /llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (Results 251 – 275 of 364)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 361b5b21 21-Mar-2019 Tim Renouf <tpr.llvm@botech.co.uk>

[AMDGPU] Support for v3i32/v3f32

Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there

[AMDGPU] Support for v3i32/v3f32

Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58902

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9
llvm-svn: 356659

show more ...


Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4
# 6023d599 04-Mar-2019 Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>

[AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, v_readlane_b32 and v_writelane_b32

See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662

Reviewers: artem.tamazov, arsenm, rampi

[AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, v_readlane_b32 and v_writelane_b32

See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D58713

llvm-svn: 355312

show more ...


Revision tags: llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1
# d7047276 08-Feb-2019 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove GCN features and predicates

These are no longer necessary since the R600 tablegen files are split
out now.

llvm-svn: 353548


Revision tags: llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1
# 2946cd70 19-Jan-2019 Chandler Carruth <chandlerc@gmail.com>

Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the ne

Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636

show more ...


# 76504a4c 12-Dec-2018 Neil Henning <neil.henning@amd.com>

[AMDGPU] Extend the SI Load/Store optimizer to combine more things.

I've extended the load/store optimizer to be able to produce dwordx3
loads and stores, This change allows many more load/stores to

[AMDGPU] Extend the SI Load/Store optimizer to combine more things.

I've extended the load/store optimizer to be able to produce dwordx3
loads and stores, This change allows many more load/stores to be combined,
and results in much more optimal code for our hardware.

Differential Revision: https://reviews.llvm.org/D54042

llvm-svn: 348937

show more ...


Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3
# a7b00058 30-Nov-2018 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Divergence-driven selection of scalar buffer load intrinsics

Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar

AMDGPU: Divergence-driven selection of scalar buffer load intrinsics

Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 348050

show more ...


# 1a94cbb3 29-Nov-2018 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU/InsertWaitcnts: Untangle some semi-global state

Summary:
Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the
first o

AMDGPU/InsertWaitcnts: Untangle some semi-global state

Summary:
Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the
first one which determines the required wait, if any, without changing
the ScoreBrackets, and the second one which actually inserts the wait
and updates the brackets.

2. Communicate pre-existing s_waitcnt instructions using an argument to
generateWaitcntInstBefore instead of through the ScoreBrackets.

To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.

There are some functional changes:

1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
SMEM instructions as required instead of waiting on all counters.

2. We now properly track pre-existing waitcnt's in all cases, which leads
to less conservative waitcnts being emitted in some cases.

s_load_dword ...
s_waitcnt lgkmcnt(0) <-- pre-existing wait count
ds_read_b32 v0, ...
ds_read_b32 v1, ...
s_waitcnt lgkmcnt(0) <-- this is too conservative
use(v0)
more code
use(v1)

This increases code size a bit, but the reduced latency should still be a
win in basically all cases. The worst code size regressions in my shader-db
are:

WORST REGRESSIONS - Code Size
Before After Delta Percentage
1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0]
2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0]
4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0]
2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0]
3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0]

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54226

llvm-svn: 347848

show more ...


# cac749ac 16-Nov-2018 Ron Lieberman <ronlieb.g@gmail.com>

[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST

Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* i

[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST

Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.

llvm-svn: 347008

show more ...


# af7b5d70 15-Nov-2018 Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

AMDHSA: More code object v3 fixes:

- Make sure IsaInfo::hasCodeObjectV3 returns true only
for AMDHSA
- Update assembler metadata tests to use v2 by default

llvm-svn: 347001


# bc233f55 07-Nov-2018 Nicolai Haehnle <nhaehnle@gmail.com>

Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"

This reverts commit r344696 for now (except for some test additions).

See https://bugs.freedesktop.org/show_bug.cgi?id=

Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"

This reverts commit r344696 for now (except for some test additions).

See https://bugs.freedesktop.org/show_bug.cgi?id=108611.

llvm-svn: 346364

show more ...


# 108927b9 05-Nov-2018 Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

AMDGPU: Add sram-ecc feature

Differential Revision: https://reviews.llvm.org/D53222

llvm-svn: 346177


Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1
# c4a2ff09 17-Oct-2018 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Divergence-driven selection of scalar buffer load intrinsics

Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar

AMDGPU: Divergence-driven selection of scalar buffer load intrinsics

Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 344696

show more ...


Revision tags: llvmorg-7.0.0
# 71e43ee4 12-Sep-2018 Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

AMDGPU: Re-apply r341982 after fixing the layering issue

Move isa version determination into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fix

AMDGPU: Re-apply r341982 after fixing the layering issue

Move isa version determination into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).

llvm-svn: 342069

show more ...


# 95066496 12-Sep-2018 Ilya Biryukov <ibiryukov@google.com>

Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser."

This reverts commit r341982.

The change introduced a layering violation. Reverting to unbreak
our integrate.

Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser."

This reverts commit r341982.

The change introduced a layering violation. Reverting to unbreak
our integrate.

llvm-svn: 342023

show more ...


# 941615e4 11-Sep-2018 Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination
into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wr

AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination
into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).

Differential Revision: https://reviews.llvm.org/D51890

llvm-svn: 341982

show more ...


Revision tags: llvmorg-7.0.0-rc3
# 0da6350d 31-Aug-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Remove remnants of old address space mapping

llvm-svn: 341165


Revision tags: llvmorg-7.0.0-rc2
# 4f703f5e 21-Aug-2018 Tim Renouf <tpr.llvm@botech.co.uk>

[AMDGPU] New buffer intrinsics

Summary:
This commit adds new intrinsics
llvm.amdgcn.raw.buffer.load
llvm.amdgcn.raw.buffer.load.format
llvm.amdgcn.raw.buffer.load.format.d16
llvm.amdgcn.stru

[AMDGPU] New buffer intrinsics

Summary:
This commit adds new intrinsics
llvm.amdgcn.raw.buffer.load
llvm.amdgcn.raw.buffer.load.format
llvm.amdgcn.raw.buffer.load.format.d16
llvm.amdgcn.struct.buffer.load
llvm.amdgcn.struct.buffer.load.format
llvm.amdgcn.struct.buffer.load.format.d16
llvm.amdgcn.raw.buffer.store
llvm.amdgcn.raw.buffer.store.format
llvm.amdgcn.raw.buffer.store.format.d16
llvm.amdgcn.struct.buffer.store
llvm.amdgcn.struct.buffer.store.format
llvm.amdgcn.struct.buffer.store.format.d16
llvm.amdgcn.raw.buffer.atomic.*
llvm.amdgcn.struct.buffer.atomic.*

with the following changes from the llvm.amdgcn.buffer.*
intrinsics:

* there are separate raw and struct versions: raw does not have an
index arg and sets idxen=0 in the instruction, and struct always sets
idxen=1 in the instruction even if the index is 0, to allow for the
fact that gfx9 does bounds checking differently depending on whether
idxen is set;

* there is a combined cachepolicy arg (glc+slc)

* there are now only two offset args: one for the offset that is
included in bounds checking and swizzling, to be split between the
instruction's voffset and immoffset fields, and one for the offset
that is excluded from bounds checking and swizzling, to go into the
instruction's soffset field.

The AMDISD::BUFFER_* SD nodes always have an index operand, all three
offset operands, combined cachepolicy operand, and an extra idxen
operand.

The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D50306

Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205
llvm-svn: 340269

show more ...


Revision tags: llvmorg-7.0.0-rc1
# 894c8fd0 01-Aug-2018 Ryan Taylor <rtayl@amd.com>

[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero

Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to

[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero

Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to zero, if so remove lod and change opcode to equivalent mapped _LZ.

Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71

Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49483

llvm-svn: 338523

show more ...


# c5a154db 28-Jun-2018 Tom Stellard <tstellar@redhat.com>

AMDGPU: Separate R600 and GCN TableGen files

Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
regis

AMDGPU: Separate R600 and GCN TableGen files

Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc. This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself. This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

llvm-svn: 335942

show more ...


# 1e8c2c70 21-Jun-2018 Scott Linder <scott@scottlinder.com>

[AMDGPU] Update assembler for HSA Code Object v3

Update AMDGPU assembler syntax behind the code-object-v3 feature:

* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Pro

[AMDGPU] Update assembler for HSA Code Object v3

Update AMDGPU assembler syntax behind the code-object-v3 feature:

* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Provide more diagnostics (e.g. values out of range, missing values, repeated
values).
* Provide path for backwards compatibility, even with underlying descriptor
changes.

Differential Revision: https://reviews.llvm.org/D47736

llvm-svn: 335281

show more ...


# 7a9c03f4 21-Jun-2018 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Select MIMG instructions manually in SITargetLowering

Summary:
Having TableGen patterns for image intrinsics is hitting limitations:
for D16 we already have to manually pre-lower the packing

AMDGPU: Select MIMG instructions manually in SITargetLowering

Summary:
Having TableGen patterns for image intrinsics is hitting limitations:
for D16 we already have to manually pre-lower the packing of data
values, and we will have to do the same for A16 eventually.

Since there is already some custom C++ code anyway, it is arguably easier
to just do everything in C++, now that we can use the beefed-up generic
tables backend of TableGen to provide all the required metadata and map
intrinsics to corresponding opcodes. With this approach, all image
intrinsic lowering happens in SITargetLowering::lowerImage. That code is
dense due to all the cases that it handles, but it should still be easier
to follow than what we had before, by virtue of it all being done in a
single location, and by virtue of not relying on the TableGen pattern
magic that very few people really understand.

This means that we will have MachineSDNodes with MIMG instructions
during DAG combining, but that seems alright: previously we had
intrinsic nodes instead, but those are similarly opaque to the generic
CodeGen infrastructure, and the final pattern matching just did a 1:1
translation to machine instructions anyway. If anything, the fact that
we now merge the address words into a vector before DAG combine should
be an advantage.

Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6

Reviewers: arsenm, rampitec, rtaylor, tstellar

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48017

llvm-svn: 335228

show more ...


# 0ab200b6 21-Jun-2018 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Refactor MIMG instruction TableGen using generic tables

Summary:
This allows us to access rich information about MIMG opcodes from C++ code.
Simplifying the mapping between equivalent opcode

AMDGPU: Refactor MIMG instruction TableGen using generic tables

Summary:
This allows us to access rich information about MIMG opcodes from C++ code.
Simplifying the mapping between equivalent opcodes of different data size
becomes quite natural.

This also flattens the MIMG-related class and multiclass hierarchy a little,
and collapses together some of the scaffolding for sample and gather4 opcodes.

Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48016

llvm-svn: 335227

show more ...


# e741d7e0 21-Jun-2018 Nicolai Haehnle <nhaehnle@gmail.com>

AMDGPU: Use generic tables instead of SearchableTable

Summary:

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://

AMDGPU: Use generic tables instead of SearchableTable

Summary:

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48014

Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641
llvm-svn: 335226

show more ...


Revision tags: llvmorg-6.0.1, llvmorg-6.0.1-rc3
# 00f2cb11 12-Jun-2018 Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

AMDHSA: Code object v3 updates

- Do not emit following assembler directives:
- .hsa_code_object_version
- .hsa_code_object_isa
- .amd_amdgpu_isa
- .amd_amdgpu_hsa_metadata
- .amd_amdgpu_pa

AMDHSA: Code object v3 updates

- Do not emit following assembler directives:
- .hsa_code_object_version
- .hsa_code_object_isa
- .amd_amdgpu_isa
- .amd_amdgpu_hsa_metadata
- .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
alignments

llvm-svn: 334519

show more ...


Revision tags: llvmorg-6.0.1-rc2
# c72ece6c 16-May-2018 Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

AMDGPU : Recalculate SGPRs when trap handler is supported

Differential Revision: https://reviews.llvm.org/D29911

llvm-svn: 332523


1...<<1112131415