History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h (Results 1 – 25 of 64)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0
# 33562085 13-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)

This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.

The problem was a

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108512)

This reverts commit
https://github.com/llvm/llvm-project/commit/7792b4ae79e5ac9355ee13b01f16e25455f8427f.

The problem was a conflict with
https://github.com/llvm/llvm-project/commit/e55d6f5ea2656bf842973d8bee86c3ace31bc865
"[AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive
(https://github.com/llvm/llvm-project/pull/107889)"
which changed the syntax of V_SET_INACTIVE (and thus made my MIR test
crash).

...if only we had a merge queue.

show more ...


# 7792b4ae 12-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)

Reverts llvm/llvm-project#108173

si-init-whole-wave.mir crashes on some buildbots (although it passed
bo

Revert "Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)"" (#108341)

Reverts llvm/llvm-project#108173

si-init-whole-wave.mir crashes on some buildbots (although it passed
both locally with sanitizers enabled and in pre-merge tests).
Investigating.

show more ...


# 703ebca8 12-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)

This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.

The bui

Reland "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)" (#108173)

This reverts commit
https://github.com/llvm/llvm-project/commit/c7a7767fca736d0447832ea4d4587fb3b9e797c2.

The buildbots failed because I removed a MI from its parent before
updating LIS. This PR should fix that.

show more ...


# c7a7767f 10-Sep-2024 Vitaly Buka <vitalybuka@google.com>

Revert "[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic" (#108054)

Breaks bots, see #105822.

Reverts llvm/llvm-project#105822


# 44556e64 10-Sep-2024 Diana Picus <Diana-Magda.Picus@amd.com>

[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)

This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may conta

[amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (#105822)

This intrinsic is meant to be used in functions that have a "tail" that
needs to be run with all the lanes enabled. The "tail" may contain
complex control flow that makes it unsuitable for the use of the
existing WWM intrinsics. Instead, we will pretend that the function
starts with all the lanes enabled, then branches into the actual body of
the function for the lanes that were meant to run it, and then finally
all the lanes will rejoin and run the tail.

As such, the intrinsic will return the EXEC mask for the body of the
function, and is meant to be used only as part of a very limited pattern
(for now only in amdgpu_cs_chain functions):

```
entry:
%func_exec = call i1 @llvm.amdgcn.init.whole.wave()
br i1 %func_exec, label %func, label %tail

func:
; ... stuff that should run with the actual EXEC mask
br label %tail

tail:
; ... stuff that runs with all the lanes enabled;
; can contain more than one basic block
```

It's an error to use the result of this intrinsic for anything
other than a branch (but unfortunately checking that in the verifier is
non-trivial because SIAnnotateControlFlow will introduce an amdgcn.if
between the intrinsic and the branch).

The intrinsic is lowered to a SI_INIT_WHOLE_WAVE pseudo, which for now
is expanded in si-wqm (which is where SI_INIT_EXEC is handled too);
however the information that the function was conceptually started in
whole wave mode is stored in the machine function info
(hasInitWholeWave). This will be useful in prolog epilog insertion,
where we can skip saving the inactive lanes for CSRs (since if the
function started with all the lanes active, then there are no inactive
lanes to preserve).

show more ...


Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# 9803de0e 04-Jan-2024 Chaitanya <Krishna.Sankisa@amd.com>

[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 (#65273)

"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add

[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 (#65273)

"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a
function uses dynamic LDS.

hidden argument will be added in below cases:

- LDS global is used in the kernel.
- Kernel calls a function which uses LDS global.
- LDS pointer is passed as argument to kernel itself.

show more ...


# 55531e71 10-Dec-2023 Kazu Hirata <kazu@google.com>

[Target] Remove unused forward declarations (NFC)


# 40d802a6 06-Dec-2023 Diana Picus <Diana-Magda.Picus@amd.com>

[AMDGPU] Introduce isBottomOfStack helper. NFC (#74288)

Introduce a helper to check if a function is at the bottom of the stack,
i.e. if it's an entry function or a chain function.
This was sugges

[AMDGPU] Introduce isBottomOfStack helper. NFC (#74288)

Introduce a helper to check if a function is at the bottom of the stack,
i.e. if it's an entry function or a chain function.
This was suggested in #71913.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1
# 5272ae66 27-Jul-2023 Diana Picus <Diana-Magda.Picus@amd.com>

[AMDGPU] Add IsChainFunction to the MachineFunctionInfo

This will represent functions with the amdgpu_cs_chain or
amdgpu_cs_chain_preserve calling conventions.

Differential Revision: https://review

[AMDGPU] Add IsChainFunction to the MachineFunctionInfo

This will represent functions with the amdgpu_cs_chain or
amdgpu_cs_chain_preserve calling conventions.

Differential Revision: https://reviews.llvm.org/D156410

show more ...


Revision tags: llvmorg-18-init
# 74e928a0 13-Jul-2023 Jon Chesterfield <jonathanchesterfield@gmail.com>

[amdgpu][lds] Remove recalculation of LDS frame from backend

Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.

Prior to this patch:
The IR lowering pa

[amdgpu][lds] Remove recalculation of LDS frame from backend

Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.

Prior to this patch:
The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol
metadata so that the assembler can build lookup tables out of it. There is a fragile association between
kernel functions and named structs which is used to recompute the frame layout in the backend, with
fatal_errors catching inconsistencies in the second calculation.

After this patch:
The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same
absolute_symbol metadata that the assembler uses to place objects within that frame size.

Deleted the now dead allocation code from the backend. Left for a later cleanup:
- enabling lowering for anonymous functions
- removing the elide-module-lds attribute (test churn, it's not used by llc any more)
- adjusting the dynamic alignment check to not use symbol names

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155190

show more ...


Revision tags: llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1
# 0507448d 04-Apr-2023 Jon Chesterfield <jonathanchesterfield@gmail.com>

[amdgpu] Implement dynamic LDS accesses from non-kernel functions

The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.

[amdgpu] Implement dynamic LDS accesses from non-kernel functions

The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.

1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel
2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables
3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen
4/ The assembler builds the lookup table using the metadata
5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use

Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144233

show more ...


Revision tags: llvmorg-16.0.0
# d3dda422 12-Mar-2023 Jon Chesterfield <jonathanchesterfield@gmail.com>

[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD

Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently

[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD

Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently used to build assembler
tables from their addresses.

This is a precursor to lowering dynamic/external LDS accesses from non-kernel
functions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144221

show more ...


Revision tags: llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2
# 69e75ae6 18-Jun-2020 Matt Arsenault <Matthew.Arsenault@amd.com>

CodeGen: Don't lazily construct MachineFunctionInfo

This fixes what I consider to be an API flaw I've tripped over
multiple times. The point this is constructed isn't well defined, so
depending on w

CodeGen: Don't lazily construct MachineFunctionInfo

This fixes what I consider to be an API flaw I've tripped over
multiple times. The point this is constructed isn't well defined, so
depending on where this is first called, you can conclude different
information based on the MachineFunction. For example, the AMDGPU
implementation inspected the MachineFrameInfo on construction for the
stack objects and if the frame has calls. This kind of worked in
SelectionDAG which visited all allocas up front, but broke in
GlobalISel which hasn't visited any of the IR when arguments are
lowered.

I've run into similar problems before with the MIR parser and trying
to make use of other MachineFunction fields, so I think it's best to
just categorically disallow dependency on the MachineFunction state in
the constructor and to always construct this at the same time as the
MachineFunction itself.

A missing feature I still could use is a way to access an custom
analysis pass on the IR here.

show more ...


# 67819a72 13-Dec-2022 Fangrui Song <i@maskray.me>

[CodeGen] llvm::Optional => std::optional


# d77ae7f2 07-Dec-2022 Jon Chesterfield <jonathanchesterfield@gmail.com>

[amdgpu] Reimplement LDS lowering

Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variabl

[amdgpu] Reimplement LDS lowering

Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.

Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139433

show more ...


# a862d09a 06-Dec-2022 Nico Weber <thakis@chromium.org>

Revert "[amdgpu] Reimplement LDS lowering"

This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed.
Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862


# 98201724 06-Dec-2022 Jon Chesterfield <jonathanchesterfield@gmail.com>

[amdgpu] Reimplement LDS lowering

Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variabl

[amdgpu] Reimplement LDS lowering

Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.

Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139433

show more ...


# 4b1b9e22 05-Dec-2022 Fangrui Song <i@maskray.me>

Remove unused #include "llvm/ADT/Optional.h"


# 5a3fe9a0 27-Sep-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Move SIModeRegisterDefaults to SI MFI

It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666


# 80ba4328 28-Sep-2022 Jon Chesterfield <jonathanchesterfield@gmail.com>

[amdgpu][nfc] Allocate kernel-specific LDS struct deterministically

A kernel may have an associated struct for laying out LDS variables.
This patch puts that instance, if present, at a deterministic

[amdgpu][nfc] Allocate kernel-specific LDS struct deterministically

A kernel may have an associated struct for laying out LDS variables.
This patch puts that instance, if present, at a deterministic address by
allocating it at the same time as the module scope instance.

This is relatively likely to be where the instance was allocated anyway (~NFC)
but will allow later patches to calculate where a given field can be found,
which means a function which is only reachable from a single kernel will be
able to access a LDS variable with zero overhead. That will be particularly
helpful for applications that instantiate a function template containing LDS
variables once per kernel.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127052

show more ...


# 20a80d60 27-Sep-2022 Vitaly Buka <vitalybuka@google.com>

Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI"

Break msan bots. Details in D134666.

This reverts commit 0ce96e06ee0226938e723bd0c8e16e3d2d51f203.


# 0ce96e06 26-Sep-2022 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Move SIModeRegisterDefaults to SI MFI

It does not belong to a general AMDGPU MFI.

Differential Revision: https://reviews.llvm.org/D134666


# 3a205977 19-Jul-2022 Jon Chesterfield <jonathanchesterfield@gmail.com>

[amdgpu] Implement lds kernel id intrinsic

Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS varia

[amdgpu] Implement lds kernel id intrinsic

Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060

show more ...


# cef65864 18-Jun-2022 Guillaume Chatelet <gchatelet@google.com>

[Alignment] Use Align for MaxKernArgAlign

Differential Revision: https://reviews.llvm.org/D128118


# fb67d683 25-May-2022 serge-sans-paille <sguelton@redhat.com>

[iwyu] Handle regressions in libLLVM header include

Running iwyu-diff on LLVM codebase since 7030654296a0416bd9402a0278 detected a few
regressions, fixing them.

Differential Revision: https://revie

[iwyu] Handle regressions in libLLVM header include

Running iwyu-diff on LLVM codebase since 7030654296a0416bd9402a0278 detected a few
regressions, fixing them.

Differential Revision: https://reviews.llvm.org/D126417

show more ...


123