History log of /llvm-project/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp (Results 1 – 21 of 21)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6
# 009368f1 09-Dec-2024 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Mark grid size loads with range metadata (#113019)

Only handles the v5 case.


Revision tags: llvmorg-19.1.5, llvmorg-19.1.4
# be187369 14-Nov-2024 Kazu Hirata <kazu@google.com>

[AMDGPU] Remove unused includes (NFC) (#116154)

Identified with misc-include-cleaner.


Revision tags: llvmorg-19.1.3
# 6924fc03 16-Oct-2024 Rahul Joshi <rjoshi@nvidia.com>

[LLVM] Add `Intrinsic::getDeclarationIfExists` (#112428)

Add `Intrinsic::getDeclarationIfExists` to lookup an existing
declaration of an intrinsic in a `Module`.


Revision tags: llvmorg-19.1.2
# 8d13e7b8 03-Oct-2024 Jay Foad <jay.foad@amd.com>

[AMDGPU] Qualify auto. NFC. (#110878)

Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)


Revision tags: llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2
# 62e9f409 29-Jul-2024 Yingwei Zheng <dtcxzyw2333@gmail.com>

[PatternMatch] Use `m_SpecificCmp` matchers. NFC. (#100878)

Compile-time improvement:
http://llvm-compile-time-tracker.com/compare.php?from=13996378d81c8fa9a364aeaafd7382abbc1db83a&to=861ffa4ec5f7b

[PatternMatch] Use `m_SpecificCmp` matchers. NFC. (#100878)

Compile-time improvement:
http://llvm-compile-time-tracker.com/compare.php?from=13996378d81c8fa9a364aeaafd7382abbc1db83a&to=861ffa4ec5f7bde5a194a7715593a1b5359eb581&stat=instructions:u
baseline: 803eaf29267c6aae9162d1a83a4a2ae508b440d3
```
Top 5 improvements:
stockfish/movegen.ll 2541620819 2538599412 -0.12%
minetest/profiler.cpp.ll 431724935 431246500 -0.11%
abc/luckySwap.c.ll 581173720 580581935 -0.10%
abc/kitTruth.c.ll 2521936288 2519445570 -0.10%
abc/extraUtilTruth.c.ll 1216674614 1215495502 -0.10%
Top 5 regressions:
openssl/libcrypto-shlib-sm4.ll 1155054721 1155943201 +0.08%
openssl/libcrypto-lib-sm4.ll 1155054838 1155943063 +0.08%
spike/vsm4r_vv.ll 1296430080 1297039258 +0.05%
spike/vsm4r_vs.ll 1312496906 1313093460 +0.05%
nuttx/lib_rand48.c.ll 126201233 126246692 +0.04%
Overall: -0.02112308%
```

show more ...


Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init
# 9df71d76 28-Jun-2024 Nikita Popov <npopov@redhat.com>

[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)

Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, re

[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)

Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.

show more ...


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init
# bc82cfb3 21-Jan-2024 Emma Pilkington <emma.pilkington95@gmail.com>

[AMDGPU] Add an asm directive to track code_object_version (#76267)

Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the ass

[AMDGPU] Add an asm directive to track code_object_version (#76267)

Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.

This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5
# d08496b5 01-Nov-2023 Nikita Popov <npopov@redhat.com>

[AMDGPULowerKernelAttribute] Avoid use of ConstantExpr::getIntegerCast() (NFC)

We are working on ConstantInt here, so folding will always succeed.
This just avoids use of the ConstantExpr API.


Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3
# 7ca3444f 10-Feb-2023 Changpeng Fang <changpeng.fang@amd.com>

AMDGPU: Use module flag to get code object version at IR level folow-up

Summary:
This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version a

AMDGPU: Use module flag to get code object version at IR level folow-up

Summary:
This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version as an argument to initialize target ID
and use it for targetID dump.

Reviewers: arsenm

Differential Revision
https://reviews.llvm.org/D143293

show more ...


Revision tags: llvmorg-16.0.0-rc2
# 54cf69c9 03-Feb-2023 Changpeng Fang <changpeng.fang@amd.com>

AMDGPU: Use module flag to get code object version at IR level

Summary:
This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command l

AMDGPU: Use module flag to get code object version at IR level

Summary:
This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line.
In case the module flag is missing, we use the current default code object version supported in the compiler.

For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code
object version, That will be in a separate patch later.

For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file.
In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal.

Reviewer: arsenm

Differential Revision:
https://reviews.llvm.org/D14313

show more ...


Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working
# fa2b1cb8 05-Oct-2022 Juan Manuel MARTINEZ CAAMAÑO <juamarti@amd.com>

[NFC][AMDGPULowerKernelAttributes] Factorize repeated code into function

Differential Revision: https://reviews.llvm.org/D135266


Revision tags: llvmorg-15.0.2
# dee4bc4a 26-Sep-2022 Changpeng Fang <changpeng.fang@amd.com>

AMDGPU: Handle new address pattern in LowerKernelAttributes introduced by opaque pointers

Summary:
With opaque pointer support, the "ptr" type is introduced and thus BitCast is not necessary in so

AMDGPU: Handle new address pattern in LowerKernelAttributes introduced by opaque pointers

Summary:
With opaque pointer support, the "ptr" type is introduced and thus BitCast is not necessary in some cases.
This work takes care of this change, and recognizes the new address patterns to do appropriate optimizations.

Reviewers:
arsenm

Differential Revision:
https://reviews.llvm.org/D134596

show more ...


# 3ae4c358 21-Sep-2022 Changpeng Fang <changpeng.fang@amd.com>

AMDGPU: Implicit kernel arguments related optimization when uniform-workgroup-size=true

Summary:
Under code object version 5, ockl_get_local_size returns the value computed by the expression:
wo

AMDGPU: Implicit kernel arguments related optimization when uniform-workgroup-size=true

Summary:
Under code object version 5, ockl_get_local_size returns the value computed by the expression:
workgroup_id < hidden_block_count ? hidden_group_size : hidden_remainder
For functions with the attribute uniform-work-group-size=true. we can evaluate workgroup_id < hidden_block_count
as true, and thus hidden_group_size is returned for ockl_get_local_size.
With uniform-workgroup-size=true, this work also set all remainders to zero, and if there
is reqd_work_group_size, we also set work-group-size to the required value from the metadata.

Reviewers:
arsenm and bcahoon

Differential Revision:
https://reviews.llvm.org/D131276

show more ...


Revision tags: llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# c986d476 07-Apr-2022 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Update reqd-work-group-size optimization for umin intrinsic

This code was pattern matching the ID computation expression as it
appears in the library. This was a compare and select, but now

AMDGPU: Update reqd-work-group-size optimization for umin intrinsic

This code was pattern matching the ID computation expression as it
appears in the library. This was a compare and select, but now that
umin is canonical, we were no longer matching. Update to match the
intrinsic instead.

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# aff66b7e 06-Jul-2021 Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>

[AMDGPU] Fix pass name of AMDGPULowerKernelAttributes. NFC.

This was obviously copy-pasted.


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4
# d6de1e1a 24-Mar-2021 Serge Guelton <sguelton@redhat.com>

Normalize interaction with boolean attributes

Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

Normalize interaction with boolean attributes

Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299

show more ...


Revision tags: llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2
# 560d7e04 20-Jan-2021 dfukalov <daniil.fukalov@amd.com>

[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets

... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036


Revision tags: llvmorg-11.1.0-rc1
# 6a87e9b0 25-Dec-2020 dfukalov <daniil.fukalov@amd.com>

[NFC][AMDGPU] Reduce include files dependency.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813


# 7ecbe0c7 28-Dec-2020 Arthur Eubanks <aeubanks@google.com>

[NewPM][AMDGPU] Port amdgpu-lower-kernel-attributes

And add it to the AMDGPU opt pipeline.

This is a function pass instead of a module pass (like the legacy pass)
because it's getting added to a CG

[NewPM][AMDGPU] Port amdgpu-lower-kernel-attributes

And add it to the AMDGPU opt pipeline.

This is a function pass instead of a module pass (like the legacy pass)
because it's getting added to a CGSCCPassManager, and you can't put a
module pass in a CGSCCPassManager.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93885

show more ...


Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1
# 2946cd70 19-Jan-2019 Chandler Carruth <chandlerc@gmail.com>

Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the ne

Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636

show more ...


Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3, llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2
# 372d796a 18-May-2018 Matt Arsenault <Matthew.Arsenault@amd.com>

AMDGPU: Add pass to optimize reqd_work_group_size

Eliminate loads from the dispatch packet when they will have
a known value.

Also pattern match the code used by the library to handle partial
workg

AMDGPU: Add pass to optimize reqd_work_group_size

Eliminate loads from the dispatch packet when they will have
a known value.

Also pattern match the code used by the library to handle partial
workgroup dispatches, which isn't necessary if reqd_work_group_size
is used.

llvm-svn: 332771

show more ...