History log of /llvm-project/clang/lib/Basic/Targets/AMDGPU.cpp (Results 1 – 25 of 136)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# d92bac8a 25-Jan-2025 Helena Kotas <hekotas@microsoft.com>

[HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (#123411)

Introduces a new address space `hlsl_constant(2)` for constant buffer
declarations.

This address spac

[HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (#123411)

Introduces a new address space `hlsl_constant(2)` for constant buffer
declarations.

This address space is applied to declarations inside `cbuffer` block.
Later on, it will also be applied to `ConstantBuffer<T>` syntax and the
default `$Globals` constant buffer.

Clang codegen translates constant buffer declarations to global
variables and loads from `hlsl_constant(2)` address space. More work
coming soon will include addition of metadata that will map these
globals to individual constant buffers and enable their transformation
to appropriate constant buffer load intrinsics later on in an LLVM pass.

Fixes #123406

show more ...


Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# ca79ff07 14-Dec-2024 Chandler Carruth <chandlerc@gmail.com>

Revert "Switch builtin strings to use string tables" (#119638)

Reverts llvm/llvm-project#118734

There are currently some specific versions of MSVC that are miscompiling
this code (we think). We

Revert "Switch builtin strings to use string tables" (#119638)

Reverts llvm/llvm-project#118734

There are currently some specific versions of MSVC that are miscompiling
this code (we think). We don't know why as all the other build bots and
at least some folks' local Windows builds work fine.

This is a candidate revert to help the relevant folks catch their
builders up and have time to debug the issue. However, the expectation
is to roll forward at some point with a workaround if at all possible.

show more ...


# be2df95e 09-Dec-2024 Chandler Carruth <chandlerc@gmail.com>

Switch builtin strings to use string tables (#118734)

The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocation

Switch builtin strings to use string tables (#118734)

The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocations
to apply to the executable image: roughly 400k.

Each of these takes up binary space in the executable, and perhaps most
interestingly takes start-up time to apply the relocations.

The largest pattern I identified were the strings used to describe
target builtins. The addresses of these string literals were stored into
huge arrays, each one requiring a dynamic relocation. The way to avoid
this is to design the target builtins to use a single large table of
strings and offsets within the table for the individual strings. This
switches the builtin management to such a scheme.

This saves over 100k dynamic relocations by my measurement, an over 25%
reduction. Just looking at byte size improvements, using the `bloaty`
tool to compare a newly built `clang` binary to an old one:

```
FILE SIZE VM SIZE
-------------- --------------
+1.4% +653Ki +1.4% +653Ki .rodata
+0.0% +960 +0.0% +960 .text
+0.0% +197 +0.0% +197 .dynstr
+0.0% +184 +0.0% +184 .eh_frame
+0.0% +96 +0.0% +96 .dynsym
+0.0% +40 +0.0% +40 .eh_frame_hdr
+114% +32 [ = ] 0 [Unmapped]
+0.0% +20 +0.0% +20 .gnu.hash
+0.0% +8 +0.0% +8 .gnu.version
+0.9% +7 +0.9% +7 [LOAD #2 [R]]
[ = ] 0 -75.4% -3.00Ki .relro_padding
-16.1% -802Ki -16.1% -802Ki .data.rel.ro
-27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn
-1.6% -2.66Mi -1.6% -2.66Mi TOTAL
```

We get a 16% reduction in the `.data.rel.ro` section, and nearly 30%
reduction in `.rela.dyn` where those reloctaions are stored.

This is also visible in my benchmarking of binary start-up overhead at
least:

```
Benchmark 1: ./old_clang --version
Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms]
Range (min … max): 14.2 ms … 22.8 ms 162 runs

Benchmark 2: ./new_clang --version
Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms]
Range (min … max): 12.4 ms … 20.3 ms 216 runs

Summary
'./new_clang --version' ran
1.13 ± 0.14 times faster than './old_clang --version'
```

We get about 2ms faster `--version` runs. While there is a lot of noise
in binary execution time, this delta is pretty consistent, and
represents over 10% improvement. This is particularly interesting to me
because for very short source files, repeatedly starting the `clang`
binary is actually the dominant cost. For example, `configure` scripts
running against the `clang` compiler are slow in large part because of
binary start up time, not the time to process the actual inputs to the
compiler.

----

This PR implements the string tables using `constexpr` code and the
existing macro system. I understand that the builtins are moving towards
a TableGen model, and if complete that would provide more options for
modeling this. Unfortunately, that migration isn't complete, and even
the parts that are migrated still rely on the ability to break out of
the TableGen model and directly expand an X-macro style `BUILTIN(...)`
textually. I looked at trying to complete the move to TableGen, but it
would both require the difficult migration of the remaining targets, and
solving some tricky problems with how to move away from any macro-based
expansion.

I was also able to find a reasonably clean and effective way of doing
this with the existing macros and some `constexpr` code that I think is
clean enough to be a pretty good intermediate state, and maybe give a
good target for the eventual TableGen solution. I was also able to
factor the macros into set of consistent patterns that avoids a
significant regression in overall boilerplate.

show more ...


Revision tags: llvmorg-19.1.5
# f8b4182f 02-Dec-2024 Nathan Gauër <brioche@google.com>

Revert "[SPIR-V] Fixup storage class for global private (#116636)" (#118312)

This reverts commit aa7fe1c10e5d6d0d3aacdb345fed995de413e142.


# aa7fe1c1 02-Dec-2024 Nathan Gauër <brioche@google.com>

[SPIR-V] Fixup storage class for global private (#116636)

Adds a new address spaces: `hlsl_private`. Variables with such address
space will be emitted with a `Private` storage class.
This is usefu

[SPIR-V] Fixup storage class for global private (#116636)

Adds a new address spaces: `hlsl_private`. Variables with such address
space will be emitted with a `Private` storage class.
This is useful for variables global to a SPIR-V module, since up to now,
they were still emitted with a `Function` storage class, which is wrong.

---------

Signed-off-by: Nathan Gauër <brioche@google.com>

show more ...


Revision tags: llvmorg-19.1.4
# d893c5ad 11-Nov-2024 Fabian Ritter <fabian.ritter@amd.com>

[Clang][HIP] Reapply: Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#115507)

So far, these macros can be used in contexts where no meaningful
wavefront size is available. We therefore deprecate these

[Clang][HIP] Reapply: Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#115507)

So far, these macros can be used in contexts where no meaningful
wavefront size is available. We therefore deprecate these macros, to
replace them with a more resilient interface to access wavefront size
information where it is available.

Reapplies #112849 with a fix for the non-hermetic clang test that failed
on Mac after the revert in #115499.

For SWDEV-491529.

show more ...


# e734de1f 08-Nov-2024 Fabian Ritter <fabian.ritter@amd.com>

Revert "[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros" (#115499)

Reverts llvm/llvm-project#112849 due to test failure on Mac, reported by
@nico


# e5c6d1f4 08-Nov-2024 Fabian Ritter <fabian.ritter@amd.com>

[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#112849)

So far, these macros can be used in contexts where no meaningful
wavefront size is available. We therefore deprecate these macros,

[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#112849)

So far, these macros can be used in contexts where no meaningful
wavefront size is available. We therefore deprecate these macros, to
replace them with a more resilient interface to access wavefront size
information where it is available.

For SWDEV-491529.

show more ...


Revision tags: llvmorg-19.1.3
# 6e0b0038 22-Oct-2024 Alex Voicu <alexandru.voicu@amd.com>

[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442)

Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private`

[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442)

Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private` as the default address space. This is wrong for cases where
the `generic` address space is available, and is corrected via this
patch. In general, this AS map abuse is a bad hack and we should re-work
it altogether, but at least after this patch we will stop being
incorrect for e.g. OpenCL 2.0.

show more ...


Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# f363e30f 09-Jul-2024 Stanislav Mekhanoshin <rampitec@users.noreply.github.com>

[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633)


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6
# a98a6e95 04-May-2024 luolent <56246516+luolent@users.noreply.github.com>

Add clarifying parenthesis around non-trivial conditions in ternary expressions. (#90391)

Fixes [#85868](https://github.com/llvm/llvm-project/issues/85868)

Parenthesis are added as requested on t

Add clarifying parenthesis around non-trivial conditions in ternary expressions. (#90391)

Fixes [#85868](https://github.com/llvm/llvm-project/issues/85868)

Parenthesis are added as requested on ternary operators with non trivial conditions.

I used this [precedence table](https://en.cppreference.com/w/cpp/language/operator_precedence) for reference, to make sure we get the expected behavior on each change.

show more ...


Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3
# 43c7eb5d 14-Feb-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU] Replace '.' with '-' in generic target names (#81718)

The dot is too confusing for tools. Output temporaries would have
'10.3-generic' so tools could parse it as an extension, device libs

[AMDGPU] Replace '.' with '-' in generic target names (#81718)

The dot is too confusing for tools. Output temporaries would have
'10.3-generic' so tools could parse it as an extension, device libs &
the associated clang driver logic are also confused by the dot.

After discussions, we decided it's better to just remove the '.' from
the target name than fix each issue one by one.

show more ...


# f93aa515 12-Feb-2024 Pierre van Houtryve <pierre.vanhoutryve@amd.com>

[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)

These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost o

[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)

These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.

Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.

This contains the documentation changes for both this change and #76954
as well.

show more ...


Revision tags: llvmorg-18.1.0-rc2
# 6fecfbc7 30-Jan-2024 Joseph Huber <huberjn@outlook.com>

[AMDGPU] Correctly exclude the HIP host from arch macros

Summary:
This logic was wrong and accidentally appling to OpenCL.


# f2a78e68 30-Jan-2024 Joseph Huber <huberjn@outlook.com>

[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#80035)

Summary:
Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means
to create a sort of "generic" IR. The resulti

[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#80035)

Summary:
Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via `-mlink-builtin-bitcode` to inherit its attributes.

However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.

show more ...


Revision tags: llvmorg-18.1.0-rc1
# 72d4fc1b 29-Jan-2024 Joseph Huber <huberjn@outlook.com>

Revert "[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660)"

This reverts commit c9a6e993f7b349405b6c8f9244cd9cf0f56a6a81.

This breaks HIP code that incorrectly depended on GPU

Revert "[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660)"

This reverts commit c9a6e993f7b349405b6c8f9244cd9cf0f56a6a81.

This breaks HIP code that incorrectly depended on GPU-specific macros to
be set. The code is totally wrong as using `__WAVEFRTONSIZE__` on the
host is absolutely meaningless, but it seems this entire corner of the
toolchain is fundmentally broken. Reverting for now to avoid breakages.

show more ...


# c9a6e993 29-Jan-2024 Joseph Huber <huberjn@outlook.com>

[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660)

Summary:
Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means
to create a sort of "generic" IR. The resulti

[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660)

Summary:
Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means
to create a sort of "generic" IR. The resulting IR will not contain any
target dependent attributes and can then be inserted into another
program via `-mlink-builtin-bitcode` to inherit its attributes.

However, there are a handful of macros that can leak incorrect
information when compiling for an unspecified architecture. Currently,
things like the wavefront size will default to 64, which is actually
variable. We should not expose these macros unless it is known.

show more ...


Revision tags: llvmorg-19-init
# 32f9983c 15-Dec-2023 Jessica Del <50999226+OutOfCache@users.noreply.github.com>

[AMDGPU] - Add address space for strided buffers (#74471)

This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers al

[AMDGPU] - Add address space for strided buffers (#74471)

This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.

show more ...


# f3dcc235 13-Dec-2023 Kazu Hirata <kazu@google.com>

[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)

This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}:

[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)

This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.

show more ...


# 276a024b 11-Dec-2023 Dominik Adamski <dominik.adamski@amd.com>

[NFC][AMDGPU] Unify AMDGPU address space enum (#73944)

Types of AMDGPU address space were defined not only in Clang-specific class
but also in LLVM header.

If we unify the AMD GPU address space

[NFC][AMDGPU] Unify AMDGPU address space enum (#73944)

Types of AMDGPU address space were defined not only in Clang-specific class
but also in LLVM header.

If we unify the AMD GPU address space enumeration, then we can reuse it in
Clang, Flang and LLVM.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3
# b8a9c50f 20-Aug-2023 Yaxun (Sam) Liu <yaxun.liu@amd.com>

[AMDGPU] Add target feature gws to clang

Reviewed by: Matt Arsenault

Differential Revision: https://reviews.llvm.org/D158367


Revision tags: llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4
# 7f12dcac 06-Mar-2023 Yaxun (Sam) Liu <yaxun.liu@amd.com>

[HIP] Fix regression about `__fp16` args and return value

HIP allows __fp16 as function arguments and return value by passing
-fallow-half-arguments-and-returns to clang through hipcc.

https://revi

[HIP] Fix regression about `__fp16` args and return value

HIP allows __fp16 as function arguments and return value by passing
-fallow-half-arguments-and-returns to clang through hipcc.

https://reviews.llvm.org/D133885 removed -fallow-half-arguments-and-returns
and add a TargetInfo member to control it.

This caused regressions in some HIP apps
(https://github.com/ROCm-Developer-Tools/HIP/issues/3178).

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D145345

Fixes: https://github.com/ROCm-Developer-Tools/HIP/issues/3178

show more ...


# ad96f25b 30-Jun-2023 Yaxun (Sam) Liu <yaxun.liu@amd.com>

[AMDGPU] Rename predefined macro __AMDGCN_WAVEFRONT_SIZE

rename it to __AMDGCN_WAVEFRONT_SIZE__ for consistency.

__AMDGCN_WAVEFRONT_SIZE will be deprecated in the future.

Reviewed by: Matt Arsenau

[AMDGPU] Rename predefined macro __AMDGCN_WAVEFRONT_SIZE

rename it to __AMDGCN_WAVEFRONT_SIZE__ for consistency.

__AMDGCN_WAVEFRONT_SIZE will be deprecated in the future.

Reviewed by: Matt Arsenault, Johannes Doerfert

Differential Revision: https://reviews.llvm.org/D154207

show more ...


# c0f0d506 24-May-2023 Yaxun (Sam) Liu <yaxun.liu@amd.com>

[HIP] emit macro `__HIP_NO_IMAGE_SUPPORT`

HIP texture/image support is optional as some devices
do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT
is defined for device not supporting im

[HIP] emit macro `__HIP_NO_IMAGE_SUPPORT`

HIP texture/image support is optional as some devices
do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT
is defined for device not supporting images (https://github.com/ROCm-Developer-Tools/HIP/blob/d0448aa4c4dd0f4b29ccf6a663b7f5ad9f5183e0/docs/reference/kernel_language.md?plain=1#L426 )

Currently the macro is defined by HIP header based on predefined macros
for GPU, e.g __gfx*__ , which is error prone. This patch let clang
emit the predefined macro.

Reviewed by: Matt Arsenault, Artem Belevich

Differential Revision: https://reviews.llvm.org/D151349

show more ...


# 6adb9a06 06-Mar-2023 Yaxun (Sam) Liu <yaxun.liu@amd.com>

[AMDGPU] Emit predefined macro `__AMDGCN_CUMODE__`

Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes.

If WGP mode is not supported, ignore -mno-cumode and emit a warnin

[AMDGPU] Emit predefined macro `__AMDGCN_CUMODE__`

Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes.

If WGP mode is not supported, ignore -mno-cumode and emit a warning.

This is needed for implementing device functions like __smid
(https://github.com/ROCm-Developer-Tools/hipamd/blob/312dff7b794337aa040be0691acc78e9f968a8d2/include/hip/amd_detail/amd_device_functions.h#L957)

Reviewed by: Matt Arsenault, Artem Belevich, Brian Sumner

Differential Revision: https://reviews.llvm.org/D145343

show more ...


123456