AMDGPU.cpp - OpenGrok history log for /llvm-project/clang/lib/Basic/Targets/AMDGPU.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-21-init
# d92bac8a	25-Jan-2025	Helena Kotas <hekotas@microsoft.com>	[HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (#123411) Introduces a new address space `hlsl_constant(2)` for constant buffer declarations. This address spac [HLSL] Introduce address space `hlsl_constant(2)` for constant buffer declarations (#123411) Introduces a new address space `hlsl_constant(2)` for constant buffer declarations. This address space is applied to declarations inside `cbuffer` block. Later on, it will also be applied to `ConstantBuffer<T>` syntax and the default `$Globals` constant buffer. Clang codegen translates constant buffer declarations to global variables and loads from `hlsl_constant(2)` address space. More work coming soon will include addition of metadata that will map these globals to individual constant buffers and enable their transformation to appropriate constant buffer load intrinsics later on in an LLVM pass. Fixes #123406 show more ...
Revision tags: llvmorg-19.1.7, llvmorg-19.1.6
# ca79ff07	14-Dec-2024	Chandler Carruth <chandlerc@gmail.com>	Revert "Switch builtin strings to use string tables" (#119638) Reverts llvm/llvm-project#118734 There are currently some specific versions of MSVC that are miscompiling this code (we think). We Revert "Switch builtin strings to use string tables" (#119638) Reverts llvm/llvm-project#118734 There are currently some specific versions of MSVC that are miscompiling this code (we think). We don't know why as all the other build bots and at least some folks' local Windows builds work fine. This is a candidate revert to help the relevant folks catch their builders up and have time to debug the issue. However, the expectation is to roll forward at some point with a workaround if at all possible. show more ...
# be2df95e	09-Dec-2024	Chandler Carruth <chandlerc@gmail.com>	Switch builtin strings to use string tables (#118734) The Clang binary (and any binary linking Clang as a library), when built using PIE, ends up with a pretty shocking number of dynamic relocation Switch builtin strings to use string tables (#118734) The Clang binary (and any binary linking Clang as a library), when built using PIE, ends up with a pretty shocking number of dynamic relocations to apply to the executable image: roughly 400k. Each of these takes up binary space in the executable, and perhaps most interestingly takes start-up time to apply the relocations. The largest pattern I identified were the strings used to describe target builtins. The addresses of these string literals were stored into huge arrays, each one requiring a dynamic relocation. The way to avoid this is to design the target builtins to use a single large table of strings and offsets within the table for the individual strings. This switches the builtin management to such a scheme. This saves over 100k dynamic relocations by my measurement, an over 25% reduction. Just looking at byte size improvements, using the `bloaty` tool to compare a newly built `clang` binary to an old one: ``` FILE SIZE VM SIZE -------------- -------------- +1.4% +653Ki +1.4% +653Ki .rodata +0.0% +960 +0.0% +960 .text +0.0% +197 +0.0% +197 .dynstr +0.0% +184 +0.0% +184 .eh_frame +0.0% +96 +0.0% +96 .dynsym +0.0% +40 +0.0% +40 .eh_frame_hdr +114% +32 [ = ] 0 [Unmapped] +0.0% +20 +0.0% +20 .gnu.hash +0.0% +8 +0.0% +8 .gnu.version +0.9% +7 +0.9% +7 [LOAD #2 [R]] [ = ] 0 -75.4% -3.00Ki .relro_padding -16.1% -802Ki -16.1% -802Ki .data.rel.ro -27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn -1.6% -2.66Mi -1.6% -2.66Mi TOTAL ``` We get a 16% reduction in the `.data.rel.ro` section, and nearly 30% reduction in `.rela.dyn` where those reloctaions are stored. This is also visible in my benchmarking of binary start-up overhead at least: ``` Benchmark 1: ./old_clang --version Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms] Range (min … max): 14.2 ms … 22.8 ms 162 runs Benchmark 2: ./new_clang --version Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms] Range (min … max): 12.4 ms … 20.3 ms 216 runs Summary './new_clang --version' ran 1.13 ± 0.14 times faster than './old_clang --version' ``` We get about 2ms faster `--version` runs. While there is a lot of noise in binary execution time, this delta is pretty consistent, and represents over 10% improvement. This is particularly interesting to me because for very short source files, repeatedly starting the `clang` binary is actually the dominant cost. For example, `configure` scripts running against the `clang` compiler are slow in large part because of binary start up time, not the time to process the actual inputs to the compiler. ---- This PR implements the string tables using `constexpr` code and the existing macro system. I understand that the builtins are moving towards a TableGen model, and if complete that would provide more options for modeling this. Unfortunately, that migration isn't complete, and even the parts that are migrated still rely on the ability to break out of the TableGen model and directly expand an X-macro style `BUILTIN(...)` textually. I looked at trying to complete the move to TableGen, but it would both require the difficult migration of the remaining targets, and solving some tricky problems with how to move away from any macro-based expansion. I was also able to find a reasonably clean and effective way of doing this with the existing macros and some `constexpr` code that I think is clean enough to be a pretty good intermediate state, and maybe give a good target for the eventual TableGen solution. I was also able to factor the macros into set of consistent patterns that avoids a significant regression in overall boilerplate. show more ...
Revision tags: llvmorg-19.1.5
# f8b4182f	02-Dec-2024	Nathan Gauër <brioche@google.com>	Revert "[SPIR-V] Fixup storage class for global private (#116636)" (#118312) This reverts commit aa7fe1c10e5d6d0d3aacdb345fed995de413e142.
# aa7fe1c1	02-Dec-2024	Nathan Gauër <brioche@google.com>	[SPIR-V] Fixup storage class for global private (#116636) Adds a new address spaces: `hlsl_private`. Variables with such address space will be emitted with a `Private` storage class. This is usefu [SPIR-V] Fixup storage class for global private (#116636) Adds a new address spaces: `hlsl_private`. Variables with such address space will be emitted with a `Private` storage class. This is useful for variables global to a SPIR-V module, since up to now, they were still emitted with a `Function` storage class, which is wrong. --------- Signed-off-by: Nathan Gauër <brioche@google.com> show more ...
Revision tags: llvmorg-19.1.4
# d893c5ad	11-Nov-2024	Fabian Ritter <fabian.ritter@amd.com>	[Clang][HIP] Reapply: Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#115507) So far, these macros can be used in contexts where no meaningful wavefront size is available. We therefore deprecate these [Clang][HIP] Reapply: Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#115507) So far, these macros can be used in contexts where no meaningful wavefront size is available. We therefore deprecate these macros, to replace them with a more resilient interface to access wavefront size information where it is available. Reapplies #112849 with a fix for the non-hermetic clang test that failed on Mac after the revert in #115499. For SWDEV-491529. show more ...
# e734de1f	08-Nov-2024	Fabian Ritter <fabian.ritter@amd.com>	Revert "[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros" (#115499) Reverts llvm/llvm-project#112849 due to test failure on Mac, reported by @nico
# e5c6d1f4	08-Nov-2024	Fabian Ritter <fabian.ritter@amd.com>	[Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#112849) So far, these macros can be used in contexts where no meaningful wavefront size is available. We therefore deprecate these macros, [Clang][HIP] Deprecate the AMDGCN_WAVEFRONT_SIZE macros (#112849) So far, these macros can be used in contexts where no meaningful wavefront size is available. We therefore deprecate these macros, to replace them with a more resilient interface to access wavefront size information where it is available. For SWDEV-491529. show more ...
Revision tags: llvmorg-19.1.3
# 6e0b0038	22-Oct-2024	Alex Voicu <alexandru.voicu@amd.com>	[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442) Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use `private` [clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442) Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use `private` as the default address space. This is wrong for cases where the `generic` address space is available, and is corrected via this patch. In general, this AS map abuse is a bad hack and we should re-work it altogether, but at least after this patch we will stop being incorrect for e.g. OpenCL 2.0. show more ...
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init
# f363e30f	09-Jul-2024	Stanislav Mekhanoshin <rampitec@users.noreply.github.com>	[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633)
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6
# a98a6e95	04-May-2024	luolent <56246516+luolent@users.noreply.github.com>	Add clarifying parenthesis around non-trivial conditions in ternary expressions. (#90391) Fixes [#85868](https://github.com/llvm/llvm-project/issues/85868) Parenthesis are added as requested on t Add clarifying parenthesis around non-trivial conditions in ternary expressions. (#90391) Fixes [#85868](https://github.com/llvm/llvm-project/issues/85868) Parenthesis are added as requested on ternary operators with non trivial conditions. I used this [precedence table](https://en.cppreference.com/w/cpp/language/operator_precedence) for reference, to make sure we get the expected behavior on each change. show more ...
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3
# 43c7eb5d	14-Feb-2024	Pierre van Houtryve <pierre.vanhoutryve@amd.com>	[AMDGPU] Replace '.' with '-' in generic target names (#81718) The dot is too confusing for tools. Output temporaries would have '10.3-generic' so tools could parse it as an extension, device libs [AMDGPU] Replace '.' with '-' in generic target names (#81718) The dot is too confusing for tools. Output temporaries would have '10.3-generic' so tools could parse it as an extension, device libs & the associated clang driver logic are also confused by the dot. After discussions, we decided it's better to just remove the '.' from the target name than fix each issue one by one. show more ...
# f93aa515	12-Feb-2024	Pierre van Houtryve <pierre.vanhoutryve@amd.com>	[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost o [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955) These generic targets include multiple GPUs and will, in the future, provide a way to build once and run on multiple GPU, at the cost of less optimization opportunities. Note that this is just doing the compiler side of things, device libs an runtimes/loader/etc. don't know about these targets yet, so none of them actually work in practice right now. This is just the initial commit to make LLVM aware of them. This contains the documentation changes for both this change and #76954 as well. show more ...
Revision tags: llvmorg-18.1.0-rc2
# 6fecfbc7	30-Jan-2024	Joseph Huber <huberjn@outlook.com>	[AMDGPU] Correctly exclude the HIP host from arch macros Summary: This logic was wrong and accidentally appling to OpenCL.
# f2a78e68	30-Jan-2024	Joseph Huber <huberjn@outlook.com>	[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#80035) Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulti [AMDGPU] Do not emit arch dependent macros with unspecified cpu (#80035) Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into another program via `-mlink-builtin-bitcode` to inherit its attributes. However, there are a handful of macros that can leak incorrect information when compiling for an unspecified architecture. Currently, things like the wavefront size will default to 64, which is actually variable. We should not expose these macros unless it is known. show more ...
Revision tags: llvmorg-18.1.0-rc1
# 72d4fc1b	29-Jan-2024	Joseph Huber <huberjn@outlook.com>	Revert "[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660)" This reverts commit c9a6e993f7b349405b6c8f9244cd9cf0f56a6a81. This breaks HIP code that incorrectly depended on GPU Revert "[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660)" This reverts commit c9a6e993f7b349405b6c8f9244cd9cf0f56a6a81. This breaks HIP code that incorrectly depended on GPU-specific macros to be set. The code is totally wrong as using `__WAVEFRTONSIZE__` on the host is absolutely meaningless, but it seems this entire corner of the toolchain is fundmentally broken. Reverting for now to avoid breakages. show more ...
# c9a6e993	29-Jan-2024	Joseph Huber <huberjn@outlook.com>	[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660) Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulti [AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660) Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into another program via `-mlink-builtin-bitcode` to inherit its attributes. However, there are a handful of macros that can leak incorrect information when compiling for an unspecified architecture. Currently, things like the wavefront size will default to 64, which is actually variable. We should not expose these macros unless it is known. show more ...
Revision tags: llvmorg-19-init
# 32f9983c	15-Dec-2023	Jessica Del <50999226+OutOfCache@users.noreply.github.com>	[AMDGPU] - Add address space for strided buffers (#74471) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers al [AMDGPU] - Add address space for strided buffers (#74471) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index. show more ...
# f3dcc235	13-Dec-2023	Kazu Hirata <kazu@google.com>	[clang] Use StringRef::{starts,ends}_with (NFC) (#75149) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}: [clang] Use StringRef::{starts,ends}_with (NFC) (#75149) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with. show more ...
# 276a024b	11-Dec-2023	Dominik Adamski <dominik.adamski@amd.com>	[NFC][AMDGPU] Unify AMDGPU address space enum (#73944) Types of AMDGPU address space were defined not only in Clang-specific class but also in LLVM header. If we unify the AMD GPU address space [NFC][AMDGPU] Unify AMDGPU address space enum (#73944) Types of AMDGPU address space were defined not only in Clang-specific class but also in LLVM header. If we unify the AMD GPU address space enumeration, then we can reuse it in Clang, Flang and LLVM. show more ...
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3
# b8a9c50f	20-Aug-2023	Yaxun (Sam) Liu <yaxun.liu@amd.com>	[AMDGPU] Add target feature gws to clang Reviewed by: Matt Arsenault Differential Revision: https://reviews.llvm.org/D158367
Revision tags: llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4
# 7f12dcac	06-Mar-2023	Yaxun (Sam) Liu <yaxun.liu@amd.com>	[HIP] Fix regression about `__fp16` args and return value HIP allows __fp16 as function arguments and return value by passing -fallow-half-arguments-and-returns to clang through hipcc. https://revi [HIP] Fix regression about `__fp16` args and return value HIP allows __fp16 as function arguments and return value by passing -fallow-half-arguments-and-returns to clang through hipcc. https://reviews.llvm.org/D133885 removed -fallow-half-arguments-and-returns and add a TargetInfo member to control it. This caused regressions in some HIP apps (https://github.com/ROCm-Developer-Tools/HIP/issues/3178). Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D145345 Fixes: https://github.com/ROCm-Developer-Tools/HIP/issues/3178 show more ...
# ad96f25b	30-Jun-2023	Yaxun (Sam) Liu <yaxun.liu@amd.com>	[AMDGPU] Rename predefined macro __AMDGCN_WAVEFRONT_SIZE rename it to __AMDGCN_WAVEFRONT_SIZE__ for consistency. __AMDGCN_WAVEFRONT_SIZE will be deprecated in the future. Reviewed by: Matt Arsenau [AMDGPU] Rename predefined macro __AMDGCN_WAVEFRONT_SIZE rename it to __AMDGCN_WAVEFRONT_SIZE__ for consistency. __AMDGCN_WAVEFRONT_SIZE will be deprecated in the future. Reviewed by: Matt Arsenault, Johannes Doerfert Differential Revision: https://reviews.llvm.org/D154207 show more ...
# c0f0d506	24-May-2023	Yaxun (Sam) Liu <yaxun.liu@amd.com>	[HIP] emit macro `__HIP_NO_IMAGE_SUPPORT` HIP texture/image support is optional as some devices do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT is defined for device not supporting im [HIP] emit macro `__HIP_NO_IMAGE_SUPPORT` HIP texture/image support is optional as some devices do not have image instructions. A macro __HIP_NO_IMAGE_SUPPORT is defined for device not supporting images (https://github.com/ROCm-Developer-Tools/HIP/blob/d0448aa4c4dd0f4b29ccf6a663b7f5ad9f5183e0/docs/reference/kernel_language.md?plain=1#L426 ) Currently the macro is defined by HIP header based on predefined macros for GPU, e.g __gfx*__ , which is error prone. This patch let clang emit the predefined macro. Reviewed by: Matt Arsenault, Artem Belevich Differential Revision: https://reviews.llvm.org/D151349 show more ...
# 6adb9a06	06-Mar-2023	Yaxun (Sam) Liu <yaxun.liu@amd.com>	[AMDGPU] Emit predefined macro `__AMDGCN_CUMODE__` Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes. If WGP mode is not supported, ignore -mno-cumode and emit a warnin [AMDGPU] Emit predefined macro `__AMDGCN_CUMODE__` Predefine __AMDGCN_CUMODE__ as 1 or 0 when compilation assumes CU or WGP modes. If WGP mode is not supported, ignore -mno-cumode and emit a warning. This is needed for implementing device functions like __smid (https://github.com/ROCm-Developer-Tools/hipamd/blob/312dff7b794337aa040be0691acc78e9f968a8d2/include/hip/amd_detail/amd_device_functions.h#L957) Reviewed by: Matt Arsenault, Artem Belevich, Brian Sumner Differential Revision: https://reviews.llvm.org/D145343 show more ...
12 3 4 5 6