#
1d97cb1f |
| 04-Feb-2022 |
Yaxun (Sam) Liu <yaxun.liu@amd.com> |
[HIP] Emit amdgpu_code_object_version module flag
code object version determines ABI, therefore should not be mixed.
This patch emits amdgpu_code_object_version module flag in LLVM IR based on code
[HIP] Emit amdgpu_code_object_version module flag
code object version determines ABI, therefore should not be mixed.
This patch emits amdgpu_code_object_version module flag in LLVM IR based on code object version (default 4).
The amdgpu_code_object_version value is code object version times 100.
LLVM IR with different amdgpu_code_object_version module flag cannot be linked.
The -cc1 option -mcode-object-version=none is for ROCm device library use only, which supports multiple ABI.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D119026
show more ...
|
#
171da443 |
| 04-Feb-2022 |
Yaxun (Sam) Liu <yaxun.liu@amd.com> |
[HIPSPV] Fix literals are mapped to Generic address space
This issue is an oversight in D108621.
Literals in HIP are emitted as global constant variables with default address space which maps to Ge
[HIPSPV] Fix literals are mapped to Generic address space
This issue is an oversight in D108621.
Literals in HIP are emitted as global constant variables with default address space which maps to Generic address space for HIPSPV. In SPIR-V such variables translate to OpVariable instructions with Generic storage class which are not legal. Fix by mapping literals to CrossWorkGroup address space.
The literals are not mapped to UniformConstant because the “flat” pointers in HIP may reference them and “flat” pointers are modeled as Generic pointers in SPIR-V. In SPIR-V/OpenCL UniformConstant pointers may not be casted to Generic.
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu
Differential Revision: https://reviews.llvm.org/D118876
show more ...
|
#
853e0aa4 |
| 04-Feb-2022 |
Hans Wennborg <hans@chromium.org> |
Don't dllexport reference temporaries
Even if the reference itself is dllexport, the temporary should not be. In fact, we're already giving it internal linkage, so dllexporting it is not just wastef
Don't dllexport reference temporaries
Even if the reference itself is dllexport, the temporary should not be. In fact, we're already giving it internal linkage, so dllexporting it is not just wasteful, but will fail to link, as in the example below:
$ cat /tmp/a.cc void _DllMainCRTStartup() {} const int __declspec(dllexport) &foo = 42;
$ clang-cl -fuse-ld=lld /tmp/a.cc /Zl /link /dll /out:a.dll lld-link: error: <root>: undefined symbol: int const &foo::$RT1
Differential revision: https://reviews.llvm.org/D118980
show more ...
|
#
1f08b086 |
| 28-Jan-2022 |
Amilendra Kodithuwakku <Amilendra.Kodithuwakku@arm.com> |
[clang][ARM] Emit warnings when PACBTI-M is used with unsupported architectures
Branch protection in M-class is supported by - Armv8.1-M.Main - Armv8-M.Main - Armv7-M
Attempting to enable this f
[clang][ARM] Emit warnings when PACBTI-M is used with unsupported architectures
Branch protection in M-class is supported by - Armv8.1-M.Main - Armv8-M.Main - Armv7-M
Attempting to enable this for other architectures, either by command-line (e.g -mbranch-protection=bti) or by target attribute in source code (e.g. __attribute__((target("branch-protection=..."))) ) will generate a warning.
In both cases function attributes related to branch protection will not be emitted. Regardless of the warning, module level attributes related to branch protection will be emitted when it is enabled via the command-line.
The following people also contributed to this patch: - Victor Campos
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D115501
show more ...
|
#
82af9502 |
| 21-Jan-2022 |
Joao Moreira <joao.moreira@intel.com> |
[X86] Enable ibt-seal optimization when LTO is used in Kernel
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit
[X86] Enable ibt-seal optimization when LTO is used in Kernel
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function.
Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1].
This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage.
A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540.
The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged.
[1] - https://lkml.org/lkml/2021/11/22/591
Reviewed By: xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D116070
show more ...
|
#
85c2bd2a |
| 19-Jan-2022 |
Yaxun (Sam) Liu <yaxun.liu@amd.com> |
Prevent adding module flag amdgpu_hostcall multiple times
HIP program with printf call fails to compile with -fsanitize=address option, because of appending module flag - amdgpu_hostcall twice, one
Prevent adding module flag amdgpu_hostcall multiple times
HIP program with printf call fails to compile with -fsanitize=address option, because of appending module flag - amdgpu_hostcall twice, one for printf and one for sanitize option. This patch fixes that issue.
Patch by: Praveen Velliengiri
Reviewed by: Yaxun Liu, Roman Lebedev
Differential Revision: https://reviews.llvm.org/D116216
show more ...
|
#
c63a3175 |
| 15-Jan-2022 |
Nikita Popov <nikita.ppv@gmail.com> |
[AttrBuilder] Remove ctor accepting AttributeList and Index
Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeLi
[AttrBuilder] Remove ctor accepting AttributeList and Index
Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.
show more ...
|
#
2bcba21c |
| 14-Jan-2022 |
Erich Keane <erich.keane@intel.com> |
[CPU-Dispatch] Make sure Dispatch names get updated if previously mangled
Cases where there is a mangling of a cpu-dispatch/cpu-specific function before the function becomes 'multiversion' (such as
[CPU-Dispatch] Make sure Dispatch names get updated if previously mangled
Cases where there is a mangling of a cpu-dispatch/cpu-specific function before the function becomes 'multiversion' (such as a member function) causes the wrong name to be emitted for one of the variants/resolver, since the name is cached. Make sure we invalidate the cache in cpu-dispatch/cpu-specific modes, like we previously did for just target multiversioning.
show more ...
|
#
b699e8b1 |
| 13-Jan-2022 |
Erich Keane <erich.keane@intel.com> |
Add another assert to cpu-dispatch emission to help track down a tough to repro error.
As mentioned yesterday, I've got a problem that I can only reproduce on Godbolt (none of the build configs on m
Add another assert to cpu-dispatch emission to help track down a tough to repro error.
As mentioned yesterday, I've got a problem that I can only reproduce on Godbolt (none of the build configs on my local machine!), so this is at least somewhat usable until I figure out a cause.
show more ...
|
#
6e77ad11 |
| 12-Jan-2022 |
Erich Keane <erich.keane@intel.com> |
Add an assert in cpudispatch emit to try to track down an error.
I'm attempting to debug an issue that I can only get to happen on godbolt, where the cpu-dispatch resolver for an out of line member
Add an assert in cpudispatch emit to try to track down an error.
I'm attempting to debug an issue that I can only get to happen on godbolt, where the cpu-dispatch resolver for an out of line member function is generated with the wrong name, causing a link failure.
show more ...
|
#
d2cc6c2d |
| 03-Jan-2022 |
Serge Guelton <sguelton@redhat.com> |
Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor sligh
Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step.
Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by
https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions
Differential Revision: https://reviews.llvm.org/D116599
show more ...
|
#
40446663 |
| 09-Jan-2022 |
Kazu Hirata <kazu@google.com> |
[clang] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
|
#
9290ccc3 |
| 04-Jan-2022 |
serge-sans-paille <sguelton@redhat.com> |
Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of attributes to be removed from an AttrBuilder. Previously AttrBuilder was used both for bu
Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of attributes to be removed from an AttrBuilder. Previously AttrBuilder was used both for building and removing, which introduced odd situation like creation of Attribute with dummy value because the only relevant part was the attribute kind.
Differential Revision: https://reviews.llvm.org/D116110
show more ...
|
#
ec2e26ea |
| 10-Aug-2021 |
Sami Tolvanen <samitolvanen@google.com> |
[Clang] Add __builtin_function_start
Control-Flow Integrity (CFI) replaces references to address-taken functions with pointers to the CFI jump table. This is a problem for low-level code, such as op
[Clang] Add __builtin_function_start
Control-Flow Integrity (CFI) replaces references to address-taken functions with pointers to the CFI jump table. This is a problem for low-level code, such as operating system kernels, which may need the address of an actual function body without the jump table indirection.
This change adds the __builtin_function_start() builtin, which accepts an argument that can be constant-evaluated to a function, and returns the address of the function body.
Link: https://github.com/ClangBuiltLinux/linux/issues/1353
Depends on D108478
Reviewed By: pcc, rjmccall
Differential Revision: https://reviews.llvm.org/D108479
show more ...
|
#
c3b624a1 |
| 15-Dec-2021 |
Nikita Popov <npopov@redhat.com> |
[CodeGen] Avoid deprecated ConstantAddress constructor
Change all uses of the deprecated constructor to pass the element type explicitly and drop it.
For cases where the correct element type was no
[CodeGen] Avoid deprecated ConstantAddress constructor
Change all uses of the deprecated constructor to pass the element type explicitly and drop it.
For cases where the correct element type was not immediately obvious to me or would require a slightly larger change I'm falling back to explicitly calling getPointerElementType() for now.
show more ...
|
#
0a14674f |
| 03-Dec-2021 |
Peter Collingbourne <peter@pcc.me.uk> |
CodeGen: Strip exception specifications from function types in CFI type names.
With C++17 the exception specification has been made part of the function type, and therefore part of mangled type name
CodeGen: Strip exception specifications from function types in CFI type names.
With C++17 the exception specification has been made part of the function type, and therefore part of mangled type names.
However, it's valid to convert function pointers with an exception specification to function pointers with the same argument and return types but without an exception specification, which means that e.g. a function of type "void () noexcept" can be called through a pointer of type "void ()". We must therefore consider the two types to be compatible for CFI purposes.
We can do this by stripping the exception specification before mangling the type name, which is what this patch does.
Differential Revision: https://reviews.llvm.org/D115015
show more ...
|
#
e3b2f022 |
| 01-Dec-2021 |
Ties Stuij <ties.stuij@arm.com> |
[clang][ARM] PACBTI-M frontend support
Handle branch protection option on the commandline as well as a function attribute. One patch for both mechanisms, as they use the same underlying parsing mech
[clang][ARM] PACBTI-M frontend support
Handle branch protection option on the commandline as well as a function attribute. One patch for both mechanisms, as they use the same underlying parsing mechanism.
These are recorded in a set of LLVM IR module-level attributes like we do for AArch64 PAC/BTI (see https://reviews.llvm.org/D85649):
- command-line options are "translated" to module-level LLVM IR attributes (metadata).
- functions have PAC/BTI specific attributes iff the __attribute__((target("branch-protection=...))) was used in the function declaration.
- command-line option -mbranch-protection to armclang targeting Arm, following this grammar:
branch-protection ::= "-mbranch-protection=" <protection> protection ::= "none" | "standard" | "bti" [ "+" <pac-ret-clause> ] | <pac-ret-clause> [ "+" "bti"] pac-ret-clause ::= "pac-ret" [ "+" <pac-ret-option> ] pac-ret-option ::= "leaf" ["+" "b-key"] | "b-key" ["+" "leaf"]
b-key is simply a placeholder to make it consistent with AArch64's version. In Arm, however, it triggers a warning informing that b-key is unsupported and a-key will be selected instead.
- Handle _attribute_((target(("branch-protection=..."))) for AArch32 with the same grammer as the commandline options.
This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Momchil Velikov - Victor Campos - Ties Stuij
Reviewed By: vhscampos
Differential Revision: https://reviews.llvm.org/D112421
show more ...
|
#
fc53eb69 |
| 29-Nov-2021 |
Erich Keane <erich.keane@intel.com> |
Reapply 'Implement target_clones multiversioning'
See discussion in D51650, this change was a little aggressive in an error while doing a 'while we were here', so this removes that error condition,
Reapply 'Implement target_clones multiversioning'
See discussion in D51650, this change was a little aggressive in an error while doing a 'while we were here', so this removes that error condition, as it is apparently useful.
This reverts commit bb4934601d731465e01e2e22c80ce2dbe687d73f.
show more ...
|
#
de34a940 |
| 18-Nov-2021 |
Phoebe Wang <pengfei.wang@intel.com> |
[X86] Add -mskip-rax-setup support to align with GCC
AMD64 ABI mandates caller to specify the number of used SSE registers when passing variable arguments. GCC also provides option -mskip-rax-setup
[X86] Add -mskip-rax-setup support to align with GCC
AMD64 ABI mandates caller to specify the number of used SSE registers when passing variable arguments. GCC also provides option -mskip-rax-setup to skip the setup of rax when SSE is disabled. This helps to reduce the code size, see pr23258.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112413
show more ...
|
#
d0ac215d |
| 14-Nov-2021 |
Kazu Hirata <kazu@google.com> |
[clang] Use isa instead of dyn_cast (NFC)
|
#
bb493460 |
| 12-Nov-2021 |
Adrian Kuegel <akuegel@google.com> |
Revert "Implement target_clones multiversioning"
This reverts commit 9deab60ae710f8c4cc810cd680edfb64c803f42d. There is a possibly unintended semantic change.
|
#
9deab60a |
| 05-Nov-2021 |
Erich Keane <erich.keane@intel.com> |
Implement target_clones multiversioning
As discussed here: https://lwn.net/Articles/691932/
GCC6.0 adds target_clones multiversioning. This functionality is an odd cross between the cpu_dispatch an
Implement target_clones multiversioning
As discussed here: https://lwn.net/Articles/691932/
GCC6.0 adds target_clones multiversioning. This functionality is an odd cross between the cpu_dispatch and 'target' MV, but is compatible with neither.
This attribute allows you to list all options, then emits a separately optimized version of each function per-option (similar to the cpu_specific attribute). It automatically generates a resolver, just like the other two.
The mangling however, is... ODD to say the least. The mangling format is: <normal_mangling>.<option string>.<option ordinal>.
Differential Revision:https://reviews.llvm.org/D51650
show more ...
|
#
4b3881e9 |
| 10-Nov-2021 |
Yaxun (Sam) Liu <yaxun.liu@amd.com> |
Emit hidden hostcall argument for sanitized kernels
this patch - https://reviews.llvm.org/D110337 changes the way how hostcall hidden argument is emitted for printf, but the sanitized kernels also u
Emit hidden hostcall argument for sanitized kernels
this patch - https://reviews.llvm.org/D110337 changes the way how hostcall hidden argument is emitted for printf, but the sanitized kernels also use hostcall buffer to report a error for invalid memory access, which is not handled by the above patch and it leads to vdi runtime error:
Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b
Patch by: Praveen Velliengiri
Reviewed by: Yaxun Liu, Matt Arsenault
Differential Revision: https://reviews.llvm.org/D112820
show more ...
|
#
80072fde |
| 04-Nov-2021 |
Yaxun (Sam) Liu <yaxun.liu@amd.com> |
[CUDA][HIP] Allow comdat for kernels
Two identical instantiations of a template function can be emitted by two TU's with linkonce_odr linkage without causing duplicate symbols in linker. MSVC also r
[CUDA][HIP] Allow comdat for kernels
Two identical instantiations of a template function can be emitted by two TU's with linkonce_odr linkage without causing duplicate symbols in linker. MSVC also requires these symbols be in comdat sections. Linux does not require the symbols in comdat sections to be merged by linker but by default clang puts them in comdat sections.
If a template kernel is instantiated identically in two TU's. MSVC requires that them to be in comdat sections, otherwise MSVC linker will diagnose them as duplicate symbols. However, currently clang does not put instantiated template kernels in comdat sections, which causes link error for MSVC.
This patch allows putting instantiated template kernels into comdat sections.
Reviewed by: Artem Belevich, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D112492
show more ...
|
#
9efce0ba |
| 06-Nov-2021 |
Itay Bookstein <ibookstein@gmail.com> |
[clang] Run LLVM Verifier in modes without CodeGen too
Previously, the Backend_Emit{Nothing,BC,LL} modes did not run the LLVM verifier since it is usually added via the TargetMachine::addPassesToEmi
[clang] Run LLVM Verifier in modes without CodeGen too
Previously, the Backend_Emit{Nothing,BC,LL} modes did not run the LLVM verifier since it is usually added via the TargetMachine::addPassesToEmitFile method according to the DisableVerify parameter. This is called from EmitAssemblyHelper::AddEmitPasses, which is only relevant for BackendAction-s that require CodeGen.
Note: * In these particular situations the verifier is added to the optimization pipeline rather than the codegen pipeline so that it runs prior to the BC/LL emission pass. * This change applies to both the old and the new PMs. * Because the clang tests use -emit-llvm ubiquitously, this change will enable the verifier for them. * A small bug is fixed in emitIFuncDefinition so that the clang/test/CodeGen/ifunc.c test would pass: the emitIFuncDefinition incorrectly passed the GlobalDecl of the IFunc itself to the call to GetOrCreateLLVMFunction for creating the resolver.
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D113352
show more ...
|