#
4ea1d435 |
| 08-Apr-2022 |
Yaxun (Sam) Liu <yaxun.liu@amd.com> |
[CUDA][HIP] Externalize kernels in anonymous name space
kernels in anonymous name space needs to have unique name to avoid duplicate symbols.
Fixes: https://github.com/llvm/llvm-project/issues/5456
[CUDA][HIP] Externalize kernels in anonymous name space
kernels in anonymous name space needs to have unique name to avoid duplicate symbols.
Fixes: https://github.com/llvm/llvm-project/issues/54560
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D123353
show more ...
|
#
fa34951f |
| 08-Apr-2022 |
Mitch Phillips <31459023+hctim@users.noreply.github.com> |
Reland "[MTE] Add -fsanitize=memtag* and friends."
Differential Revision: https://reviews.llvm.org/D118948
|
#
4aaf25b4 |
| 08-Apr-2022 |
Aaron Ballman <aaron@aaronballman.com> |
Revert "[MTE] Add -fsanitize=memtag* and friends."
This reverts commit 8aa1490513f111afd407d87c3f07d26f65c8a686.
Broke testing: https://lab.llvm.org/buildbot/#/builders/109/builds/36233
|
#
8aa14905 |
| 01-Apr-2022 |
Mitch Phillips <31459023+hctim@users.noreply.github.com> |
[MTE] Add -fsanitize=memtag* and friends.
Currently, enablement of heap MTE on Android is specified by an ELF note, which signals to the linker to enable heap MTE. This change allows -fsanitize=memt
[MTE] Add -fsanitize=memtag* and friends.
Currently, enablement of heap MTE on Android is specified by an ELF note, which signals to the linker to enable heap MTE. This change allows -fsanitize=memtag-heap to synthesize these notes, rather than adding them through the build system. We need to extend this feature to also signal the linker to do special work for MTE globals (in future) and MTE stack (currently implemented in the toolchain, but not implemented in the loader).
Current Android uses a non-backwards-compatible ELF note, called ".note.android.memtag". Stack MTE is an ABI break anyway, so we don't mind that we won't be able to run executables with stack MTE on Android 11/12 devices.
The current expectation is to support the verbiage used by Android, in that "SYNC" means MTE Synchronous mode, and "ASYNC" effectively means "fast", using the Kernel auto-upgrade feature that allows hardware-specific and core-specific configuration as to whether "ASYNC" would end up being Asynchronous, Asymmetric, or Synchronous on that particular core, whichever has a reasonable performance delta. Of course, this is platform and loader-specific.
Differential Revision: https://reviews.llvm.org/D118948
show more ...
|
#
74b56e02 |
| 08-Apr-2022 |
Chuanqi Xu <yedeng.yd@linux.alibaba.com> |
[NFC] Remove unused variable in CodeGenModules
This eliminates an unused-variable warning
|
#
b1ea0191 |
| 07-Mar-2022 |
Kavitha Natarajan <kavitha.natarajan@amd.com> |
[clang][DebugInfo] Support debug info for alias variable
clang to emit DWARF information for global alias variable as DW_TAG_imported_declaration. This change also handles nested (recursive) importe
[clang][DebugInfo] Support debug info for alias variable
clang to emit DWARF information for global alias variable as DW_TAG_imported_declaration. This change also handles nested (recursive) imported declarations.
Reviewed by: dblaikie, aprantl
Differential Revision: https://reviews.llvm.org/D120989
show more ...
|
#
5531abaf |
| 24-Mar-2022 |
Tom Honermann <tom.honermann@intel.com> |
[clang] Corrections for target_clones multiversion functions.
This change merges code for emit of target and target_clones multiversion resolver functions and, in doing so, corrects handling of targ
[clang] Corrections for target_clones multiversion functions.
This change merges code for emit of target and target_clones multiversion resolver functions and, in doing so, corrects handling of target_clones functions that are declared but not defined. Previously, a use of such a target_clones function would result in an attempted emit of an ifunc that referenced an undefined resolver function. Ifunc references to undefined resolver functions are not allowed and, when the LLVM verifier is not disabled (via '-disable-llvm-verifier'), resulted in the verifier issuing a "IFunc resolver must be a definition" error and aborting the compilation. With this change, ifuncs and resolver function definitions are always emitted for used target_clones functions regardless of whether the target_clones function is defined (if the function is defined, then the ifunc and resolver are emitted regardless of whether the function is used).
This change has the side effect of causing target_clones variants and resolver functions to be emitted in a different order than they were previously. This is harmless and is reflected in the updated tests.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D122958
show more ...
|
#
40af8df6 |
| 01-Apr-2022 |
Tom Honermann <tom.honermann@intel.com> |
[clang] NFC: Preparation for merging code to emit target and target_clones resolvers.
This change modifies CodeGenModule::emitMultiVersionFunctions() in preparation for a change that will merge supp
[clang] NFC: Preparation for merging code to emit target and target_clones resolvers.
This change modifies CodeGenModule::emitMultiVersionFunctions() in preparation for a change that will merge support for emitting target_clones resolvers into this function. This change mostly serves to isolate indentation changes from later behavior modifying changes.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D122957
show more ...
|
#
0ace0100 |
| 01-Apr-2022 |
Tom Honermann <tom.honermann@intel.com> |
[clang] NFC: Simplify the interface to CodeGenModule::GetOrCreateMultiVersionResolver().
Previously, GetOrCreateMultiVersionResolver() required the caller to provide a GlobalDecl along with an llvm:
[clang] NFC: Simplify the interface to CodeGenModule::GetOrCreateMultiVersionResolver().
Previously, GetOrCreateMultiVersionResolver() required the caller to provide a GlobalDecl along with an llvm::type and FunctionDecl. The latter two can be cheaply obtained from the first, and the llvm::type parameter is not always used, so requiring the caller to provide them was unnecessary and created the possibility that callers would pass an inconsistent set. This change simplifies the interface to only require the GlobalDecl value.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D122956
show more ...
|
#
bed5ee3f |
| 31-Mar-2022 |
Tom Honermann <tom.honermann@intel.com> |
[clang] NFC: Enhance comments in CodeGen for multiversion function support.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D122955
|
#
15a17696 |
| 04-Apr-2022 |
Shangwu Yao <shangwuyao@gmail.com> |
Emit OpenCL metadata when targeting SPIR-V
This is required for converting function calls such as get_global_id() into SPIR-V builtins.
Differential Revision: https://reviews.llvm.org/D123049
|
#
7c53fc4f |
| 25-Mar-2022 |
Tom Honermann <tom.honermann@intel.com> |
[clang] Emit target_clones resolver functions as COMDAT.
Previously, resolver functions synthesized for target_clones multiversion functions were not emitted as COMDAT. Now fixed.
|
#
ff18b158 |
| 05-Apr-2022 |
Nikita Popov <npopov@redhat.com> |
[CodeGen] Avoid unnecessary ConstantExpr cast
With opaque pointers, this is not necessarily a ConstantExpr. And we don't need one here either, just Constant is sufficient.
|
#
9ba8c402 |
| 28-Mar-2022 |
Erich Keane <erich.keane@intel.com> |
Fix behavior of ifuncs with 'used' extern "C" static functions
We expect that `extern "C"` static functions to be usable in things like inline assembly, as well as ifuncs: See the bug report here: h
Fix behavior of ifuncs with 'used' extern "C" static functions
We expect that `extern "C"` static functions to be usable in things like inline assembly, as well as ifuncs: See the bug report here: https://github.com/llvm/llvm-project/issues/54549
However, we were diagnosing this as 'not defined', because the ifunc's attempt to look up its resolver would generate a declared IR function.
Additionally, as background, the way we allow these static extern "C" functions to work in inline assembly is by making an alias with the C mangling in MOST situations to the version we emit with internal-linkage/mangling.
The problem here was multi-fold: First- We generated the alias after the ifunc was checked, so the function by that name didn't exist yet. Second, the ifunc's generation caused a symbol to exist under the name of the alias already (the declared function above), which suppressed the alias generation.
This patch fixes all of this by moving the checking of ifuncs/CFE aliases until AFTER we have generated the extern-C alias. Then, it does a 'fixup' around the GlobalIFunc to make sure we correct the reference.
Differential Revision: https://reviews.llvm.org/D122608
show more ...
|
#
dfde3549 |
| 29-Mar-2022 |
Chris Bieneman <chris.bieneman@me.com> |
NFC. Fixing warnings from adding DXContainer
Adds DXContainer to switch statements in Clang and LLDB to silence warnings.
|
#
d6148749 |
| 28-Mar-2022 |
James Y Knight <jyknight@google.com> |
[Clang] Implement __builtin_source_location.
This builtin returns the address of a global instance of the `std::source_location::__impl` type, which must be defined (with an appropriate shape) befor
[Clang] Implement __builtin_source_location.
This builtin returns the address of a global instance of the `std::source_location::__impl` type, which must be defined (with an appropriate shape) before calling the builtin.
It will be used to implement std::source_location in libc++ in a future change. The builtin is compatible with GCC's implementation, and libstdc++'s usage. An intentional divergence is that GCC declares the builtin's return type to be `const void*` (for ease-of-implementation reasons), while Clang uses the actual type, `const std::source_location::__impl*`.
In order to support this new functionality, I've also added a new 'UnnamedGlobalConstantDecl'. This artificial Decl is modeled after MSGuidDecl, and is used to represent a generic concept of an lvalue constant with global scope, deduplicated by its value. It's possible that MSGuidDecl itself, or some of the other similar sorts of things in Clang might be able to be refactored onto this more-generic concept, but there's enough special-case weirdness in MSGuidDecl that I gave up attempting to share code there, at least for now.
Finally, for compatibility with libstdc++'s <source_location> header, I've added a second exception to the "cannot cast from void* to T* in constant evaluation" rule. This seems a bit distasteful, but feels like the best available option.
Reviewers: aaron.ballman, erichkeane
Differential Revision: https://reviews.llvm.org/D120159
show more ...
|
#
b8f0e128 |
| 22-Mar-2022 |
Nikita Popov <npopov@redhat.com> |
[CodeGen] Remove some uses of deprecated Address constructor
Remove two stray uses in CodeGenModule and CGCUDANV.
|
#
5d2ce766 |
| 18-Mar-2022 |
Benjamin Kramer <benny.kra@googlemail.com> |
Use llvm::append_range instead of push_back loops where applicable. NFCI.
|
#
806bbc49 |
| 21-Feb-2022 |
Joseph Huber <jhuber6@vols.utk.edu> |
[OpenMP] Try to embed offloading objects after codegen
Currently we use the `-fembed-offload-object` option to embed a binary file into the host as a named section. This is currently only used as a
[OpenMP] Try to embed offloading objects after codegen
Currently we use the `-fembed-offload-object` option to embed a binary file into the host as a named section. This is currently only used as a codegen action, meaning we only handle this option correctly when the input is a bitcode file. This patch adds the same handling to embed an offloading object after we complete code generation. This allows us to embed the object correctly if the input file is source or bitcode.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120270
show more ...
|
#
dc152659 |
| 10-Mar-2022 |
Erich Keane <erich.keane@intel.com> |
Have cpu-specific variants set 'tune-cpu' as an optimization hint
Due to various implementation constraints, despite the programmer choosing a 'processor' cpu_dispatch/cpu_specific needs to use the
Have cpu-specific variants set 'tune-cpu' as an optimization hint
Due to various implementation constraints, despite the programmer choosing a 'processor' cpu_dispatch/cpu_specific needs to use the 'feature' list of a processor to identify it. This results in the identified processor in source-code not being propogated to the optimizer, and thus, not able to be tuned for.
This patch changes to use the actual cpu as written for tune-cpu so that opt can make decisions based on the cpu-as-spelled, which should better match the behavior expected by the programmer.
Note that the 'valid' list of processors for x86 is in llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list contains only Intel processors, but other vendors may wish to add their own entries as 'alias'es (or with different feature lists!).
If this is not done, there is two potential performance issues with the patch, but I believe them to be worth it in light of the improvements to behavior and performance.
1- In the event that the user spelled "ProcessorB", but we only have the features available to test for "ProcessorA" (where A is B minus features), AND there is an optimization opportunity for "B" that negatively affects "A", the optimizer will likely choose to do so.
2- In the event that the user spelled VendorI's processor, and the feature list allows it to run on VendorA's processor of similar features, AND there is an optimization opportunity for VendorIs that negatively affects "A"s, the optimizer will likely choose to do so. This can be fixed by adding an alias to X86TargetParser.def.
Differential Revision: https://reviews.llvm.org/D121410
show more ...
|
#
f3480390 |
| 29-Jan-2022 |
Itay Bookstein <ibookstein@gmail.com> |
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers
The purpose of this change is to fix the following codegen bug:
``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) {
[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers
The purpose of this change is to fix the following codegen bug:
``` // main.c __attribute__((cpu_specific(generic))) int *foo(void) { static int z; return &z;} int main() { return *foo() = 5; }
// other.c __attribute__((cpu_dispatch(generic))) int *foo(void);
// run: clang main.c other.c -o main; ./main ```
This will segfault prior to the change, and return the correct exit code 5 after the change.
The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int *, therefore trying to write to program text.
The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function.
I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`.
Previous discussion in: https://reviews.llvm.org/D112349
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D120266
show more ...
|
#
50650766 |
| 16-Feb-2022 |
Nikita Popov <npopov@redhat.com> |
[CodeGen] Rename deprecated Address constructor
To make uses of the deprecated constructor easier to spot, and to ensure that no new uses are introduced, rename it to Address::deprecated().
While d
[CodeGen] Rename deprecated Address constructor
To make uses of the deprecated constructor easier to spot, and to ensure that no new uses are introduced, rename it to Address::deprecated().
While doing the rename, I've filled in element types in cases where it was relatively obvious, but we're still left with 135 calls to the deprecated constructor.
show more ...
|
#
6398903a |
| 14-Feb-2022 |
Momchil Velikov <momchil.velikov@arm.com> |
Extend the `uwtable` attribute with unwind table kind
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tabl
Extend the `uwtable` attribute with unwind table kind
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or asynchronous unwind tables (2)`. However, this is encoded in LLVM IR by the presence or the absence of the `uwtable` attribute, i.e. we lose the information whether to generate want just some unwind tables or asynchronous unwind tables.
Asynchronous unwind tables take more space in the runtime image, I'd estimate something like 80-90% more, as the difference is adding roughly the same number of CFI directives as for prologues, only a bit simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even more, if you consider tail duplication of epilogue blocks. Asynchronous unwind tables could also restrict code generation to having only a finite number of frame pointer adjustments (an example of *not* having a finite number of `SP` adjustments is on AArch64 when untagging the stack (MTE) in some cases the compiler can modify `SP` in a loop). Having the CFI precise up to an instruction generally also means one cannot bundle together CFI instructions once the prologue is done, they need to be interspersed with ordinary instructions, which means extra `DW_CFA_advance_loc` commands, further increasing the unwind tables size.
That is to say, async unwind tables impose a non-negligible overhead, yet for the most common use cases (like C++ exceptions), they are not even needed.
This patch extends the `uwtable` attribute with an optional value: - `uwtable` (default to `async`) - `uwtable(sync)`, synchronous unwind tables - `uwtable(async)`, asynchronous (instruction precise) unwind tables
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114543
show more ...
|
#
87dd3d35 |
| 11-Feb-2022 |
Arthur Eubanks <aeubanks@google.com> |
[clang][OpaquePtr] Remove call to getPointerElementType() in CodeGenModule::GetAddrOfGlobalTemporary()
|
#
d8f99bb6 |
| 11-Feb-2022 |
Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com> |
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now r
[AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph.
If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument.
The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument.
Reviewed By: jdoerfert, arsenm, kpyzhov
Differential Revision: https://reviews.llvm.org/D119216
show more ...
|