#
37561ba8 |
| 19-May-2021 |
Fangrui Song <i@maskray.me> |
-fno-semantic-interposition: Don't set dso_local on GlobalVariable
`clang -fpic -fno-semantic-interposition` may set dso_local on variables for -fpic.
GCC folks consider there are 'address interpos
-fno-semantic-interposition: Don't set dso_local on GlobalVariable
`clang -fpic -fno-semantic-interposition` may set dso_local on variables for -fpic.
GCC folks consider there are 'address interposition' and 'semantic interposition', and 'disabling semantic interposition' can optimize function calls but cannot change variable references to use local aliases (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483).
This patch removes dso_local for variables in `clang -fpic -fno-semantic-interposition` mode so that the built shared objects can work with copy relocations. Building llvm-project tiself with -fno-semantic-interposition (D102453) should now be safe with trunk Clang.
Example: ``` // a.c int var; int *addr() { return var; }
// old: cannot be interposed movslq .Lvar$local(%rip), %rax // new: can be interposed movq var@GOTPCREL(%rip), %rax movslq (%rax), %rax ```
The local alias lowering for `GlobalVariable`s is kept in case there is a future option allowing local aliases.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D102583
show more ...
|
#
797ad701 |
| 18-May-2021 |
Ten Tzen <tentzen@microsoft.com> |
[Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception
[Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception for Microsoft Windows SEH (Structured Exception Handling). This is the first step of this project; only X86_64 target is enabled in this patch.
Compiler options: For clang-cl.exe, the option is -EHa, the same as MSVC. For clang.exe, the extra option is -fasync-exceptions, plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code: For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules: * First, no exception can move in or out of _try region., i.e., no "potential faulty instruction can be moved across _try boundary. * Second, the order of exceptions for instructions 'directly' under a _try must be preserved (not applied to those in callees). * Finally, global states (local/global/heap variables) that can be read outside of _try region must be updated in memory (not just in register) before the subsequent exception occurs.
The impact to C++ code: Although SEH is a feature for C code, -EHa does have a profound effect on C++ side. When a C++ function (in the same compilation unit with option -EHa ) is called by a SEH C function, a hardware exception occurs in C++ code can also be handled properly by an upstream SEH _try-handler or a C++ catch(...). As such, when that happens in the middle of an object's life scope, the dtor must be invoked the same way as C++ Synchronous Exception during unwinding process.
Design: A natural way to achieve the rules above in LLVM today is to allow an EH edge added on memory/computation instruction (previous iload/istore idea) so that exception path is modeled in Flow graph preciously. However, tracking every single memory instruction and potential faulty instruction can create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level. Instead, the proposed design tracks and reports EH state at BLOCK-level to reduce the complexity of flow graph and minimize the performance-impact on CPP code under -EHa option.
One key element of this design is the ability to compute State number at block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping into a _try is not allowed. The single entry must start with a seh_try_begin() invoke with a correct State number that is the initial state of the SEME. Through control-flow, state number is propagated into all blocks. Side exits marked by seh_try_end() will unwind to parent state based on existing SEHUnwindMap[]. Note side exits can ONLY jump into parent scopes (lower state number). Thus, when a block succeeds various states from its predecessors, the lowest State triumphs others. If some exits flow to unreachable, propagation on those paths terminate, not affecting remaining blocks. For CPP code, object lifetime region is usually a SEME as SEH _try. However there is one rare exception: jumping into a lifetime that has Dtor but has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits). Our solution is to inject a eha_scope_begin() invoke in the side entry block to ensure a correct State.
Implementation: Part-1: Clang implementation described below.
Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately added after ctor() is called and EHStack is pushed. So it must be an invoke, not a call. With that it's also guaranteed an EH-cleanup-pad is created regardless whether there exists a call in this scope. _scope_end is added before dtor(). These two intrinsics make the computation of Block-State possible in downstream code gen pass, even in the presence of ctor/dtor inlining.
Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark _try boundary and to prevent from exceptions being moved across _try boundary. All memory instructions inside a _try are considered as 'volatile' to assure 2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's acceptable as the amount of code directly under _try is very small.
Part-2 (will be in Part-2 patch): LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in BE (WinEHPreparing pass) where all other EH tables/maps are calculated. In addition to _scope_begin & _scope_end, the computation of block state also rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction is marked and reported in DAG Instruction Selection pass, the same place where the state for -EHsc (synchronous exceptions) is done. If the first instruction in a reported block scope can trap, a Nop is injected before this instruction. This nop is needed to accommodate LLVM Windows EH implementation, in which the address in IPToState table is offset by +1. (note the purpose of that is to ensure the return address of a call is in the same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is 'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches C++ exceptions). Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below: https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D80344/new/
show more ...
|
#
9f7d552c |
| 17-May-2021 |
Arthur Eubanks <aeubanks@google.com> |
[NFC] Pass GV value type instead of pointer type to GetOrCreateLLVMGlobal
For opaque pointers, to avoid PointerType::getElementType().
Reviewed By: dblaikie
Differential Revision: https://reviews.
[NFC] Pass GV value type instead of pointer type to GetOrCreateLLVMGlobal
For opaque pointers, to avoid PointerType::getElementType().
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102638
show more ...
|
#
a624cec5 |
| 13-May-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[Clang][Codegen] Do not annotate thunk's this/return types with align/deref/nonnull attrs
As it was discovered in post-commit feedback for 0aa0458f1429372038ca6a4edc7e94c96cd9a753, we handle thunks
[Clang][Codegen] Do not annotate thunk's this/return types with align/deref/nonnull attrs
As it was discovered in post-commit feedback for 0aa0458f1429372038ca6a4edc7e94c96cd9a753, we handle thunks incorrectly, and end up annotating their this/return with attributes that are valid for their callees, not for thunks themselves.
While it would be good to fix this properly, and keep annotating them on thunks, i've tried doing that in https://reviews.llvm.org/D100388 with little success, and the patch is stuck for a month now.
So for now, as a stopgap measure, subj.
show more ...
|
#
98575708 |
| 11-May-2021 |
Yaxun (Sam) Liu <yaxun.liu@amd.com> |
[CUDA][HIP] Fix device template variables
Currently clang does not emit device template variables instantiated only in host functions, however, nvcc is able to do that:
https://godbolt.org/z/fneEff
[CUDA][HIP] Fix device template variables
Currently clang does not emit device template variables instantiated only in host functions, however, nvcc is able to do that:
https://godbolt.org/z/fneEfferY
This patch fixes this issue by refactoring and extending the existing mechanism for emitting static device var ODR-used by host only. Basically clang records device variables ODR-used by host code and force them to be emitted in device compilation. The existing mechanism makes sure these device variables ODR-used by host code are added to llvm.compiler-used, therefore they are guaranteed not to be deleted.
It also fixes non-ODR-use of static device variable by host code causing static device variable to be emitted and registered, which should not.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D102237
show more ...
|
#
df729e2b |
| 22-Apr-2021 |
Johannes Doerfert <johannes@jdoerfert.de> |
[OpenMP] Overhaul `declare target` handling
This patch fixes various issues with our prior `declare target` handling and extends it to support `omp begin declare target` as well.
This started with
[OpenMP] Overhaul `declare target` handling
This patch fixes various issues with our prior `declare target` handling and extends it to support `omp begin declare target` as well.
This started with PR49649 in mind, trying to provide a way for users to avoid the "ref" global use introduced for globals with internal linkage. From there it went down the rabbit hole, e.g., all variables, even `nohost` ones, were emitted into the device code so it was impossible to determine if "ref" was needed late in the game (based on the name only). To make it really useful, `begin declare target` was needed as it can carry the `device_type`. Not emitting variables eagerly had a ripple effect. Finally, the precedence of the (explicit) declare target list items needed to be taken into account, that meant we cannot just look for any declare target attribute to make a decision. This caused the handling of functions to require fixup as well.
I tried to clean up things while I was at it, e.g., we should not "parse declarations and defintions" as part of OpenMP parsing, this will always break at some point. Instead, we keep track what region we are in and act on definitions and declarations instead, this is what we do for declare variant and other begin/end directives already.
Highlights: - new diagnosis for restrictions specificed in the standard, - delayed emission of globals not mentioned in an explicit list of a declare target, - omission of `nohost` globals on the host and `host` globals on the device, - no explicit parsing of declarations in-between `omp [begin] declare variant` and the corresponding end anymore, regular parsing instead, - precedence for explicit mentions in `declare target` lists over implicit mentions in the declaration-definition-seq, and - `omp allocate` declarations will now replace an earlier emitted global, if necessary.
---
Notes:
The patch is larger than I hoped but it turns out that most changes do on their own lead to "inconsistent states", which seem less desirable overall.
After working through this I feel the standard should remove the explicit declare target forms as the delayed emission is horrible. That said, while we delay things anyway, it seems to me we check too often for the current status even though that is often not sufficient to act upon. There seems to be a lot of duplication that can probably be trimmed down. Eagerly emitting some things seems pretty weak as an argument to keep so much logic around.
---
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D101030
show more ...
|
#
84c47543 |
| 21-Apr-2021 |
Leonard Chan <leonardchan@google.com> |
[clang] Add -fc++-abi= flag for specifying which C++ ABI to use
This implements the flag proposed in RFC http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html.
The goal is to add a way to
[clang] Add -fc++-abi= flag for specifying which C++ ABI to use
This implements the flag proposed in RFC http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html.
The goal is to add a way to override the default target C++ ABI through a compiler flag. This makes it easier to test and transition between different C++ ABIs through compile flags rather than build flags.
In this patch:
- Store -fc++-abi= in a LangOpt. This isn't stored in a CodeGenOpt because there are instances outside of codegen where Clang needs to know what the ABI is (particularly through ASTContext::createCXXABI), and we should be able to override the target default if the flag is provided at that point. - Expose the existing ABIs in TargetCXXABI as values that can be passed through this flag. - Create a .def file for these ABIs to make it easier to check flag values. - Add an error for diagnosing bad ABI flag values.
Differential Revision: https://reviews.llvm.org/D85802
show more ...
|
#
2669abae |
| 04-May-2021 |
Alex Lorenz <arphaman@gmail.com> |
[clang][CodeGen] Use llvm::stable_sort for multi version resolver options
The use of llvm::sort causes periodic failures on the bot with EXPENSIVE_CHECKS enabled, as the regular sort pre-shuffles th
[clang][CodeGen] Use llvm::stable_sort for multi version resolver options
The use of llvm::sort causes periodic failures on the bot with EXPENSIVE_CHECKS enabled, as the regular sort pre-shuffles the array in the expensive checks mode, leading to a non-deterministic test result which causes the CodeGenCXX/attr-cpuspecific-outoflinedefs.cpp testcase to fail on the bot (http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/).
show more ...
|
#
7818906c |
| 20-Dec-2019 |
Alexey Bader <alexey.bader@intel.com> |
[SYCL] Implement SYCL address space attributes handling
Default address space (applies when no explicit address space was specified) maps to generic (4) address space.
Added SYCL named address spac
[SYCL] Implement SYCL address space attributes handling
Default address space (applies when no explicit address space was specified) maps to generic (4) address space.
Added SYCL named address spaces `sycl_global`, `sycl_local` and `sycl_private` defined as sub-sets of the default address space.
Static variables without address space now reside in global address space when compile for SPIR target, unless they have an explicit address space qualifier in source code.
Differential Revision: https://reviews.llvm.org/D89909
show more ...
|
#
2786e673 |
| 23-Apr-2021 |
Fangrui Song <i@maskray.me> |
[IR][sanitizer] Add module flag "frame-pointer" and set it for cc1 -mframe-pointer={non-leaf,all}
The Linux kernel objtool diagnostic `call without frame pointer save/setup` arise in multiple instru
[IR][sanitizer] Add module flag "frame-pointer" and set it for cc1 -mframe-pointer={non-leaf,all}
The Linux kernel objtool diagnostic `call without frame pointer save/setup` arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism introduced in D100251, it's trivial to respect the command line -m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.
Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan) Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)
Also document the function attribute "frame-pointer" which is long overdue.
Differential Revision: https://reviews.llvm.org/D101016
show more ...
|
#
775a9483 |
| 21-Apr-2021 |
Fangrui Song <i@maskray.me> |
[IR][sanitizer] Set nounwind on module ctor/dtor, additionally set uwtable if -fasynchronous-unwind-tables
On ELF targets, if a function has uwtable or personality, or does not have nounwind (`needs
[IR][sanitizer] Set nounwind on module ctor/dtor, additionally set uwtable if -fasynchronous-unwind-tables
On ELF targets, if a function has uwtable or personality, or does not have nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.
Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified. (i.e. If -g[123], every function gets `.eh_frame`. This behavior is strange but that is the status quo on GCC and Clang.)
Let's take asan as an example. Other sanitizers are similar. `asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true, so every function gets `.eh_frame` if `-g[123]` is specified. This is the root cause that `-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame while `-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.
This patch
* sets the nounwind attribute on sanitizer module ctor/dtor. * let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.
The "uwtable" mechanism is generic: synthesized functions not cloned/specialized from existing ones should consider `Function::createWithDefaultAttr` instead of `Function::create` if they want to get some default attributes which have more of module semantics.
Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955 https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.
Differential Revision: https://reviews.llvm.org/D100251
show more ...
|
#
0ed61361 |
| 20-Apr-2021 |
Erich Keane <erich.keane@intel.com> |
Ensure target-multiversioning emits deferred declarations
As reported in PR50025, sometimes we would end up not emitting functions needed by inline multiversioned variants. This is because we typica
Ensure target-multiversioning emits deferred declarations
As reported in PR50025, sometimes we would end up not emitting functions needed by inline multiversioned variants. This is because we typically use the 'deferred decl' mechanism to emit these. However, the variants are emitted after that typically happens. This fixes that by ensuring we re-run deferred decls after this happens. Also, the multiversion emission is done recursively to ensure that MV functions that require other MV functions to be emitted get emitted.
show more ...
|
#
6a72ed23 |
| 19-Apr-2021 |
Jan Svoboda <jan_svoboda@apple.com> |
[clang] NFC: Fix range-based for loop warnings related to decl lookup
|
#
f6f21dcd |
| 24-Mar-2021 |
Alexey Bataev <a.bataev@outlook.com> |
[OPENMP]Fix PR49636: Assertion `(!Entry.getAddress() || Entry.getAddress() == Addr) && "Resetting with the new address."' failed.
The original issue is caused by the fact that the variable is alloca
[OPENMP]Fix PR49636: Assertion `(!Entry.getAddress() || Entry.getAddress() == Addr) && "Resetting with the new address."' failed.
The original issue is caused by the fact that the variable is allocated with incorrect type i1 instead of i8. This causes the bitcasting of the declaration to i8 type and the bitcast expression does not match the original variable. To fix the problem, the UndefValue initializer and the original variable should be emitted with type i8, not i1.
Differential Revision: https://reviews.llvm.org/D99297
show more ...
|
#
d672d521 |
| 11-Mar-2021 |
Alex Lorenz <arphaman@gmail.com> |
Revert "[CodeGenModule] Set dso_local for Mach-O GlobalValue"
This reverts commit 809a1e0ffd7af40ee27270ff8ba2ffc927330e71.
Mach-O doesn't support dso_local and this change broke XNU because of the
Revert "[CodeGenModule] Set dso_local for Mach-O GlobalValue"
This reverts commit 809a1e0ffd7af40ee27270ff8ba2ffc927330e71.
Mach-O doesn't support dso_local and this change broke XNU because of the use of dso_local.
Differential Revision: https://reviews.llvm.org/D98458
show more ...
|
#
440f6bdf |
| 16-Mar-2021 |
Luke Drummond <luke.drummond@codeplay.com> |
[OpenCL][NFCI] Prefer CodeGenFunction::EmitRuntimeCall
`CodeGenFunction::EmitRuntimeCall` automatically sets the right calling convention for the callee so we can avoid setting it ourselves.
As req
[OpenCL][NFCI] Prefer CodeGenFunction::EmitRuntimeCall
`CodeGenFunction::EmitRuntimeCall` automatically sets the right calling convention for the callee so we can avoid setting it ourselves.
As requested in https://reviews.llvm.org/D98411
Reviewed by: anastasia Differential Revision: https://reviews.llvm.org/D98705
show more ...
|
#
fcfd3fda |
| 10-Mar-2021 |
Luke Drummond <luke.drummond@codeplay.com> |
[OpenCL] Respect calling convention for builtin
`__translate_sampler_initializer` has a calling convention of `spir_func`, but clang generated calls to it using the default CC.
Instruction Combinin
[OpenCL] Respect calling convention for builtin
`__translate_sampler_initializer` has a calling convention of `spir_func`, but clang generated calls to it using the default CC.
Instruction Combining was lowering these mismatching calling conventions to `store i1* undef` which itself was subsequently lowered to a trap instruction by simplifyCFG resulting in runtime `SIGILL`
There are arguably two bugs here: but whether there's any wisdom in converting an obviously invalid call into a runtime crash over aborting with a sensible error message will require further discussion. So for now it's enough to set the right calling convention on the runtime helper.
Reviewed By: svenh, bader
Differential Revision: https://reviews.llvm.org/D98411
show more ...
|
#
cdb42a4c |
| 12-Mar-2021 |
Sriraman Tallam <tmsriram@google.com> |
Disable unique linkage suffixes ifor global vars until demanglers can be fixed.
D96109 added support for unique internal linkage names for both internal linkage functions and global variables. There
Disable unique linkage suffixes ifor global vars until demanglers can be fixed.
D96109 added support for unique internal linkage names for both internal linkage functions and global variables. There was a lot of discussion on how to get the demangling right for functions but I completely missed the point that demanglers do not support suffixes for global vars. For example:
$ c++filt _ZL3foo foo $ c++filt _ZL3foo.uniq.123 _ZL3foo.uniq.123
The demangling for functions works as expected.
I am not sure of the impact of this. I don't understand how debuggers and other tools depend on the correctness of global variable demangling so I am pre-emptively disabling it until we can get the demangling support added.
Importantly, uniquefying global variables is not needed right now as we do not do profile attribution to global vars based on sampling. It was added for completeness and so this feature is not exactly missed.
Differential Revision: https://reviews.llvm.org/D98392
show more ...
|
#
78d0e918 |
| 05-Mar-2021 |
Sriraman Tallam <tmsriram@google.com> |
Refactor -funique-internal-linakge-names implementation.
The option -funique-internal-linkage-names was added in D73307 and D78243 as a LLVM early pass to insert a unique suffix to internal linkage
Refactor -funique-internal-linakge-names implementation.
The option -funique-internal-linkage-names was added in D73307 and D78243 as a LLVM early pass to insert a unique suffix to internal linkage functions and vars. The unique suffix was the hash of the module path. However, we found that this can be done more cleanly in clang early and the fixes that need to be done later can be completely avoided. The fixes in particular are trying to modify the DW_AT_linkage_name and finding the right place to insert the pass.
This patch ressurects the original implementation proposed in D73307 which was reviewed and then ditched in favor of the pass based approach.
Differential Revision: https://reviews.llvm.org/D96109
show more ...
|
#
19005035 |
| 10-Feb-2021 |
Akira Hatanaka <ahatanaka@apple.com> |
[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR
This reapplies ed4718eccb12bd42214ca4fb17d196d49561c0c7, which was reverted
[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR
This reapplies ed4718eccb12bd42214ca4fb17d196d49561c0c7, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in 75805dce5ff874676f3559c069fcd6737838f5c0.
Original commit message:
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925).
- The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
show more ...
|
#
e7e67c93 |
| 03-Mar-2021 |
Wang, Pengfei <pengfei.wang@intel.com> |
Add Windows ehcont section support (/guard:ehcont).
Add option /guard:ehcont
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D96709
|
#
0a5dd067 |
| 03-Mar-2021 |
Hans Wennborg <hans@chromium.org> |
Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering o
Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details.
> Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808
This reverts commit ed4718eccb12bd42214ca4fb17d196d49561c0c7.
show more ...
|
#
9e2579db |
| 01-Mar-2021 |
Richard Smith <richard@metafoo.co.uk> |
Fix infinite recursion during IR emission if a constant-initialized lifetime-extended temporary object's initializer refers back to the same object.
`GetAddrOfGlobalTemporary` previously tried to em
Fix infinite recursion during IR emission if a constant-initialized lifetime-extended temporary object's initializer refers back to the same object.
`GetAddrOfGlobalTemporary` previously tried to emit the initializer of a global temporary before updating the global temporary map. Emitting the initializer could recurse back into `GetAddrOfGlobalTemporary` for the same temporary, resulting in an infinite recursion.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D97733
show more ...
|
#
5cf2a37f |
| 08-Feb-2021 |
Yaxun (Sam) Liu <yaxun.liu@amd.com> |
[HIP] Emit kernel symbol
Currently clang uses stub function to launch kernel. This is inconvenient to interop with C++ programs since the stub function has different name as kernel, which is require
[HIP] Emit kernel symbol
Currently clang uses stub function to launch kernel. This is inconvenient to interop with C++ programs since the stub function has different name as kernel, which is required by ROCm debugger.
This patch emits a variable symbol which has the same name as the kernel and uses it to register and launch the kernel. This allows C++ program to launch a kernel by using the original kernel name.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D86376
show more ...
|
#
8afdacba |
| 26-Feb-2021 |
Fangrui Song <i@maskray.me> |
Add GNU attribute 'retain'
For ELF targets, GCC 11 will set SHF_GNU_RETAIN on the section of a `__attribute__((retain))` function/variable to prevent linker garbage collection. (See AttrDocs.td for
Add GNU attribute 'retain'
For ELF targets, GCC 11 will set SHF_GNU_RETAIN on the section of a `__attribute__((retain))` function/variable to prevent linker garbage collection. (See AttrDocs.td for the linker support).
This patch adds `retain` functions/variables to the `llvm.used` list, which has the desired linker GC semantics. Note: `retain` does not imply `used`, so an unused function/variable can be dropped by Sema.
Before 'retain' was introduced, previous ELF solutions require inline asm or linker tricks, e.g. `asm volatile(".reloc 0, R_X86_64_NONE, target");` (architecture dependent) or define a non-local symbol in the section and use `ld -u`. There was no elegant source-level solution.
With D97448, `__attribute__((retain))` will set `SHF_GNU_RETAIN` on ELF targets.
Differential Revision: https://reviews.llvm.org/D97447
show more ...
|