#
2e4d2762 |
| 09-Feb-2024 |
Pranav Kant <prka@google.com> |
[X86][CodeGen] Emit float128 libcalls for math functions (#79611)
Make LLVM emit libcalls to proper float128 variants for float128 types.
|
Revision tags: llvmorg-18.1.0-rc2 |
|
#
cca49663 |
| 06-Feb-2024 |
Craig Topper <craig.topper@sifive.com> |
[FastISel][X86] Use getTypeForExtReturn in GetReturnInfo. (#80803)
The comment and code here seems to match getTypeForExtReturn. The
history shows that at the time this code was added, similar code
[FastISel][X86] Use getTypeForExtReturn in GetReturnInfo. (#80803)
The comment and code here seems to match getTypeForExtReturn. The
history shows that at the time this code was added, similar code existed
in SelectionDAGBuilder. SelectionDAGBuiler code has since been
refactored into getTypeForExtReturn.
This patch makes FastISel match SelectionDAGBuilder.
The test changes are because X86 has customization of
getTypeForExtReturn. So now we only extend returns to i8.
Stumbled onto this difference by accident.
show more ...
|
#
274d1b00 |
| 02-Feb-2024 |
Harald van Dijk <harald@gigawatt.nl> |
[NFC] Add useFPRegsForHalfType(). (#74147)
Currently, half operations can be promoted in one of two ways.
* If softPromoteHalfType() returns false, fp16 values are passed around
in fp32 register
[NFC] Add useFPRegsForHalfType(). (#74147)
Currently, half operations can be promoted in one of two ways.
* If softPromoteHalfType() returns false, fp16 values are passed around
in fp32 registers, and whole chains of fp16 operations are promoted to
fp32 in one go.
* If softPromoteHalfType() returns true, fp16 values are passed around
in i16 registers, and individual fp16 operations are promoted to fp32
and the result truncated to fp16 right away.
The softPromoteHalfType behavior is necessary for correctness, but
changing this for an existing target breaks the ABI. Therefore, this
commit adds a third option:
* If softPromoteHalfType() returns true and useFPRegsForHalfType()
returns true as well, fp16 values are passed around in fp32 registers,
but individual fp16 operations are promoted to fp32 and the result
truncated to fp16 right away.
This change does not yet update any target to make use of it.
show more ...
|
Revision tags: llvmorg-18.1.0-rc1 |
|
#
184ca395 |
| 25-Jan-2024 |
Nico Weber <thakis@chromium.org> |
[llvm] Move CodeGenTypes library to its own directory (#79444)
Finally addresses https://reviews.llvm.org/D148769#4311232 :)
No behavior change.
|
Revision tags: llvmorg-19-init |
|
#
b58f91a3 |
| 09-Jan-2024 |
James Y Knight <jyknight@google.com> |
Set the default value for MaxAtomicSizeInBitsSupported to 0.
This was planned since its introduction, but wasn't rolled out for a little bit longer than intended (ahem...8 years).
All in-tree targe
Set the default value for MaxAtomicSizeInBitsSupported to 0.
This was planned since its introduction, but wasn't rolled out for a little bit longer than intended (ahem...8 years).
All in-tree targets have now been adjusted to call setMaxAtomicSizeInBitsSupported explicitly where required, so this should be a no-op. The docs in docs/Atomics.rst already claimed the default was 0, so that doesn't need updating.
show more ...
|
#
ff0c1f20 |
| 04-Jan-2024 |
Jie Fu <jiefu@tencent.com> |
[CodeGen] Remove unused variables in TargetLoweringBase.cpp (NFC)
llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:12: error: unused variable 'ModeN' [-Werror,-Wunused-variable] 570 | un
[CodeGen] Remove unused variables in TargetLoweringBase.cpp (NFC)
llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:12: error: unused variable 'ModeN' [-Werror,-Wunused-variable] 570 | unsigned ModeN, ModelN; | ^~~~~ llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:19: error: unused variable 'ModelN' [-Werror,-Wunused-variable] 570 | unsigned ModeN, ModelN; | ^~~~~~ 2 errors generated.
show more ...
|
#
ce61b0e9 |
| 04-Jan-2024 |
Thomas Preud'homme <thomas.preudhomme@arm.com> |
Add out-of-line-atomics support to GlobalISel (#74588)
This patch implement the GlobalISel counterpart to
4d7df43ffdb460dddb2877a886f75f45c3fee188.
|
Revision tags: llvmorg-17.0.6 |
|
#
d8b8aa3a |
| 27-Nov-2023 |
Youngsuk Kim <youngsuk.kim@hpe.com> |
[llvm] Replace calls to Type::getPointerTo (NFC)
Cleanup work towards removing the method Type::getPointerTo.
If a call to Type::getPointerTo is used solely to support an unneeded pointer-cast, rem
[llvm] Replace calls to Type::getPointerTo (NFC)
Cleanup work towards removing the method Type::getPointerTo.
If a call to Type::getPointerTo is used solely to support an unneeded pointer-cast, remove the call entirely.
show more ...
|
#
f3138524 |
| 14-Nov-2023 |
Acim-Maravic <119684637+Acim-Maravic@users.noreply.github.com> |
[AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.
Co-authored-by: Acim Maravic <acim.maravic@amd.c
[AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.
Co-authored-by: Acim Maravic <acim.maravic@amd.com>
show more ...
|
Revision tags: llvmorg-17.0.5 |
|
#
7b9d73c2 |
| 07-Nov-2023 |
Paulo Matos <pmatos@igalia.com> |
[NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
|
#
50f69e5f |
| 31-Oct-2023 |
Fangrui Song <i@maskray.me> |
insertSSPDeclarations: adjust Darwin condition that sets dso_local
This change is for AArch32 and not strictly needed, but it ensures that we follow the model that direct accesses are only emitted f
insertSSPDeclarations: adjust Darwin condition that sets dso_local
This change is for AArch32 and not strictly needed, but it ensures that we follow the model that direct accesses are only emitted for dso_local and we do not need TargetMachine::shouldAssumeDSOLocal to force dso_local for a dso_preemptable variable.
There is no behavior change to the arm/arm64 configurations listed in commit 5888dee7d04748744743a35d3aef030018bdc275.
show more ...
|
Revision tags: llvmorg-17.0.4 |
|
#
98c90a13 |
| 19-Oct-2023 |
Ramkumar Ramachandra <Ramkumar.Ramachandra@imgtec.com> |
ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924)
The issue #55208 noticed that std::rint is vectorized by the
SLPVectorizer, but a very similar function, std::lrint, i
ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924)
The issue #55208 noticed that std::rint is vectorized by the
SLPVectorizer, but a very similar function, std::lrint, is not.
std::lrint corresponds to ISD::LRINT in the SelectionDAG, and
std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now,
neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant,
and the LangRef makes this clear in the documentation of llvm.lrint.*
and llvm.llrint.*.
This patch extends the LangRef to include vector variants of
llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of
scalarizing it for all targets. However, this patch would be devoid of
motivation unless we show the utility of these new vector variants.
Hence, the RISCV target has been chosen to implement a custom lowering
to the vfcvt.x.f.v instruction. The patch also includes a CostModel for
RISCV, and a trivial follow-up can potentially enable the SLPVectorizer
to vectorize std::lrint and std::llrint, fixing #55208.
The patch includes tests, obviously for the RISCV target, but also for
the X86, AArch64, and PowerPC targets to justify the addition of the
vector variants to the LangRef.
show more ...
|
Revision tags: llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3 |
|
#
b14e83d1 |
| 12-Aug-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10 to fix this asymmetry. AMDGPU already has most of the code for f32 exp10 expansion implemented alo
IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10 to fix this asymmetry. AMDGPU already has most of the code for f32 exp10 expansion implemented alongside exp, so the current implementation is duplicating nearly identical effort between the compiler and library which is inconvenient.
https://reviews.llvm.org/D157871
show more ...
|
#
6862f0fa |
| 24-Aug-2023 |
Serge Pavlov <sepavloff@gmail.com> |
[FPEnv] Intrinsics for access to FP control modes
The change introduces intrinsics 'get_fpmode', 'set_fpmode' and 'reset_fpmode'. They manage all target dynamic floating-point control modes, which i
[FPEnv] Intrinsics for access to FP control modes
The change introduces intrinsics 'get_fpmode', 'set_fpmode' and 'reset_fpmode'. They manage all target dynamic floating-point control modes, which include, for instance, rounding direction, precision, treatment of denormals and so on. The intrinsics do the same operations as the C library functions 'fegetmode' and 'fesetmode'. By default they are lowered to calls to these functions.
Two main use cases are supported by this implementation.
1. Local modification of the control modes. In this case the code usually has a pattern (in pseudocode):
saved_modes = get_fpmode() set_fpmode(<new_modes>) ... <do operations under the new modes> ... set_fpmode(saved_modes)
In the case when it is known that the current FP environment is default, the code may be shorter:
set_fpmode(<new_modes>) ... <do operations under the new modes> ... reset_fpmode()
Such patterns appear not only in user code but also in implementations of various FP controlling pragmas. In particular, the implementation of `#pragma STDC FENV_ROUND` requires similar code if the target does not support static rounding mode.
2. Portable control of FP modes. Usually FP control modes are set by writing to some control register. Different targets have different layout of this register, the way the register is accessed also may be different. Using set of target-specific definitions for the control register bits together with these intrinsic functions provides enough portable way to handle control modes across wide range of hardware.
This change defines only llvm intrinsic function, which implement the access required for the aforementioned use cases.
Differential Revision: https://reviews.llvm.org/D82525
show more ...
|
Revision tags: llvmorg-17.0.0-rc2 |
|
#
778fa4ed |
| 31-Jul-2023 |
David Green <david.green@arm.com> |
[AArch64] Add some basic handling for bf16 constants.
This adds some basic handling for bf16 constants, attempting to treat them a lot like fp16 constants where it can. Zero immediates get lowered t
[AArch64] Add some basic handling for bf16 constants.
This adds some basic handling for bf16 constants, attempting to treat them a lot like fp16 constants where it can. Zero immediates get lowered to FMOVH0, others either get lowered to FMOVWHr(MOVi32imm) or use FMOVHi if they can. Without fp16 they get expanded. This may not always be optimal, but fixes a gap in our lowering. See llvm/test/CodeGen/AArch64/f16-imm.ll for the equivalent fp16 test.
Differential Revision: https://reviews.llvm.org/D156649
show more ...
|
Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3 |
|
#
003b58f6 |
| 27-Apr-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return values. Alternatively could introduce a pair of intrinsics to separately return the fractional and expon
IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return values. Alternatively could introduce a pair of intrinsics to separately return the fractional and exponent parts.
AMDGPU has native instructions to return the two halves, but could use some generic legalization and optimization handling. For example, we should be able to handle legalization of f16 on older targets, and for bf16. Additionally antique targets need a hardware workaround which would be better handled in the backend rather than in library code where it is now.
show more ...
|
#
1ec30106 |
| 23-Jun-2023 |
Amara Emerson <amara@apple.com> |
Darwin: Use the GOT to reference ___stack_chk_guard.
e018cbf7208b changed the default behaviour for Darwin, and this breaks some existing software.
rdar://110350601
|
#
26bfbec5 |
| 09-Jun-2023 |
Anna Thomas <anna@azul.com> |
[Intrinsic] Introduce reduction intrinsics for minimum/maximum
This patch introduces the reduction intrinsic for floating point minimum and maximum which has the same semantics (for NaN and signed z
[Intrinsic] Introduce reduction intrinsics for minimum/maximum
This patch introduces the reduction intrinsic for floating point minimum and maximum which has the same semantics (for NaN and signed zero) as llvm.minimum and llvm.maximum.
Reviewed-By: nikic
Differential Revision: https://reviews.llvm.org/D152370
show more ...
|
#
eece6ba2 |
| 27-Apr-2023 |
Matt Arsenault <Matthew.Arsenault@amd.com> |
IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic op
IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics
AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support.
Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.
show more ...
|
#
eecaeb6f |
| 05-Jun-2023 |
Serge Pavlov <sepavloff@gmail.com> |
[FPEnv] Intrinsics for access to FP environment
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'. They are used to read floating-point environment, set it or reset to some
[FPEnv] Intrinsics for access to FP environment
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'. They are used to read floating-point environment, set it or reset to some default state. They do the same actions as C library functions 'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls to these functions.
The new intrinsics specify FP environment as a value of integer type, it is convenient of most targets where the FP state is a content of some register. Some targets however use long representations. On X86 the size of FP environment is 256 bits, and even half of this size is not a legal ibteger type. To facilitate legalization in such cases, two sets of DAG nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP environment may be represented by a legal integer type. Nodes GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in memory, much like `fesetenv` and `fegetenv` do. They are used when target has long representation for floationg-point state.
Differential Revision: https://reviews.llvm.org/D71742
show more ...
|
#
e018cbf7 |
| 23-May-2023 |
Fangrui Song <i@maskray.me> |
[IR] Make stack protector symbol dso_local according to -f[no-]direct-access-external-data
There are two motivations.
`-fno-pic -fstack-protector -mstack-protector-guard=global` created `__stack_ch
[IR] Make stack protector symbol dso_local according to -f[no-]direct-access-external-data
There are two motivations.
`-fno-pic -fstack-protector -mstack-protector-guard=global` created `__stack_chk_guard` is referenced directly on all ELF OSes except FreeBSD. This patch allows referencing the symbol indirectly with -fno-direct-access-external-data.
Some Linux kernel folks want `-fno-pic -fstack-protector -mstack-protector-guard-reg=gs -mstack-protector-guard-symbol=__stack_chk_guard` created `__stack_chk_guard` to be referenced directly, avoiding R_X86_64_REX_GOTPCRELX (even if the relocation may be optimized out by the linker). https://github.com/llvm/llvm-project/issues/60116 Why they need this isn't so clear to me.
---
Add module flag "direct-access-external-data" and set the dso_local property of the stack protector symbol. The module flag can benefit other LLVMCodeGen synthesized symbols that are not represented in LLVM IR.
Nowadays, with `-fno-pic` being uncommon, ideally we should set "direct-access-external-data" when it is true. However, doing so would require ~90 clang/test tests to be updated, which are too much.
As a compromise, we set "direct-access-external-data" only when it's different from the implied default value.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D150841
show more ...
|
Revision tags: llvmorg-16.0.2 |
|
#
c1221251 |
| 10-Apr-2023 |
NAKAMURA Takumi <geek4civic@gmail.com> |
Restore CodeGen/MachineValueType.h from `Support`
This is rework of;
- rG13e77db2df94 (r328395; MVT)
Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h` can be restored
Restore CodeGen/MachineValueType.h from `Support`
This is rework of;
- rG13e77db2df94 (r328395; MVT)
Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h` can be restored as well.
Depends on D148767
Differential Revision: https://reviews.llvm.org/D149024
show more ...
|
#
e744e51b |
| 29-Apr-2023 |
Sergei Barannikov <barannikov88@gmail.com> |
[SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.
Reviewed By: RKSimon
Differential Revision: https://reviews
[SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)
This will make them consistent with other overflow-aware nodes.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D148196
show more ...
|
#
f1924d96 |
| 06-Apr-2023 |
Craig Topper <craig.topper@sifive.com> |
[SelectionDAG] Expand VP SDNodes by default.
Differential Revision: https://reviews.llvm.org/D147643
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4 |
|
#
ddccc5ba |
| 27-Feb-2023 |
Nikita Popov <npopov@redhat.com> |
[CodeGen] Always expand division larger than i128
Default MaxDivRemBitWidthSupported to 128, so that divisions larger than 128 bits are always expanded, without requiring additional configuration fr
[CodeGen] Always expand division larger than i128
Default MaxDivRemBitWidthSupported to 128, so that divisions larger than 128 bits are always expanded, without requiring additional configuration from the target.
Note that this may still emit calls to __udivti3 on 32-bit targets, which likely don't have an implementation of that builtin. However, I believe this is sufficient to fix https://github.com/llvm/llvm-project/issues/60531, because Zig must already be defining those builtins.
Differential Revision: https://reviews.llvm.org/D144871
show more ...
|