#
11ef356d |
| 05-Feb-2021 |
Craig Topper <craig.topper@sifive.com> |
[TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D96097
|
#
d745b82d |
| 25-Jan-2021 |
Fangrui Song <i@maskray.me> |
[XRay] Support DW_TAG_call_site and delete unneeded PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL lowering
|
#
551aaa24 |
| 22-Jan-2021 |
Kazu Hirata <kazu@google.com> |
[llvm] Use isDigit (NFC)
|
#
cfc60730 |
| 18-Dec-2020 |
Jessica Paquette <jpaquette@apple.com> |
[GlobalISel] Combine (a[0]) | (a[1] << k1) | ...| (a[m] << kn) into a wide load
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`. (See D27861)
This tries to recognize
[GlobalISel] Combine (a[0]) | (a[1] << k1) | ...| (a[m] << kn) into a wide load
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`. (See D27861)
This tries to recognize patterns like below (assuming a little-endian target):
``` s8* x = ... s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24) -> s32 val = *((i32)a)
s8* x = ... s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24) -> s32 val = BSWAP(*((s32)a)) ```
(This patch also handles the big-endian target case as well, in which the first example above has a BSWAP, and the second example above does not.)
To recognize the pattern, this searches from the last G_OR in the expression tree.
E.g.
``` Reg Reg \ / OR_1 Reg \ / OR_2 \ Reg .. / Root ```
Each non-OR register in the tree is put in a list. Each register in the list is then checked to see if it's an appropriate load + shift logic.
If every register is a load + potentially a shift, the combine checks if those loads + shifts, when OR'd together, are equivalent to a wide load (possibly with a BSWAP.)
To simplify things, this patch
(1) Only handles G_ZEXTLOADs (which appear to be the common case) (2) Only works in a single MachineBasicBlock (3) Only handles G_SHL as the bit twiddling to stick the small load into a specific location
An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from test/CodeGen/AArch64/load-combine.ll)
At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3, and a 0.4% improvement for CTMark/7zip-benchmark.
Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was the first instruction in the block.
Differential Revision: https://reviews.llvm.org/D94350
show more ...
|
#
8f004471 |
| 02-Jan-2021 |
Brandon Bergren <bdragon@FreeBSD.org> |
[PowerPC] Add the LLVM triple for powerpcle [1/5]
Add a triple for powerpcle-*-*.
This is a little-endian encoding of the 32-bit PowerPC ABI, useful in certain niche situations:
1) A loader such a
[PowerPC] Add the LLVM triple for powerpcle [1/5]
Add a triple for powerpcle-*-*.
This is a little-endian encoding of the 32-bit PowerPC ABI, useful in certain niche situations:
1) A loader such as the FreeBSD loader which will be loading a little endian kernel. This is required for PowerPC64LE to load properly in pseries VMs. Such a loader is implemented as a freestanding ELF32 LSB binary.
2) Userspace emulation of a 32-bit LE architecture such as x86 on 64-bit hosts such as PowerPC64LE with tools like box86 requires having a 32-bit LE toolchain and library set, as they operate by translating only the main binary and switching to native code when making library calls.
3) The Void Linux for PowerPC project is experimenting with running an entire powerpcle userland.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93918
show more ...
|
Revision tags: llvmorg-11.0.1, llvmorg-11.0.1-rc2 |
|
#
a89d751f |
| 17-Dec-2020 |
Bjorn Pettersson <bjorn.a.pettersson@ericsson.com> |
Add intrinsics for saturating float to int casts
This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptos
Add intrinsics for saturating float to int casts
This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ
The intrinsics have overloaded source and result type and support vector operands:
i32 @llvm.fptoui.sat.i32.f32(float %f) i100 @llvm.fptoui.sat.i100.f64(double %f) <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f) // etc
On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is an integer constant specifying the scalar saturation width. The idea here is that initially the second operand and the scalar width of the result type are the same, but they may change during type legalization. For example:
i19 @llvm.fptsi.sat.i19.f32(float %f) // builds i19 fp_to_sint_sat f, 19 // type legalizes (through integer result promotion) i32 fp_to_sint_sat f, 19
I went for this approach, because saturated conversion does not compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width.
There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like:
f = fmaxnum f, FP(MIN) f = fminnum f, FP(MAX) i = fptoxi f i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
If the bounds cannot be exactly represented, we expand to something like this instead:
i = fptoxi f i = select f ult FP(MIN), MIN, i i = select f ogt FP(MAX), MAX, i i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
It should be noted that this expansion assumes a non-trapping fptoxi.
Initial tests are for AArch64, x86_64 and ARM. This exercises all of the scalar and vector legalization. ARM is included to test float softening.
Original patch by @nikic and @ebevhan (based on D54696).
Differential Revision: https://reviews.llvm.org/D54749
show more ...
|
#
08e287aa |
| 14-Dec-2020 |
QingShan Zhang <qshanz@cn.ibm.com> |
[PowerPC][FP128] Fix the incorrect signature for math library call
The runtime library has two family library implementation for ppc_fp128 and fp128. For IBM Long double(ppc_fp128), it is suffixed w
[PowerPC][FP128] Fix the incorrect signature for math library call
The runtime library has two family library implementation for ppc_fp128 and fp128. For IBM Long double(ppc_fp128), it is suffixed with 'l', i.e(sqrtl). For IEEE Long double(fp128), it is suffixed with "ieee128" or "f128". We miss to map several libcall for IEEE Long double.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D91675
show more ...
|
Revision tags: llvmorg-11.0.1-rc1 |
|
#
c5978f42 |
| 21-Oct-2020 |
Tim Northover <t.p.northover@gmail.com> |
UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change tags each trap with an integer representing the kind of failure encountered, which can ai
UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change tags each trap with an integer representing the kind of failure encountered, which can aid in tracking down the root cause of the problem.
show more ...
|
#
78a57069 |
| 06-Dec-2020 |
Martin Storsjö <martin@martin.st> |
[CodeGen] Restore accessing __stack_chk_guard via a .refptr stub on mingw after 2518433f861fcb87
Add tests for this particular detail for x86 and arm (similar tests already existed for x86_64 and aa
[CodeGen] Restore accessing __stack_chk_guard via a .refptr stub on mingw after 2518433f861fcb87
Add tests for this particular detail for x86 and arm (similar tests already existed for x86_64 and aarch64).
The libssp implementation may be located in a separate DLL, and in those cases, the references need to be in a .refptr stub, to avoid needing to touch up code in the text section at runtime (which is supported but inefficient for x86, and unsupported for arm).
Differential Revision: https://reviews.llvm.org/D92738
show more ...
|
#
2518433f |
| 05-Dec-2020 |
Fangrui Song <i@maskray.me> |
Make __stack_chk_guard dso_local if Reloc::Static
This is currently implied by TargetMachine::shouldAssumeDSOLocal but will be changed in the future.
|
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3 |
|
#
f7bc7c29 |
| 03-Jul-2020 |
Hsiangkai Wang <kai.wang@sifive.com> |
[RISCV] Support Zfh half-precision floating-point extension.
Support "Zfh" extension according to https://github.com/riscv/riscv-isa-manual/blob/zfh/src/zfh.tex
Differential Revision: https://revie
[RISCV] Support Zfh half-precision floating-point extension.
Support "Zfh" extension according to https://github.com/riscv/riscv-isa-manual/blob/zfh/src/zfh.tex
Differential Revision: https://reviews.llvm.org/D90738
show more ...
|
#
4d7df43f |
| 19-Nov-2020 |
Pavel Iliin <Pavel.Iliin@arm.com> |
[AArch64] Out-of-line atomics (-moutline-atomics) implementation.
This patch implements out of line atomics for LSE deployment mechanism. Details how it works can be found in llvm/docs/Atomics.rst O
[AArch64] Out-of-line atomics (-moutline-atomics) implementation.
This patch implements out of line atomics for LSE deployment mechanism. Details how it works can be found in llvm/docs/Atomics.rst Options -moutline-atomics and -mno-outline-atomics to enable and disable it were added to clang driver. This is clang and llvm part of out-of-line atomics interface, library part is already supported by libgcc. Compiler-rt support is provided in separate patch.
Differential Revision: https://reviews.llvm.org/D91157
show more ...
|
#
80732011 |
| 18-Nov-2020 |
Adhemerval Zanella <adhemerval.zanella@linaro.org> |
[AArch64] Lower fptrunc/fpext from/to FP128t to/from FP16
The compiler-rt part which adds the emitted symbols is handled in a subsequent patch.
Differential Revision: https://reviews.llvm.org/D91731
|
#
c126eb75 |
| 04-Nov-2020 |
Cameron McInally <mcinally@cray.com> |
[SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.
Differential Revision: https:/
[SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.
Differential Revision: https://reviews.llvm.org/D90644
show more ...
|
#
dda1e74b |
| 30-Oct-2020 |
Cameron McInally <mcinally@cray.com> |
[Legalize] Add legalizations for VECREDUCE_SEQ_FADD
Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass.
Differential Revision: https://reviews.
[Legalize] Add legalizations for VECREDUCE_SEQ_FADD
Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass.
Differential Revision: https://reviews.llvm.org/D90247
show more ...
|
#
35a531fb |
| 29-Sep-2020 |
David Sherwood <david.sherwood@arm.com> |
[SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize comparison operators when in fact the
[SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize comparison operators when in fact the code was only ever expecting either scalar values or fixed width vectors. I've changed some of these places to use the equivalent scalar operator.
Differential Revision: https://reviews.llvm.org/D88482
show more ...
|
#
1687a8d8 |
| 13-Oct-2020 |
Craig Topper <craig.topper@intel.com> |
[X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization.
This passes existing X86 test but I'm not sure if it handles all type legalization case
[X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization.
This passes existing X86 test but I'm not sure if it handles all type legalization cases it needs to.
Alternative to D89200
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D89222
show more ...
|
#
c5ba0d33 |
| 23-Sep-2020 |
David Sherwood <david.sherwood@arm.com> |
[SVE] Make ElementCount and TypeSize use a new PolySize class
I have introduced a new template PolySize class, where the template parameter determines the type of quantity, i.e. for an element count
[SVE] Make ElementCount and TypeSize use a new PolySize class
I have introduced a new template PolySize class, where the template parameter determines the type of quantity, i.e. for an element count this is just an unsigned value. The ElementCount class is now just a simple derivation of PolySize<unsigned>, whereas TypeSize is more complicated because it still needs to contain the uint64_t cast operator, since there are still many places in the code that rely upon this implicit cast. As such the class also still needs some of it's own operators.
I've tried to minimise the amount of code in the base PolySize class, which led to a couple of changes:
1. In some places we were relying on '==' operator comparisons between ElementCounts and the scalar value 1. I didn't put this operator in the new PolySize class, and thought it was actually clearer to use the isScalar() function instead. 2. I removed the isByteSized function and replaced it with calls to isKnownMultipleOf(8).
I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so that it's more consistent with coefficientDivideBy.
Differential Revision: https://reviews.llvm.org/D88409
show more ...
|
#
b8ce6a67 |
| 01-Oct-2020 |
David Sherwood <david.sherwood@arm.com> |
[SVE][CodeGen] Add new EVT/MVT getFixedSizeInBits() functions
When we know that a particular type is always going to be fixed width we have so far been writing code like this:
getSizeInBits().get
[SVE][CodeGen] Add new EVT/MVT getFixedSizeInBits() functions
When we know that a particular type is always going to be fixed width we have so far been writing code like this:
getSizeInBits().getFixedSize()
Since we are doing this in quite a few places now it seems to make sense to add a new helper function that allows us to replace these calls with a single getFixedSizeInBits() call.
Differential Revision: https://reviews.llvm.org/D88649
show more ...
|
#
bafdd113 |
| 11-Sep-2020 |
David Sherwood <david.sherwood@arm.com> |
[SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy
After some recent upstream discussion we decided that it was best to avoid having the / operator for both ElementCount and
[SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy
After some recent upstream discussion we decided that it was best to avoid having the / operator for both ElementCount and TypeSize, since this could give the impression that these classes can be used in the same way as basic integer integer types. However, division for scalable types is a bit odd because we are only dividing the minimum quantity by a value, as opposed to something like:
(MinSize * Vscale) / SomeValue
This is why when performing division it's important the caller first establishes whether the operation makes sense, perhaps by calling isKnownMultipleOf() prior to division. The caller must now explictly call divideCoefficientBy() on the class to perform the operation.
Differential Revision: https://reviews.llvm.org/D87700
show more ...
|
#
e077367a |
| 18-Sep-2020 |
David Sherwood <david.sherwood@arm.com> |
[SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits
An existing function Type::getScalarSizeInBits returns a uint64_t instead of a TypeSize class because the cal
[SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits
An existing function Type::getScalarSizeInBits returns a uint64_t instead of a TypeSize class because the caller is requesting a scalar size, which cannot be scalable. This patch makes other similar functions requesting a scalar size consistent with that, thereby eliminating more than 1000 implicit TypeSize -> uint64_t casts.
Differential revision: https://reviews.llvm.org/D87889
show more ...
|
#
ad3d6f99 |
| 12-Sep-2020 |
Craig Topper <craig.topper@intel.com> |
[SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors.
Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop isn't natively supported by the tar
[SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors.
Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop isn't natively supported by the target, this leads to poor codegen due to the expansion of ctpop being more complex than what is needed for parity.
This adds a DAG combine to convert the pattern to ISD::PARITY before operation legalization. Type legalization is updated to handled Expanding and Promoting this operation. If after type legalization, CTPOP is supported for this type, LegalizeDAG will turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a series of shifts and xors followed by an AND with 1.
I've avoided vectors in this patch to avoid more legalization complexity for this patch.
X86 previously had a custom DAG combiner for this. This is now moved to Custom lowering for the new opcode. There is a minor regression in vector-reduce-xor-bool.ll, but a follow up patch can easily fix that.
Fixes PR47433
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87209
show more ...
|
#
f4257c58 |
| 14-Aug-2020 |
David Sherwood <david.sherwood@arm.com> |
[SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isS
[SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
show more ...
|
#
d870e363 |
| 27-Aug-2020 |
Brad Smith <brad@comstyle.com> |
[SSP] Restore setting the visibility of __guard_local to hidden for better code generation.
Patch by: Philip Guenther
|
#
a407ec9b |
| 19-Aug-2020 |
Mehdi Amini <joker.eph@gmail.com> |
Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private.""
Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.
|