Revision tags: llvmorg-21-init |
|
#
754ed95b |
| 20-Jan-2025 |
yingopq <115543042+yingopq@users.noreply.github.com> |
[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525)
…on returning { i8, i128 }
Fixes https://github.com/llvm/llvm-project/issues/96432.
|
Revision tags: llvmorg-19.1.7 |
|
#
9ae92d70 |
| 21-Dec-2024 |
Sergei Barannikov <barannikov88@gmail.com> |
[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969)
With this change, targets are no longer required to put memory / strict-fp opcodes after special `ISD::FIRST_TARGET_
[SelectionDAG] Virtualize isTargetStrictFPOpcode / isTargetMemoryOpcode (#119969)
With this change, targets are no longer required to put memory / strict-fp opcodes after special `ISD::FIRST_TARGET_MEMORY_OPCODE`/`ISD::FIRST_TARGET_STRICTFP_OPCODE` markers. This will also allow autogenerating `isTargetMemoryOpcode`/`isTargetStrictFPOpcode (#119709).
Pull Request: https://github.com/llvm/llvm-project/pull/119969
show more ...
|
Revision tags: llvmorg-19.1.6 |
|
#
8630a7ba |
| 09-Dec-2024 |
David Sherwood <david.sherwood@arm.com> |
Reapply "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566)" (#118823)
[Reverts d57892a2a153ab71a796f07e39d939eae6910c21]
For IR like this:
%icmp = icmp ult <4 x i
Reapply "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566)" (#118823)
[Reverts d57892a2a153ab71a796f07e39d939eae6910c21]
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
---------
Co-authored-by: Paul Walker <paul.walker@arm.com>
show more ...
|
#
d57892a2 |
| 04-Dec-2024 |
Vitaly Buka <vitalybuka@google.com> |
Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc" (#118693)
Reverts llvm/llvm-project#117566
Breaks libc++ tests with HWASAN
https://lab.llvm.org/buildbot/#/builders/
Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc" (#118693)
Reverts llvm/llvm-project#117566
Breaks libc++ tests with HWASAN
https://lab.llvm.org/buildbot/#/builders/55/builds/3959
show more ...
|
#
4675db5f |
| 04-Dec-2024 |
David Sherwood <david.sherwood@arm.com> |
[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566)
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where
[DAGCombiner] Add support for scalarising extracts of a vector setcc (#117566)
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
show more ...
|
Revision tags: llvmorg-19.1.5 |
|
#
c3536b26 |
| 03-Dec-2024 |
Dan Gohman <dev@sunfishcode.online> |
[WebAssembly] Define call-indirect-overlong and bulk-memory-opt features (#117087)
This defines some new target features. These are subsets of existing
features that reflect implementation concerns
[WebAssembly] Define call-indirect-overlong and bulk-memory-opt features (#117087)
This defines some new target features. These are subsets of existing
features that reflect implementation concerns:
- "call-indirect-overlong" - implied by "reference-types"; just the
overlong encoding for the `call_indirect` immediate, and not the actual
reference types.
- "bulk-memory-opt" - implied by "bulk-memory": just `memory.copy` and
`memory.fill`, and not the other instructions in the bulk-memory
proposal.
This is split out from https://github.com/llvm/llvm-project/pull/112035.
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
show more ...
|
#
ea58410d |
| 27-Nov-2024 |
Sam Clegg <sbc@chromium.org> |
[WebAssembly] Implement %llvm.thread.pointer intrinsic (#117817)
We can simply use the `__tls_base` global for this which is guaranteed
to be non-zero and unique per thread.
Fixes: #117433
|
#
9b76e7fc |
| 25-Nov-2024 |
David Sherwood <david.sherwood@arm.com> |
Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)" (#117556)
This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.
|
#
22ec44f5 |
| 25-Nov-2024 |
David Sherwood <david.sherwood@arm.com> |
[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
w
[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)
For IR like this:
%icmp = icmp ult <4 x i32> %a, splat (i32 5)
%res = extractelement <4 x i1> %icmp, i32 1
where there is only one use of %icmp we can take a similar approach
to what we already do for binary ops such add, sub, etc. and convert
this into
%ext = extractelement <4 x i32> %a, i32 1
%res = icmp ult i32 %ext, 5
For AArch64 targets at least the scalar boolean result will almost
certainly need to be in a GPR anyway, since it will probably be
used by branches for control flow. I've tried to reuse existing code
in scalarizeExtractedBinop to also work for setcc.
NOTE: The optimisations don't apply for tests such as
extract_icmp_v4i32_splat_rhs in the file
CodeGen/AArch64/extract-vector-cmp.ll
because scalarizeExtractedBinOp only works if one of the input
operands is a constant.
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
43570a28 |
| 15-Nov-2024 |
Kazu Hirata <kazu@google.com> |
[WebAssembly] Remove unused includes (NFC) (#116318)
Identified with misc-include-cleaner.
|
Revision tags: llvmorg-19.1.3 |
|
#
11844584 |
| 24-Oct-2024 |
Dan Gohman <dev@sunfishcode.online> |
[WebAssembly] Protect memory.fill and memory.copy from zero-length ranges. (#112617)
WebAssembly's `memory.fill` and `memory.copy` instructions trap if the
pointers are out of bounds, even if the l
[WebAssembly] Protect memory.fill and memory.copy from zero-length ranges. (#112617)
WebAssembly's `memory.fill` and `memory.copy` instructions trap if the
pointers are out of bounds, even if the length is zero. This is
different from LLVM, which expects that it can call `memcpy` on
arbitrary invalid pointers if the length is zero. To avoid spurious
traps, branch around `memory.fill` and `memory.copy` when the length is
zero.
---------
Co-authored-by: Heejin Ahn <aheejin@gmail.com>
show more ...
|
#
33363521 |
| 23-Oct-2024 |
Jordan Rupprecht <rupprecht@google.com> |
[NFC][WebAssembly] Inline var only used in assertion (#113507)
|
#
c2293b33 |
| 23-Oct-2024 |
Alex Crichton <alex@alexcrichton.com> |
[WebAssembly] Implement the wide-arithmetic proposal (#111598)
This commit implements the [wide-arithmetic] proposal which has recently
reached phase 2 in the WebAssembly proposals process. The goa
[WebAssembly] Implement the wide-arithmetic proposal (#111598)
This commit implements the [wide-arithmetic] proposal which has recently
reached phase 2 in the WebAssembly proposals process. The goal here is
to implement support in LLVM for emitting these instructions which are
gated behind a new feature flag by default. A new `wide-arithmetic`
feature flag is introduced which gates these four new instructions from
being emitted.
Emission of each instruction itself is relatively simple given LLVM's
preexisting lowering rules and infrastructure. The main gotcha is that
due to the multi-result nature of all of these instructions it needed
the lowerings to be implemented in C++ rather than in TableGen.
[wide-arithmetic]: https://github.com/WebAssembly/wide-arithmetic
show more ...
|
Revision tags: llvmorg-19.1.2 |
|
#
853c43d0 |
| 09-Oct-2024 |
Jeffrey Byrnes <jeffrey.byrnes@amd.com> |
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
show more ...
|
Revision tags: llvmorg-19.1.1 |
|
#
f8f0a266 |
| 22-Sep-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[clang][wasm] Replace the target integer sub saturate intrinsics with the equivalent generic `__builtin_elementwise_sub_sat` intrinsics (#109405)
Remove the Intrinsic::wasm_sub_sat_signed/wasm_sub_s
[clang][wasm] Replace the target integer sub saturate intrinsics with the equivalent generic `__builtin_elementwise_sub_sat` intrinsics (#109405)
Remove the Intrinsic::wasm_sub_sat_signed/wasm_sub_sat_unsigned entries
and just use sub_sat_s/sub_sat_u directly
show more ...
|
Revision tags: llvmorg-19.1.0 |
|
#
c076638c |
| 11-Sep-2024 |
Brendan Dahl <brendan.dahl@gmail.com> |
[WebAssembly] Support BUILD_VECTOR with F16x8. (#108117)
Convert BUILD_VECTORS with FP16x8 to I16x8 since there's no FP16 scalar
value to intialize v128.const.
|
#
415288a2 |
| 11-Sep-2024 |
Brendan Dahl <brendan.dahl@gmail.com> |
[WebAssembly] Add load and store patterns for V8F16. (#108119)
|
Revision tags: llvmorg-19.1.0-rc4 |
|
#
5703d857 |
| 30-Aug-2024 |
Brendan Dahl <brendan.dahl@gmail.com> |
[WebAssembly] Add intrinsics to wasm_simd128.h for all FP16 instructions (#106465)
Getting this to work required a few additional changes:
- Add builtins for any instructions that can't be done wit
[WebAssembly] Add intrinsics to wasm_simd128.h for all FP16 instructions (#106465)
Getting this to work required a few additional changes:
- Add builtins for any instructions that can't be done with plain C
currently.
- Add support for the saturating version of fp_to_<s,i>_I16x8. Other
vector sizes supported this already.
- Support bitcast of f16x8 to v128. Needed to return a __f16x8 as
v128_t.
show more ...
|
#
4d7a0aba |
| 27-Aug-2024 |
Sergei Barannikov <barannikov88@gmail.com> |
[DataLayout] Change return type of `getStackAlignment` to `MaybeAlign` (#105478)
Currently, `getStackAlignment` asserts if the stack alignment wasn't
specified. This makes it inconvenient to use an
[DataLayout] Change return type of `getStackAlignment` to `MaybeAlign` (#105478)
Currently, `getStackAlignment` asserts if the stack alignment wasn't
specified. This makes it inconvenient to use and complicates testing.
This change also makes `exceedsNaturalStackAlignment` method redundant.
show more ...
|
#
7d373cef |
| 22-Aug-2024 |
Brendan Dahl <brendan.dahl@gmail.com> |
[WebAssembly] Change half-precision feature name to fp16. (#105434)
This better aligns with how the feature is being referred to and what
runtimes (V8) are calling it.
|
Revision tags: llvmorg-19.1.0-rc3 |
|
#
76c45295 |
| 05-Aug-2024 |
Sam Parker <sam.parker@arm.com> |
[WebAssembly] Fix assertion in LowerBUILD_VECTOR (#101961)
The assertion was failing in the case where we were trying to lower to
loadxx_zero, but lane zero was undef.
|
Revision tags: llvmorg-19.1.0-rc2 |
|
#
08decd20 |
| 02-Aug-2024 |
Sam Parker <sam.parker@arm.com> |
[WebAssembly] load_zero to initialise build_vector (#100610)
Instead of splatting a single lane, to initialise a build_vector, lower
to scalar_to_vector which can be selected to load_zero.
Also
[WebAssembly] load_zero to initialise build_vector (#100610)
Instead of splatting a single lane, to initialise a build_vector, lower
to scalar_to_vector which can be selected to load_zero.
Also add load_zero and load_lane patterns for f32x4 and f64x2.
show more ...
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
f270a4dd |
| 17-Jul-2024 |
Amara Emerson <amara@apple.com> |
[AArch64] Don't tail call memset if it would convert to a bzero. (#98969)
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore w
[AArch64] Don't tail call memset if it would convert to a bzero. (#98969)
Well, not quite that simple. We can tc memset since it returns the first
argument but bzero doesn't do that and therefore we can end up
miscompiling.
This patch also refactors the logic out of isInTailCallPosition() into the callers.
As a result memcpy and memmove are also modified to do the same thing
for consistency.
rdar://131419786
show more ...
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7 |
|
#
05e6bb40 |
| 30-May-2024 |
Roger Ferrer Ibáñez <rofirrim@gmail.com> |
[SelectionDAG] Add an ISD::CLEAR_CACHE node to lower llvm.clear_cache (#93795)
The current way of lowering `llvm.clear_cache` is a bit unusual. As
suggested by Matt Arsenault we are better off usin
[SelectionDAG] Add an ISD::CLEAR_CACHE node to lower llvm.clear_cache (#93795)
The current way of lowering `llvm.clear_cache` is a bit unusual. As
suggested by Matt Arsenault we are better off using an ISD node.
This change introduces a new `ISD::CLEAR_CACHE`, registers a new libcall
by default named `__clear_cache` and the default legalisation is a
libcall.
This is preparatory work for a custom lowering of `ISD::CLEAR_CACHE`
needed by RISC-V on some platforms.
show more ...
|
#
60bce6ea |
| 28-May-2024 |
Brendan Dahl <brendan.dahl@gmail.com> |
[WebAssembly] Implement all f16x8 binary instructions. (#93360)
This reuses most of the code that was created for f32x4 and f64x2 binary
instructions and tries to follow how they were implemented.
[WebAssembly] Implement all f16x8 binary instructions. (#93360)
This reuses most of the code that was created for f32x4 and f64x2 binary
instructions and tries to follow how they were implemented.
add/sub/mul/div - use regular LL instructions
min/max - use the minimum/maximum intrinsic, and also have builtins
pmin/pmax - use the wasm.pmax/pmin intrinsics and also have builtins
Specified at:
https://github.com/WebAssembly/half-precision/blob/29a9b9462c9285d4ccc1a5dc39214ddfd1892658/proposals/half-precision/Overview.md
show more ...
|