Revision tags: llvmorg-21-init |
|
#
416f1c46 |
| 20-Jan-2025 |
Mats Jun Larsen <mats@jun.codes> |
[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617)
In accordance with https://github.com/llvm/llvm-project/issues/123569
In order to keep the patch at reasonable size, this
[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617)
In accordance with https://github.com/llvm/llvm-project/issues/123569
In order to keep the patch at reasonable size, this PR only covers for the llvm subproject, unittests excluded.
show more ...
|
Revision tags: llvmorg-19.1.7 |
|
#
f8d27047 |
| 18-Dec-2024 |
Vladi Krapp <vladi.krapp@arm.com> |
[ARM] Reduce loop unroll when low overhead branching is available (#120065)
For processors with low overhead branching (LOB), runtime unrolling the
innermost loop is often detrimental to performanc
[ARM] Reduce loop unroll when low overhead branching is available (#120065)
For processors with low overhead branching (LOB), runtime unrolling the
innermost loop is often detrimental to performance. In these cases the
loop remainder gets unrolled into a series of compare-and-jump blocks,
which in deeply nested loops get executed multiple times, negating the
benefits of LOB.
This is particularly noticable when the loop trip count of the innermost
loop varies within the outer loop, such as in the case of triangular
matrix decompositions.
In these cases we will prefer to not unroll the innermost loop, with the
intention for it to be executed as a low overhead loop.
show more ...
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4 |
|
#
c3260c65 |
| 29-Oct-2024 |
Benjamin Maxwell <benjamin.maxwell@arm.com> |
[IR] Add `llvm.sincos` intrinsic (#109825)
This adds the `llvm.sincos` intrinsic, legalization, and lowering.
The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine a
[IR] Add `llvm.sincos` intrinsic (#109825)
This adds the `llvm.sincos` intrinsic, legalization, and lowering.
The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).
```
declare { float, float } @llvm.sincos.f32(float %Val)
declare { double, double } @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val)
```
The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
show more ...
|
Revision tags: llvmorg-19.1.3 |
|
#
e37d736d |
| 24-Oct-2024 |
Nashe Mncube <nashe.mncube@arm.com> |
Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289)
This is a recommit of #107120 . The original PR was approved but failed
buildbot. The newly added tests should only be run fo
Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289)
This is a recommit of #107120 . The original PR was approved but failed
buildbot. The newly added tests should only be run for compilers that
support the ARM target. This has been resolved by adding a config file
for these tests.
- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make ARM backend generate full word loads instead of
loading a single byte (ldrb) and/or half word (ldrh). Only pads
destination when it's a stack allocated constant size array and source
when it's constant string. Heuristic to decide whether to pad or not
is very basic and could be improved to allow more examples to be
padded.
- Pass works at the midend level
show more ...
|
#
370fd743 |
| 17-Oct-2024 |
Nashe Mncube <nashe.mncube@arm.com> |
Revert "[llvm][ARM]Add widen global arrays pass" (#112701)
Reverts llvm/llvm-project#107120
Unexpected build failures in post-commit pipelines. Needs investigation
|
#
ab90d279 |
| 17-Oct-2024 |
Nashe Mncube <nashe.mncube@arm.com> |
[llvm][ARM]Add widen global arrays pass (#107120)
- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make backend generate full word loads instead of loading a
si
[llvm][ARM]Add widen global arrays pass (#107120)
- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make backend generate full word loads instead of loading a
single byte (ldrb) and/or half word (ldrh). Only pads destination when
it's a stack allocated constant size array and source when it's constant
array. Heuristic to decide whether to pad or not is very basic and could
be improved to allow more examples to be padded.
- Pass works within GlobalOpt but is disabled by default on all targets
except ARM.
show more ...
|
Revision tags: llvmorg-19.1.2 |
|
#
853c43d0 |
| 09-Oct-2024 |
Jeffrey Byrnes <jeffrey.byrnes@amd.com> |
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
show more ...
|
Revision tags: llvmorg-19.1.1 |
|
#
d2885743 |
| 25-Sep-2024 |
Philip Reames <preames@rivosinc.com> |
[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)
This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares
[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)
This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares and selects to provide
information about the operands passed in an analogous manner. This
allows us to model the cost of materializing the vector constant, as
some select-of-constants are significantly more expensive than others
when you account for the cost of materializing the constants involved.
This is a stepping stone towards fixing
https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch
will be required to utilize the new API.
show more ...
|
Revision tags: llvmorg-19.1.0 |
|
#
a7697c86 |
| 05-Sep-2024 |
Nikita Popov <npopov@redhat.com> |
[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984)
These intrinsics currently assume natural alignment. Instead, respect
the alignment attribute on the intrinsic. Teach InstCom
[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984)
These intrinsics currently assume natural alignment. Instead, respect
the alignment attribute on the intrinsic. Teach InstCombine to improve
that alignment.
If desired I could also adjust the clang frontend to add alignment
annotations equivalent to the previous behavior, but I don't see any
indication that such an assumption is correct in the ARM intrinsics
docs.
Fixes https://github.com/llvm/llvm-project/issues/59081.
show more ...
|
Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3 |
|
#
dcd246cb |
| 05-Aug-2024 |
David Green <david.green@arm.com> |
[ARM] Add scalar add_sat costs. (#100988)
These can usually generate:
- qadd / qsub for signed i32 scalars
- uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned
i8/i16
- Are
[ARM] Add scalar add_sat costs. (#100988)
These can usually generate:
- qadd / qsub for signed i32 scalars
- uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned
i8/i16
- Are expanded to an add + cmp + sel otherwise
This can lead to differences in unrolling etc, but should be a better
cost for the instructions.
show more ...
|
Revision tags: llvmorg-19.1.0-rc2 |
|
#
ea7cc12f |
| 28-Jul-2024 |
David Green <david.green@arm.com> |
[ARM] Add fallback fptoi_sat costs.
This makes sure that the custom operations get a fallback cost, even if they are not perfect.
|
Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
11484cb8 |
| 18-Jun-2024 |
Nikita Popov <npopov@redhat.com> |
[InstCombine] Pass SimplifyQuery to SimplifyDemandedBits()
This will enable calling SimplifyDemandedBits() with a SimplifyQuery that has CondContext set in the future.
Additionally this also margin
[InstCombine] Pass SimplifyQuery to SimplifyDemandedBits()
This will enable calling SimplifyDemandedBits() with a SimplifyQuery that has CondContext set in the future.
Additionally this also marginally strengthens the analysis by retaining the original context instruction for one-use chains.
show more ...
|
#
34a2889e |
| 19-Jun-2024 |
Andreas Jonson <andjo403@hotmail.com> |
[InstCombine] Swap out range metadata to range attribute for arm_mve_pred_v2i (#94847)
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6 |
|
#
2e8d8155 |
| 10-May-2024 |
Graham Hunter <graham.hunter@arm.com> |
[TTI] Support scalable offsets in getScalingFactorCost (#88113)
Part of the work to support vscale-relative immediates in LSR.
|
Revision tags: llvmorg-18.1.5, llvmorg-18.1.4 |
|
#
4ac2721e |
| 09-Apr-2024 |
David Green <david.green@arm.com> |
[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as sto
[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.
It should help fix some of the regressions from #87510.
show more ...
|
Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3 |
|
#
7bc079c8 |
| 12-Feb-2024 |
Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com> |
[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for
extract subvector.
Many targets do not have cost for extractsubvector shuffle kind, but have the costs for single source
[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for
extract subvector.
Many targets do not have cost for extractsubvector shuffle kind, but have the costs for single source permute. If there are no costs estimation for extractsubvector, better to switchto single source permute for better cost estimation.
Reviewers: RKSimon, davemgreen, arsenm
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/79837
show more ...
|
Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1 |
|
#
184ca395 |
| 25-Jan-2024 |
Nico Weber <thakis@chromium.org> |
[llvm] Move CodeGenTypes library to its own directory (#79444)
Finally addresses https://reviews.llvm.org/D148769#4311232 :)
No behavior change.
|
Revision tags: llvmorg-19-init |
|
#
586ecdf2 |
| 12-Dec-2023 |
Kazu Hirata <kazu@google.com> |
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::
[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
show more ...
|
Revision tags: llvmorg-17.0.6, llvmorg-17.0.5 |
|
#
fee2953f |
| 02-Nov-2023 |
David Green <david.green@arm.com> |
[ARM] Fix for undef elements from demanded elements (#70504)
I think this is right, that the undef bits should be the undef bits from
the passthrough (operand 0), with the top/bottom lanes cleared,
[ARM] Fix for undef elements from demanded elements (#70504)
I think this is right, that the undef bits should be the undef bits from
the passthrough (operand 0), with the top/bottom lanes cleared, as they
come from the second arg (operand 1). We don't yet attempt to look for
undef elements in the second operand, but this should fix the bug with
all elements being marked as undef and the instruction being optimized
away.
show more ...
|
#
75b3c3d2 |
| 31-Oct-2023 |
David Green <david.green@arm.com> |
[ARM] Disable UpperBound loop unrolling for MVE tail predicated loops. (#69709)
For MVE tail predicated loops, better code can be generated by keeping
the loop whole than to unroll to an upper boun
[ARM] Disable UpperBound loop unrolling for MVE tail predicated loops. (#69709)
For MVE tail predicated loops, better code can be generated by keeping
the loop whole than to unroll to an upper bound, which requires the
expansion of active lane masks that can be difficult to generate good
code for. This patch disables UpperBound unrolling when we find a
active_lane_mask in the loop.
show more ...
|
Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4 |
|
#
233fb987 |
| 04-Sep-2023 |
David Green <david.green@arm.com> |
[ARM] Improve bitwise reduction costs
This adds some basic and/or/xor reduction costs for NEON/MVE, handling them like other reductions where vector operations are used to reduce to legal sizes, fol
[ARM] Improve bitwise reduction costs
This adds some basic and/or/xor reduction costs for NEON/MVE, handling them like other reductions where vector operations are used to reduce to legal sizes, followed by an optional VREV+VAND/VORR/VEOR step and scalarization from there.
show more ...
|
#
4cef24a8 |
| 04-Sep-2023 |
David Green <david.green@arm.com> |
[ARM] Improve reduction integer min/max costs
This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar to the existing Add reduction costs. They follow the same style as Add re
[ARM] Improve reduction integer min/max costs
This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar to the existing Add reduction costs. They follow the same style as Add reductions, but include a higher cost as the costs tend to be dependant on the element size for vminv/vmaxv. These costs may not be precise, but will be more inline than the default that extracts each element.
show more ...
|
#
2955cc15 |
| 04-Sep-2023 |
David Green <david.green@arm.com> |
[ARM] Improve costs for FMin/Max reductions
Similar to the other reductions, this changes the cost of fmin/fmax reductions under MVE/NEON to perform vector operations until the types need to be scal
[ARM] Improve costs for FMin/Max reductions
Similar to the other reductions, this changes the cost of fmin/fmax reductions under MVE/NEON to perform vector operations until the types need to be scalarized. The fp16 vectors can perform a VREV+FMIN/FMAX to skip a step of the reduction, and otherwise need lanewise extract fro the top lanes.
show more ...
|
#
4530f029 |
| 04-Sep-2023 |
David Green <david.green@arm.com> |
[ARM] Improve reduction fadd/fmul costs
This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by halving the vector size until it it gets scalarized, with some additional costs for
[ARM] Improve reduction fadd/fmul costs
This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by halving the vector size until it it gets scalarized, with some additional costs for fp16 which may require extracting the top lanes.
Differential Revision: https://reviews.llvm.org/D159367
show more ...
|
Revision tags: llvmorg-17.0.0-rc3 |
|
#
9a207578 |
| 08-Aug-2023 |
Alexey Bataev <a.bataev@outlook.com> |
[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask().
It improves shuffle instructions estimation and improves vectorization outcome.
Differential Revision: https://reviews.llvm.org/D15
[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask().
It improves shuffle instructions estimation and improves vectorization outcome.
Differential Revision: https://reviews.llvm.org/D157425
show more ...
|