History log of /llvm-project/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (Results 1 – 25 of 352)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init
# 416f1c46 20-Jan-2025 Mats Jun Larsen <mats@jun.codes>

[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617)

In accordance with https://github.com/llvm/llvm-project/issues/123569

In order to keep the patch at reasonable size, this

[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617)

In accordance with https://github.com/llvm/llvm-project/issues/123569

In order to keep the patch at reasonable size, this PR only covers for
the llvm subproject, unittests excluded.

show more ...


Revision tags: llvmorg-19.1.7
# f8d27047 18-Dec-2024 Vladi Krapp <vladi.krapp@arm.com>

[ARM] Reduce loop unroll when low overhead branching is available (#120065)

For processors with low overhead branching (LOB), runtime unrolling the
innermost loop is often detrimental to performanc

[ARM] Reduce loop unroll when low overhead branching is available (#120065)

For processors with low overhead branching (LOB), runtime unrolling the
innermost loop is often detrimental to performance. In these cases the
loop remainder gets unrolled into a series of compare-and-jump blocks,
which in deeply nested loops get executed multiple times, negating the
benefits of LOB.

This is particularly noticable when the loop trip count of the innermost
loop varies within the outer loop, such as in the case of triangular
matrix decompositions.

In these cases we will prefer to not unroll the innermost loop, with the
intention for it to be executed as a low overhead loop.

show more ...


Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4
# c3260c65 29-Oct-2024 Benjamin Maxwell <benjamin.maxwell@arm.com>

[IR] Add `llvm.sincos` intrinsic (#109825)

This adds the `llvm.sincos` intrinsic, legalization, and lowering.

The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine a

[IR] Add `llvm.sincos` intrinsic (#109825)

This adds the `llvm.sincos` intrinsic, legalization, and lowering.

The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).

```
declare { float, float } @llvm.sincos.f32(float %Val)
declare { double, double } @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val)
```

The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.

show more ...


Revision tags: llvmorg-19.1.3
# e37d736d 24-Oct-2024 Nashe Mncube <nashe.mncube@arm.com>

Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289)

This is a recommit of #107120 . The original PR was approved but failed
buildbot. The newly added tests should only be run fo

Recommit: [llvm][ARM][GlobalOpt]Add widen global arrays pass (#113289)

This is a recommit of #107120 . The original PR was approved but failed
buildbot. The newly added tests should only be run for compilers that
support the ARM target. This has been resolved by adding a config file
for these tests.

- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make ARM backend generate full word loads instead of
loading a single byte (ldrb) and/or half word (ldrh). Only pads
destination when it's a stack allocated constant size array and source
when it's constant string. Heuristic to decide whether to pad or not
is very basic and could be improved to allow more examples to be
padded.
- Pass works at the midend level

show more ...


# 370fd743 17-Oct-2024 Nashe Mncube <nashe.mncube@arm.com>

Revert "[llvm][ARM]Add widen global arrays pass" (#112701)

Reverts llvm/llvm-project#107120

Unexpected build failures in post-commit pipelines. Needs investigation


# ab90d279 17-Oct-2024 Nashe Mncube <nashe.mncube@arm.com>

[llvm][ARM]Add widen global arrays pass (#107120)

- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make backend generate full word loads instead of loading a
si

[llvm][ARM]Add widen global arrays pass (#107120)

- Pass optimizes memcpy's by padding out destinations and sources to a
full word to make backend generate full word loads instead of loading a
single byte (ldrb) and/or half word (ldrh). Only pads destination when
it's a stack allocated constant size array and source when it's constant
array. Heuristic to decide whether to pad or not is very basic and could
be improved to allow more examples to be padded.
- Pass works within GlobalOpt but is disabled by default on all targets
except ARM.

show more ...


Revision tags: llvmorg-19.1.2
# 853c43d0 09-Oct-2024 Jeffrey Byrnes <jeffrey.byrnes@amd.com>

[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)

Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code

[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)

Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.

show more ...


Revision tags: llvmorg-19.1.1
# d2885743 25-Sep-2024 Philip Reames <preames@rivosinc.com>

[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)

This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares

[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)

This follows in the spirit of 7d82c99403f615f6236334e698720bf979959704,
and extends the costing API for compares and selects to provide
information about the operands passed in an analogous manner. This
allows us to model the cost of materializing the vector constant, as
some select-of-constants are significantly more expensive than others
when you account for the cost of materializing the constants involved.

This is a stepping stone towards fixing
https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch
will be required to utilize the new API.

show more ...


Revision tags: llvmorg-19.1.0
# a7697c86 05-Sep-2024 Nikita Popov <npopov@redhat.com>

[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984)

These intrinsics currently assume natural alignment. Instead, respect
the alignment attribute on the intrinsic. Teach InstCom

[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984)

These intrinsics currently assume natural alignment. Instead, respect
the alignment attribute on the intrinsic. Teach InstCombine to improve
that alignment.

If desired I could also adjust the clang frontend to add alignment
annotations equivalent to the previous behavior, but I don't see any
indication that such an assumption is correct in the ARM intrinsics
docs.

Fixes https://github.com/llvm/llvm-project/issues/59081.

show more ...


Revision tags: llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3
# dcd246cb 05-Aug-2024 David Green <david.green@arm.com>

[ARM] Add scalar add_sat costs. (#100988)

These can usually generate:
- qadd / qsub for signed i32 scalars
- uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned
i8/i16
- Are

[ARM] Add scalar add_sat costs. (#100988)

These can usually generate:
- qadd / qsub for signed i32 scalars
- uqadd16 / qadd16 / uqsub16 / qsub16 with an extend for signed/unsigned
i8/i16
- Are expanded to an add + cmp + sel otherwise

This can lead to differences in unrolling etc, but should be a better
cost for the instructions.

show more ...


Revision tags: llvmorg-19.1.0-rc2
# ea7cc12f 28-Jul-2024 David Green <david.green@arm.com>

[ARM] Add fallback fptoi_sat costs.

This makes sure that the custom operations get a fallback cost, even if they
are not perfect.


Revision tags: llvmorg-19.1.0-rc1, llvmorg-20-init
# 11484cb8 18-Jun-2024 Nikita Popov <npopov@redhat.com>

[InstCombine] Pass SimplifyQuery to SimplifyDemandedBits()

This will enable calling SimplifyDemandedBits() with a SimplifyQuery
that has CondContext set in the future.

Additionally this also margin

[InstCombine] Pass SimplifyQuery to SimplifyDemandedBits()

This will enable calling SimplifyDemandedBits() with a SimplifyQuery
that has CondContext set in the future.

Additionally this also marginally strengthens the analysis by
retaining the original context instruction for one-use chains.

show more ...


# 34a2889e 19-Jun-2024 Andreas Jonson <andjo403@hotmail.com>

[InstCombine] Swap out range metadata to range attribute for arm_mve_pred_v2i (#94847)


Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6
# 2e8d8155 10-May-2024 Graham Hunter <graham.hunter@arm.com>

[TTI] Support scalable offsets in getScalingFactorCost (#88113)

Part of the work to support vscale-relative immediates in LSR.


Revision tags: llvmorg-18.1.5, llvmorg-18.1.4
# 4ac2721e 09-Apr-2024 David Green <david.green@arm.com>

[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)

This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as sto

[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)

This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.

show more ...


Revision tags: llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3
# 7bc079c8 12-Feb-2024 Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>

[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for

extract subvector.

Many targets do not have cost for extractsubvector shuffle kind, but
have the costs for single source

[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for

extract subvector.

Many targets do not have cost for extractsubvector shuffle kind, but
have the costs for single source permute. If there are no costs
estimation for extractsubvector, better to switchto single source
permute for better cost estimation.

Reviewers: RKSimon, davemgreen, arsenm

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/79837

show more ...


Revision tags: llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1
# 184ca395 25-Jan-2024 Nico Weber <thakis@chromium.org>

[llvm] Move CodeGenTypes library to its own directory (#79444)

Finally addresses https://reviews.llvm.org/D148769#4311232 :)

No behavior change.


Revision tags: llvmorg-19-init
# 586ecdf2 12-Dec-2023 Kazu Hirata <kazu@google.com>

[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)

This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::

[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956)

This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.

show more ...


Revision tags: llvmorg-17.0.6, llvmorg-17.0.5
# fee2953f 02-Nov-2023 David Green <david.green@arm.com>

[ARM] Fix for undef elements from demanded elements (#70504)

I think this is right, that the undef bits should be the undef bits from
the passthrough (operand 0), with the top/bottom lanes cleared,

[ARM] Fix for undef elements from demanded elements (#70504)

I think this is right, that the undef bits should be the undef bits from
the passthrough (operand 0), with the top/bottom lanes cleared, as they
come from the second arg (operand 1). We don't yet attempt to look for
undef elements in the second operand, but this should fix the bug with
all elements being marked as undef and the instruction being optimized
away.

show more ...


# 75b3c3d2 31-Oct-2023 David Green <david.green@arm.com>

[ARM] Disable UpperBound loop unrolling for MVE tail predicated loops. (#69709)

For MVE tail predicated loops, better code can be generated by keeping
the loop whole than to unroll to an upper boun

[ARM] Disable UpperBound loop unrolling for MVE tail predicated loops. (#69709)

For MVE tail predicated loops, better code can be generated by keeping
the loop whole than to unroll to an upper bound, which requires the
expansion of active lane masks that can be difficult to generate good
code for. This patch disables UpperBound unrolling when we find a
active_lane_mask in the loop.

show more ...


Revision tags: llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4
# 233fb987 04-Sep-2023 David Green <david.green@arm.com>

[ARM] Improve bitwise reduction costs

This adds some basic and/or/xor reduction costs for NEON/MVE, handling them
like other reductions where vector operations are used to reduce to legal
sizes, fol

[ARM] Improve bitwise reduction costs

This adds some basic and/or/xor reduction costs for NEON/MVE, handling them
like other reductions where vector operations are used to reduce to legal
sizes, followed by an optional VREV+VAND/VORR/VEOR step and scalarization from
there.

show more ...


# 4cef24a8 04-Sep-2023 David Green <david.green@arm.com>

[ARM] Improve reduction integer min/max costs

This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar
to the existing Add reduction costs. They follow the same style as Add
re

[ARM] Improve reduction integer min/max costs

This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar
to the existing Add reduction costs. They follow the same style as Add
reductions, but include a higher cost as the costs tend to be dependant on the
element size for vminv/vmaxv. These costs may not be precise, but will be more
inline than the default that extracts each element.

show more ...


# 2955cc15 04-Sep-2023 David Green <david.green@arm.com>

[ARM] Improve costs for FMin/Max reductions

Similar to the other reductions, this changes the cost of fmin/fmax reductions
under MVE/NEON to perform vector operations until the types need to be
scal

[ARM] Improve costs for FMin/Max reductions

Similar to the other reductions, this changes the cost of fmin/fmax reductions
under MVE/NEON to perform vector operations until the types need to be
scalarized. The fp16 vectors can perform a VREV+FMIN/FMAX to skip a step of the
reduction, and otherwise need lanewise extract fro the top lanes.

show more ...


# 4530f029 04-Sep-2023 David Green <david.green@arm.com>

[ARM] Improve reduction fadd/fmul costs

This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by
halving the vector size until it it gets scalarized, with some additional costs
for

[ARM] Improve reduction fadd/fmul costs

This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by
halving the vector size until it it gets scalarized, with some additional costs
for fp16 which may require extracting the top lanes.

Differential Revision: https://reviews.llvm.org/D159367

show more ...


Revision tags: llvmorg-17.0.0-rc3
# 9a207578 08-Aug-2023 Alexey Bataev <a.bataev@outlook.com>

[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask().

It improves shuffle instructions estimation and improves vectorization
outcome.

Differential Revision: https://reviews.llvm.org/D15

[TTI]Add InsertSubvector pattern in improveShuffleKindFromMask().

It improves shuffle instructions estimation and improves vectorization
outcome.

Differential Revision: https://reviews.llvm.org/D157425

show more ...


12345678910>>...15