History log of /llvm-project/llvm/test/Analysis/CostModel/X86/arith-overflow.ll (Results 26 – 50 of 53)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 3538ee76 26-Sep-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Improve AVX1/AVX2 v16i32->v16i16/v16i8 truncation costs (PR51972)

Based off worst case btver2 (AVX1) and haswell (AVX2) llvm-mca reports


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4
# c931d352 23-Sep-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Increase i64 mul cost from 1 to 2

Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also di

[CostModel][X86] Increase i64 mul cost from 1 to 2

Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization.

Noticed while investigating PR51436.

show more ...


Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# 96b4117d 12-Jul-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Adjust truncate SSE/AVX legalized costs based on llvm-mca reports.

Update truncation costs based on the worst case costs from the script in D103695.

Move to using legalized types w

[CostModel][X86] Adjust truncate SSE/AVX legalized costs based on llvm-mca reports.

Update truncation costs based on the worst case costs from the script in D103695.

Move to using legalized types wherever possible, which allows us to prune the cost tables.

show more ...


# 4c7e9a38 07-Jul-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Adjust sext/zext SSE/AVX legalized costs based on llvm-mca reports.

Update costs based on the worst case costs from the script in D103695.

Move to using legalized types wherever po

[CostModel][X86] Adjust sext/zext SSE/AVX legalized costs based on llvm-mca reports.

Update costs based on the worst case costs from the script in D103695.

Move to using legalized types wherever possible, which allows us to prune the cost tables.

show more ...


Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2
# 49d3a367 07-Jun-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Improve AVX1/AVX2 truncation costs

Based off the worse case numbers generated by D103695, we were overestimating the cost of a number of vector truncations:

AVX2: v2i32->v2i8, v2i6

[CostModel][X86] Improve AVX1/AVX2 truncation costs

Based off the worse case numbers generated by D103695, we were overestimating the cost of a number of vector truncations:

AVX2: v2i32->v2i8, v2i64->v2i16 + v4i64->v4i32
AVX1: v2i32->v2i8, v4i64->v4i16 + v16i16->v16i8

Once we have a working set of conversion costs, the intention is to cleanup the tables and use legalized types a lot more to reduce the number of entries we currently have.

show more ...


# 90d25808 27-May-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Improve accuracy of sext/zext to 256-bit vector costs on AVX1 targets

Determined from llvm-mca analysis (btver2 vs bdver2 vs sandybridge), the split+extends+concat sequence on AVX1

[CostModel][X86] Improve accuracy of sext/zext to 256-bit vector costs on AVX1 targets

Determined from llvm-mca analysis (btver2 vs bdver2 vs sandybridge), the split+extends+concat sequence on AVX1 capable targets are cheaper than the #ops that the cost was previously based on.

show more ...


Revision tags: llvmorg-12.0.1-rc1
# 243e5886 23-May-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Improve accuracy of vXi64 MUL costs on AVX2/AVX512 targets

By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the current

[CostModel][X86] Improve accuracy of vXi64 MUL costs on AVX2/AVX512 targets

By llvm-mca analysis, Haswell/Broadwell has the worst v4i64 recip-throughput cost of the AVX2 targets at 6 (vs the currently used cost of 8). Similarly SkylakeServer (our only AVX512 target model) implements PMULLQ with an average cost of 1.5 (rounded up to 2.0), and the PMULUDQ-sequence (without AVX512DQ) as a cost of 6.

show more ...


# e4ec5cc8 23-May-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Align v2i64 MUL costs on SSE42+ targets with worst case

Based on worst case of sandybridge (which seems to match nehalem for this SSE sequence) (vs btver2 + bdver2) llvm-mca analysis


# fc01b9bd 22-May-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Align v4i64 MUL costs on AVX1 targets with worst case

Based on worst case of sandybridge (vs btver2 + bdver2) llvm-mca analysis - which is a lot less than what we were predicting (I

[CostModel][X86] Align v4i64 MUL costs on AVX1 targets with worst case

Based on worst case of sandybridge (vs btver2 + bdver2) llvm-mca analysis - which is a lot less than what we were predicting (I think based off total uop count).

show more ...


# 9bd0dc83 22-May-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Improve v8i32 MUL costs on AVX1 targets to account for slower btver2

BTVER2 has a 2 cycle throughput for v4i32 multiplies (same as SSE41 targets), which is only partially hidden by

[CostModel][X86] Improve v8i32 MUL costs on AVX1 targets to account for slower btver2

BTVER2 has a 2 cycle throughput for v4i32 multiplies (same as SSE41 targets), which is only partially hidden by the subvector extracts/insert when splitting v8i32.

show more ...


Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1
# e11195d0 13-Nov-2020 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Remove unused CHECK prefixes

Allows us to remove the "CHECK: {{^}}" hack and help simplify D91275


# 3c050a59 03-Nov-2020 Sanjay Patel <spatel@rotateright.com>

[CostModel] fix cost calc bug for sadd/ssub with overflow

As noted in D90554, there's an opcode typo in using an easily
misused cost model API: getCmpSelInstrCost(). Beyond that, the
assumed sequenc

[CostModel] fix cost calc bug for sadd/ssub with overflow

As noted in D90554, there's an opcode typo in using an easily
misused cost model API: getCmpSelInstrCost(). Beyond that, the
assumed sequence of ops is questionable, but that would be
another patch.

My guess is that the x86 test diffs show that we are probably
wrong both before and after this change, so there will be no
practical difference.
As an example, I tried this test which shows a cost of '7'
either way:

define <4 x i32> @sadd(<4 x i32> %va, <4 x i32> %vb) {
%V4I32 = call {<4 x i32>, <4 x i1>} @llvm.sadd.with.overflow.v4i32(<4 x i32> %va, <4 x i32> %vb)
%ov = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 1
%r = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 0
%z = select <4 x i1> %ov, <4 x i32> <i32 42, i32 42, i32 42, i32 42>, <4 x i32> %r
ret <4 x i32> %z
}

$ llc -o - sadd.ll -mattr=avx
vpaddd %xmm1, %xmm0, %xmm2
vpcmpgtd %xmm2, %xmm0, %xmm0
vpxor %xmm0, %xmm1, %xmm0
vblendvps %xmm0, LCPI0_0(%rip), %xmm2, %xmm0a

Differential Revision: https://reviews.llvm.org/D90681

show more ...


# 7979f249 01-Nov-2020 Fangrui Song <i@maskray.me>

[test] Fix some unused check prefixes in test/Analysis/CostModel/X86


# 251dd7c0 30-Oct-2020 Sanjay Patel <spatel@rotateright.com>

[x86] add cost overrides for mul with overflow

I'm assuming the standard size integer instructions for this end up as something like:
mulq %rsi
seto %al

And the 'mul' generally has reciprocal throu

[x86] add cost overrides for mul with overflow

I'm assuming the standard size integer instructions for this end up as something like:
mulq %rsi
seto %al

And the 'mul' generally has reciprocal throughput of 1 on typical implementations
(higher latency, but that's not handled here).
The default costs may end up much higher than that, and that's what we see in the test diffs.

Vector types are left as a 'TODO'.

Differential Revision: https://reviews.llvm.org/D90431

show more ...


Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1
# e39c7ab2 02-May-2020 Craig Topper <craig.topper@intel.com>

[CostModel][X86][ARM] Teach default implementation of getCastInstrCost to not add a split/join cost if source type and the destination type both have a SplitVector action

If both the source and the

[CostModel][X86][ARM] Teach default implementation of getCastInstrCost to not add a split/join cost if source type and the destination type both have a SplitVector action

If both the source and the destination need to be split then the two halves of the split operation are completely independent and don't need to be split or joined. So we don't need to assess a cost for the split or join.

Differential Revision: https://reviews.llvm.org/D79111

show more ...


# b938168a 01-May-2020 Craig Topper <craig.topper@intel.com>

[X86] Lower the cost of v4i64->v4i32 truncate with avx512.

We use the vpmovqd instruction which is a single uop. So
the cost should be 1.


# cff66865 29-Apr-2020 Craig Topper <craig.topper@intel.com>

[X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX

We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these

For v4i64->

[X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX

We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these

For v4i64->v4i32 we generate:
vextractf128 xmm1, ymm0, 1
vshufps xmm0, xmm0, xmm1, 136 # xmm0 = xmm0[0,2],xmm1[0,2]

And for v8i64->v8i32 we generate:
vperm2f128 ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3]
vinsertf128 ymm0, ymm0, xmm1, 1
vshufps ymm0, ymm0, ymm2, 136 # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6]

Differential Revision: https://reviews.llvm.org/D79109

show more ...


# bdbbed11 27-Apr-2020 Craig Topper <craig.topper@intel.com>

[X86][CostModel] Update costs for vector truncate with avx512f/avx512bw.

All avx512 truncate instructions except vXi64->vXi32 are 2 uops
on port 5. So raise their costs to 2. Except when we have an

[X86][CostModel] Update costs for vector truncate with avx512f/avx512bw.

All avx512 truncate instructions except vXi64->vXi32 are 2 uops
on port 5. So raise their costs to 2. Except when we have an
earlier faster sequence like pshufb for 128 bit input vectors.

Add a lower cost of 3 v16i16->v16i8 with avx512f where we can
extend to v16i32 then truncate. And a cost of 2 for avx512bw with
and without avx512vl. There we can use vpmovwb with either a ymm
or zmm input. Both of these beat masking, splitting, and using
packuswb which is our avx/avx2 codegen.

show more ...


# 8dfb9627 15-Apr-2020 Craig Topper <craig.topper@intel.com>

[X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead.

This moves v32i16/v64i8 to a model consistent with how we
treat integer types with avx1.

This does change the ABI

[X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead.

This moves v32i16/v64i8 to a model consistent with how we
treat integer types with avx1.

This does change the ABI for types vXi16/vXi8 vectors larger than
512 bits to pass in multiple zmms instead of multiple ymms. We'd
already hacked some code to make v64i8/v32i16 pass in zmm.

Cost model is still a bit of a mess. In some place I tried to
match existing behavior. But really we need to account for
splitting and concating costs. Cost model for shuffles is
especially pessimistic.

Differential Revision: https://reviews.llvm.org/D76212

show more ...


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5
# b2da1dda 16-Mar-2020 Craig Topper <craig.topper@gmail.com>

[X86] Add a non-zero cost for truncating v32i16->v32i8 on avx512bw.


Revision tags: llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3
# eaa41e10 24-Feb-2020 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Try to check against common prefixes before using target-specific cpu checks

SLM/GLM is still a mess so not all of them have been updated yet.


Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1
# 35625464 29-Jan-2020 Craig Topper <craig.topper@intel.com>

[X86] Fix the cost model for v16i16->v16i32 zero_extend/sign_extend with AVX2

We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just o

[X86] Fix the cost model for v16i16->v16i32 zero_extend/sign_extend with AVX2

We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just one extract to split the 16i16 and two v8i16->v8i32 operations so our cost should be 3 not 4.

Differential Revision: https://reviews.llvm.org/D73646

show more ...


Revision tags: llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1
# d7f0207d 26-Sep-2019 Simon Pilgrim <llvm-dev@redking.me.uk>

[CostModel][X86] Fix SLM <2 x i64> icmp costs

SLM is 2 x slower for <2 x i64> comparison ops than other vector types, we should account for this like we do for SLM <2 x i64> add/sub/mul costs.

This

[CostModel][X86] Fix SLM <2 x i64> icmp costs

SLM is 2 x slower for <2 x i64> comparison ops than other vector types, we should account for this like we do for SLM <2 x i64> add/sub/mul costs.

This should remove some of the SLM codegen diffs in D43582

llvm-svn: 372954

show more ...


# 665ccbff 22-Sep-2019 Simon Pilgrim <llvm-dev@redking.me.uk>

[Cost][X86] Add v2i64 truncation costs

We are missing costs for a lot of truncation cases, I'm hoping to address all the 'zero cost' cases in trunc.ll

I thought this was a vector widening side effe

[Cost][X86] Add v2i64 truncation costs

We are missing costs for a lot of truncation cases, I'm hoping to address all the 'zero cost' cases in trunc.ll

I thought this was a vector widening side effect, but even before this we had some interesting LV decisions (notably over indvars) being made due to these zero costs.

llvm-svn: 372498

show more ...


Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3
# 42bf2dd6 25-Feb-2019 Simon Pilgrim <llvm-dev@redking.me.uk>

[TTI] Add generic cost model for smul/umul overflow intrinsics

Based off smul/umul fixed costs and the implementation in TargetLowering::expandMULO.

llvm-svn: 354784


123