History log of /llvm-project/llvm/test/CodeGen/X86/extract-concat.ll (Results 1 – 24 of 24)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-21-init, llvmorg-19.1.7
# be6c752e 12-Jan-2025 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86] X86FixupVectorConstantsPass - use VPMOVSX/ZX extensions for PS/PD domain moves (#122601)

For targets with free domain moves, or AVX512 support, allow the use of VPMOVSX/ZX extension loads to r

[X86] X86FixupVectorConstantsPass - use VPMOVSX/ZX extensions for PS/PD domain moves (#122601)

For targets with free domain moves, or AVX512 support, allow the use of VPMOVSX/ZX extension loads to reduce the load sizes.

I've limited this to extension to i32/i64 types as we're mostly interested in shuffle mask loading here, but we could include i16 types as well just as easily.

Inspired by a regression on #122485

show more ...


Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4
# 13c359aa 27-Feb-2024 Simon Pilgrim <RKSimon@users.noreply.github.com>

[X86] ReplaceNodeResults - truncate sub-128-bit vectors as shuffles directly (#83120)

We were scalarizing these truncations, but in most cases we can widen the source vector to 128-bits and perform

[X86] ReplaceNodeResults - truncate sub-128-bit vectors as shuffles directly (#83120)

We were scalarizing these truncations, but in most cases we can widen the source vector to 128-bits and perform the truncation as a shuffle directly (which will usually lower as a PACK or PSHUFB).

For the cases where the widening and shuffle isn't legal we can leave it to generic legalization to scalarize for us.

Fixes #81883

show more ...


Revision tags: llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2
# b5d35fea 02-Feb-2024 Simon Pilgrim <RKSimon@users.noreply.github.com>

[X86] X86FixupVectorConstants - load+sign-extend vector constants that can be stored in a truncated form (#79815)

Reduce the size of the vector constant by storing it in the constant pool in a trunc

[X86] X86FixupVectorConstants - load+sign-extend vector constants that can be stored in a truncated form (#79815)

Reduce the size of the vector constant by storing it in the constant pool in a truncated form, and sign-extend it as part of the load.

I've extended the existing FixupConstant functionality to support these sext constant rebuilds - we still select the smallest stored constant entry and prefer vzload/broadcast/vextload for same bitwidth to avoid domain flips.

I intend to add the matching load+zero-extend handling in a future PR, but that requires some alterations to the existing MC shuffle comments handling first.

show more ...


Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2
# 7f9b94c0 02-Aug-2023 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86] LowerBuildVectorv16i8 - attempt to merge lowest 2 x i16 insertions into a i32 MOVD scalar_to_vectpr

Similar to D156350, if we were going to create 2 x i16 insertions (MOVD+PINSRW), try to merg

[X86] LowerBuildVectorv16i8 - attempt to merge lowest 2 x i16 insertions into a i32 MOVD scalar_to_vectpr

Similar to D156350, if we were going to create 2 x i16 insertions (MOVD+PINSRW), try to merge them into a single MOVD to reduce the amount of GPR<->VEC traffic

show more ...


Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3
# e9f9467d 23-Apr-2023 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86] X86FixupInstTunings - add VPERMILPDri -> VSHUFPDrri mapping

Similar to the original VPERMILPSri -> VSHUFPSrri mapping added in D143787, replacing VPERMILPDri -> VSHUFPDrri should never be any

[X86] X86FixupInstTunings - add VPERMILPDri -> VSHUFPDrri mapping

Similar to the original VPERMILPSri -> VSHUFPSrri mapping added in D143787, replacing VPERMILPDri -> VSHUFPDrri should never be any slower and saves an encoding byte.

The sibling VPERMILPDmi -> VPSHUFDmi mapping is trickier as we need the same shuffle mask in every lane (and it needs to be adjusted) - I haven't attempted that yet but we can investigate it in the future if there's interest.

Fixes #61060

Differential Revision: https://reviews.llvm.org/D148999

show more ...


Revision tags: llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3
# 69a322fe 16-Feb-2023 Noah Goldstein <goldstein.w.n@gmail.com>

Add new pass `X86FixupInstTuning` for fixing up machine-instruction selection.

There are a variety of cases where we want more control over the exact
instruction emitted. This commit creates a new p

Add new pass `X86FixupInstTuning` for fixing up machine-instruction selection.

There are a variety of cases where we want more control over the exact
instruction emitted. This commit creates a new pass to fixup
instructions after the DAG has been lowered. The pass is only meant to
replace instructions that are guranteed to be interchangable, not to
do analysis for special cases.

Handling these instruction changes in in X86ISelLowering of
X86ISelDAGToDAG isn't ideal, as its liable to either break existing
patterns that expected a certain instruction or generate infinite
loops.

As well, operating as the MachineInstruction level allows us to access
scheduling/code size information for making the decisions.

Currently only implements `{v}permilps` -> `{v}shufps/{v}shufd` but
more transforms can be added.

Differential Revision: https://reviews.llvm.org/D143787

show more ...


Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6
# 2f448bf5 22-Jun-2022 Nikita Popov <npopov@redhat.com>

[X86] Migrate tests to use opaque pointers (NFC)

Test updates were performed using:
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

These are only the test updates where the test pas

[X86] Migrate tests to use opaque pointers (NFC)

Test updates were performed using:
https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34

These are only the test updates where the test passed without
further modification (which is almost all of them, as the backend
is largely pointer-type agnostic).

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init
# bd122f6d 22-Jan-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle vperm2x128(movddup(x),movddup(y)) cases

Fold vperm2x128(movddup(x),movddup(y)) -> movddup(vperm2x128(x,y))


# c33d36e0 22-Jan-2021 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle unary vperm2x128(permute/shift(x,c),undef) cases

Fold vperm2x128(permute/shift(x,c),undef) -> permute/shift(vperm2x128(x,undef),c)


Revision tags: llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1
# acbc5ede 26-Apr-2020 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86][SSE] getFauxShuffle - support insert(truncate/extend(extract(vec0,c0)),vec1,c1) shuffle patterns at the byte level

Followup to the PR45604 fix at rGe71dd7c011a3 where we disabled most of these

[X86][SSE] getFauxShuffle - support insert(truncate/extend(extract(vec0,c0)),vec1,c1) shuffle patterns at the byte level

Followup to the PR45604 fix at rGe71dd7c011a3 where we disabled most of these cases.

By creating the shuffle at the byte level we can handle any extension/truncation as long as we track how small the scalar got and assume that the upper bytes will need to be zero.

show more ...


# 757c7c24 23-Apr-2020 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86][SSE] Add SSE2 extract-concat tests

Check pre-SSE41 codegen where we have less PEXTR*/PINSR* instructions


# e71dd7c0 19-Apr-2020 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86][SSE] getFauxShuffle - don't combine shuffles with small truncated scalars (PR45604)

getFauxShuffle attempts to combine INSERT_VECTOR_ELT(TRUNCATE/EXTEND(EXTRACT_VECTOR_ELT(x))) patterns into a

[X86][SSE] getFauxShuffle - don't combine shuffles with small truncated scalars (PR45604)

getFauxShuffle attempts to combine INSERT_VECTOR_ELT(TRUNCATE/EXTEND(EXTRACT_VECTOR_ELT(x))) patterns into a target shuffle chain.

PR45604 identified an issue where the scalar was truncated to a size smaller than the destination vector element and then zero extended back, which requires the upper bits to be zero'd which we don't currently do.

To avoid the bug I've added an early out in these truncation cases, a future commit should allow us to handle this by inserting the necessary SM_SentinelZero padding.

show more ...


# e30d29eb 26-Mar-2020 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT())

As long we extract from a source vector with smaller elements and we zero-extend the eleme

[X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT())

As long we extract from a source vector with smaller elements and we zero-extend the element in the final shuffle mask then we can safely peek through truncations and any/zero-extensions to find the source extraction.

show more ...


Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5
# 05c0d349 13-Mar-2020 Simon Pilgrim <llvm-dev@redking.me.uk>

[X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)

If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implici

[X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)

If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implicit zero extension of PEXTRB.

The biggest perf diff is on SLM targets where MOVD (uops=1, lat=3 tp=1) is notably faster than PEXTRB (uops=2, lat=5, tp=4).

This matches what we already do for PEXTRW.

Differential Revision: https://reviews.llvm.org/D76138

show more ...


Revision tags: llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3
# e48b536b 16-Feb-2020 Sanjay Patel <spatel@rotateright.com>

[x86] form broadcast of scalar memop even with >1 use

The unseen logic diff occurs because MayFoldLoad() is defined like this:

static bool MayFoldLoad(SDValue Op) {
return Op.hasOneUse() && ISD::

[x86] form broadcast of scalar memop even with >1 use

The unseen logic diff occurs because MayFoldLoad() is defined like this:

static bool MayFoldLoad(SDValue Op) {
return Op.hasOneUse() && ISD::isNormalLoad(Op.getNode());
}

The test diffs here all seem ok to me on screen/paper, but it's hard to know
if that will lead to universally better perf for all targets. For example,
if a target implements broadcast from mem as multiple uops, we would have to
weigh the potential reduction of instructions and register pressure vs.
possible increase in number of uops. I don't know if we can make a truly
informed decision on this at compile-time.

The motivating case that I'm looking at in PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024
...resembles the diff in extract-concat.ll, but we're not going to change the
larger example there without at least 1 other fix.

Differential Revision: https://reviews.llvm.org/D74088

show more ...


Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init
# 31992a69 08-Jan-2020 Sanjay Patel <spatel@rotateright.com>

[x86] add test for concat-extract corner case; NFC

See D72361 for discussion.


# 6d52edeb 07-Jan-2020 Sanjay Patel <spatel@rotateright.com>

[x86] add tests for extract-of-concat; NFC


Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2
# 8b5f2ab2 07-Aug-2019 Craig Topper <craig.topper@intel.com>

Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."

The assert that caused this to be reverted should be fixed now.

Original commit message:

This patch chang

Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."

The assert that caused this to be reverted should be fixed now.

Original commit message:

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 368183

show more ...


# bd0d97e1 06-Aug-2019 Mitch Phillips <mitchphillips@outlook.com>

Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."

This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810.

This commit broke the MSan buildbots. See
https://revi

Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."

This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810.

This commit broke the MSan buildbots. See
https://reviews.llvm.org/rL367901 for more information.

llvm-svn: 368107

show more ...


# 3de33245 05-Aug-2019 Craig Topper <craig.topper@intel.com>

[X86] Enable -x86-experimental-vector-widening-legalization by default.

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from prom

[X86] Enable -x86-experimental-vector-widening-legalization by default.

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 367901

show more ...


Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3
# 17057b52 06-Nov-2018 Craig Topper <craig.topper@intel.com>

[X86] Autogenerate complete checks. NFC

llvm-svn: 346188


Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1, llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1, llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1, llvmorg-3.8.1, llvmorg-3.8.1-rc1, llvmorg-3.8.0, llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1, llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1, llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, studio-1.4, llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1, llvmorg-3.6.2, llvmorg-3.6.2-rc1, llvmorg-3.6.1, llvmorg-3.6.1-rc1, llvmorg-3.5.2, llvmorg-3.5.2-rc1, llvmorg-3.6.0, llvmorg-3.6.0-rc4, llvmorg-3.6.0-rc3, llvmorg-3.6.0-rc2, llvmorg-3.6.0-rc1, llvmorg-3.5.1, llvmorg-3.5.1-rc2, llvmorg-3.5.1-rc1, llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2, llvmorg-3.5.0-rc1, llvmorg-3.4.2, llvmorg-3.4.2-rc1, llvmorg-3.4.1, llvmorg-3.4.1-rc2, llvmorg-3.4.1-rc1, llvmorg-3.4.0, llvmorg-3.4.0-rc3, llvmorg-3.4.0-rc2, llvmorg-3.4.0-rc1, llvmorg-3.3.1-rc1, llvmorg-3.3.0, llvmorg-3.3.0-rc3, llvmorg-3.3.0-rc2, llvmorg-3.3.0-rc1, llvmorg-3.2.0, llvmorg-3.2.0-rc3, llvmorg-3.2.0-rc2, llvmorg-3.2.0-rc1
# 3ac8201e 17-Oct-2012 Michael Liao <michael.liao@intel.com>

Revert part of r166049 back and enable test case in r166125.

- Folding (trunc (concat ... X )) to (concat ... (trunc X) ...) is valid
when '...' are all 'undef's.
- r166125 relies on this transfor

Revert part of r166049 back and enable test case in r166125.

- Folding (trunc (concat ... X )) to (concat ... (trunc X) ...) is valid
when '...' are all 'undef's.
- r166125 relies on this transformation.

llvm-svn: 166155

show more ...


# f630f4ca 17-Oct-2012 Michael Liao <michael.liao@intel.com>

Disable extract-concat test case temporarily

llvm-svn: 166141


# 7a442c80 17-Oct-2012 Michael Liao <michael.liao@intel.com>

Teach DAG combine to fold (extract_subvec (concat v1, ..) i) to v_i

- If the extracted vector has the same type of all vectored being concatenated
together, it should be simplified directly into v

Teach DAG combine to fold (extract_subvec (concat v1, ..) i) to v_i

- If the extracted vector has the same type of all vectored being concatenated
together, it should be simplified directly into v_i, where i is the index of
the element being extracted.

llvm-svn: 166125

show more ...