Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
be6c752e |
| 12-Jan-2025 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86] X86FixupVectorConstantsPass - use VPMOVSX/ZX extensions for PS/PD domain moves (#122601)
For targets with free domain moves, or AVX512 support, allow the use of VPMOVSX/ZX extension loads to r
[X86] X86FixupVectorConstantsPass - use VPMOVSX/ZX extensions for PS/PD domain moves (#122601)
For targets with free domain moves, or AVX512 support, allow the use of VPMOVSX/ZX extension loads to reduce the load sizes.
I've limited this to extension to i32/i64 types as we're mostly interested in shuffle mask loading here, but we could include i16 types as well just as easily.
Inspired by a regression on #122485
show more ...
|
Revision tags: llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4 |
|
#
13c359aa |
| 27-Feb-2024 |
Simon Pilgrim <RKSimon@users.noreply.github.com> |
[X86] ReplaceNodeResults - truncate sub-128-bit vectors as shuffles directly (#83120)
We were scalarizing these truncations, but in most cases we can widen the source vector to 128-bits and perform
[X86] ReplaceNodeResults - truncate sub-128-bit vectors as shuffles directly (#83120)
We were scalarizing these truncations, but in most cases we can widen the source vector to 128-bits and perform the truncation as a shuffle directly (which will usually lower as a PACK or PSHUFB).
For the cases where the widening and shuffle isn't legal we can leave it to generic legalization to scalarize for us.
Fixes #81883
show more ...
|
Revision tags: llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2 |
|
#
b5d35fea |
| 02-Feb-2024 |
Simon Pilgrim <RKSimon@users.noreply.github.com> |
[X86] X86FixupVectorConstants - load+sign-extend vector constants that can be stored in a truncated form (#79815)
Reduce the size of the vector constant by storing it in the constant pool in a trunc
[X86] X86FixupVectorConstants - load+sign-extend vector constants that can be stored in a truncated form (#79815)
Reduce the size of the vector constant by storing it in the constant pool in a truncated form, and sign-extend it as part of the load.
I've extended the existing FixupConstant functionality to support these sext constant rebuilds - we still select the smallest stored constant entry and prefer vzload/broadcast/vextload for same bitwidth to avoid domain flips.
I intend to add the matching load+zero-extend handling in a future PR, but that requires some alterations to the existing MC shuffle comments handling first.
show more ...
|
Revision tags: llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2 |
|
#
7f9b94c0 |
| 02-Aug-2023 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86] LowerBuildVectorv16i8 - attempt to merge lowest 2 x i16 insertions into a i32 MOVD scalar_to_vectpr
Similar to D156350, if we were going to create 2 x i16 insertions (MOVD+PINSRW), try to merg
[X86] LowerBuildVectorv16i8 - attempt to merge lowest 2 x i16 insertions into a i32 MOVD scalar_to_vectpr
Similar to D156350, if we were going to create 2 x i16 insertions (MOVD+PINSRW), try to merge them into a single MOVD to reduce the amount of GPR<->VEC traffic
show more ...
|
Revision tags: llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3 |
|
#
e9f9467d |
| 23-Apr-2023 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86] X86FixupInstTunings - add VPERMILPDri -> VSHUFPDrri mapping
Similar to the original VPERMILPSri -> VSHUFPSrri mapping added in D143787, replacing VPERMILPDri -> VSHUFPDrri should never be any
[X86] X86FixupInstTunings - add VPERMILPDri -> VSHUFPDrri mapping
Similar to the original VPERMILPSri -> VSHUFPSrri mapping added in D143787, replacing VPERMILPDri -> VSHUFPDrri should never be any slower and saves an encoding byte.
The sibling VPERMILPDmi -> VPSHUFDmi mapping is trickier as we need the same shuffle mask in every lane (and it needs to be adjusted) - I haven't attempted that yet but we can investigate it in the future if there's interest.
Fixes #61060
Differential Revision: https://reviews.llvm.org/D148999
show more ...
|
Revision tags: llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3 |
|
#
69a322fe |
| 16-Feb-2023 |
Noah Goldstein <goldstein.w.n@gmail.com> |
Add new pass `X86FixupInstTuning` for fixing up machine-instruction selection.
There are a variety of cases where we want more control over the exact instruction emitted. This commit creates a new p
Add new pass `X86FixupInstTuning` for fixing up machine-instruction selection.
There are a variety of cases where we want more control over the exact instruction emitted. This commit creates a new pass to fixup instructions after the DAG has been lowered. The pass is only meant to replace instructions that are guranteed to be interchangable, not to do analysis for special cases.
Handling these instruction changes in in X86ISelLowering of X86ISelDAGToDAG isn't ideal, as its liable to either break existing patterns that expected a certain instruction or generate infinite loops.
As well, operating as the MachineInstruction level allows us to access scheduling/code size information for making the decisions.
Currently only implements `{v}permilps` -> `{v}shufps/{v}shufd` but more transforms can be added.
Differential Revision: https://reviews.llvm.org/D143787
show more ...
|
Revision tags: llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6 |
|
#
2f448bf5 |
| 22-Jun-2022 |
Nikita Popov <npopov@redhat.com> |
[X86] Migrate tests to use opaque pointers (NFC)
Test updates were performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34
These are only the test updates where the test pas
[X86] Migrate tests to use opaque pointers (NFC)
Test updates were performed using: https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34
These are only the test updates where the test passed without further modification (which is almost all of them, as the backend is largely pointer-type agnostic).
show more ...
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init |
|
#
bd122f6d |
| 22-Jan-2021 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle vperm2x128(movddup(x),movddup(y)) cases
Fold vperm2x128(movddup(x),movddup(y)) -> movddup(vperm2x128(x,y))
|
#
c33d36e0 |
| 22-Jan-2021 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle unary vperm2x128(permute/shift(x,c),undef) cases
Fold vperm2x128(permute/shift(x,c),undef) -> permute/shift(vperm2x128(x,undef),c)
|
Revision tags: llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1 |
|
#
acbc5ede |
| 26-Apr-2020 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][SSE] getFauxShuffle - support insert(truncate/extend(extract(vec0,c0)),vec1,c1) shuffle patterns at the byte level
Followup to the PR45604 fix at rGe71dd7c011a3 where we disabled most of these
[X86][SSE] getFauxShuffle - support insert(truncate/extend(extract(vec0,c0)),vec1,c1) shuffle patterns at the byte level
Followup to the PR45604 fix at rGe71dd7c011a3 where we disabled most of these cases.
By creating the shuffle at the byte level we can handle any extension/truncation as long as we track how small the scalar got and assume that the upper bytes will need to be zero.
show more ...
|
#
757c7c24 |
| 23-Apr-2020 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][SSE] Add SSE2 extract-concat tests
Check pre-SSE41 codegen where we have less PEXTR*/PINSR* instructions
|
#
e71dd7c0 |
| 19-Apr-2020 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][SSE] getFauxShuffle - don't combine shuffles with small truncated scalars (PR45604)
getFauxShuffle attempts to combine INSERT_VECTOR_ELT(TRUNCATE/EXTEND(EXTRACT_VECTOR_ELT(x))) patterns into a
[X86][SSE] getFauxShuffle - don't combine shuffles with small truncated scalars (PR45604)
getFauxShuffle attempts to combine INSERT_VECTOR_ELT(TRUNCATE/EXTEND(EXTRACT_VECTOR_ELT(x))) patterns into a target shuffle chain.
PR45604 identified an issue where the scalar was truncated to a size smaller than the destination vector element and then zero extended back, which requires the upper bits to be zero'd which we don't currently do.
To avoid the bug I've added an early out in these truncation cases, a future commit should allow us to handle this by inserting the necessary SM_SentinelZero padding.
show more ...
|
#
e30d29eb |
| 26-Mar-2020 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT())
As long we extract from a source vector with smaller elements and we zero-extend the eleme
[X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT())
As long we extract from a source vector with smaller elements and we zero-extend the element in the final shuffle mask then we can safely peek through truncations and any/zero-extensions to find the source extraction.
show more ...
|
Revision tags: llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5 |
|
#
05c0d349 |
| 13-Mar-2020 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)
If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implici
[X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)
If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implicit zero extension of PEXTRB.
The biggest perf diff is on SLM targets where MOVD (uops=1, lat=3 tp=1) is notably faster than PEXTRB (uops=2, lat=5, tp=4).
This matches what we already do for PEXTRW.
Differential Revision: https://reviews.llvm.org/D76138
show more ...
|
Revision tags: llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3 |
|
#
e48b536b |
| 16-Feb-2020 |
Sanjay Patel <spatel@rotateright.com> |
[x86] form broadcast of scalar memop even with >1 use
The unseen logic diff occurs because MayFoldLoad() is defined like this:
static bool MayFoldLoad(SDValue Op) { return Op.hasOneUse() && ISD::
[x86] form broadcast of scalar memop even with >1 use
The unseen logic diff occurs because MayFoldLoad() is defined like this:
static bool MayFoldLoad(SDValue Op) { return Op.hasOneUse() && ISD::isNormalLoad(Op.getNode()); }
The test diffs here all seem ok to me on screen/paper, but it's hard to know if that will lead to universally better perf for all targets. For example, if a target implements broadcast from mem as multiple uops, we would have to weigh the potential reduction of instructions and register pressure vs. possible increase in number of uops. I don't know if we can make a truly informed decision on this at compile-time.
The motivating case that I'm looking at in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 ...resembles the diff in extract-concat.ll, but we're not going to change the larger example there without at least 1 other fix.
Differential Revision: https://reviews.llvm.org/D74088
show more ...
|
Revision tags: llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init |
|
#
31992a69 |
| 08-Jan-2020 |
Sanjay Patel <spatel@rotateright.com> |
[x86] add test for concat-extract corner case; NFC
See D72361 for discussion.
|
#
6d52edeb |
| 07-Jan-2020 |
Sanjay Patel <spatel@rotateright.com> |
[x86] add tests for extract-of-concat; NFC
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2 |
|
#
8b5f2ab2 |
| 07-Aug-2019 |
Craig Topper <craig.topper@intel.com> |
Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."
The assert that caused this to be reverted should be fixed now.
Original commit message:
This patch chang
Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."
The assert that caused this to be reverted should be fixed now.
Original commit message:
This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors.
This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them.
Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code.
llvm-svn: 368183
show more ...
|
#
bd0d97e1 |
| 06-Aug-2019 |
Mitch Phillips <mitchphillips@outlook.com> |
Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."
This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810.
This commit broke the MSan buildbots. See https://revi
Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."
This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810.
This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information.
llvm-svn: 368107
show more ...
|
#
3de33245 |
| 05-Aug-2019 |
Craig Topper <craig.topper@intel.com> |
[X86] Enable -x86-experimental-vector-widening-legalization by default.
This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from prom
[X86] Enable -x86-experimental-vector-widening-legalization by default.
This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors.
This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them.
Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code.
llvm-svn: 367901
show more ...
|
Revision tags: llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1, llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3, llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1, llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
#
17057b52 |
| 06-Nov-2018 |
Craig Topper <craig.topper@intel.com> |
[X86] Autogenerate complete checks. NFC
llvm-svn: 346188
|
Revision tags: llvmorg-7.0.1-rc2, llvmorg-7.0.1-rc1, llvmorg-7.0.0, llvmorg-7.0.0-rc3, llvmorg-7.0.0-rc2, llvmorg-7.0.0-rc1, llvmorg-6.0.1, llvmorg-6.0.1-rc3, llvmorg-6.0.1-rc2, llvmorg-6.0.1-rc1, llvmorg-5.0.2, llvmorg-5.0.2-rc2, llvmorg-5.0.2-rc1, llvmorg-6.0.0, llvmorg-6.0.0-rc3, llvmorg-6.0.0-rc2, llvmorg-6.0.0-rc1, llvmorg-5.0.1, llvmorg-5.0.1-rc3, llvmorg-5.0.1-rc2, llvmorg-5.0.1-rc1, llvmorg-5.0.0, llvmorg-5.0.0-rc5, llvmorg-5.0.0-rc4, llvmorg-5.0.0-rc3, llvmorg-5.0.0-rc2, llvmorg-5.0.0-rc1, llvmorg-4.0.1, llvmorg-4.0.1-rc3, llvmorg-4.0.1-rc2, llvmorg-4.0.1-rc1, llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3, llvmorg-4.0.0-rc2, llvmorg-4.0.0-rc1, llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1, llvmorg-3.9.0, llvmorg-3.9.0-rc3, llvmorg-3.9.0-rc2, llvmorg-3.9.0-rc1, llvmorg-3.8.1, llvmorg-3.8.1-rc1, llvmorg-3.8.0, llvmorg-3.8.0-rc3, llvmorg-3.8.0-rc2, llvmorg-3.8.0-rc1, llvmorg-3.7.1, llvmorg-3.7.1-rc2, llvmorg-3.7.1-rc1, llvmorg-3.7.0, llvmorg-3.7.0-rc4, llvmorg-3.7.0-rc3, studio-1.4, llvmorg-3.7.0-rc2, llvmorg-3.7.0-rc1, llvmorg-3.6.2, llvmorg-3.6.2-rc1, llvmorg-3.6.1, llvmorg-3.6.1-rc1, llvmorg-3.5.2, llvmorg-3.5.2-rc1, llvmorg-3.6.0, llvmorg-3.6.0-rc4, llvmorg-3.6.0-rc3, llvmorg-3.6.0-rc2, llvmorg-3.6.0-rc1, llvmorg-3.5.1, llvmorg-3.5.1-rc2, llvmorg-3.5.1-rc1, llvmorg-3.5.0, llvmorg-3.5.0-rc4, llvmorg-3.5.0-rc3, llvmorg-3.5.0-rc2, llvmorg-3.5.0-rc1, llvmorg-3.4.2, llvmorg-3.4.2-rc1, llvmorg-3.4.1, llvmorg-3.4.1-rc2, llvmorg-3.4.1-rc1, llvmorg-3.4.0, llvmorg-3.4.0-rc3, llvmorg-3.4.0-rc2, llvmorg-3.4.0-rc1, llvmorg-3.3.1-rc1, llvmorg-3.3.0, llvmorg-3.3.0-rc3, llvmorg-3.3.0-rc2, llvmorg-3.3.0-rc1, llvmorg-3.2.0, llvmorg-3.2.0-rc3, llvmorg-3.2.0-rc2, llvmorg-3.2.0-rc1 |
|
#
3ac8201e |
| 17-Oct-2012 |
Michael Liao <michael.liao@intel.com> |
Revert part of r166049 back and enable test case in r166125.
- Folding (trunc (concat ... X )) to (concat ... (trunc X) ...) is valid when '...' are all 'undef's. - r166125 relies on this transfor
Revert part of r166049 back and enable test case in r166125.
- Folding (trunc (concat ... X )) to (concat ... (trunc X) ...) is valid when '...' are all 'undef's. - r166125 relies on this transformation.
llvm-svn: 166155
show more ...
|
#
f630f4ca |
| 17-Oct-2012 |
Michael Liao <michael.liao@intel.com> |
Disable extract-concat test case temporarily
llvm-svn: 166141
|
#
7a442c80 |
| 17-Oct-2012 |
Michael Liao <michael.liao@intel.com> |
Teach DAG combine to fold (extract_subvec (concat v1, ..) i) to v_i
- If the extracted vector has the same type of all vectored being concatenated together, it should be simplified directly into v
Teach DAG combine to fold (extract_subvec (concat v1, ..) i) to v_i
- If the extracted vector has the same type of all vectored being concatenated together, it should be simplified directly into v_i, where i is the index of the element being extracted.
llvm-svn: 166125
show more ...
|