Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2 |
|
#
2cca53c8 |
| 13-Dec-2021 |
Alexey Bataev <a.bataev@outlook.com> |
[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual vector registers) in the best way if we take t
[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches.
Part of D100486
Differential Revision: https://reviews.llvm.org/D115653
show more ...
|
#
5f7ac159 |
| 20-Apr-2022 |
Alexey Bataev <a.bataev@outlook.com> |
Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer."
This reverts commit 2f49163b3365e5dc046b03e422a048dd45aee3f0 to fix a buildbot failure. Reported in h
Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer."
This reverts commit 2f49163b3365e5dc046b03e422a048dd45aee3f0 to fix a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284
show more ...
|
#
2f49163b |
| 13-Dec-2021 |
Alexey Bataev <a.bataev@outlook.com> |
[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual vector registers) in the best way if we take t
[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches.
Part of D100486
Differential Revision: https://reviews.llvm.org/D115653
show more ...
|
Revision tags: llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4 |
|
#
1da52ef2 |
| 19-Sep-2021 |
David Green <david.green@arm.com> |
[ARM] Add VGETLANEu patterns for v4f16 and v8f16
These were apparently missing, having no pattern that could convert a VGETLANEu of a v4f16 to an i32. Added bf16 whilst here, following the same code.
|
Revision tags: llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2 |
|
#
649a3d00 |
| 04-Feb-2021 |
David Green <david.green@arm.com> |
[ARM] Handle f16 in GeneratePerfectShuffle
This new f16 shuffle under Neon would hit an assert in GeneratePerfectShuffle as it would try to treat a f16 vector as an i8. Add f16 handling, treating th
[ARM] Handle f16 in GeneratePerfectShuffle
This new f16 shuffle under Neon would hit an assert in GeneratePerfectShuffle as it would try to treat a f16 vector as an i8. Add f16 handling, treating them like an i16.
Differential Revision: https://reviews.llvm.org/D95446
show more ...
|
Revision tags: llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1 |
|
#
9e2768a3 |
| 27-Jan-2021 |
David Green <david.green@arm.com> |
[ARM] Add neon FP16 scalar_to_vector patterns.
This adds some simple fp16 scalar_to_vector patterns, preventing a selection failure if this came up.
Differential Revision: https://reviews.llvm.org/
[ARM] Add neon FP16 scalar_to_vector patterns.
This adds some simple fp16 scalar_to_vector patterns, preventing a selection failure if this came up.
Differential Revision: https://reviews.llvm.org/D95427
show more ...
|
Revision tags: llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2 |
|
#
d428f881 |
| 26-Jun-2020 |
David Green <david.green@arm.com> |
[ARM] VCVTT fpround instruction selection
Similar to the recent patch for fpext, this adds vcvtb and vcvtt with insert into vector instruction selection patterns for fptruncs. This helps clear up a
[ARM] VCVTT fpround instruction selection
Similar to the recent patch for fpext, this adds vcvtb and vcvtt with insert into vector instruction selection patterns for fptruncs. This helps clear up a lot of register shuffling that we would otherwise do.
Differential Revision: https://reviews.llvm.org/D81637
show more ...
|
#
76e0e1a5 |
| 26-Jun-2020 |
David Green <david.green@arm.com> |
[ARM] VCVTT instruction selection
We current extract and convert from a top lane of a f16 vector using a VMOVX;VCVTB pair. We can simplify that to use a single VCVTT. The pattern is mostly copied fr
[ARM] VCVTT instruction selection
We current extract and convert from a top lane of a f16 vector using a VMOVX;VCVTB pair. We can simplify that to use a single VCVTT. The pattern is mostly copied from a vector extract pattern, but produces a VCVTTHS f32 directly.
This had to move some code around so that ARMInstrVFP had access to the required pattern frags that were previously part of ARMInstrNEON.
Differential Revision: https://reviews.llvm.org/D81556
show more ...
|
#
75268812 |
| 19-Jun-2020 |
Mikhail Maltsev <mikhail.maltsev@arm.com> |
[ARM][BFloat] Lowering of create/get/set/dup intrinsics
This patch adds codegen for the following BFloat operations to the ARM backend: * concatenation of bf16 vectors * bf16 vector element extracti
[ARM][BFloat] Lowering of create/get/set/dup intrinsics
This patch adds codegen for the following BFloat operations to the ARM backend: * concatenation of bf16 vectors * bf16 vector element extraction * bf16 vector element insertion * duplication of a bf16 value into each lane of a vector * duplication of a bf16 vector lane into each lane
Differential Revision: https://reviews.llvm.org/D81411
show more ...
|
#
61ef2d27 |
| 10-Jun-2020 |
David Green <david.green@arm.com> |
[ARM] Update fp16-insert-extract.ll test checks. NFC
|
Revision tags: llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init, llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1, llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2 |
|
#
08da01b4 |
| 04-Jun-2019 |
Mikhail Maltsev <mikhail.maltsev@arm.com> |
[ARM] Add FP16 vector insert/extract patterns
This change adds two FP16 extraction and two insertion patterns (one per possible vector length). Extractions are handled by copying a Q/D register into
[ARM] Add FP16 vector insert/extract patterns
This change adds two FP16 extraction and two insertion patterns (one per possible vector length). Extractions are handled by copying a Q/D register into one of VFP2 class registers, where single FP32 sub-registers can be accessed. Then the extraction of even lanes are simple sub-register extractions (because we don't care about the top parts of registers for FP16 operations). Odd lanes need an additional VMOVX instruction.
Unfortunately, insertions cannot be handled in the same way, because: * There is no instruction to insert FP16 into an even lane (VINS only works with odd lanes) * The patterns for odd lanes will have a form of a DAG (not a tree), and will not be implementable in pure tablegen
Because of this insertions are handled in the same way as 16-bit integer insertions (with conversions between FP registers and GPRs using VMOVHR instructions).
Without these patterns the ARM backend would sometimes fail during instruction selection.
This patch also adds patterns which combine: * an FP16 element extraction and a store into a single VST1 instruction * an FP16 load and insertion into a single VLD1 instruction
Differential Revision: https://reviews.llvm.org/D62651
llvm-svn: 362482
show more ...
|