Revision tags: llvmorg-21-init, llvmorg-19.1.7 |
|
#
bab7920f |
| 13-Jan-2025 |
Alexey Bataev <a.bataev@outlook.com> |
[RISCV][CG]Use processShuffleMasks for per-register shuffles
Patch adds usage of processShuffleMasks in in codegen in lowerShuffleViaVRegSplitting. This function is already used for X86 shuffles est
[RISCV][CG]Use processShuffleMasks for per-register shuffles
Patch adds usage of processShuffleMasks in in codegen in lowerShuffleViaVRegSplitting. This function is already used for X86 shuffles estimations and in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE functions, unifies the code.
Reviewers: topperc, wangpc-pp, lukel97, preames
Reviewed By: preames
Pull Request: https://github.com/llvm/llvm-project/pull/121765
show more ...
|
#
24bb180e |
| 10-Jan-2025 |
Philip Reames <preames@rivosinc.com> |
[RISCV] Attempt to widen SEW before generic shuffle lowering (#122311)
This takes inspiration from AArch64 which does the same thing to assist
with zip/trn/etc.. Doing this recursion unconditionall
[RISCV] Attempt to widen SEW before generic shuffle lowering (#122311)
This takes inspiration from AArch64 which does the same thing to assist
with zip/trn/etc.. Doing this recursion unconditionally when the mask
allows is slightly questionable, but seems to work out okay in practice.
As a bit of context, it's helpful to realize that we have existing logic
in both DAGCombine and InstCombine which mutates the element width of in
an analogous manner. However, that code has two restriction which
prevent it from handling the motivating cases here. First, it only
triggers if there is a bitcast involving a different element type.
Second, the matcher used considers a partially undef wide element to be
a non-match. I considered trying to relax those assumptions, but the
information loss for undef in mid-level opt seemed more likely to open a
can of worms than I wanted.
show more ...
|
#
45c01e8a |
| 19-Dec-2024 |
Finn Plummer <50529406+inbelic@users.noreply.github.com> |
[NFC][TargetTransformInfo][VectorUtils] Consolidate `isVectorIntrinsic...` api (#117635)
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of
[NFC][TargetTransformInfo][VectorUtils] Consolidate `isVectorIntrinsic...` api (#117635)
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
consistently name them
- move `isTriviallyScalarizable` to VectorUtils
- update all uses of the api and provide the TTI parameter
Resolves #117030
show more ...
|
Revision tags: llvmorg-19.1.6 |
|
#
b759020c |
| 11-Dec-2024 |
LiqinWeng <liqin.weng@spacemit.com> |
[LV][EVL] Support cast instruction with EVL-vectorization (#108351)
|
#
b9aa155d |
| 06-Dec-2024 |
Alexey Bataev <a.bataev@outlook.com> |
[TTI][X86]Fix detection of the shuffles from the second shuffle operand only
If the shuffle mask uses only indices from the second shuffle operand, processShuffleMasks function misses it currently,
[TTI][X86]Fix detection of the shuffles from the second shuffle operand only
If the shuffle mask uses only indices from the second shuffle operand, processShuffleMasks function misses it currently, which prevents correct cost estimation in this corner case. To fix this, need to raise the limit to 2 * VF rather than just VF and adjust processing correspondingly. Will allow future improvements for 2 sources permutations.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/118972
show more ...
|
Revision tags: llvmorg-19.1.5 |
|
#
4a3f46de |
| 28-Nov-2024 |
LiqinWeng <liqin.weng@spacemit.com> |
[LV][EVL] Support call instruction with EVL-vectorization (#110412)
|
#
8663b877 |
| 21-Nov-2024 |
Finn Plummer <50529406+inbelic@users.noreply.github.com> |
[NFC][VectorUtils][TargetTransformInfo] Add `isVectorIntrinsicWithOverloadTypeAtArg` api (#114849)
This changes allows target intrinsics to specify and overwrite overloaded types.
- Updates `Repl
[NFC][VectorUtils][TargetTransformInfo] Add `isVectorIntrinsicWithOverloadTypeAtArg` api (#114849)
This changes allows target intrinsics to specify and overwrite overloaded types.
- Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case
- Updates `SLPVectorizer` to use available TTI
- Updates `VPTransformState` to pass down TTI
- Updates `VPlanRecipe` to use passed-down TTI
This change will let us add scalarization for `asdouble`: #114847
show more ...
|
Revision tags: llvmorg-19.1.4 |
|
#
818d7159 |
| 09-Nov-2024 |
Tex Riddell <texr@microsoft.com> |
[Analysis] atan2: isTriviallyVectorizable; add to massv and accelerate veclibs (#113637)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- Re
[Analysis] atan2: isTriviallyVectorizable; add to massv and accelerate veclibs (#113637)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
- Return true for atan2 from isTriviallyVectorizable
- Add atan2 to VecFuncs.def for massv and accelerate libraries.
- Add atan2 to hasOptimizedCodeGen
- Add atan2 support in llvm/lib/Analysis/ValueTracking.cpp
llvm::getIntrinsicForCallSite and update vectorization tests
- Add atan2 name check to isLoweredToCall in
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
- Note: there's no test coverage for these names in isLoweredToCall, except that Transforms/TailCallElim/inf-recursion.ll is impacted by the "fabs" case
Thanks to @jroelofs for the atan2 accelerate veclib and associated test
additions, plus the hasOptimizedCodeGen addition.
Part of: Implement the atan2 HLSL Function #70096.
show more ...
|
#
dfb60bb9 |
| 29-Oct-2024 |
Rohit Aggarwal <44664450+rohitaggarwal007@users.noreply.github.com> |
Adding more vector calls for -fveclib=AMDLIBM (#109662)
AMD has it's own implementation of vector calls.
New vector calls are introduced in the library for exp10, log10, sincos and finite asin/acos
Adding more vector calls for -fveclib=AMDLIBM (#109662)
AMD has it's own implementation of vector calls.
New vector calls are introduced in the library for exp10, log10, sincos and finite asin/acos
Please refer [https://github.com/amd/aocl-libm-ose]
---------
Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
show more ...
|
Revision tags: llvmorg-19.1.3 |
|
#
dcbf2c2c |
| 21-Oct-2024 |
Farzon Lotfi <1802579+farzonl@users.noreply.github.com> |
[Scalarizer][DirectX] support structs return types (#111569)
Based on this RFC:
https://discourse.llvm.org/t/rfc-allow-the-scalarizer-pass-to-scalarize-vectors-returned-in-structs/82306
LLVM int
[Scalarizer][DirectX] support structs return types (#111569)
Based on this RFC:
https://discourse.llvm.org/t/rfc-allow-the-scalarizer-pass-to-scalarize-vectors-returned-in-structs/82306
LLVM intrinsics do not support out params. To get around this limitation
implementers will make intrinsics return structs to capture a return
type and an out param. This implementation detail should not impact
scalarization since these cases should be elementwise operations.
## Three changes are needed.
- The CallInst visitor needs to be updated to handle Structs
- A new visitor is needed for `ExtractValue` instructions
- finsh needs to be update to handle structs so that insert elements are
properly propogated.
## Testing changes
- Add support for `llvm.frexp`
- Add support for `llvm.dx.splitdouble`
fixes https://github.com/llvm/llvm-project/issues/111437
show more ...
|
#
4ba1800b |
| 16-Oct-2024 |
Amr Hesham <amr96@programmer.net> |
[LLVM][NFC] Reduce copying of parameter in lambda (#110299)
Reduce redundant copy parameter in lambda
Fixes #95642
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1 |
|
#
cc7b24a4 |
| 24-Sep-2024 |
Piotr Fusik <p.fusik@samsung.com> |
[NFC] Fix typos in comments (#109765)
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
#
a156b5a4 |
| 02-Sep-2024 |
Yingwei Zheng <dtcxzyw2333@gmail.com> |
[SLP] Add vectorization support for [u|s]cmp (#106747)
This patch adds vectorization support for [u|s]cmp intrinsic calls.
|
#
d58d105c |
| 30-Aug-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584)
Show fallback cases in amdlibm tests where it doesn't have that specific op
|
Revision tags: llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
#
b22fa909 |
| 16-Jul-2024 |
mskamp <msk@posteo.org> |
[ValueTracking][X86] Compute KnownBits for phadd/phsub (#92429)
Add KnownBits computations to ValueTracking and X86 DAG lowering.
These instructions add/subtract adjacent vector elements in t
[ValueTracking][X86] Compute KnownBits for phadd/phsub (#92429)
Add KnownBits computations to ValueTracking and X86 DAG lowering.
These instructions add/subtract adjacent vector elements in their operands. Example: phadd [X1, X2] [Y1, Y2] = [X1 + X2, Y1 + Y2]. This means that, in this example, we can compute the KnownBits of the operation by computing the KnownBits of [X1, X2] + [X1, X2] and [Y1, Y2] + [Y1, Y2] and intersecting the results. This approach also generalizes to all x86 vector types.
There are also the operations phadd.sw and phsub.sw, which perform saturating addition/subtraction. Use sadd_sat and ssub_sat to compute the KnownBits of these operations.
Also adjust the existing test case pr53247.ll because it can be transformed to a constant using the new KnownBits computation.
Fixes #82516.
show more ...
|
#
2d209d96 |
| 27-Jun-2024 |
Nikita Popov <npopov@redhat.com> |
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it does
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
show more ...
|
#
d42b3926 |
| 26-Jun-2024 |
Nikita Popov <npopov@redhat.com> |
[VectorUtils] Use SmallPtrSet::remove_if() (NFC)
|
#
5b4000dc |
| 26-Jun-2024 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
[VectorUtils] Add llvm::scaleShuffleMaskElts wrapper for narrowShuffleMaskElts/widenShuffleMaskElts, NFC. (#96646)
Using the target number of vector elements, scaleShuffleMaskElts will try to use na
[VectorUtils] Add llvm::scaleShuffleMaskElts wrapper for narrowShuffleMaskElts/widenShuffleMaskElts, NFC. (#96646)
Using the target number of vector elements, scaleShuffleMaskElts will try to use narrowShuffleMaskElts/widenShuffleMaskElts to scale the shuffle mask accordingly.
Working on #58895 I didn't want to create yet another case where we have to handle both re-scaling cases.
show more ...
|
#
605e1847 |
| 24-Jun-2024 |
Nikita Popov <npopov@redhat.com> |
[VectorUtils] Use poison instead of undef in findScalarElement()
Out-of-range extractelement returns poison, and so do poison elements in the shufflevector mask.
|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7 |
|
#
1d874335 |
| 05-Jun-2024 |
Farzon Lotfi <1802579+farzonl@users.noreply.github.com> |
[x86] Add tan intrinsic part 4 (#90503)
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://disc
[x86] Add tan intrinsic part 4 (#90503)
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
Much of this change was following how G_FSIN and G_FCOS were used.
Changes:
- `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN`
opcode
- `llvm/docs/LangRef.rst` - Document the tan intrinsic
- `llvm/include/llvm/Analysis/VecFuncs.def` - Associate the tan
intrinsic as a vector function similar to the tanf libcall.
- `llvm/include/llvm/CodeGen/BasicTTIImpl.h` - Map the tan intrinsic to
`ISD::FTAN`
- `llvm/include/llvm/CodeGen/ISDOpcodes.h` - Define ISD opcodes for
`FTAN` and `STRICT_FTAN`
- `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/include/llvm/IR/RuntimeLibcalls.def` - Define tan libcall
mappings
- `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN`
Opcode
- `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN`
Opcode handler
- `llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td` - Map
`G_FTAN` to `ftan`
- `llvm/include/llvm/Target/TargetSelectionDAG.td` - Define `ftan`,
`strict_ftan`, and `any_ftan` and map them to the ISD opcodes for `FTAN`
and `STRICT_FTAN`
- `llvm/lib/Analysis/VectorUtils.cpp` - Associate the tan intrinsic as a
vector intrinsic
- `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic
to `G_FTAN` Opcode
- `llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp` - Add `G_FTAN` to
the list of floating point math operations also associate `G_FTAN` with
the `TAN_F` runtime lib.
- `llvm/lib/CodeGen/GlobalISel/Utils.cpp` - More floating point math
operation common behaviors.
- llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp - List the function
expansion operations for `FTAN` and `STRICT_FTAN`. Also define both
opcodes in `PromoteNode`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp` - More `FTAN`
and `STRICT_FTAN` handling in the legalizer
- `llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h` - Define
`SoftenFloatRes_FTAN` and `ExpandFloatRes_FTAN`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp` - Define `FTAN`
as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp` - Define
`FTAN` as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp` - define tan as an
intrinsic that doesn't return NaN.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp` Map
`LibFunc_tan`, `LibFunc_tanf`, and `LibFunc_tanl` to `ISD::FTAN`. Map
`Intrinsic::tan` to `ISD::FTAN` and add selection dag handling for
`Intrinsic::tan`.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp` - Define `ftan`
and `strict_ftan` names for the equivalent ISD opcodes.
- `llvm/lib/CodeGen/TargetLoweringBase.cpp` -Define a Tan128 libcall and
ISD::FTAN as a target lowering action.
- `llvm/lib/Target/X86/X86ISelLowering.cpp` - Add x86_64 lowering for
tan intrinsic
resolves https://github.com/llvm/llvm-project/issues/70082
show more ...
|
Revision tags: llvmorg-18.1.6, llvmorg-18.1.5 |
|
#
cf328ff9 |
| 24-Apr-2024 |
Pierre van Houtryve <pierre.vanhoutryve@amd.com> |
[IR] Memory Model Relaxation Annotations (#78569)
Implements the core/target-agnostic components of Memory Model
Relaxation Annotations.
RFC:
https://discourse.llvm.org/t/rfc-mmras-memory-model
[IR] Memory Model Relaxation Annotations (#78569)
Implements the core/target-agnostic components of Memory Model
Relaxation Annotations.
RFC:
https://discourse.llvm.org/t/rfc-mmras-memory-model-relaxation-annotations/76361/5
show more ...
|
Revision tags: llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1 |
|
#
a1a590ef |
| 05-Mar-2024 |
Yingwei Zheng <dtcxzyw2333@gmail.com> |
[InstCombine] Fix miscompilation in PR83947 (#83993)
https://github.com/llvm/llvm-project/blob/762f762504967efbe159db5c737154b989afc9bb/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L394-L407
[InstCombine] Fix miscompilation in PR83947 (#83993)
https://github.com/llvm/llvm-project/blob/762f762504967efbe159db5c737154b989afc9bb/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L394-L407
Comment from @topperc:
> This transforms assumes the mask is a non-zero splat. We only know its
a splat and not provably all 0s. The mask is a constexpr that includes
the address of the global variable. We can't resolve the constant
expression to an exact value.
Fixes #83947.
show more ...
|
Revision tags: llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init |
|
#
92289db8 |
| 17-Jan-2024 |
Alexandros Lamprineas <alexandros.lamprineas@arm.com> |
[VFABI] Move the Vector ABI demangling utility to LLVMCore. (#77513)
This fixes #71892 allowing us to check magled names in the IR verifier.
|
#
e512df3e |
| 02-Jan-2024 |
Alexandros Lamprineas <alexandros.lamprineas@arm.com> |
[LV] Fix crash when vectorizing function calls with linear args. (#76274)
llvm/lib/IR/Type.cpp:694:
Assertion `isValidElementType(ElementType) && "Element type of a
VectorType must be an i
[LV] Fix crash when vectorizing function calls with linear args. (#76274)
llvm/lib/IR/Type.cpp:694:
Assertion `isValidElementType(ElementType) && "Element type of a
VectorType must be an integer, floating point, or pointer type."'
failed.
Stack dump:
llvm::FixedVectorType::get(llvm::Type*, unsigned int)
llvm::VPWidenCallRecipe::execute(llvm::VPTransformState&)
llvm::VPBasicBlock::execute(llvm::VPTransformState*)
llvm::VPRegionBlock::execute(llvm::VPTransformState*)
llvm::VPlan::execute(llvm::VPTransformState*)
...
Happens with function calls of void return type.
show more ...
|
#
ddb6db4d |
| 19-Dec-2023 |
Paschalis Mpeis <paschalis.mpeis@arm.com> |
[VFABI] Create FunctionType for vector functions (#75058)
`createFunctionType` returns a FunctionType that may contain a mask,
which is currently placed as the last parameter to the Function.
The
[VFABI] Create FunctionType for vector functions (#75058)
`createFunctionType` returns a FunctionType that may contain a mask,
which is currently placed as the last parameter to the Function.
The placement happens according to `VFParameters` of `VFInfo`, and it
should be able to handle VFABI specification changes.
Regarding the return type, it uses the scalar type of the input instruction,
as the specification does not encode in the mangled name such information.
If that ever happens, that information should be available from `VFInfo`.
show more ...
|