|
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init |
|
| #
d69033d2 |
| 12-Jul-2023 |
Nikita Popov <npopov@redhat.com> |
[SCEVExpander] Fix GEP IV inc reuse logic for opaque pointers
Instead of checking the pointer type, check the element type of the GEP.
Previously we ended up reusing GEP increments that were not in
[SCEVExpander] Fix GEP IV inc reuse logic for opaque pointers
Instead of checking the pointer type, check the element type of the GEP.
Previously we ended up reusing GEP increments that were not in expanded form, thus not respecting LSRs choice of representation.
The change in 2011-10-06-ReusePhi.ll recovers a regression that appeared when converting that test to opaque pointers.
Changes in various Thumb tests now compute the step outside the loop instead of using add.w inside the loop, which is LSR's preferred representation for this target.
show more ...
|
|
Revision tags: llvmorg-16.0.6, llvmorg-16.0.5 |
|
| #
c4a60c9d |
| 25-May-2023 |
sgokhale <sgokhale@nvidia.com> |
[CodeGen][ShrinkWrap] Enable PostShrinkWrap by default
This is an attempt to reland D42600 and enabling this optimisation by default.
This also resolves the issue pointed out in the context of PGO
[CodeGen][ShrinkWrap] Enable PostShrinkWrap by default
This is an attempt to reland D42600 and enabling this optimisation by default.
This also resolves the issue pointed out in the context of PGO build.
Differential Revision: https://reviews.llvm.org/D42600
show more ...
|
|
Revision tags: llvmorg-16.0.4 |
|
| #
f4999d35 |
| 08-May-2023 |
Alan Zhao <ayzhao@google.com> |
Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.
The original commit causes a Chrome build assertion failure with ThinLTO: https://cr
Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 1ddfd1c8186735c62b642df05c505dc4907ffac4.
The original commit causes a Chrome build assertion failure with ThinLTO: https://crbug.com/1443635
show more ...
|
| #
1ddfd1c8 |
| 08-May-2023 |
sgokhale <sgokhale@nvidia.com> |
[CodeGen][ShrinkWrap] Split restore point
Try to reland D42600
Differential Revision: https://reviews.llvm.org/D42600
|
|
Revision tags: llvmorg-16.0.3, llvmorg-16.0.2 |
|
| #
bb5befef |
| 13-Apr-2023 |
sgokhale <sgokhale@nvidia.com> |
Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 5f0bccc3d1a74111458c71f009817c9995f4bf83.
An issue has been reported here: https://github.com/ClangBuiltLinux/linux/issues/1833
|
| #
5f0bccc3 |
| 11-Apr-2023 |
sgokhale <sgokhale@nvidia.com> |
[CodeGen][ShrinkWrap] Split restore point
This patch splits a restore point to allow it to only post-dominate blocks reachable by use or def of CSRs(Callee Saved Registers)/FI(Frame Index).
Benchma
[CodeGen][ShrinkWrap] Split restore point
This patch splits a restore point to allow it to only post-dominate blocks reachable by use or def of CSRs(Callee Saved Registers)/FI(Frame Index).
Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change for others.
Co-authored-by: junbuml
Differential Revision: https://reviews.llvm.org/D42600
show more ...
|
|
Revision tags: llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2 |
|
| #
6da3cfc3 |
| 30-Jan-2023 |
Sergei Barannikov <barannikov88@gmail.com> |
[Thumb2] Convert some tests to opaque pointers (NFC)
|
|
Revision tags: llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7 |
|
| #
9ed2f14c |
| 14-Dec-2022 |
Nikita Popov <npopov@redhat.com> |
[AsmParser] Remove typed pointer auto-detection
IR is now always parsed in opaque pointer mode, unless -opaque-pointers=0 is explicitly given. There is no automatic detection of typed pointers anymo
[AsmParser] Remove typed pointer auto-detection
IR is now always parsed in opaque pointer mode, unless -opaque-pointers=0 is explicitly given. There is no automatic detection of typed pointers anymore.
The -opaque-pointers=0 option is added to any remaining IR tests that haven't been migrated yet.
Differential Revision: https://reviews.llvm.org/D141912
show more ...
|
|
Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4 |
|
| #
88ac25b3 |
| 27-Oct-2022 |
John Brawn <john.brawn@arm.com> |
[MachineCSE] Allow PRE of instructions that read physical registers
Currently MachineCSE forbids PRE when the instruction reads a physical register. Relax this so that it's allowed when the value be
[MachineCSE] Allow PRE of instructions that read physical registers
Currently MachineCSE forbids PRE when the instruction reads a physical register. Relax this so that it's allowed when the value being read is the same as what would be read in the place the instruction would be hoisted to.
This is being done in preparation for adding FPCR handling to the AArch64 backend, in order to prevent it to from worsening the generated code, but for targets that already have a similar register it should improve things.
This patch affects code generation in several tests. The new code looks better except for in Thumb2/LowOverheadLoops/memcall.ll where we perform PRE but the LowOverheadLoops transformation then undoes it. Also in AMDGPU/selectcc-opt.ll the CHECK makes things look worse, but actually the function as a whole is better (as a MOV is PRE'd).
Differential Revision: https://reviews.llvm.org/D136675
show more ...
|
| #
7a7b36e9 |
| 28-Oct-2022 |
John Brawn <john.brawn@arm.com> |
Revert "[MachineCSE] Allow PRE of instructions that read physical registers"
This reverts commit 628467e53f4ceecd2b5f0797f07591c66d9d9d2a.
This is causing a miscompile in ffmpeg when compiled for a
Revert "[MachineCSE] Allow PRE of instructions that read physical registers"
This reverts commit 628467e53f4ceecd2b5f0797f07591c66d9d9d2a.
This is causing a miscompile in ffmpeg when compiled for armv7.
show more ...
|
| #
628467e5 |
| 27-Oct-2022 |
John Brawn <john.brawn@arm.com> |
[MachineCSE] Allow PRE of instructions that read physical registers
Currently MachineCSE forbids PRE when the instruction reads a physical register. Relax this so that it's allowed when the value be
[MachineCSE] Allow PRE of instructions that read physical registers
Currently MachineCSE forbids PRE when the instruction reads a physical register. Relax this so that it's allowed when the value being read is the same as what would be read in the place the instruction would be hoisted to.
This is being done in preparation for adding FPCR handling to the AArch64 backend, in order to prevent it to from worsening the generated code, but for targets that already have a similar register it should improve things.
This patch affects code generation in several tests. The new code looks better except for in Thumb2/LowOverheadLoops/memcall.ll where we perform PRE but the LowOverheadLoops transformation then undoes it. Also in AMDGPU/selectcc-opt.ll the CHECK makes things look worse, but actually the function as a whole is better (as a MOV is PRE'd).
Differential Revision: https://reviews.llvm.org/D136675
show more ...
|
|
Revision tags: llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3, llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2 |
|
| #
bee2f618 |
| 13-Jun-2021 |
David Green <david.green@arm.com> |
[ARM] Introduce t2WhileLoopStartTP
This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in D90591. It keeps a reference to both the tripcount register and the element count register, s
[ARM] Introduce t2WhileLoopStartTP
This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in D90591. It keeps a reference to both the tripcount register and the element count register, so that the ARMLowOverheadLoops pass in the backend can pick the correct one without having to search for it from the operand of a VCTP.
Differential Revision: https://reviews.llvm.org/D103236
show more ...
|
|
Revision tags: llvmorg-12.0.1-rc1 |
|
| #
ce76093c |
| 14-May-2021 |
David Green <david.green@arm.com> |
[ARM] Expand predecessor search to multiple blocks when reverting WhileLoopStarts
We were previously only searching a single preheader for call instructions when reverting WhileLoopStarts to DoLoopS
[ARM] Expand predecessor search to multiple blocks when reverting WhileLoopStarts
We were previously only searching a single preheader for call instructions when reverting WhileLoopStarts to DoLoopStarts. This extends that to multiple blocks that can come up when, for example a loop is expanded from a memcpy. It also expends the instructions from just Call's to also include other LoopStarts, to catch other low overhead loops in the preheader.
Differential Revision: https://reviews.llvm.org/D102269
show more ...
|
| #
dfe3ffaa |
| 06-May-2021 |
Malhar Jajoo <malhar.jajoo@arm.com> |
[ARM] Transforming memset to Tail predicated Loop
This patch converts llvm.memset intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE).
[ARM] Transforming memset to Tail predicated Loop
This patch converts llvm.memset intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE).
The llvm.memset is converted to a TP loop for both constant and non-constant input sizes (of llvm.memset).
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D100435
show more ...
|
| #
9ff38e2d |
| 06-May-2021 |
Malhar Jajoo <malhar.jajoo@arm.com> |
[ARM] Transforming memcpy to Tail predicated Loop
This patch converts llvm.memcpy intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE).
[ARM] Transforming memcpy to Tail predicated Loop
This patch converts llvm.memcpy intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE).
From an implementation point of view, the patch
- adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel) - adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter, on matching the above node. - Adds a custom inserter function that expands the pseudo instruction into MIR suitable to be (by later passes) into a WLSTP loop.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D99723
show more ...
|
| #
fc690777 |
| 06-May-2021 |
Malhar Jajoo <malhar.jajoo@arm.com> |
Revert "[ARM] Transforming memcpy to Tail predicated Loop"
Reverting commit since it causes failure (10462). This reverts commit b856f4a232cbd43476e9b9f75c80aacfc6f5c152.
|
| #
b856f4a2 |
| 06-May-2021 |
Malhar Jajoo <malhar.jajoo@arm.com> |
[ARM] Transforming memcpy to Tail predicated Loop
This patch converts llvm.memcpy intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE).
[ARM] Transforming memcpy to Tail predicated Loop
This patch converts llvm.memcpy intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE).
From an implementation point of view, the patch
- adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel) - adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter, on matching the above node. - Adds a custom inserter function that expands the pseudo instruction into MIR suitable to be (by later passes) into a WLSTP loop.
Note: A cli option is used to control the conversion of memcpy to TP loop and this option is currently disabled by default. It may be enabled in the future after further downstream testing.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D99723
show more ...
|
|
Revision tags: llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1 |
|
| #
e4744994 |
| 03-Nov-2020 |
David Green <david.green@arm.com> |
[ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops
If an instruction will be lowered to a call there is no advantage of using a low overhead loop as the LR register will n
[ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops
If an instruction will be lowered to a call there is no advantage of using a low overhead loop as the LR register will need to be spilled and reloaded around the call, and the low overhead will end up being reverted. This teaches our hardware loop lowering that these memory intrinsics will be calls under certain situations.
Differential Revision: https://reviews.llvm.org/D90439
show more ...
|
| #
785080e3 |
| 03-Nov-2020 |
David Green <david.green@arm.com> |
[ARM] Low overhead loop memcpy lowering test. NFC
|