Revision tags: llvmorg-21-init |
|
#
3a4376b8 |
| 27-Jan-2025 |
Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com> |
LAA: handle 0 return from getPtrStride correctly (#124539)
getPtrStride returns 0 when the PtrScev is loop-invariant, and this is
not an erroneous value: it returns std::nullopt to communicate that
LAA: handle 0 return from getPtrStride correctly (#124539)
getPtrStride returns 0 when the PtrScev is loop-invariant, and this is
not an erroneous value: it returns std::nullopt to communicate that it
was not able to find a valid pointer stride. In analyzeLoop, we call
getPtrStride with a value_or(0) conflating the zero return value with
std::nullopt. Fix this, handling loop-invariant loads correctly.
show more ...
|
#
b7286dbe |
| 27-Jan-2025 |
David Sherwood <david.sherwood@arm.com> |
Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752" (#123616)
The last attempt failed a sanitiser build because we were
creating a reference to a null
Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752" (#123616)
The last attempt failed a sanitiser build because we were
creating a reference to a null Predicates pointer in
isDereferenceableAndAlignedInLoop. This was exposed by
the unit test IsDerefReadOnlyLoop in
unittests/Analysis/LoadsTest.cpp. I fixed this by falling
back on getConstantMaxBackedgeTakenCount if Predicates is
null - see line 316 in llvm/lib/Analysis/Loads.cpp. There
are no other changes.
show more ...
|
#
a00938ee |
| 15-Jan-2025 |
David Sherwood <david.sherwood@arm.com> |
Revert "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop (#96752)" (#123057)
This reverts commit bfedf6460c2cad6e6f966b457d8d27084579dcd8.
|
#
bfedf646 |
| 15-Jan-2025 |
David Sherwood <david.sherwood@arm.com> |
[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop (#96752)
Currently when we encounter a negative step in the induction
variable isDereferenceableAndAlignedInLoop b
[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop (#96752)
Currently when we encounter a negative step in the induction
variable isDereferenceableAndAlignedInLoop bails out because
the element size is signed greater than the step. This patch
adds support for negative steps in cases where we detect the
start address for the load is of the form base + offset. In
this case the address decrements in each iteration so we need
to calculate the access size differently. I have done this by
caling getStartAndEndForAccess from LoopAccessAnalysis.cpp.
The motivation for this patch comes from PR #88385 where a
reviewer requested reusing isDereferenceableAndAlignedInLoop,
but that PR itself does support reverse loops.
The changed test in LoopVectorize/X86/load-deref-pred.ll now
passes because previously we were calculating the total access
size incorrectly, whereas now it is 412 bytes and fits
perfectly into the alloca.
show more ...
|
Revision tags: llvmorg-19.1.7 |
|
#
8b456146 |
| 13-Jan-2025 |
Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com> |
LAA: add missed swap when inverting src, sink (#122254)
When inverting source and sink on a negative induction step, the types
of the source and sink should also be swapped. This fixes a bug in the
LAA: add missed swap when inverting src, sink (#122254)
When inverting source and sink on a negative induction step, the types
of the source and sink should also be swapped. This fixes a bug in the
code that follows, that computes properties based on these types. With
234cc40 ([LAA] Limit no-overlap check to at least one loop-invariant
accesses.), that code is guarded by a loop-invariant condition: however,
the commit did not add any new tests exercising the guarded code, and
hence the bugfix in this patch requires additional tests to exercise
that guarded codepath.
show more ...
|
#
17912f33 |
| 09-Jan-2025 |
Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com> |
LAA: refactor dependence class to prep for scaled strides (NFC) (#122113)
Rearrange the DepDistanceAndSizeInfo struct in preparation to scale
strides. getDependenceDistanceStrideAndSize now returns
LAA: refactor dependence class to prep for scaled strides (NFC) (#122113)
Rearrange the DepDistanceAndSizeInfo struct in preparation to scale
strides. getDependenceDistanceStrideAndSize now returns the data of
CommonStride, MaxStride, and clarifies when to retry with runtime
checks, in place of (unscaled) strides.
show more ...
|
Revision tags: llvmorg-19.1.6 |
|
#
bc0976ed |
| 10-Dec-2024 |
Nikita Popov <npopov@redhat.com> |
[LAA] Strip non-inbounds offset in getPointerDiff() (NFC) (#118665)
I believe that this code doesn't care whether the offsets are known to
be inbounds a priori. For the same reason the change is no
[LAA] Strip non-inbounds offset in getPointerDiff() (NFC) (#118665)
I believe that this code doesn't care whether the offsets are known to
be inbounds a priori. For the same reason the change is not testable, as
the SCEV based fallback code will look through non-inbounds offsets
anyway. So make it clear that there is no special inbounds requirement
here.
show more ...
|
Revision tags: llvmorg-19.1.5 |
|
#
aa5cdcea |
| 28-Nov-2024 |
Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com> |
LAA: improve code in a couple of routines (NFC) (#108092)
|
Revision tags: llvmorg-19.1.4 |
|
#
a353e258 |
| 05-Nov-2024 |
Florian Hahn <flo@fhahn.com> |
[LAA] Don't require Stride == 1/-1 for inbounds pointer AddRecs nowrap. (#113126)
If we have a pointer AddRec, the maximum increment is
2^(pointer-index-wdith - 1) - 1. This means that if increment
[LAA] Don't require Stride == 1/-1 for inbounds pointer AddRecs nowrap. (#113126)
If we have a pointer AddRec, the maximum increment is
2^(pointer-index-wdith - 1) - 1. This means that if incrementing the
AddRec wraps, the distance between the previously accessed location and
the wrapped location is > 2^(pointer-index-wdith - 1), i.e. if the GEP
for the AddRec is inbounds, this would be poison due to the object being
larger than half the pointer index type space. The poison would be
immediate UB when the memory access gets executed..
Similar reasoning can be applied for decrements.
PR: https://github.com/llvm/llvm-project/pull/113126
show more ...
|
Revision tags: llvmorg-19.1.3 |
|
#
d897ea37 |
| 22-Oct-2024 |
Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com> |
LAA: check nusw on GEP in place of inbounds (#112223)
With the introduction of the nusw flag in GEPNoWrapFlags, it should be
safe to weaken the check in LoopAccessAnalysis to just check the nusw
f
LAA: check nusw on GEP in place of inbounds (#112223)
With the introduction of the nusw flag in GEPNoWrapFlags, it should be
safe to weaken the check in LoopAccessAnalysis to just check the nusw
flag on the GEP, instead of inbounds.
show more ...
|
#
f719cfa8 |
| 22-Oct-2024 |
Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com> |
LAA: be less conservative in isNoWrap (#112553)
isNoWrap has exactly one caller which handles Assume = true separately,
but too conservatively. Instead, pass Assume to isNoWrap, so it is
threaded
LAA: be less conservative in isNoWrap (#112553)
isNoWrap has exactly one caller which handles Assume = true separately,
but too conservatively. Instead, pass Assume to isNoWrap, so it is
threaded into getPtrStride, which has the correct handling for the
Assume flag. Also note that the Stride == 1 check in isNoWrap is
incorrect: getPtrStride returns Strides == 1 or -1, except when
isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we
can include the case of -1 Stride, and when isNoWrapAddRec is true. With
this change, passing Assume = true to getPtrStride could return a
non-unit stride, and we correctly handle that case as well.
show more ...
|
Revision tags: llvmorg-19.1.2 |
|
#
0614b3cf |
| 07-Oct-2024 |
Kazu Hirata <kazu@google.com> |
[Analysis] Simplify code with DenseMap::operator[] (NFC) (#111331)
|
#
dec4cfdb |
| 04-Oct-2024 |
Florian Hahn <flo@fhahn.com> |
[LAA] Use loop guards when checking invariant accesses.
Apply loop guards to start and end pointers like done in other places to improve results.
|
Revision tags: llvmorg-19.1.1 |
|
#
50a1ab12 |
| 23-Sep-2024 |
Benjamin Maxwell <benjamin.maxwell@arm.com> |
[LAA] Don't assume libcalls with output/input pointers can be vectorized (#108980)
LoopAccessAnalysis currently does not check/track aliasing from the
output pointers, but assumes vectorizing libra
[LAA] Don't assume libcalls with output/input pointers can be vectorized (#108980)
LoopAccessAnalysis currently does not check/track aliasing from the
output pointers, but assumes vectorizing library calls with a mapping is
safe.
This can result in incorrect codegen if something like the following is
vectorized:
```
for(int i=0; i<N; i++) {
// No aliasing between input and output pointers detected.
sincos(cos_out[0], sin_out+i, cos_out+i);
}
```
Where for VF >= 2 `cos_out[1]` to `cos_out[VF-1]` is the cosine of the
original value of `cos_out[0]` not the updated value.
show more ...
|
Revision tags: llvmorg-19.1.0, llvmorg-19.1.0-rc4 |
|
#
d43a8093 |
| 27-Aug-2024 |
Florian Hahn <flo@fhahn.com> |
Revert "[LAA] Remove loop-invariant check added in 234cc40adc61."
This reverts commit a80053322b765eec93951e21db490c55521da2d8.
The new asserts exposed an underlying issue where the expanded bounds
Revert "[LAA] Remove loop-invariant check added in 234cc40adc61."
This reverts commit a80053322b765eec93951e21db490c55521da2d8.
The new asserts exposed an underlying issue where the expanded bounds could wrap, causing the parts of the code to incorrectly determine that accesses do not overlap.
Reproducer below based on @mstorsjo's test case.
opt -passes='print<access-info>'
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
define i32 @j(ptr %P, i32 %x, i32 %y) { entry: %gep.P.4 = getelementptr inbounds nuw i8, ptr %P, i32 4 %gep.P.8 = getelementptr inbounds nuw i8, ptr %P, i32 8 br label %loop
loop: %1 = phi i32 [ %x, %entry ], [ %sel, %loop.latch ] %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop.latch ] %gep.iv = getelementptr inbounds i64, ptr %gep.P.8, i32 %iv %l = load i32, ptr %gep.iv, align 4 %c.1 = icmp eq i32 %l, 3 br i1 %c.1, label %loop.latch, label %if.then
if.then: ; preds = %for.body store i64 0, ptr %gep.iv, align 4 %l.2 = load i32, ptr %gep.P.4 br label %loop.latch
loop.latch: %sel = phi i32 [ %l.2, %if.then ], [ %1, %loop ] %iv.next = add nsw i32 %iv, 1 %c.2 = icmp slt i32 %iv.next, %sel br i1 %c.2, label %loop, label %exit
exit: %res = phi i32 [ %iv.next, %loop.latch ] ret i32 %res }
show more ...
|
#
a8005332 |
| 26-Aug-2024 |
Florian Hahn <flo@fhahn.com> |
[LAA] Remove loop-invariant check added in 234cc40adc61.
234cc40adc61 introduced a loop-invariance check to limit the compile-time impact of the newly added checks.
This patch removes the restricti
[LAA] Remove loop-invariant check added in 234cc40adc61.
234cc40adc61 introduced a loop-invariance check to limit the compile-time impact of the newly added checks.
This patch removes the restriction and avoids extra compile-time impact by sinking the check to exits where we would return an unknown dependence. This notably reduces the amount the extra checks are executed while not missing out on any improvements from them.
https://llvm-compile-time-tracker.com/compare.php?from=33e7cd6ff23f6c904314d17c68dc58168fd32d09&to=7c55e66d4f31ce8262b90c119a8e84e1f9515ff1&stat=instructions:u
show more ...
|
#
d7c84d7b |
| 21-Aug-2024 |
Florian Hahn <flo@fhahn.com> |
[LAA] Collect loop guards only once in MemoryDepChecker (NFCI).
This on its own gives small compile-time improvements in some configs and enables using loop guards at more places in the future while
[LAA] Collect loop guards only once in MemoryDepChecker (NFCI).
This on its own gives small compile-time improvements in some configs and enables using loop guards at more places in the future while keeping compile-time impact low.
https://llvm-compile-time-tracker.com/compare.php?from=c44202574ff9a8c0632aba30c2765b134557435f&to=55ffc3dd920fa9af439fd39f8f9cc13509531420&stat=instructions:u
show more ...
|
Revision tags: llvmorg-19.1.0-rc3 |
|
#
6a84af70 |
| 16-Aug-2024 |
Nikita Popov <npopov@redhat.com> |
[LAA] Use computeConstantDifference() (#103725)
Use computeConstantDifference() instead of casting getMinusSCEV() to
SCEVConstant. This can be much faster in some cases, because
computeConstantDif
[LAA] Use computeConstantDifference() (#103725)
Use computeConstantDifference() instead of casting getMinusSCEV() to
SCEVConstant. This can be much faster in some cases, because
computeConstantDifference() computes the result without creating new
SCEV expressions.
This improves LTO/ThinLTO compile-time for lencod by more than 10%.
I've verified that computeConstantDifference() does not produce worse
results than the previous code for anything in llvm-test-suite. This
required raising the iteration cutoff to 6. I ended up increasing it to
8 just to be on the safe side (for code outside llvm-test-suite), and
because this doesn't materially affect compile-time anyway (we'll almost
always bail out earlier).
show more ...
|
Revision tags: llvmorg-19.1.0-rc2 |
|
#
edf46f36 |
| 03-Aug-2024 |
Florian Hahn <flo@fhahn.com> |
[SCEV] Use const SCEV * explicitly in more places.
Use const SCEV * explicitly in more places to prepare for https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.
|
#
844c188c |
| 26-Jul-2024 |
Florian Hahn <flo@fhahn.com> |
[LAA] Refine stride checks for SCEVs during dependence analysis. (#99577)
Update getDependenceDistanceStrideAndSize to reason about different
combinations of strides directly and explicitly.
Upd
[LAA] Refine stride checks for SCEVs during dependence analysis. (#99577)
Update getDependenceDistanceStrideAndSize to reason about different
combinations of strides directly and explicitly.
Update getPtrStride to return 0 for invariant pointers.
Then proceed by checking the strides.
If either source or sink are not strided by a constant (i.e. not a
non-wrapping AddRec) or invariant, the accesses may overlap
with earlier or later iterations and we cannot generate runtime
checks to disambiguate them.
Otherwise they are either loop invariant or strided. In that case, we
can generate a runtime check to disambiguate them.
If both are strided by constants, we proceed as previously.
This is an alternative to
https://github.com/llvm/llvm-project/pull/99239 and also replaces
additional checks if the underlying object is loop-invariant.
Fixes https://github.com/llvm/llvm-project/issues/87189.
PR: https://github.com/llvm/llvm-project/pull/99577
show more ...
|
Revision tags: llvmorg-19.1.0-rc1 |
|
#
3eaf9f72 |
| 25-Jul-2024 |
Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com> |
LAA: fix style after cursory reading (NFC) (#100447)
|
#
2754c083 |
| 24-Jul-2024 |
Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com> |
LAA: mark LoopInfo pointer const (NFC) (#100373)
|
Revision tags: llvmorg-20-init |
|
#
19c9a1c2 |
| 18-Jul-2024 |
Florian Hahn <flo@fhahn.com> |
[LAA] Include IndirectUnsafe in ::isPossiblyBackward.
Similarly to Unknown, IndirectUnsafe should also be considered possibly backward, as it may be a backwards dependency e.g. via loading different
[LAA] Include IndirectUnsafe in ::isPossiblyBackward.
Similarly to Unknown, IndirectUnsafe should also be considered possibly backward, as it may be a backwards dependency e.g. via loading different base pointers.
This also brings isPossiblyBackward in line with Dependence::isSafeForVectorization. At the moment this can't be tested, as it is not possible to write a test with an AddRec that is based on a loop varying value. But this may change in the future and may cause mis-compiles in the future.
show more ...
|
#
3ccda936 |
| 14-Jul-2024 |
Florian Hahn <flo@fhahn.com> |
[LAA] Update pointer-bounds cache to also consider access type.
The same pointer may be accessed with different types and the bound includes the size of the accessed type to compute the end. Update
[LAA] Update pointer-bounds cache to also consider access type.
The same pointer may be accessed with different types and the bound includes the size of the accessed type to compute the end. Update the cache to correctly disambiguate between different accessed types.
show more ...
|
#
22a7f6dc |
| 11-Jul-2024 |
Graham Hunter <graham.hunter@arm.com> |
Revert "[LV] Autovectorization for the all-in-one histogram intrinsic" (#98493)
Reverts llvm/llvm-project#91458 to deal with post-commit reviewer
requests.
|