extractelement.ll - OpenGrok history log for /llvm-project/llvm/test/Analysis/CostModel/RISCV/extractelement.ll

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2
# 3d7df0a5	27-Sep-2023	Sergey Kachkov <109674256+skachkov-sc@users.noreply.github.com>	[RISCV][CostModel] Estimate cost of Extract/InsertElement with non-constant index when vector instructions are not available (#67334) This patch fixes the compilation time issue of matrix-types-spec [RISCV][CostModel] Estimate cost of Extract/InsertElement with non-constant index when vector instructions are not available (#67334) This patch fixes the compilation time issue of matrix-types-spec test from test-suite. Reproduction of the problem: ``` clang++ -DNDEBUG --target=riscv64-linux-gnu --sysroot=<sysroot path> --gcc-toolchain=<gcc path> -O2 -fenable-matrix <test-suite-path>/SingleSource/UnitTests/matrix-types-spec.cpp ``` On my machine, compilation takes 50.44s. In comparison, the same test with RVV (-march=rv64gcv) compiles in 3.06s, and for x86-64 target it takes 1.71s. It turns out that the main issue is unrolling of loop in multiplySpec function, that has extractelements with non-constant index: ``` for.body9.i: ; preds = %for.body9.i, %for.cond6.preheader.i %indvars.iv.i92 = phi i64 [ 0, %for.cond6.preheader.i ], [ %indvars.iv.next.i93, %for.body9.i ] %Elt.033.i = phi double [ 0.000000e+00, %for.cond6.preheader.i ], [ %80, %for.body9.i ] %77 = mul nuw nsw i64 %indvars.iv.i92, 25 %78 = add nuw nsw i64 %77, %indvars.iv39.i91 %matrixext.i = extractelement <475 x double> %62, i64 %78 %79 = add nuw nsw i64 %indvars.iv.i92, %74 %matrixext13.i = extractelement <209 x double> %73, i64 %79 %80 = tail call double @llvm.fmuladd.f64(double %matrixext.i, double %matrixext13.i, double %Elt.033.i) %indvars.iv.next.i93 = add nuw nsw i64 %indvars.iv.i92, 1 %exitcond.not.i94 = icmp eq i64 %indvars.iv.next.i93, 19 br i1 %exitcond.not.i94, label %for.cond.cleanup8.i, label %for.body9.i, !llvm.loop !21 ``` When RVV is supported, extractelement/insertelement with non-constant index can be lowered quite efficiently with vslidedown/vslideup; otherwise it's lowered via stack memory operations, i.e. for extractelement each vector element is stored on stack and then the needed element is loaded back; for insertelement is stores all vector elements, rewrites the required element value and then loads vector back. Currently the cost of such expensive operation is estimated as zero, so loop unroll processes the loop more aggresively. The proper estimation of cost (in a way like in X86 target) prohibits unrolling of this loop and fixes compilation time (2.77s on my machine). show more ...

Revision (<<< Hide revision tags) (Show revision tags >>>)

Date

Author

Comments

Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2

# 3d7df0a5

27-Sep-2023

Sergey Kachkov <109674256+skachkov-sc@users.noreply.github.com>

[RISCV][CostModel] Estimate cost of Extract/InsertElement with non-constant index when vector instructions are not available (#67334)

This patch fixes the compilation time issue of matrix-types-spec

[RISCV][CostModel] Estimate cost of Extract/InsertElement with non-constant index when vector instructions are not available (#67334)

This patch fixes the compilation time issue of matrix-types-spec test
from test-suite.

Reproduction of the problem:
```
clang++ -DNDEBUG --target=riscv64-linux-gnu --sysroot=<sysroot path> --gcc-toolchain=<gcc path> -O2 -fenable-matrix <test-suite-path>/SingleSource/UnitTests/matrix-types-spec.cpp
```

On my machine, compilation takes 50.44s. In comparison, the same test
with RVV (-march=rv64gcv) compiles in 3.06s, and for x86-64 target it
takes 1.71s. It turns out that the main issue is unrolling of loop in
multiplySpec function, that has extractelements with non-constant index:
```
for.body9.i: ; preds = %for.body9.i, %for.cond6.preheader.i
%indvars.iv.i92 = phi i64 [ 0, %for.cond6.preheader.i ], [ %indvars.iv.next.i93, %for.body9.i ]
%Elt.033.i = phi double [ 0.000000e+00, %for.cond6.preheader.i ], [ %80, %for.body9.i ]
%77 = mul nuw nsw i64 %indvars.iv.i92, 25
%78 = add nuw nsw i64 %77, %indvars.iv39.i91
%matrixext.i = extractelement <475 x double> %62, i64 %78
%79 = add nuw nsw i64 %indvars.iv.i92, %74
%matrixext13.i = extractelement <209 x double> %73, i64 %79
%80 = tail call double @llvm.fmuladd.f64(double %matrixext.i, double %matrixext13.i, double %Elt.033.i)
%indvars.iv.next.i93 = add nuw nsw i64 %indvars.iv.i92, 1
%exitcond.not.i94 = icmp eq i64 %indvars.iv.next.i93, 19
br i1 %exitcond.not.i94, label %for.cond.cleanup8.i, label %for.body9.i, !llvm.loop !21
```

When RVV is supported, extractelement/insertelement with non-constant
index can be lowered quite efficiently with vslidedown/vslideup;
otherwise it's lowered via stack memory operations, i.e. for
extractelement each vector element is stored on stack and then the
needed element is loaded back; for insertelement is stores all vector
elements, rewrites the required element value and then loads vector
back. Currently the cost of such expensive operation is estimated as
zero, so loop unroll processes the loop more aggresively. The proper
estimation of cost (in a way like in X86 target) prohibits unrolling of
this loop and fixes compilation time (2.77s on my machine).