xref: /llvm-project/polly/docs/Performance.rst (revision cd4d873d159378f032da2033f33fc786f7e196bb)
199d3567bSSiddharth Bhat.. include:: <isonum.txt>
299d3567bSSiddharth Bhat==================================================
399d3567bSSiddharth BhatPerformance
499d3567bSSiddharth Bhat==================================================
599d3567bSSiddharth Bhat
699d3567bSSiddharth BhatHigh-Performance Generalized Matrix Multiplication
799d3567bSSiddharth Bhat--------------------------------------------------
899d3567bSSiddharth Bhat
999d3567bSSiddharth BhatPolly automatically detects and optimizes generalized matrix multiplication,
1099d3567bSSiddharth Bhatthe computation C |larr| α ⊗ C ⊕ β ⊗ A ⊗ B, where A, B, and C are three appropriately sized matrices,
11*98597071SSiddharth Bhat⊕ and ⊗ operations are originating from the corresponding matrix semiring, and α and β are
1299d3567bSSiddharth Bhatconstants, and beta is not equal to zero. It allows to obtain the highly optimized form structured
1399d3567bSSiddharth Bhatsimilar to the expert implementation of GEMM that can be found in GotoBLAS and its successors. The
1499d3567bSSiddharth Bhatperformance evaluation of GEMM is shown in the following figure.
1599d3567bSSiddharth Bhat
1699d3567bSSiddharth Bhat
1799d3567bSSiddharth Bhat    .. image:: images/GEMM_double.png
1899d3567bSSiddharth Bhat       :align: center
1999d3567bSSiddharth Bhat
2099d3567bSSiddharth Bhat
2199d3567bSSiddharth Bhat
2299d3567bSSiddharth BhatCompile Time Impact of Polly
2399d3567bSSiddharth Bhat----------------------------
2499d3567bSSiddharth Bhat
2599d3567bSSiddharth BhatClang+LLVM+Polly are compiled using Clang on a Intel(R) Core(TM) i7-7700 based system. The experiment
2699d3567bSSiddharth Bhatis repeated twice: with and without Polly enabled in order to measure its compile time impact.
2799d3567bSSiddharth Bhat
2899d3567bSSiddharth BhatThe following versions are used:
2999d3567bSSiddharth Bhat
3099d3567bSSiddharth Bhat
3199d3567bSSiddharth Bhat- Polly (git hash 0db98a4837b6f233063307bb9184374175401922)
3299d3567bSSiddharth Bhat- Clang (git hash 3e1d04a92b51ed36163995c96c31a0e4bbb1561d)
3399d3567bSSiddharth Bhat- LLVM  git hash 0265ec7ebad69a47f5c899d95295b5eb41aba68e)
3499d3567bSSiddharth Bhat
3599d3567bSSiddharth Bhat`ninja <https://ninja-build.org/>`_ is used as the build system.
3699d3567bSSiddharth Bhat
3799d3567bSSiddharth BhatFor both cases the whole compilation was performed five times. The compile times in seconds are shown in the following table.
3899d3567bSSiddharth Bhat
3999d3567bSSiddharth Bhat+--------------+-------------+
4099d3567bSSiddharth Bhat|Polly Disabled|Polly Enabled|
4199d3567bSSiddharth Bhat+==============+=============+
4299d3567bSSiddharth Bhat|964           |977          |
4399d3567bSSiddharth Bhat+--------------+-------------+
4499d3567bSSiddharth Bhat|964           |980          |
4599d3567bSSiddharth Bhat+--------------+-------------+
4699d3567bSSiddharth Bhat|967           |981          |
4799d3567bSSiddharth Bhat+--------------+-------------+
4899d3567bSSiddharth Bhat|967           |981          |
4999d3567bSSiddharth Bhat+--------------+-------------+
5099d3567bSSiddharth Bhat|968           |982          |
5199d3567bSSiddharth Bhat+--------------+-------------+
5299d3567bSSiddharth Bhat
5399d3567bSSiddharth Bhat
5499d3567bSSiddharth BhatThe median compile time without Polly enabled is 967 seconds and with Polly enabled it is 981 seconds. The overhead is 1.4%.
5599d3567bSSiddharth Bhat
56