199d3567bSSiddharth Bhat.. include:: <isonum.txt> 299d3567bSSiddharth Bhat================================================== 399d3567bSSiddharth BhatPerformance 499d3567bSSiddharth Bhat================================================== 599d3567bSSiddharth Bhat 699d3567bSSiddharth BhatHigh-Performance Generalized Matrix Multiplication 799d3567bSSiddharth Bhat-------------------------------------------------- 899d3567bSSiddharth Bhat 999d3567bSSiddharth BhatPolly automatically detects and optimizes generalized matrix multiplication, 1099d3567bSSiddharth Bhatthe computation C |larr| α ⊗ C ⊕ β ⊗ A ⊗ B, where A, B, and C are three appropriately sized matrices, 11*98597071SSiddharth Bhat⊕ and ⊗ operations are originating from the corresponding matrix semiring, and α and β are 1299d3567bSSiddharth Bhatconstants, and beta is not equal to zero. It allows to obtain the highly optimized form structured 1399d3567bSSiddharth Bhatsimilar to the expert implementation of GEMM that can be found in GotoBLAS and its successors. The 1499d3567bSSiddharth Bhatperformance evaluation of GEMM is shown in the following figure. 1599d3567bSSiddharth Bhat 1699d3567bSSiddharth Bhat 1799d3567bSSiddharth Bhat .. image:: images/GEMM_double.png 1899d3567bSSiddharth Bhat :align: center 1999d3567bSSiddharth Bhat 2099d3567bSSiddharth Bhat 2199d3567bSSiddharth Bhat 2299d3567bSSiddharth BhatCompile Time Impact of Polly 2399d3567bSSiddharth Bhat---------------------------- 2499d3567bSSiddharth Bhat 2599d3567bSSiddharth BhatClang+LLVM+Polly are compiled using Clang on a Intel(R) Core(TM) i7-7700 based system. The experiment 2699d3567bSSiddharth Bhatis repeated twice: with and without Polly enabled in order to measure its compile time impact. 2799d3567bSSiddharth Bhat 2899d3567bSSiddharth BhatThe following versions are used: 2999d3567bSSiddharth Bhat 3099d3567bSSiddharth Bhat 3199d3567bSSiddharth Bhat- Polly (git hash 0db98a4837b6f233063307bb9184374175401922) 3299d3567bSSiddharth Bhat- Clang (git hash 3e1d04a92b51ed36163995c96c31a0e4bbb1561d) 3399d3567bSSiddharth Bhat- LLVM git hash 0265ec7ebad69a47f5c899d95295b5eb41aba68e) 3499d3567bSSiddharth Bhat 3599d3567bSSiddharth Bhat`ninja <https://ninja-build.org/>`_ is used as the build system. 3699d3567bSSiddharth Bhat 3799d3567bSSiddharth BhatFor both cases the whole compilation was performed five times. The compile times in seconds are shown in the following table. 3899d3567bSSiddharth Bhat 3999d3567bSSiddharth Bhat+--------------+-------------+ 4099d3567bSSiddharth Bhat|Polly Disabled|Polly Enabled| 4199d3567bSSiddharth Bhat+==============+=============+ 4299d3567bSSiddharth Bhat|964 |977 | 4399d3567bSSiddharth Bhat+--------------+-------------+ 4499d3567bSSiddharth Bhat|964 |980 | 4599d3567bSSiddharth Bhat+--------------+-------------+ 4699d3567bSSiddharth Bhat|967 |981 | 4799d3567bSSiddharth Bhat+--------------+-------------+ 4899d3567bSSiddharth Bhat|967 |981 | 4999d3567bSSiddharth Bhat+--------------+-------------+ 5099d3567bSSiddharth Bhat|968 |982 | 5199d3567bSSiddharth Bhat+--------------+-------------+ 5299d3567bSSiddharth Bhat 5399d3567bSSiddharth Bhat 5499d3567bSSiddharth BhatThe median compile time without Polly enabled is 967 seconds and with Polly enabled it is 981 seconds. The overhead is 1.4%. 5599d3567bSSiddharth Bhat 56