1========================== 2OpenMP-Aware Optimizations 3========================== 4 5LLVM, since `version 11 <https://releases.llvm.org/download.html#11.0.0>`_ (12 6Oct 2020), supports an :ref:`OpenMP-Aware optimization pass <OpenMPOpt>`. This 7optimization pass will attempt to optimize the module with OpenMP-specific 8domain-knowledge. This pass is enabled by default at high optimization levels 9(O2 / O3) if compiling with OpenMP support enabled. 10 11.. _OpenMPOpt: 12 13OpenMPOpt 14========= 15 16.. contents:: 17 :local: 18 :depth: 1 19 20OpenMPOpt contains several OpenMP-Aware optimizations. This pass is run early on 21the entire Module, and later on the entire call graph. Most optimizations done 22by OpenMPOpt support remarks. Optimization remarks can be enabled by compiling 23with the following flags. 24 25.. code-block:: console 26 27 $ clang -Rpass=openmp-opt -Rpass-missed=openmp-opt -Rpass-analysis=openmp-opt 28 29OpenMP Runtime Call Deduplication 30--------------------------------- 31 32The OpenMP runtime library contains several functions used to implement features 33of the OpenMP standard. Several of the runtime calls are constant within a 34parallel region. A common optimization is to replace invariant code with a 35single reference, but in this case the compiler will only see an opaque call 36into the runtime library. To get around this, OpenMPOpt maintains a list of 37OpenMP runtime functions that are constant and will manually deduplicate them. 38 39Globalization 40------------- 41 42The OpenMP standard requires that data can be shared between different threads. 43This requirement poses a unique challenge when offloading to GPU accelerators. 44Data cannot be shared between the threads in a GPU by default, in order to do 45this it must either be placed in global or shared memory. This needs to be done 46every time a variable may potentially be shared in order to create correct 47OpenMP programs. Unfortunately, this has significant performance implications 48and is not needed in the majority of cases. For example, when Clang is 49generating code for this offloading region, it will see that the variable `x` 50escapes and is potentially shared. This will require globalizing the variable, 51which means it cannot reside in the registers on the device. 52 53.. code-block:: c++ 54 55 void use(void *) { } 56 57 void foo() { 58 int x; 59 use(&x); 60 } 61 62 int main() { 63 #pragma omp target parallel 64 foo(); 65 } 66 67In many cases, this transformation is not actually necessary but still carries a 68significant performance penalty. Because of this, OpenMPOpt can perform and 69inter-procedural optimization and scan each known usage of the globalized 70variable and determine if it is potentially captured and shared by another 71thread. If it is not actually captured, it can safely be moved back to fast 72register memory. 73 74Another case is memory that is intentionally shared between the threads, but is 75shared from one thread to all the others. Such variables can be moved to shared 76memory when compiled without needing to go through the runtime library. This 77allows for users to confidently declare shared memory on the device without 78needing to use custom OpenMP allocators or rely on the runtime. 79 80 81.. code-block:: c++ 82 83 static void share(void *); 84 85 static void foo() { 86 int x[64]; 87 #pragma omp parallel 88 share(x); 89 } 90 91 int main() { 92 #pragma omp target 93 foo(); 94 } 95 96These optimizations can have very large performance implications. Both of these 97optimizations rely heavily on inter-procedural analysis. Because of this, 98offloading applications should ideally be contained in a single translation unit 99and functions should not be externally visible unless needed. OpenMPOpt will 100inform the user if any globalization calls remain if remarks are enabled. This 101should be treated as a defect in the program. 102 103Resources 104========= 105 106- 2021 OpenMP Webinar: "A Compiler's View of OpenMP" https://youtu.be/eIMpgez61r4 107- 2020 LLVM Developers’ Meeting: "(OpenMP) Parallelism-Aware Optimizations" https://youtu.be/gtxWkeLCxmU 108- 2019 EuroLLVM Developers’ Meeting: "Compiler Optimizations for (OpenMP) Target Offloading to GPUs" https://youtu.be/3AbS82C3X30 109