xref: /llvm-project/openmp/docs/optimizations/OpenMPOpt.rst (revision 0c660256eb41fb0ba44277a32f39d2a028f797f2)
1==========================
2OpenMP-Aware Optimizations
3==========================
4
5LLVM, since `version 11 <https://releases.llvm.org/download.html#11.0.0>`_ (12
6Oct 2020), supports an :ref:`OpenMP-Aware optimization pass <OpenMPOpt>`. This
7optimization pass will attempt to optimize the module with OpenMP-specific
8domain-knowledge. This pass is enabled by default at high optimization levels
9(O2 / O3) if compiling with OpenMP support enabled.
10
11.. _OpenMPOpt:
12
13OpenMPOpt
14=========
15
16.. contents::
17   :local:
18   :depth: 1
19
20OpenMPOpt contains several OpenMP-Aware optimizations. This pass is run early on
21the entire Module, and later on the entire call graph. Most optimizations done
22by OpenMPOpt support remarks. Optimization remarks can be enabled by compiling
23with the following flags.
24
25.. code-block:: console
26
27  $ clang -Rpass=openmp-opt -Rpass-missed=openmp-opt -Rpass-analysis=openmp-opt
28
29OpenMP Runtime Call Deduplication
30---------------------------------
31
32The OpenMP runtime library contains several functions used to implement features
33of the OpenMP standard. Several of the runtime calls are constant within a
34parallel region. A common optimization is to replace invariant code with a
35single reference, but in this case the compiler will only see an opaque call
36into the runtime library. To get around this, OpenMPOpt maintains a list of
37OpenMP runtime functions that are constant and will manually deduplicate them.
38
39Globalization
40-------------
41
42The OpenMP standard requires that data can be shared between different threads.
43This requirement poses a unique challenge when offloading to GPU accelerators.
44Data cannot be shared between the threads in a GPU by default, in order to do
45this it must either be placed in global or shared memory. This needs to be done
46every time a variable may potentially be shared in order to create correct
47OpenMP programs. Unfortunately, this has significant performance implications
48and is not needed in the majority of cases. For example, when Clang is
49generating code for this offloading region, it will see that the variable `x`
50escapes and is potentially shared. This will require globalizing the variable,
51which means it cannot reside in the registers on the device.
52
53.. code-block:: c++
54
55  void use(void *) { }
56
57  void foo() {
58    int x;
59    use(&x);
60  }
61
62  int main() {
63  #pragma omp target parallel
64    foo();
65  }
66
67In many cases, this transformation is not actually necessary but still carries a
68significant performance penalty. Because of this, OpenMPOpt can perform and
69inter-procedural optimization and scan each known usage of the globalized
70variable and determine if it is potentially captured and shared by another
71thread. If it is not actually captured, it can safely be moved back to fast
72register memory.
73
74Another case is memory that is intentionally shared between the threads, but is
75shared from one thread to all the others. Such variables can be moved to shared
76memory when compiled without needing to go through the runtime library.  This
77allows for users to confidently declare shared memory on the device without
78needing to use custom OpenMP allocators or rely on the runtime.
79
80
81.. code-block:: c++
82
83  static void share(void *);
84
85  static void foo() {
86    int x[64];
87  #pragma omp parallel
88    share(x);
89  }
90
91  int main() {
92    #pragma omp target
93    foo();
94  }
95
96These optimizations can have very large performance implications. Both of these
97optimizations rely heavily on inter-procedural analysis. Because of this,
98offloading applications should ideally be contained in a single translation unit
99and functions should not be externally visible unless needed. OpenMPOpt will
100inform the user if any globalization calls remain if remarks are enabled. This
101should be treated as a defect in the program.
102
103Resources
104=========
105
106- 2021 OpenMP Webinar: "A Compiler's View of OpenMP" https://youtu.be/eIMpgez61r4
107- 2020 LLVM Developers’ Meeting: "(OpenMP) Parallelism-Aware Optimizations" https://youtu.be/gtxWkeLCxmU
108- 2019 EuroLLVM Developers’ Meeting: "Compiler Optimizations for (OpenMP) Target Offloading to GPUs" https://youtu.be/3AbS82C3X30
109