xref: /llvm-project/openmp/docs/remarks/OMP112.rst (revision 0c660256eb41fb0ba44277a32f39d2a028f797f2)
1.. _omp112:
2
3Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [OMP112]
4=====================================================================================================
5
6This missed remark indicates that a globalized value was found on the target
7device that was not either replaced with stack memory by :ref:`OMP110 <omp110>`
8or shared memory by :ref:`OMP111 <omp111>`. Globalization that has not been
9removed will need to be handled by the runtime and will significantly impact
10performance.
11
12The OpenMP standard requires that threads are able to share their data between
13each-other. However, this is not true by default when offloading to a target
14device such as a GPU. Threads on a GPU cannot shared their data unless it is
15first placed in global or shared memory. In order to create standards complaint
16code, the Clang compiler will globalize any variables that could potentially be
17shared between the threads. In the majority of cases, globalized variables can
18either be returns to a thread-local stack, or pushed to shared memory. However,
19in a few cases it is necessary and will cause a performance penalty.
20
21Examples
22--------
23
24This example shows legitimate data sharing on the device. It is a convoluted
25example, but is completely complaint with the OpenMP standard. If globalization
26was not added this would result in different results on different target
27devices.
28
29.. code-block:: c++
30
31  #include <omp.h>
32  #include <cstdio>
33
34  #pragma omp declare target
35  static int *p;
36  #pragma omp end declare target
37
38  void foo() {
39    int x = omp_get_thread_num();
40    if (omp_get_thread_num() == 1)
41      p = &x;
42
43  #pragma omp barrier
44
45    printf ("Thread %d: %d\n", omp_get_thread_num(), *p);
46  }
47
48  int main() {
49  #pragma omp target parallel
50    foo();
51  }
52
53.. code-block:: console
54
55  $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
56  omp112.cpp:9:7: remark: Found thread data sharing on the GPU. Expect degraded performance
57  due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
58  int x = omp_get_thread_num();
59      ^
60
61A less convoluted example globalization that cannot be removed occurs when
62calling functions that aren't visible from the current translation unit.
63
64.. code-block:: c++
65
66  extern void use(int *x);
67
68  void foo() {
69    int x;
70    use(&x);
71  }
72
73  int main() {
74  #pragma omp target parallel
75    foo();
76  }
77
78.. code-block:: console
79
80  $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
81  omp112.cpp:4:7: remark: Found thread data sharing on the GPU. Expect degraded performance
82  due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
83  int x;
84      ^
85
86Diagnostic Scope
87----------------
88
89OpenMP target offloading missed remark.
90