xref: /llvm-project/openmp/docs/remarks/OMP110.rst (revision 0c660256eb41fb0ba44277a32f39d2a028f797f2)
16242f9b9SJoseph Huber.. _omp110:
26242f9b9SJoseph Huber
316164079SJoseph HuberMoving globalized variable to the stack. [OMP110]
416164079SJoseph Huber=================================================
516164079SJoseph Huber
616164079SJoseph HuberThis optimization remark indicates that a globalized variable was moved back to
716164079SJoseph Huberthread-local stack memory on the device. This occurs when the optimization pass
8*dead50d4SJoseph Hubercan determine that a globalized variable cannot possibly be shared between
9*dead50d4SJoseph Huberthreads and globalization was ultimately unnecessary. Using stack memory is the
10*dead50d4SJoseph Huberbest-case scenario for data globalization as the variable can now be stored in
11*dead50d4SJoseph Huberfast register files on the device. This optimization requires full visibility of
12*dead50d4SJoseph Hubereach variable.
1316164079SJoseph Huber
1416164079SJoseph HuberGlobalization typically occurs when a pointer to a thread-local variable escapes
1516164079SJoseph Huberthe current scope. The compiler needs to be pessimistic and assume that the
1616164079SJoseph Huberpointer could be shared between multiple threads according to the OpenMP
1716164079SJoseph Huberstandard. This is expensive on target offloading devices that do not allow
1816164079SJoseph Huberthreads to share data by default. Instead, this data must be moved to memory
1916164079SJoseph Huberthat can be shared, such as shared or global memory. This optimization moves the
2016164079SJoseph Huberdata back from shared or global memory to thread-local stack memory if the data
2116164079SJoseph Huberis not actually shared between the threads.
2216164079SJoseph Huber
2316164079SJoseph HuberExamples
2416164079SJoseph Huber--------
2516164079SJoseph Huber
2616164079SJoseph HuberA trivial example of globalization occurring can be seen with this example. The
2716164079SJoseph Hubercompiler sees that a pointer to the thread-local variable ``x`` escapes the
2816164079SJoseph Hubercurrent scope and must globalize it even though it is not actually necessary.
2916164079SJoseph HuberFortunately, this optimization can undo this by looking at its usage.
3016164079SJoseph Huber
3116164079SJoseph Huber.. code-block:: c++
3216164079SJoseph Huber
3316164079SJoseph Huber  void use(int *x) { }
3416164079SJoseph Huber
3516164079SJoseph Huber  void foo() {
3616164079SJoseph Huber    int x;
3716164079SJoseph Huber    use(&x);
3816164079SJoseph Huber  }
3916164079SJoseph Huber
4016164079SJoseph Huber  int main() {
4116164079SJoseph Huber  #pragma omp target parallel
4216164079SJoseph Huber    foo();
4316164079SJoseph Huber  }
4416164079SJoseph Huber
4516164079SJoseph Huber.. code-block:: console
4616164079SJoseph Huber
4716164079SJoseph Huber  $ clang++ -fopenmp -fopenmp-targets=nvptx64 omp110.cpp -O1 -Rpass=openmp-opt
4816164079SJoseph Huber  omp110.cpp:6:7: remark: Moving globalized variable to the stack. [OMP110]
4916164079SJoseph Huber    int x;
5016164079SJoseph Huber        ^
5116164079SJoseph Huber
5216164079SJoseph HuberA less trivial example can be seen using C++'s complex numbers. In this case the
5316164079SJoseph Huberoverloaded arithmetic operators cause pointers to the complex numbers to escape
5416164079SJoseph Huberthe current scope, but they can again be removed once the usage is visible.
5516164079SJoseph Huber
5616164079SJoseph Huber.. code-block:: c++
5716164079SJoseph Huber
5816164079SJoseph Huber  #include <complex>
5916164079SJoseph Huber
6016164079SJoseph Huber  using complex = std::complex<double>;
6116164079SJoseph Huber
6216164079SJoseph Huber  void zaxpy(complex *X, complex *Y, const complex D, int N) {
6316164079SJoseph Huber  #pragma omp target teams distribute parallel for firstprivate(D)
6416164079SJoseph Huber    for (int i = 0; i < N; ++i)
6516164079SJoseph Huber      Y[i] = D * X[i] + Y[i];
6616164079SJoseph Huber  }
6716164079SJoseph Huber
6816164079SJoseph Huber.. code-block:: console
6916164079SJoseph Huber
7016164079SJoseph Huber  $ clang++ -fopenmp -fopenmp-targets=nvptx64 omp110.cpp -O1 -Rpass=openmp-opt
7116164079SJoseph Huber  In file included from omp110.cpp:1:
7216164079SJoseph Huber  In file included from /usr/bin/clang/lib/clang/13.0.0/include/openmp_wrappers/complex:27:
7316164079SJoseph Huber  /usr/include/c++/8/complex:328:20: remark: Moving globalized variable to the stack. [OMP110]
7416164079SJoseph Huber        complex<_Tp> __r = __x;
7516164079SJoseph Huber                     ^
7616164079SJoseph Huber  /usr/include/c++/8/complex:388:20: remark: Moving globalized variable to the stack. [OMP110]
7716164079SJoseph Huber        complex<_Tp> __r = __x;
7816164079SJoseph Huber                     ^
7916164079SJoseph Huber
8016164079SJoseph HuberDiagnostic Scope
8116164079SJoseph Huber----------------
8216164079SJoseph Huber
8316164079SJoseph HuberOpenMP target offloading optimization remark.
84