1.. _omp110: 2 3Moving globalized variable to the stack. [OMP110] 4================================================= 5 6This optimization remark indicates that a globalized variable was moved back to 7thread-local stack memory on the device. This occurs when the optimization pass 8can determine that a globalized variable cannot possibly be shared between 9threads and globalization was ultimately unnecessary. Using stack memory is the 10best-case scenario for data globalization as the variable can now be stored in 11fast register files on the device. This optimization requires full visibility of 12each variable. 13 14Globalization typically occurs when a pointer to a thread-local variable escapes 15the current scope. The compiler needs to be pessimistic and assume that the 16pointer could be shared between multiple threads according to the OpenMP 17standard. This is expensive on target offloading devices that do not allow 18threads to share data by default. Instead, this data must be moved to memory 19that can be shared, such as shared or global memory. This optimization moves the 20data back from shared or global memory to thread-local stack memory if the data 21is not actually shared between the threads. 22 23Examples 24-------- 25 26A trivial example of globalization occurring can be seen with this example. The 27compiler sees that a pointer to the thread-local variable ``x`` escapes the 28current scope and must globalize it even though it is not actually necessary. 29Fortunately, this optimization can undo this by looking at its usage. 30 31.. code-block:: c++ 32 33 void use(int *x) { } 34 35 void foo() { 36 int x; 37 use(&x); 38 } 39 40 int main() { 41 #pragma omp target parallel 42 foo(); 43 } 44 45.. code-block:: console 46 47 $ clang++ -fopenmp -fopenmp-targets=nvptx64 omp110.cpp -O1 -Rpass=openmp-opt 48 omp110.cpp:6:7: remark: Moving globalized variable to the stack. [OMP110] 49 int x; 50 ^ 51 52A less trivial example can be seen using C++'s complex numbers. In this case the 53overloaded arithmetic operators cause pointers to the complex numbers to escape 54the current scope, but they can again be removed once the usage is visible. 55 56.. code-block:: c++ 57 58 #include <complex> 59 60 using complex = std::complex<double>; 61 62 void zaxpy(complex *X, complex *Y, const complex D, int N) { 63 #pragma omp target teams distribute parallel for firstprivate(D) 64 for (int i = 0; i < N; ++i) 65 Y[i] = D * X[i] + Y[i]; 66 } 67 68.. code-block:: console 69 70 $ clang++ -fopenmp -fopenmp-targets=nvptx64 omp110.cpp -O1 -Rpass=openmp-opt 71 In file included from omp110.cpp:1: 72 In file included from /usr/bin/clang/lib/clang/13.0.0/include/openmp_wrappers/complex:27: 73 /usr/include/c++/8/complex:328:20: remark: Moving globalized variable to the stack. [OMP110] 74 complex<_Tp> __r = __x; 75 ^ 76 /usr/include/c++/8/complex:388:20: remark: Moving globalized variable to the stack. [OMP110] 77 complex<_Tp> __r = __x; 78 ^ 79 80Diagnostic Scope 81---------------- 82 83OpenMP target offloading optimization remark. 84