16242f9b9SJoseph Huber.. _omp110: 26242f9b9SJoseph Huber 316164079SJoseph HuberMoving globalized variable to the stack. [OMP110] 416164079SJoseph Huber================================================= 516164079SJoseph Huber 616164079SJoseph HuberThis optimization remark indicates that a globalized variable was moved back to 716164079SJoseph Huberthread-local stack memory on the device. This occurs when the optimization pass 8*dead50d4SJoseph Hubercan determine that a globalized variable cannot possibly be shared between 9*dead50d4SJoseph Huberthreads and globalization was ultimately unnecessary. Using stack memory is the 10*dead50d4SJoseph Huberbest-case scenario for data globalization as the variable can now be stored in 11*dead50d4SJoseph Huberfast register files on the device. This optimization requires full visibility of 12*dead50d4SJoseph Hubereach variable. 1316164079SJoseph Huber 1416164079SJoseph HuberGlobalization typically occurs when a pointer to a thread-local variable escapes 1516164079SJoseph Huberthe current scope. The compiler needs to be pessimistic and assume that the 1616164079SJoseph Huberpointer could be shared between multiple threads according to the OpenMP 1716164079SJoseph Huberstandard. This is expensive on target offloading devices that do not allow 1816164079SJoseph Huberthreads to share data by default. Instead, this data must be moved to memory 1916164079SJoseph Huberthat can be shared, such as shared or global memory. This optimization moves the 2016164079SJoseph Huberdata back from shared or global memory to thread-local stack memory if the data 2116164079SJoseph Huberis not actually shared between the threads. 2216164079SJoseph Huber 2316164079SJoseph HuberExamples 2416164079SJoseph Huber-------- 2516164079SJoseph Huber 2616164079SJoseph HuberA trivial example of globalization occurring can be seen with this example. The 2716164079SJoseph Hubercompiler sees that a pointer to the thread-local variable ``x`` escapes the 2816164079SJoseph Hubercurrent scope and must globalize it even though it is not actually necessary. 2916164079SJoseph HuberFortunately, this optimization can undo this by looking at its usage. 3016164079SJoseph Huber 3116164079SJoseph Huber.. code-block:: c++ 3216164079SJoseph Huber 3316164079SJoseph Huber void use(int *x) { } 3416164079SJoseph Huber 3516164079SJoseph Huber void foo() { 3616164079SJoseph Huber int x; 3716164079SJoseph Huber use(&x); 3816164079SJoseph Huber } 3916164079SJoseph Huber 4016164079SJoseph Huber int main() { 4116164079SJoseph Huber #pragma omp target parallel 4216164079SJoseph Huber foo(); 4316164079SJoseph Huber } 4416164079SJoseph Huber 4516164079SJoseph Huber.. code-block:: console 4616164079SJoseph Huber 4716164079SJoseph Huber $ clang++ -fopenmp -fopenmp-targets=nvptx64 omp110.cpp -O1 -Rpass=openmp-opt 4816164079SJoseph Huber omp110.cpp:6:7: remark: Moving globalized variable to the stack. [OMP110] 4916164079SJoseph Huber int x; 5016164079SJoseph Huber ^ 5116164079SJoseph Huber 5216164079SJoseph HuberA less trivial example can be seen using C++'s complex numbers. In this case the 5316164079SJoseph Huberoverloaded arithmetic operators cause pointers to the complex numbers to escape 5416164079SJoseph Huberthe current scope, but they can again be removed once the usage is visible. 5516164079SJoseph Huber 5616164079SJoseph Huber.. code-block:: c++ 5716164079SJoseph Huber 5816164079SJoseph Huber #include <complex> 5916164079SJoseph Huber 6016164079SJoseph Huber using complex = std::complex<double>; 6116164079SJoseph Huber 6216164079SJoseph Huber void zaxpy(complex *X, complex *Y, const complex D, int N) { 6316164079SJoseph Huber #pragma omp target teams distribute parallel for firstprivate(D) 6416164079SJoseph Huber for (int i = 0; i < N; ++i) 6516164079SJoseph Huber Y[i] = D * X[i] + Y[i]; 6616164079SJoseph Huber } 6716164079SJoseph Huber 6816164079SJoseph Huber.. code-block:: console 6916164079SJoseph Huber 7016164079SJoseph Huber $ clang++ -fopenmp -fopenmp-targets=nvptx64 omp110.cpp -O1 -Rpass=openmp-opt 7116164079SJoseph Huber In file included from omp110.cpp:1: 7216164079SJoseph Huber In file included from /usr/bin/clang/lib/clang/13.0.0/include/openmp_wrappers/complex:27: 7316164079SJoseph Huber /usr/include/c++/8/complex:328:20: remark: Moving globalized variable to the stack. [OMP110] 7416164079SJoseph Huber complex<_Tp> __r = __x; 7516164079SJoseph Huber ^ 7616164079SJoseph Huber /usr/include/c++/8/complex:388:20: remark: Moving globalized variable to the stack. [OMP110] 7716164079SJoseph Huber complex<_Tp> __r = __x; 7816164079SJoseph Huber ^ 7916164079SJoseph Huber 8016164079SJoseph HuberDiagnostic Scope 8116164079SJoseph Huber---------------- 8216164079SJoseph Huber 8316164079SJoseph HuberOpenMP target offloading optimization remark. 84