1.. _omp111: 2 3Replaced globalized variable with X bytes of shared memory. [OMP111] 4==================================================================== 5 6This optimization occurs when a globalized variable's data is shared between 7multiple threads, but requires a constant amount of memory that can be 8determined at compile time. This is the case when only a single thread creates 9the memory and is then shared between every thread. The memory can then be 10pushed to a static buffer of shared memory on the device. This optimization 11allows users to declare shared memory on the device without using OpenMP's 12custom allocators. 13 14Globalization occurs when a pointer to a thread-local variable escapes the 15current scope. If a single thread is known to be responsible for creating and 16sharing the data it can instead be mapped directly to the device's shared 17memory. Checking if only a single thread can execute an instruction requires 18that the parent functions have internal linkage. Otherwise, an external caller 19could invalidate this analysis but having multiple threads call that function. 20The optimization pass will make internal copies of each function to use for this 21reason, but it is still recommended to mark them as internal using keywords like 22``static`` whenever possible. 23 24Example 25------- 26 27This optimization should apply to any variable declared in an OpenMP target 28region that is then shared with every thread in a parallel region. This allows 29the user to declare shared memory without using custom allocators. A simple 30stencil calculation shows how this can be used. 31 32.. code-block:: c++ 33 34 void stencil(int M, int N, double *X, double *Y) { 35 #pragma omp target teams distribute collapse(2) \ 36 map(to : X [0:M * N]) map(tofrom : Y [0:M * N]) 37 for (int i0 = 0; i0 < M; i0 += MC) { 38 for (int j0 = 0; j0 < N; j0 += NC) { 39 double sX[MC][NC]; 40 41 #pragma omp parallel for collapse(2) shared(sX) default(firstprivate) 42 for (int i1 = 0; i1 < MC; ++i1) 43 for (int j1 = 0; j1 < NC; ++j1) 44 sX[i1][j1] = X[(i0 + i1) * N + (j0 + j1)]; 45 46 #pragma omp parallel for collapse(2) shared(sX) default(firstprivate) 47 for (int i1 = 1; i1 < MC - 1; ++i1) 48 for (int j1 = 1; j1 < NC - 1; ++j1) 49 Y[(i0 + i1) * N + j0 * j1] = (sX[i1 + 1][j1] + sX[i1 - 1][j1] + 50 sX[i1][j1 + 1] + sX[i1][j1 - 1] + 51 -4.0 * sX[i1][j1]) / (dX * dX); 52 } 53 } 54 } 55 56.. code-block:: console 57 58 59 $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass=openmp-opt -fopenmp-version=51 omp111.cpp 60 omp111.cpp:10:14: remark: Replaced globalized variable with 8192 bytes of shared memory. [OMP111] 61 double sX[MC][NC]; 62 ^ 63 64The default mapping for variables captured in an OpenMP parallel region is 65``shared``. This means taking a pointer to the object which will ultimately 66result in globalization that will be mapped to shared memory when it could have 67been placed in registers. To avoid this, make sure each variable that can be 68copied into the region is marked ``firstprivate`` either explicitly or using the 69OpenMP 5.1 feature ``default(firstprivate)``. 70 71Diagnostic Scope 72---------------- 73 74OpenMP target offloading optimization remark. 75