xref: /llvm-project/openmp/runtime/doc/doxygen/libomp_interface.h (revision e4f5d010330159a919c0fa0fc0c769f12a17a00d)
1 // This file does not contain any code; it just contains additional text and formatting
2 // for doxygen.
3 
4 
5 //===----------------------------------------------------------------------===//
6 //
7 //                     The LLVM Compiler Infrastructure
8 //
9 // This file is dual licensed under the MIT and the University of Illinois Open
10 // Source Licenses. See LICENSE.txt for details.
11 //
12 //===----------------------------------------------------------------------===//
13 
14 
15 /*! @mainpage LLVM  OpenMP* Runtime Library Interface
16 @section sec_intro Introduction
17 
18 This document describes the interface provided by the
19 LLVM  OpenMP\other runtime library to the compiler.
20 Routines that are directly called as simple functions by user code are
21 not currently described here, since their definition is in the OpenMP
22 specification available from http://openmp.org
23 
24 The aim here is to explain the interface from the compiler to the runtime.
25 
26 The overall design is described, and each function in the interface
27 has its own description. (At least, that's the ambition, we may not be there yet).
28 
29 @section sec_building Quickly Building the Runtime
30 For the impatient, we cover building the runtime as the first topic here.
31 
32 CMake is used to build the OpenMP runtime.  For details and a full list of options for the CMake build system,
33 see <tt>Build_With_CMake.txt</tt> inside the <tt>runtime/</tt> subdirectory.  These
34 instructions will provide the most typical build.
35 
36 In-LLVM-tree build:.
37 @code
38 $ cd where-you-want-to-live
39 Check out openmp into llvm/projects
40 $ cd where-you-want-to-build
41 $ mkdir build && cd build
42 $ cmake path/to/llvm -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
43 $ make omp
44 @endcode
45 Out-of-LLVM-tree build:
46 @code
47 $ cd where-you-want-to-live
48 Check out openmp
49 $ cd where-you-want-to-live/openmp/runtime
50 $ mkdir build && cd build
51 $ cmake path/to/openmp -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
52 $ make
53 @endcode
54 
55 @section sec_supported Supported RTL Build Configurations
56 
57 The architectures supported are IA-32 architecture, Intel&reg;&nbsp; 64, and
58 Intel&reg;&nbsp; Many Integrated Core Architecture.  The build configurations
59 supported are shown in the table below.
60 
61 <table border=1>
62 <tr><th> <th>icc/icl<th>gcc<th>clang
63 <tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7)
64 <tr><td>FreeBSD\other<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7,8)
65 <tr><td>OS X\other<td>Yes(1,3,4)<td>No<td>Yes(4,6,7)
66 <tr><td>Windows\other OS<td>Yes(1,4)<td>No<td>No
67 </table>
68 (1) On IA-32 architecture and Intel&reg;&nbsp; 64, icc/icl versions 12.x
69     are supported (12.1 is recommended).<br>
70 (2) gcc version 4.7 is supported.<br>
71 (3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br>
72 (4) Intel&reg;&nbsp; Many Integrated Core Architecture not supported.<br>
73 (5) On Intel&reg;&nbsp; Many Integrated Core Architecture, icc/icl versions 13.0 or later are required.<br>
74 (6) Clang\other version 3.3 is supported.<br>
75 (7) Clang\other currently does not offer a software-implemented 128 bit extended
76     precision type.  Thus, all entry points reliant on this type are removed
77     from the library and cannot be called in the user program.  The following
78     functions are not available:
79 @code
80     __kmpc_atomic_cmplx16_*
81     __kmpc_atomic_float16_*
82     __kmpc_atomic_*_fp
83 @endcode
84 (8) Community contribution provided AS IS, not tested by Intel.
85 
86 Supported Architectures: IBM(R) Power 7 and Power 8
87 <table border=1>
88 <tr><th> <th>gcc<th>clang
89 <tr><td>Linux\other OS<td>Yes(1,2)<td>Yes(3,4)
90 </table>
91 (1) On Power 7, gcc version 4.8.2 is supported.<br>
92 (2) On Power 8, gcc version 4.8.2 is supported.<br>
93 (3) On Power 7, clang version 3.7 is supported.<br>
94 (4) On Power 8, clang version 3.7 is supported.<br>
95 
96 @section sec_frontend Front-end Compilers that work with this RTL
97 
98 The following compilers are known to do compatible code generation for
99 this RTL: icc/icl, gcc.  Code generation is discussed in more detail
100 later in this document.
101 
102 @section sec_outlining Outlining
103 
104 The runtime interface is based on the idea that the compiler
105 "outlines" sections of code that are to run in parallel into separate
106 functions that can then be invoked in multiple threads.  For instance,
107 simple code like this
108 
109 @code
110 void foo()
111 {
112 #pragma omp parallel
113     {
114         ... do something ...
115     }
116 }
117 @endcode
118 is converted into something that looks conceptually like this (where
119 the names used are merely illustrative; the real library function
120 names will be used later after we've discussed some more issues...)
121 
122 @code
123 static void outlinedFooBody()
124 {
125     ... do something ...
126 }
127 
128 void foo()
129 {
130     __OMP_runtime_fork(outlinedFooBody, (void*)0);   // Not the real function name!
131 }
132 @endcode
133 
134 @subsection SEC_SHAREDVARS Addressing shared variables
135 
136 In real uses of the OpenMP\other API there are normally references
137 from the outlined code  to shared variables that are in scope in the containing function.
138 Therefore the containing function must be able to address
139 these variables. The runtime supports two alternate ways of doing
140 this.
141 
142 @subsubsection SEC_SEC_OT Current Technique
143 The technique currently supported by the runtime library is to receive
144 a separate pointer to each shared variable that can be accessed from
145 the outlined function.  This is what is shown in the example below.
146 
147 We hope soon to provide an alternative interface to support the
148 alternate implementation described in the next section. The
149 alternative implementation has performance advantages for small
150 parallel regions that have many shared variables.
151 
152 @subsubsection SEC_SEC_PT Future Technique
153 The idea is to treat the outlined function as though it
154 were a lexically nested function, and pass it a single argument which
155 is the pointer to the parent's stack frame. Provided that the compiler
156 knows the layout of the parent frame when it is generating the outlined
157 function it can then access the up-level variables at appropriate
158 offsets from the parent frame.  This is a classical compiler technique
159 from the 1960s to support languages like Algol (and its descendants)
160 that support lexically nested functions.
161 
162 The main benefit of this technique is that there is no code required
163 at the fork point to marshal the arguments to the outlined function.
164 Since the runtime knows statically how many arguments must be passed to the
165 outlined function, it can easily copy them to the thread's stack
166 frame.  Therefore the performance of the fork code is independent of
167 the number of shared variables that are accessed by the outlined
168 function.
169 
170 If it is hard to determine the stack layout of the parent while generating the
171 outlined code, it is still possible to use this approach by collecting all of
172 the variables in the parent that are accessed from outlined functions into
173 a single `struct` which is placed on the stack, and whose address is passed
174 to the outlined functions. In this way the offsets of the shared variables
175 are known (since they are inside the struct) without needing to know
176 the complete layout of the parent stack-frame. From the point of view
177 of the runtime either of these techniques is equivalent, since in either
178 case it only has to pass a single argument to the outlined function to allow
179 it to access shared variables.
180 
181 A scheme like this is how gcc\other generates outlined functions.
182 
183 @section SEC_INTERFACES Library Interfaces
184 The library functions used for specific parts of the OpenMP\other language implementation
185 are documented in different modules.
186 
187  - @ref BASIC_TYPES fundamental types used by the runtime in many places
188  - @ref DEPRECATED  functions that are in the library but are no longer required
189  - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime
190  - @ref PARALLEL functions for implementing `omp parallel`
191  - @ref THREAD_STATES functions for supporting thread state inquiries
192  - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections`
193  - @ref THREADPRIVATE functions to support thread private data, copyin etc
194  - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc
195  - @ref ATOMIC_OPS functions to support atomic operations
196  - @ref STATS_GATHERING macros to support developer profiling of libomp
197  - Documentation on tasking has still to be written...
198 
199 @section SEC_EXAMPLES Examples
200 @subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example
201 This example shows the code generated for a parallel for with reduction and dynamic scheduling.
202 
203 @code
204 extern float foo( void );
205 
206 int main () {
207     int i;
208     float r = 0.0;
209     #pragma omp parallel for schedule(dynamic) reduction(+:r)
210     for ( i = 0; i < 10; i ++ ) {
211         r += foo();
212     }
213 }
214 @endcode
215 
216 The transformed code looks like this.
217 @code
218 extern float foo( void );
219 
220 int main () {
221     static int zero = 0;
222     auto int gtid;
223     auto float r = 0.0;
224     __kmpc_begin( & loc3, 0 );
225     // The gtid is not actually required in this example so could be omitted;
226     // We show its initialization here because it is often required for calls into
227     // the runtime and should be locally cached like this.
228     gtid = __kmpc_global thread num( & loc3 );
229     __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r );
230     __kmpc_end( & loc0 );
231     return 0;
232 }
233 
234 struct main_10_reduction_t_5 { float r_10_rpr; };
235 
236 static kmp_critical_name lck = { 0 };
237 static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set
238                       // if compiler has generated an atomic reduction.
239 
240 void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) {
241     auto int i_7_pr;
242     auto int lower, upper, liter, incr;
243     auto struct main_10_reduction_t_5 reduce;
244     reduce.r_10_rpr = 0.F;
245     liter = 0;
246     __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 );
247     while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) {
248         for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ )
249           reduce.r_10_rpr += foo();
250     }
251     switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) {
252         case 1:
253            *r_7_shp += reduce.r_10_rpr;
254            __kmpc_end_reduce_nowait( & loc10, *gtid, & lck );
255            break;
256         case 2:
257            __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr );
258            break;
259         default:;
260     }
261 }
262 
263 void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs,
264                        struct main_10_reduction_t_5 *reduce_rhs )
265 {
266     reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr;
267 }
268 @endcode
269 
270 @defgroup BASIC_TYPES Basic Types
271 Types that are used throughout the runtime.
272 
273 @defgroup DEPRECATED Deprecated Functions
274 Functions in this group are for backwards compatibility only, and
275 should not be used in new code.
276 
277 @defgroup STARTUP_SHUTDOWN Startup and Shutdown
278 These functions are for library initialization and shutdown.
279 
280 @defgroup PARALLEL Parallel (fork/join)
281 These functions are used for implementing <tt>\#pragma omp parallel</tt>.
282 
283 @defgroup THREAD_STATES Thread Information
284 These functions return information about the currently executing thread.
285 
286 @defgroup WORK_SHARING Work Sharing
287 These functions are used for implementing
288 <tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and
289 <tt>\#pragma omp master</tt> constructs.
290 
291 When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types
292 which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same,
293 so they are only described once.
294 
295 Static loop scheduling is handled by  @ref __kmpc_for_static_init_4 and friends. Only a single call is needed,
296 since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known.
297 
298 Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions.
299 The init function is called once in each thread outside the loop, while the next function is called each
300 time that the previous chunk of work has been exhausted.
301 
302 @defgroup SYNCHRONIZATION Synchronization
303 These functions are used for implementing barriers.
304 
305 @defgroup THREADPRIVATE Thread private data support
306 These functions support copyin/out and thread private data.
307 
308 @defgroup STATS_GATHERING Statistics Gathering from OMPTB
309 These macros support profiling the libomp library.  Use --stats=on when building with build.pl to enable
310 and then use the KMP_* macros to profile (through counts or clock ticks) libomp during execution of an OpenMP program.
311 
312 @section sec_stats_env_vars Environment Variables
313 
314 This section describes the environment variables relevant to stats-gathering in libomp
315 
316 @code
317 KMP_STATS_FILE
318 @endcode
319 This environment variable is set to an output filename that will be appended *NOT OVERWRITTEN* if it exists.  If this environment variable is undefined, the statistics will be output to stderr
320 
321 @code
322 KMP_STATS_THREADS
323 @endcode
324 This environment variable indicates to print thread-specific statistics as well as aggregate statistics.  Each thread's statistics will be shown as well as the collective sum of all threads.  The values "true", "on", "1", "yes" will all indicate to print per thread statistics.
325 
326 @defgroup TASKING Tasking support
327 These functions support tasking constructs.
328 
329 @defgroup USER User visible functions
330 These functions can be called directly by the user, but are runtime library specific, rather than being OpenMP interfaces.
331 
332 */
333 
334