xref: /llvm-project/openmp/runtime/doc/doxygen/libomp_interface.h (revision 309b00a42e902e816dad03c2c2f1a9e41ba130bc)
1 // clang-format off
2 // This file does not contain any code; it just contains additional text and formatting
3 // for doxygen.
4 
5 
6 //===----------------------------------------------------------------------===//
7 //
8 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
9 // See https://llvm.org/LICENSE.txt for license information.
10 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
11 //
12 //===----------------------------------------------------------------------===//
13 
14 
15 /*! @mainpage LLVM  OpenMP* Runtime Library Interface
16 @section sec_intro Introduction
17 
18 This document describes the interface provided by the
19 LLVM  OpenMP\other runtime library to the compiler.
20 Routines that are directly called as simple functions by user code are
21 not currently described here, since their definition is in the OpenMP
22 specification available from http://openmp.org
23 
24 The aim here is to explain the interface from the compiler to the runtime.
25 
26 The overall design is described, and each function in the interface
27 has its own description. (At least, that's the ambition, we may not be there yet).
28 
29 @section sec_building Quickly Building the Runtime
30 For the impatient, we cover building the runtime as the first topic here.
31 
32 CMake is used to build the OpenMP runtime.  For details and a full list of options for the CMake build system,
33 see <tt>README.rst</tt> in the source code repository.  These instructions will provide the most typical build.
34 
35 In-LLVM-tree build:.
36 @code
37 $ cd where-you-want-to-live
38 Check out openmp into llvm/projects
39 $ cd where-you-want-to-build
40 $ mkdir build && cd build
41 $ cmake path/to/llvm -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
42 $ make omp
43 @endcode
44 Out-of-LLVM-tree build:
45 @code
46 $ cd where-you-want-to-live
47 Check out openmp
48 $ cd where-you-want-to-live/openmp
49 $ mkdir build && cd build
50 $ cmake path/to/openmp -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
51 $ make
52 @endcode
53 
54 @section sec_supported Supported RTL Build Configurations
55 
56 The architectures supported are IA-32 architecture, Intel&reg;&nbsp; 64, and
57 Intel&reg;&nbsp; Many Integrated Core Architecture.  The build configurations
58 supported are shown in the table below.
59 
60 <table border=1>
61 <tr><th> <th>icc/icl<th>gcc<th>clang
62 <tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7)
63 <tr><td>FreeBSD\other<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7,8)
64 <tr><td>OS X\other<td>Yes(1,3,4)<td>No<td>Yes(4,6,7)
65 <tr><td>Windows\other OS<td>Yes(1,4)<td>No<td>No
66 </table>
67 (1) On IA-32 architecture and Intel&reg;&nbsp; 64, icc/icl versions 12.x
68     are supported (12.1 is recommended).<br>
69 (2) gcc version 4.7 is supported.<br>
70 (3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br>
71 (4) Intel&reg;&nbsp; Many Integrated Core Architecture not supported.<br>
72 (5) On Intel&reg;&nbsp; Many Integrated Core Architecture, icc/icl versions 13.0 or later are required.<br>
73 (6) Clang\other version 3.3 is supported.<br>
74 (7) Clang\other currently does not offer a software-implemented 128 bit extended
75     precision type.  Thus, all entry points reliant on this type are removed
76     from the library and cannot be called in the user program.  The following
77     functions are not available:
78 @code
79     __kmpc_atomic_cmplx16_*
80     __kmpc_atomic_float16_*
81     __kmpc_atomic_*_fp
82 @endcode
83 (8) Community contribution provided AS IS, not tested by Intel.
84 
85 Supported Architectures: IBM(R) Power 7 and Power 8
86 <table border=1>
87 <tr><th> <th>gcc<th>clang
88 <tr><td>Linux\other OS<td>Yes(1,2)<td>Yes(3,4)
89 </table>
90 (1) On Power 7, gcc version 4.8.2 is supported.<br>
91 (2) On Power 8, gcc version 4.8.2 is supported.<br>
92 (3) On Power 7, clang version 3.7 is supported.<br>
93 (4) On Power 8, clang version 3.7 is supported.<br>
94 
95 @section sec_frontend Front-end Compilers that work with this RTL
96 
97 The following compilers are known to do compatible code generation for
98 this RTL: icc/icl, gcc.  Code generation is discussed in more detail
99 later in this document.
100 
101 @section sec_outlining Outlining
102 
103 The runtime interface is based on the idea that the compiler
104 "outlines" sections of code that are to run in parallel into separate
105 functions that can then be invoked in multiple threads.  For instance,
106 simple code like this
107 
108 @code
109 void foo()
110 {
111 #pragma omp parallel
112     {
113         ... do something ...
114     }
115 }
116 @endcode
117 is converted into something that looks conceptually like this (where
118 the names used are merely illustrative; the real library function
119 names will be used later after we've discussed some more issues...)
120 
121 @code
122 static void outlinedFooBody()
123 {
124     ... do something ...
125 }
126 
127 void foo()
128 {
129     __OMP_runtime_fork(outlinedFooBody, (void*)0);   // Not the real function name!
130 }
131 @endcode
132 
133 @subsection SEC_SHAREDVARS Addressing shared variables
134 
135 In real uses of the OpenMP\other API there are normally references
136 from the outlined code  to shared variables that are in scope in the containing function.
137 Therefore the containing function must be able to address
138 these variables. The runtime supports two alternate ways of doing
139 this.
140 
141 @subsubsection SEC_SEC_OT Current Technique
142 The technique currently supported by the runtime library is to receive
143 a separate pointer to each shared variable that can be accessed from
144 the outlined function.  This is what is shown in the example below.
145 
146 We hope soon to provide an alternative interface to support the
147 alternate implementation described in the next section. The
148 alternative implementation has performance advantages for small
149 parallel regions that have many shared variables.
150 
151 @subsubsection SEC_SEC_PT Future Technique
152 The idea is to treat the outlined function as though it
153 were a lexically nested function, and pass it a single argument which
154 is the pointer to the parent's stack frame. Provided that the compiler
155 knows the layout of the parent frame when it is generating the outlined
156 function it can then access the up-level variables at appropriate
157 offsets from the parent frame.  This is a classical compiler technique
158 from the 1960s to support languages like Algol (and its descendants)
159 that support lexically nested functions.
160 
161 The main benefit of this technique is that there is no code required
162 at the fork point to marshal the arguments to the outlined function.
163 Since the runtime knows statically how many arguments must be passed to the
164 outlined function, it can easily copy them to the thread's stack
165 frame.  Therefore the performance of the fork code is independent of
166 the number of shared variables that are accessed by the outlined
167 function.
168 
169 If it is hard to determine the stack layout of the parent while generating the
170 outlined code, it is still possible to use this approach by collecting all of
171 the variables in the parent that are accessed from outlined functions into
172 a single `struct` which is placed on the stack, and whose address is passed
173 to the outlined functions. In this way the offsets of the shared variables
174 are known (since they are inside the struct) without needing to know
175 the complete layout of the parent stack-frame. From the point of view
176 of the runtime either of these techniques is equivalent, since in either
177 case it only has to pass a single argument to the outlined function to allow
178 it to access shared variables.
179 
180 A scheme like this is how gcc\other generates outlined functions.
181 
182 @section SEC_INTERFACES Library Interfaces
183 The library functions used for specific parts of the OpenMP\other language implementation
184 are documented in different modules.
185 
186  - @ref BASIC_TYPES fundamental types used by the runtime in many places
187  - @ref DEPRECATED  functions that are in the library but are no longer required
188  - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime
189  - @ref PARALLEL functions for implementing `omp parallel`
190  - @ref THREAD_STATES functions for supporting thread state inquiries
191  - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections`
192  - @ref THREADPRIVATE functions to support thread private data, copyin etc
193  - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc
194  - @ref ATOMIC_OPS functions to support atomic operations
195  - @ref STATS_GATHERING macros to support developer profiling of libomp
196  - Documentation on tasking has still to be written...
197 
198 @section SEC_EXAMPLES Examples
199 @subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example
200 This example shows the code generated for a parallel for with reduction and dynamic scheduling.
201 
202 @code
203 extern float foo( void );
204 
205 int main () {
206     int i;
207     float r = 0.0;
208     #pragma omp parallel for schedule(dynamic) reduction(+:r)
209     for ( i = 0; i < 10; i ++ ) {
210         r += foo();
211     }
212 }
213 @endcode
214 
215 The transformed code looks like this.
216 @code
217 extern float foo( void );
218 
219 int main () {
220     static int zero = 0;
221     auto int gtid;
222     auto float r = 0.0;
223     __kmpc_begin( & loc3, 0 );
224     // The gtid is not actually required in this example so could be omitted;
225     // We show its initialization here because it is often required for calls into
226     // the runtime and should be locally cached like this.
227     gtid = __kmpc_global thread num( & loc3 );
228     __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r );
229     __kmpc_end( & loc0 );
230     return 0;
231 }
232 
233 struct main_10_reduction_t_5 { float r_10_rpr; };
234 
235 static kmp_critical_name lck = { 0 };
236 static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set
237                       // if compiler has generated an atomic reduction.
238 
239 void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) {
240     auto int i_7_pr;
241     auto int lower, upper, liter, incr;
242     auto struct main_10_reduction_t_5 reduce;
243     reduce.r_10_rpr = 0.F;
244     liter = 0;
245     __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 );
246     while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) {
247         for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ )
248           reduce.r_10_rpr += foo();
249     }
250     switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) {
251         case 1:
252            *r_7_shp += reduce.r_10_rpr;
253            __kmpc_end_reduce_nowait( & loc10, *gtid, & lck );
254            break;
255         case 2:
256            __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr );
257            break;
258         default:;
259     }
260 }
261 
262 void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs,
263                        struct main_10_reduction_t_5 *reduce_rhs )
264 {
265     reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr;
266 }
267 @endcode
268 
269 @defgroup BASIC_TYPES Basic Types
270 Types that are used throughout the runtime.
271 
272 @defgroup DEPRECATED Deprecated Functions
273 Functions in this group are for backwards compatibility only, and
274 should not be used in new code.
275 
276 @defgroup STARTUP_SHUTDOWN Startup and Shutdown
277 These functions are for library initialization and shutdown.
278 
279 @defgroup PARALLEL Parallel (fork/join)
280 These functions are used for implementing <tt>\#pragma omp parallel</tt>.
281 
282 @defgroup THREAD_STATES Thread Information
283 These functions return information about the currently executing thread.
284 
285 @defgroup WORK_SHARING Work Sharing
286 These functions are used for implementing
287 <tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and
288 <tt>\#pragma omp master</tt> constructs.
289 
290 When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types
291 which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same,
292 so they are only described once.
293 
294 Static loop scheduling is handled by  @ref __kmpc_for_static_init_4 and friends. Only a single call is needed,
295 since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known.
296 
297 Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions.
298 The init function is called once in each thread outside the loop, while the next function is called each
299 time that the previous chunk of work has been exhausted.
300 
301 @defgroup SYNCHRONIZATION Synchronization
302 These functions are used for implementing barriers.
303 
304 @defgroup THREADPRIVATE Thread private data support
305 These functions support copyin/out and thread private data.
306 
307 @defgroup STATS_GATHERING Statistics Gathering from OMPTB
308 These macros support profiling the libomp library.  Use --stats=on when building with build.pl to enable
309 and then use the KMP_* macros to profile (through counts or clock ticks) libomp during execution of an OpenMP program.
310 
311 @section sec_stats_env_vars Environment Variables
312 
313 This section describes the environment variables relevant to stats-gathering in libomp
314 
315 @code
316 KMP_STATS_FILE
317 @endcode
318 This environment variable is set to an output filename that will be appended *NOT OVERWRITTEN* if it exists.  If this environment variable is undefined, the statistics will be output to stderr
319 
320 @code
321 KMP_STATS_THREADS
322 @endcode
323 This environment variable indicates to print thread-specific statistics as well as aggregate statistics.  Each thread's statistics will be shown as well as the collective sum of all threads.  The values "true", "on", "1", "yes" will all indicate to print per thread statistics.
324 
325 @defgroup TASKING Tasking support
326 These functions support tasking constructs.
327 
328 @defgroup USER User visible functions
329 These functions can be called directly by the user, but are runtime library specific, rather than being OpenMP interfaces.
330 
331 */
332 
333