xref: /llvm-project/openmp/runtime/doc/doxygen/libomp_interface.h (revision bb02c2547fd0eeb6023d7900c0572cba6b2987aa)
1 // This file does not contain any code; it just contains additional text and formatting
2 // for doxygen.
3 
4 
5 //===----------------------------------------------------------------------===//
6 //
7 //                     The LLVM Compiler Infrastructure
8 //
9 // This file is dual licensed under the MIT and the University of Illinois Open
10 // Source Licenses. See LICENSE.txt for details.
11 //
12 //===----------------------------------------------------------------------===//
13 
14 
15 /*! @mainpage LLVM  OpenMP* Runtime Library Interface
16 @section sec_intro Introduction
17 
18 This document describes the interface provided by the
19 LLVM  OpenMP\other runtime library to the compiler.
20 Routines that are directly called as simple functions by user code are
21 not currently described here, since their definition is in the OpenMP
22 specification available from http://openmp.org
23 
24 The aim here is to explain the interface from the compiler to the runtime.
25 
26 The overall design is described, and each function in the interface
27 has its own description. (At least, that's the ambition, we may not be there yet).
28 
29 @section sec_building Building the Runtime
30 For the impatient, we cover building the runtime as the first topic here.
31 
32 A top-level Makefile is provided that attempts to derive a suitable
33 configuration for the most commonly used environments.  To see the
34 default settings, type:
35 @code
36 % make info
37 @endcode
38 
39 You can change the Makefile's behavior with the following options:
40 
41  - <b>omp_root</b>:    The path to the top-level directory containing the top-level
42              Makefile.  By default, this will take on the value of the
43              current working directory.
44 
45  - <b>omp_os</b>:      Operating system.  By default, the build will attempt to
46              detect this. Currently supports "linux", "macos", and
47              "windows".
48 
49  - <b>arch</b>:        Architecture. By default, the build will attempt to
50 	     detect this if not specified by the user. Currently
51 	     supported values are
52              - "32" for IA-32 architecture
53              - "32e" for Intel&reg;&nbsp;64 architecture
54              - "mic" for Intel&reg;&nbsp;Many Integrated Core Architecture (
55              If "mic" is specified then "icc" will be used as the
56              compiler, and appropriate k1om binutils will be used. The
57              necessary packages must be installed on the build machine
58              for this to be possible, but an
59 	     Intel&reg;&nbsp;Xeon Phi&trade;&nbsp;
60              coprocessor is not required to build the library).
61 
62  - <b>compiler</b>:    Which compiler to use for the build.  Defaults to "icc"
63              or "icl" depending on the value of omp_os. Also supports
64              "gcc" when omp_os is "linux" for gcc\other versions
65              4.6.2 and higher. For icc on OS X\other, OS X\other versions
66 	     greater than 10.6 are not supported currently. Also, icc
67 	     version 13.0 is not supported. The selected compiler should be
68              installed and in the user's path. The corresponding
69              Fortran compiler should also be in the path.
70 
71  - <b>mode</b>:        Library mode: default is "release".  Also supports "debug".
72 
73 To use any of the options above, simple add &lt;option_name&gt;=&lt;value&gt;.  For
74 example, if you want to build with gcc instead of icc, type:
75 @code
76 % make compiler=gcc
77 @endcode
78 
79 Underneath the hood of the top-level Makefile, the runtime is built by
80 a perl script that in turn drives a detailed runtime system make.  The
81 script can be found at <tt>tools/build.pl</tt>, and will print
82 information about all its flags and controls if invoked as
83 @code
84 % tools/build.pl --help
85 @endcode
86 
87 If invoked with no arguments, it will try to build a set of libraries
88 that are appropriate for the machine on which the build is happening.
89 There are many options for building out of tree, and configuring library
90 features that can also be used. Consult the <tt>--help</tt> output for details.
91 
92 @section sec_supported Supported RTL Build Configurations
93 
94 The architectures supported are IA-32 architecture, Intel&reg;&nbsp; 64, and
95 Intel&reg;&nbsp; Many Integrated Core Architecture.  The build configurations
96 supported are shown in the table below.
97 
98 <table border=1>
99 <tr><th> <th>icc/icl<th>gcc
100 <tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4)
101 <tr><td>OS X\other<td>Yes(1,3,4)<td>No
102 <tr><td>Windows\other OS<td>Yes(1,4)<td>No
103 </table>
104 (1) On IA-32 architecture and Intel&reg;&nbsp; 64, icc/icl versions 12.x
105     are supported (12.1 is recommended).<br>
106 (2) gcc version 4.6.2 is supported.<br>
107 (3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br>
108 (4) Intel&reg;&nbsp; Many Integrated Core Architecture not supported.<br>
109 (5) On Intel&reg;&nbsp; Many Integrated Core Architecture, icc/icl versions 13.0 or later are required.
110 
111 @section sec_frontend Front-end Compilers that work with this RTL
112 
113 The following compilers are known to do compatible code generation for
114 this RTL: icc/icl, gcc.  Code generation is discussed in more detail
115 later in this document.
116 
117 @section sec_outlining Outlining
118 
119 The runtime interface is based on the idea that the compiler
120 "outlines" sections of code that are to run in parallel into separate
121 functions that can then be invoked in multiple threads.  For instance,
122 simple code like this
123 
124 @code
125 void foo()
126 {
127 #pragma omp parallel
128     {
129         ... do something ...
130     }
131 }
132 @endcode
133 is converted into something that looks conceptually like this (where
134 the names used are merely illustrative; the real library function
135 names will be used later after we've discussed some more issues...)
136 
137 @code
138 static void outlinedFooBody()
139 {
140     ... do something ...
141 }
142 
143 void foo()
144 {
145     __OMP_runtime_fork(outlinedFooBody, (void*)0);   // Not the real function name!
146 }
147 @endcode
148 
149 @subsection SEC_SHAREDVARS Addressing shared variables
150 
151 In real uses of the OpenMP\other API there are normally references
152 from the outlined code  to shared variables that are in scope in the containing function.
153 Therefore the containing function must be able to address
154 these variables. The runtime supports two alternate ways of doing
155 this.
156 
157 @subsubsection SEC_SEC_OT Current Technique
158 The technique currently supported by the runtime library is to receive
159 a separate pointer to each shared variable that can be accessed from
160 the outlined function.  This is what is shown in the example below.
161 
162 We hope soon to provide an alternative interface to support the
163 alternate implementation described in the next section. The
164 alternative implementation has performance advantages for small
165 parallel regions that have many shared variables.
166 
167 @subsubsection SEC_SEC_PT Future Technique
168 The idea is to treat the outlined function as though it
169 were a lexically nested function, and pass it a single argument which
170 is the pointer to the parent's stack frame. Provided that the compiler
171 knows the layout of the parent frame when it is generating the outlined
172 function it can then access the up-level variables at appropriate
173 offsets from the parent frame.  This is a classical compiler technique
174 from the 1960s to support languages like Algol (and its descendants)
175 that support lexically nested functions.
176 
177 The main benefit of this technique is that there is no code required
178 at the fork point to marshal the arguments to the outlined function.
179 Since the runtime knows statically how many arguments must be passed to the
180 outlined function, it can easily copy them to the thread's stack
181 frame.  Therefore the performance of the fork code is independent of
182 the number of shared variables that are accessed by the outlined
183 function.
184 
185 If it is hard to determine the stack layout of the parent while generating the
186 outlined code, it is still possible to use this approach by collecting all of
187 the variables in the parent that are accessed from outlined functions into
188 a single `struct` which is placed on the stack, and whose address is passed
189 to the outlined functions. In this way the offsets of the shared variables
190 are known (since they are inside the struct) without needing to know
191 the complete layout of the parent stack-frame. From the point of view
192 of the runtime either of these techniques is equivalent, since in either
193 case it only has to pass a single argument to the outlined function to allow
194 it to access shared variables.
195 
196 A scheme like this is how gcc\other generates outlined functions.
197 
198 @section SEC_INTERFACES Library Interfaces
199 The library functions used for specific parts of the OpenMP\other language implementation
200 are documented in different modules.
201 
202  - @ref BASIC_TYPES fundamental types used by the runtime in many places
203  - @ref DEPRECATED  functions that are in the library but are no longer required
204  - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime
205  - @ref PARALLEL functions for implementing `omp parallel`
206  - @ref THREAD_STATES functions for supporting thread state inquiries
207  - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections`
208  - @ref THREADPRIVATE functions to support thread private data, copyin etc
209  - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc
210  - @ref ATOMIC_OPS functions to support atomic operations
211  - @ref STATS_GATHERING macros to support developer profiling of libomp
212  - Documentation on tasking has still to be written...
213 
214 @section SEC_EXAMPLES Examples
215 @subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example
216 This example shows the code generated for a parallel for with reduction and dynamic scheduling.
217 
218 @code
219 extern float foo( void );
220 
221 int main () {
222     int i;
223     float r = 0.0;
224     #pragma omp parallel for schedule(dynamic) reduction(+:r)
225     for ( i = 0; i < 10; i ++ ) {
226         r += foo();
227     }
228 }
229 @endcode
230 
231 The transformed code looks like this.
232 @code
233 extern float foo( void );
234 
235 int main () {
236     static int zero = 0;
237     auto int gtid;
238     auto float r = 0.0;
239     __kmpc_begin( & loc3, 0 );
240     // The gtid is not actually required in this example so could be omitted;
241     // We show its initialization here because it is often required for calls into
242     // the runtime and should be locally cached like this.
243     gtid = __kmpc_global thread num( & loc3 );
244     __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r );
245     __kmpc_end( & loc0 );
246     return 0;
247 }
248 
249 struct main_10_reduction_t_5 { float r_10_rpr; };
250 
251 static kmp_critical_name lck = { 0 };
252 static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set
253                       // if compiler has generated an atomic reduction.
254 
255 void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) {
256     auto int i_7_pr;
257     auto int lower, upper, liter, incr;
258     auto struct main_10_reduction_t_5 reduce;
259     reduce.r_10_rpr = 0.F;
260     liter = 0;
261     __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 );
262     while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) {
263         for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ )
264           reduce.r_10_rpr += foo();
265     }
266     switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) {
267         case 1:
268            *r_7_shp += reduce.r_10_rpr;
269            __kmpc_end_reduce_nowait( & loc10, *gtid, & lck );
270            break;
271         case 2:
272            __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr );
273            break;
274         default:;
275     }
276 }
277 
278 void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs,
279                        struct main_10_reduction_t_5 *reduce_rhs )
280 {
281     reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr;
282 }
283 @endcode
284 
285 @defgroup BASIC_TYPES Basic Types
286 Types that are used throughout the runtime.
287 
288 @defgroup DEPRECATED Deprecated Functions
289 Functions in this group are for backwards compatibility only, and
290 should not be used in new code.
291 
292 @defgroup STARTUP_SHUTDOWN Startup and Shutdown
293 These functions are for library initialization and shutdown.
294 
295 @defgroup PARALLEL Parallel (fork/join)
296 These functions are used for implementing <tt>\#pragma omp parallel</tt>.
297 
298 @defgroup THREAD_STATES Thread Information
299 These functions return information about the currently executing thread.
300 
301 @defgroup WORK_SHARING Work Sharing
302 These functions are used for implementing
303 <tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and
304 <tt>\#pragma omp master</tt> constructs.
305 
306 When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types
307 which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same,
308 so they are only described once.
309 
310 Static loop scheduling is handled by  @ref __kmpc_for_static_init_4 and friends. Only a single call is needed,
311 since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known.
312 
313 Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions.
314 The init function is called once in each thread outside the loop, while the next function is called each
315 time that the previous chunk of work has been exhausted.
316 
317 @defgroup SYNCHRONIZATION Synchronization
318 These functions are used for implementing barriers.
319 
320 @defgroup THREADPRIVATE Thread private data support
321 These functions support copyin/out and thread private data.
322 
323 @defgroup STATS_GATHERING Statistics Gathering from OMPTB
324 These macros support profiling the libomp library.  Use --stats=on when building with build.pl to enable
325 and then use the KMP_* macros to profile (through counts or clock ticks) libomp during execution of an OpenMP program.
326 
327 @section sec_stats_env_vars Environment Variables
328 
329 This section describes the environment variables relevant to stats-gathering in libomp
330 
331 @code
332 KMP_STATS_FILE
333 @endcode
334 This environment variable is set to an output filename that will be appended *NOT OVERWRITTEN* if it exists.  If this environment variable is undefined, the statistics will be output to stderr
335 
336 @code
337 KMP_STATS_THREADS
338 @endcode
339 This environment variable indicates to print thread-specific statistics as well as aggregate statistics.  Each thread's statistics will be shown as well as the collective sum of all threads.  The values "true", "on", "1", "yes" will all indicate to print per thread statistics.
340 
341 @defgroup TASKING Tasking support
342 These functions support tasking constructs.
343 
344 @defgroup USER User visible functions
345 These functions can be called directly by the user, but are runtime library specific, rather than being OpenMP interfaces.
346 
347 */
348 
349