1 // This file does not contain any code; it just contains additional text and formatting 2 // for doxygen. 3 4 5 //===----------------------------------------------------------------------===// 6 // 7 // The LLVM Compiler Infrastructure 8 // 9 // This file is dual licensed under the MIT and the University of Illinois Open 10 // Source Licenses. See LICENSE.txt for details. 11 // 12 //===----------------------------------------------------------------------===// 13 14 15 /*! @mainpage LLVM OpenMP* Runtime Library Interface 16 @section sec_intro Introduction 17 18 This document describes the interface provided by the 19 LLVM OpenMP\other runtime library to the compiler. 20 Routines that are directly called as simple functions by user code are 21 not currently described here, since their definition is in the OpenMP 22 specification available from http://openmp.org 23 24 The aim here is to explain the interface from the compiler to the runtime. 25 26 The overall design is described, and each function in the interface 27 has its own description. (At least, that's the ambition, we may not be there yet). 28 29 @section sec_building Building the Runtime 30 For the impatient, we cover building the runtime as the first topic here. 31 32 A top-level Makefile is provided that attempts to derive a suitable 33 configuration for the most commonly used environments. To see the 34 default settings, type: 35 @code 36 % make info 37 @endcode 38 39 You can change the Makefile's behavior with the following options: 40 41 - <b>omp_root</b>: The path to the top-level directory containing the top-level 42 Makefile. By default, this will take on the value of the 43 current working directory. 44 45 - <b>omp_os</b>: Operating system. By default, the build will attempt to 46 detect this. Currently supports "linux", "macos", and 47 "windows". 48 49 - <b>arch</b>: Architecture. By default, the build will attempt to 50 detect this if not specified by the user. Currently 51 supported values are 52 - "32" for IA-32 architecture 53 - "32e" for Intel® 64 architecture 54 - "mic" for Intel® Many Integrated Core Architecture ( 55 If "mic" is specified then "icc" will be used as the 56 compiler, and appropriate k1om binutils will be used. The 57 necessary packages must be installed on the build machine 58 for this to be possible, but an 59 Intel® Xeon Phi™ 60 coprocessor is not required to build the library). 61 62 - <b>compiler</b>: Which compiler to use for the build. Defaults to "icc" 63 or "icl" depending on the value of omp_os. Also supports 64 "gcc" when omp_os is "linux" for gcc\other versions 65 4.6.2 and higher. For icc on OS X\other, OS X\other versions 66 greater than 10.6 are not supported currently. Also, icc 67 version 13.0 is not supported. The selected compiler should be 68 installed and in the user's path. The corresponding 69 Fortran compiler should also be in the path. 70 71 - <b>mode</b>: Library mode: default is "release". Also supports "debug". 72 73 To use any of the options above, simple add <option_name>=<value>. For 74 example, if you want to build with gcc instead of icc, type: 75 @code 76 % make compiler=gcc 77 @endcode 78 79 Underneath the hood of the top-level Makefile, the runtime is built by 80 a perl script that in turn drives a detailed runtime system make. The 81 script can be found at <tt>tools/build.pl</tt>, and will print 82 information about all its flags and controls if invoked as 83 @code 84 % tools/build.pl --help 85 @endcode 86 87 If invoked with no arguments, it will try to build a set of libraries 88 that are appropriate for the machine on which the build is happening. 89 There are many options for building out of tree, and configuring library 90 features that can also be used. Consult the <tt>--help</tt> output for details. 91 92 @section sec_supported Supported RTL Build Configurations 93 94 The architectures supported are IA-32 architecture, Intel® 64, and 95 Intel® Many Integrated Core Architecture. The build configurations 96 supported are shown in the table below. 97 98 <table border=1> 99 <tr><th> <th>icc/icl<th>gcc 100 <tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4) 101 <tr><td>OS X\other<td>Yes(1,3,4)<td>No 102 <tr><td>Windows\other OS<td>Yes(1,4)<td>No 103 </table> 104 (1) On IA-32 architecture and Intel® 64, icc/icl versions 12.x 105 are supported (12.1 is recommended).<br> 106 (2) gcc version 4.6.2 is supported.<br> 107 (3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br> 108 (4) Intel® Many Integrated Core Architecture not supported.<br> 109 (5) On Intel® Many Integrated Core Architecture, icc/icl versions 13.0 or later are required. 110 111 @section sec_frontend Front-end Compilers that work with this RTL 112 113 The following compilers are known to do compatible code generation for 114 this RTL: icc/icl, gcc. Code generation is discussed in more detail 115 later in this document. 116 117 @section sec_outlining Outlining 118 119 The runtime interface is based on the idea that the compiler 120 "outlines" sections of code that are to run in parallel into separate 121 functions that can then be invoked in multiple threads. For instance, 122 simple code like this 123 124 @code 125 void foo() 126 { 127 #pragma omp parallel 128 { 129 ... do something ... 130 } 131 } 132 @endcode 133 is converted into something that looks conceptually like this (where 134 the names used are merely illustrative; the real library function 135 names will be used later after we've discussed some more issues...) 136 137 @code 138 static void outlinedFooBody() 139 { 140 ... do something ... 141 } 142 143 void foo() 144 { 145 __OMP_runtime_fork(outlinedFooBody, (void*)0); // Not the real function name! 146 } 147 @endcode 148 149 @subsection SEC_SHAREDVARS Addressing shared variables 150 151 In real uses of the OpenMP\other API there are normally references 152 from the outlined code to shared variables that are in scope in the containing function. 153 Therefore the containing function must be able to address 154 these variables. The runtime supports two alternate ways of doing 155 this. 156 157 @subsubsection SEC_SEC_OT Current Technique 158 The technique currently supported by the runtime library is to receive 159 a separate pointer to each shared variable that can be accessed from 160 the outlined function. This is what is shown in the example below. 161 162 We hope soon to provide an alternative interface to support the 163 alternate implementation described in the next section. The 164 alternative implementation has performance advantages for small 165 parallel regions that have many shared variables. 166 167 @subsubsection SEC_SEC_PT Future Technique 168 The idea is to treat the outlined function as though it 169 were a lexically nested function, and pass it a single argument which 170 is the pointer to the parent's stack frame. Provided that the compiler 171 knows the layout of the parent frame when it is generating the outlined 172 function it can then access the up-level variables at appropriate 173 offsets from the parent frame. This is a classical compiler technique 174 from the 1960s to support languages like Algol (and its descendants) 175 that support lexically nested functions. 176 177 The main benefit of this technique is that there is no code required 178 at the fork point to marshal the arguments to the outlined function. 179 Since the runtime knows statically how many arguments must be passed to the 180 outlined function, it can easily copy them to the thread's stack 181 frame. Therefore the performance of the fork code is independent of 182 the number of shared variables that are accessed by the outlined 183 function. 184 185 If it is hard to determine the stack layout of the parent while generating the 186 outlined code, it is still possible to use this approach by collecting all of 187 the variables in the parent that are accessed from outlined functions into 188 a single `struct` which is placed on the stack, and whose address is passed 189 to the outlined functions. In this way the offsets of the shared variables 190 are known (since they are inside the struct) without needing to know 191 the complete layout of the parent stack-frame. From the point of view 192 of the runtime either of these techniques is equivalent, since in either 193 case it only has to pass a single argument to the outlined function to allow 194 it to access shared variables. 195 196 A scheme like this is how gcc\other generates outlined functions. 197 198 @section SEC_INTERFACES Library Interfaces 199 The library functions used for specific parts of the OpenMP\other language implementation 200 are documented in different modules. 201 202 - @ref BASIC_TYPES fundamental types used by the runtime in many places 203 - @ref DEPRECATED functions that are in the library but are no longer required 204 - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime 205 - @ref PARALLEL functions for implementing `omp parallel` 206 - @ref THREAD_STATES functions for supporting thread state inquiries 207 - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections` 208 - @ref THREADPRIVATE functions to support thread private data, copyin etc 209 - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc 210 - @ref ATOMIC_OPS functions to support atomic operations 211 - @ref STATS_GATHERING macros to support developer profiling of libomp 212 - Documentation on tasking has still to be written... 213 214 @section SEC_EXAMPLES Examples 215 @subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example 216 This example shows the code generated for a parallel for with reduction and dynamic scheduling. 217 218 @code 219 extern float foo( void ); 220 221 int main () { 222 int i; 223 float r = 0.0; 224 #pragma omp parallel for schedule(dynamic) reduction(+:r) 225 for ( i = 0; i < 10; i ++ ) { 226 r += foo(); 227 } 228 } 229 @endcode 230 231 The transformed code looks like this. 232 @code 233 extern float foo( void ); 234 235 int main () { 236 static int zero = 0; 237 auto int gtid; 238 auto float r = 0.0; 239 __kmpc_begin( & loc3, 0 ); 240 // The gtid is not actually required in this example so could be omitted; 241 // We show its initialization here because it is often required for calls into 242 // the runtime and should be locally cached like this. 243 gtid = __kmpc_global thread num( & loc3 ); 244 __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r ); 245 __kmpc_end( & loc0 ); 246 return 0; 247 } 248 249 struct main_10_reduction_t_5 { float r_10_rpr; }; 250 251 static kmp_critical_name lck = { 0 }; 252 static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set 253 // if compiler has generated an atomic reduction. 254 255 void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) { 256 auto int i_7_pr; 257 auto int lower, upper, liter, incr; 258 auto struct main_10_reduction_t_5 reduce; 259 reduce.r_10_rpr = 0.F; 260 liter = 0; 261 __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 ); 262 while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) { 263 for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ ) 264 reduce.r_10_rpr += foo(); 265 } 266 switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) { 267 case 1: 268 *r_7_shp += reduce.r_10_rpr; 269 __kmpc_end_reduce_nowait( & loc10, *gtid, & lck ); 270 break; 271 case 2: 272 __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr ); 273 break; 274 default:; 275 } 276 } 277 278 void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs, 279 struct main_10_reduction_t_5 *reduce_rhs ) 280 { 281 reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr; 282 } 283 @endcode 284 285 @defgroup BASIC_TYPES Basic Types 286 Types that are used throughout the runtime. 287 288 @defgroup DEPRECATED Deprecated Functions 289 Functions in this group are for backwards compatibility only, and 290 should not be used in new code. 291 292 @defgroup STARTUP_SHUTDOWN Startup and Shutdown 293 These functions are for library initialization and shutdown. 294 295 @defgroup PARALLEL Parallel (fork/join) 296 These functions are used for implementing <tt>\#pragma omp parallel</tt>. 297 298 @defgroup THREAD_STATES Thread Information 299 These functions return information about the currently executing thread. 300 301 @defgroup WORK_SHARING Work Sharing 302 These functions are used for implementing 303 <tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and 304 <tt>\#pragma omp master</tt> constructs. 305 306 When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types 307 which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same, 308 so they are only described once. 309 310 Static loop scheduling is handled by @ref __kmpc_for_static_init_4 and friends. Only a single call is needed, 311 since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known. 312 313 Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions. 314 The init function is called once in each thread outside the loop, while the next function is called each 315 time that the previous chunk of work has been exhausted. 316 317 @defgroup SYNCHRONIZATION Synchronization 318 These functions are used for implementing barriers. 319 320 @defgroup THREADPRIVATE Thread private data support 321 These functions support copyin/out and thread private data. 322 323 @defgroup STATS_GATHERING Statistics Gathering from OMPTB 324 These macros support profiling the libomp library. Use --stats=on when building with build.pl to enable 325 and then use the KMP_* macros to profile (through counts or clock ticks) libomp during execution of an OpenMP program. 326 327 @section sec_stats_env_vars Environment Variables 328 329 This section describes the environment variables relevant to stats-gathering in libomp 330 331 @code 332 KMP_STATS_FILE 333 @endcode 334 This environment variable is set to an output filename that will be appended *NOT OVERWRITTEN* if it exists. If this environment variable is undefined, the statistics will be output to stderr 335 336 @code 337 KMP_STATS_THREADS 338 @endcode 339 This environment variable indicates to print thread-specific statistics as well as aggregate statistics. Each thread's statistics will be shown as well as the collective sum of all threads. The values "true", "on", "1", "yes" will all indicate to print per thread statistics. 340 341 @defgroup TASKING Tasking support 342 These functions support tasking constructs. 343 344 @defgroup USER User visible functions 345 These functions can be called directly by the user, but are runtime library specific, rather than being OpenMP interfaces. 346 347 */ 348 349