1*309b00a4SShilei Tian // clang-format off 25e8470afSJim Cownie // This file does not contain any code; it just contains additional text and formatting 35e8470afSJim Cownie // for doxygen. 45e8470afSJim Cownie 55e8470afSJim Cownie 65e8470afSJim Cownie //===----------------------------------------------------------------------===// 75e8470afSJim Cownie // 857b08b09SChandler Carruth // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 957b08b09SChandler Carruth // See https://llvm.org/LICENSE.txt for license information. 1057b08b09SChandler Carruth // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 115e8470afSJim Cownie // 125e8470afSJim Cownie //===----------------------------------------------------------------------===// 135e8470afSJim Cownie 145e8470afSJim Cownie 15820b2555SAndrey Churbanov /*! @mainpage LLVM OpenMP* Runtime Library Interface 165e8470afSJim Cownie @section sec_intro Introduction 175e8470afSJim Cownie 185e8470afSJim Cownie This document describes the interface provided by the 19820b2555SAndrey Churbanov LLVM OpenMP\other runtime library to the compiler. 205e8470afSJim Cownie Routines that are directly called as simple functions by user code are 215e8470afSJim Cownie not currently described here, since their definition is in the OpenMP 225e8470afSJim Cownie specification available from http://openmp.org 235e8470afSJim Cownie 245e8470afSJim Cownie The aim here is to explain the interface from the compiler to the runtime. 255e8470afSJim Cownie 265e8470afSJim Cownie The overall design is described, and each function in the interface 275e8470afSJim Cownie has its own description. (At least, that's the ambition, we may not be there yet). 285e8470afSJim Cownie 291acc2dbfSJonathan Peyton @section sec_building Quickly Building the Runtime 305e8470afSJim Cownie For the impatient, we cover building the runtime as the first topic here. 315e8470afSJim Cownie 321acc2dbfSJonathan Peyton CMake is used to build the OpenMP runtime. For details and a full list of options for the CMake build system, 332e809acdSJonas Hahnfeld see <tt>README.rst</tt> in the source code repository. These instructions will provide the most typical build. 341acc2dbfSJonathan Peyton 351acc2dbfSJonathan Peyton In-LLVM-tree build:. 365e8470afSJim Cownie @code 371acc2dbfSJonathan Peyton $ cd where-you-want-to-live 381acc2dbfSJonathan Peyton Check out openmp into llvm/projects 391acc2dbfSJonathan Peyton $ cd where-you-want-to-build 401acc2dbfSJonathan Peyton $ mkdir build && cd build 411acc2dbfSJonathan Peyton $ cmake path/to/llvm -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler> 421acc2dbfSJonathan Peyton $ make omp 435e8470afSJim Cownie @endcode 441acc2dbfSJonathan Peyton Out-of-LLVM-tree build: 455e8470afSJim Cownie @code 461acc2dbfSJonathan Peyton $ cd where-you-want-to-live 471acc2dbfSJonathan Peyton Check out openmp 482e809acdSJonas Hahnfeld $ cd where-you-want-to-live/openmp 491acc2dbfSJonathan Peyton $ mkdir build && cd build 501acc2dbfSJonathan Peyton $ cmake path/to/openmp -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler> 511acc2dbfSJonathan Peyton $ make 525e8470afSJim Cownie @endcode 535e8470afSJim Cownie 545e8470afSJim Cownie @section sec_supported Supported RTL Build Configurations 555e8470afSJim Cownie 565e8470afSJim Cownie The architectures supported are IA-32 architecture, Intel® 64, and 575e8470afSJim Cownie Intel® Many Integrated Core Architecture. The build configurations 585e8470afSJim Cownie supported are shown in the table below. 595e8470afSJim Cownie 605e8470afSJim Cownie <table border=1> 611acc2dbfSJonathan Peyton <tr><th> <th>icc/icl<th>gcc<th>clang 621acc2dbfSJonathan Peyton <tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7) 631acc2dbfSJonathan Peyton <tr><td>FreeBSD\other<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7,8) 641acc2dbfSJonathan Peyton <tr><td>OS X\other<td>Yes(1,3,4)<td>No<td>Yes(4,6,7) 651acc2dbfSJonathan Peyton <tr><td>Windows\other OS<td>Yes(1,4)<td>No<td>No 665e8470afSJim Cownie </table> 675e8470afSJim Cownie (1) On IA-32 architecture and Intel® 64, icc/icl versions 12.x 685e8470afSJim Cownie are supported (12.1 is recommended).<br> 691acc2dbfSJonathan Peyton (2) gcc version 4.7 is supported.<br> 705e8470afSJim Cownie (3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br> 715e8470afSJim Cownie (4) Intel® Many Integrated Core Architecture not supported.<br> 721acc2dbfSJonathan Peyton (5) On Intel® Many Integrated Core Architecture, icc/icl versions 13.0 or later are required.<br> 731acc2dbfSJonathan Peyton (6) Clang\other version 3.3 is supported.<br> 741acc2dbfSJonathan Peyton (7) Clang\other currently does not offer a software-implemented 128 bit extended 751acc2dbfSJonathan Peyton precision type. Thus, all entry points reliant on this type are removed 761acc2dbfSJonathan Peyton from the library and cannot be called in the user program. The following 771acc2dbfSJonathan Peyton functions are not available: 781acc2dbfSJonathan Peyton @code 791acc2dbfSJonathan Peyton __kmpc_atomic_cmplx16_* 801acc2dbfSJonathan Peyton __kmpc_atomic_float16_* 811acc2dbfSJonathan Peyton __kmpc_atomic_*_fp 821acc2dbfSJonathan Peyton @endcode 831acc2dbfSJonathan Peyton (8) Community contribution provided AS IS, not tested by Intel. 841acc2dbfSJonathan Peyton 851acc2dbfSJonathan Peyton Supported Architectures: IBM(R) Power 7 and Power 8 861acc2dbfSJonathan Peyton <table border=1> 871acc2dbfSJonathan Peyton <tr><th> <th>gcc<th>clang 881acc2dbfSJonathan Peyton <tr><td>Linux\other OS<td>Yes(1,2)<td>Yes(3,4) 891acc2dbfSJonathan Peyton </table> 901acc2dbfSJonathan Peyton (1) On Power 7, gcc version 4.8.2 is supported.<br> 911acc2dbfSJonathan Peyton (2) On Power 8, gcc version 4.8.2 is supported.<br> 921acc2dbfSJonathan Peyton (3) On Power 7, clang version 3.7 is supported.<br> 931acc2dbfSJonathan Peyton (4) On Power 8, clang version 3.7 is supported.<br> 945e8470afSJim Cownie 955e8470afSJim Cownie @section sec_frontend Front-end Compilers that work with this RTL 965e8470afSJim Cownie 975e8470afSJim Cownie The following compilers are known to do compatible code generation for 985e8470afSJim Cownie this RTL: icc/icl, gcc. Code generation is discussed in more detail 995e8470afSJim Cownie later in this document. 1005e8470afSJim Cownie 1015e8470afSJim Cownie @section sec_outlining Outlining 1025e8470afSJim Cownie 1035e8470afSJim Cownie The runtime interface is based on the idea that the compiler 1045e8470afSJim Cownie "outlines" sections of code that are to run in parallel into separate 1055e8470afSJim Cownie functions that can then be invoked in multiple threads. For instance, 1065e8470afSJim Cownie simple code like this 1075e8470afSJim Cownie 1085e8470afSJim Cownie @code 1095e8470afSJim Cownie void foo() 1105e8470afSJim Cownie { 1115e8470afSJim Cownie #pragma omp parallel 1125e8470afSJim Cownie { 1135e8470afSJim Cownie ... do something ... 1145e8470afSJim Cownie } 1155e8470afSJim Cownie } 1165e8470afSJim Cownie @endcode 1175e8470afSJim Cownie is converted into something that looks conceptually like this (where 1185e8470afSJim Cownie the names used are merely illustrative; the real library function 1195e8470afSJim Cownie names will be used later after we've discussed some more issues...) 1205e8470afSJim Cownie 1215e8470afSJim Cownie @code 1225e8470afSJim Cownie static void outlinedFooBody() 1235e8470afSJim Cownie { 1245e8470afSJim Cownie ... do something ... 1255e8470afSJim Cownie } 1265e8470afSJim Cownie 1275e8470afSJim Cownie void foo() 1285e8470afSJim Cownie { 1295e8470afSJim Cownie __OMP_runtime_fork(outlinedFooBody, (void*)0); // Not the real function name! 1305e8470afSJim Cownie } 1315e8470afSJim Cownie @endcode 1325e8470afSJim Cownie 1335e8470afSJim Cownie @subsection SEC_SHAREDVARS Addressing shared variables 1345e8470afSJim Cownie 1355e8470afSJim Cownie In real uses of the OpenMP\other API there are normally references 1365e8470afSJim Cownie from the outlined code to shared variables that are in scope in the containing function. 1375e8470afSJim Cownie Therefore the containing function must be able to address 1385e8470afSJim Cownie these variables. The runtime supports two alternate ways of doing 1395e8470afSJim Cownie this. 1405e8470afSJim Cownie 1415e8470afSJim Cownie @subsubsection SEC_SEC_OT Current Technique 1425e8470afSJim Cownie The technique currently supported by the runtime library is to receive 1435e8470afSJim Cownie a separate pointer to each shared variable that can be accessed from 1445e8470afSJim Cownie the outlined function. This is what is shown in the example below. 1455e8470afSJim Cownie 1465e8470afSJim Cownie We hope soon to provide an alternative interface to support the 1475e8470afSJim Cownie alternate implementation described in the next section. The 1485e8470afSJim Cownie alternative implementation has performance advantages for small 1495e8470afSJim Cownie parallel regions that have many shared variables. 1505e8470afSJim Cownie 1515e8470afSJim Cownie @subsubsection SEC_SEC_PT Future Technique 1525e8470afSJim Cownie The idea is to treat the outlined function as though it 1535e8470afSJim Cownie were a lexically nested function, and pass it a single argument which 1545e8470afSJim Cownie is the pointer to the parent's stack frame. Provided that the compiler 1555e8470afSJim Cownie knows the layout of the parent frame when it is generating the outlined 1565e8470afSJim Cownie function it can then access the up-level variables at appropriate 1575e8470afSJim Cownie offsets from the parent frame. This is a classical compiler technique 1585e8470afSJim Cownie from the 1960s to support languages like Algol (and its descendants) 1595e8470afSJim Cownie that support lexically nested functions. 1605e8470afSJim Cownie 1615e8470afSJim Cownie The main benefit of this technique is that there is no code required 1625e8470afSJim Cownie at the fork point to marshal the arguments to the outlined function. 1635e8470afSJim Cownie Since the runtime knows statically how many arguments must be passed to the 1645e8470afSJim Cownie outlined function, it can easily copy them to the thread's stack 1655e8470afSJim Cownie frame. Therefore the performance of the fork code is independent of 1665e8470afSJim Cownie the number of shared variables that are accessed by the outlined 1675e8470afSJim Cownie function. 1685e8470afSJim Cownie 1695e8470afSJim Cownie If it is hard to determine the stack layout of the parent while generating the 1705e8470afSJim Cownie outlined code, it is still possible to use this approach by collecting all of 1715e8470afSJim Cownie the variables in the parent that are accessed from outlined functions into 1725e8470afSJim Cownie a single `struct` which is placed on the stack, and whose address is passed 1735e8470afSJim Cownie to the outlined functions. In this way the offsets of the shared variables 1745e8470afSJim Cownie are known (since they are inside the struct) without needing to know 1755e8470afSJim Cownie the complete layout of the parent stack-frame. From the point of view 1765e8470afSJim Cownie of the runtime either of these techniques is equivalent, since in either 1775e8470afSJim Cownie case it only has to pass a single argument to the outlined function to allow 1785e8470afSJim Cownie it to access shared variables. 1795e8470afSJim Cownie 1805e8470afSJim Cownie A scheme like this is how gcc\other generates outlined functions. 1815e8470afSJim Cownie 1825e8470afSJim Cownie @section SEC_INTERFACES Library Interfaces 1835e8470afSJim Cownie The library functions used for specific parts of the OpenMP\other language implementation 1845e8470afSJim Cownie are documented in different modules. 1855e8470afSJim Cownie 1865e8470afSJim Cownie - @ref BASIC_TYPES fundamental types used by the runtime in many places 1875e8470afSJim Cownie - @ref DEPRECATED functions that are in the library but are no longer required 1885e8470afSJim Cownie - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime 1895e8470afSJim Cownie - @ref PARALLEL functions for implementing `omp parallel` 1905e8470afSJim Cownie - @ref THREAD_STATES functions for supporting thread state inquiries 1915e8470afSJim Cownie - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections` 1925e8470afSJim Cownie - @ref THREADPRIVATE functions to support thread private data, copyin etc 1935e8470afSJim Cownie - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc 1945e8470afSJim Cownie - @ref ATOMIC_OPS functions to support atomic operations 195469dcc63SJonathan Peyton - @ref STATS_GATHERING macros to support developer profiling of libomp 1965e8470afSJim Cownie - Documentation on tasking has still to be written... 1975e8470afSJim Cownie 1985e8470afSJim Cownie @section SEC_EXAMPLES Examples 1995e8470afSJim Cownie @subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example 2005e8470afSJim Cownie This example shows the code generated for a parallel for with reduction and dynamic scheduling. 2015e8470afSJim Cownie 2025e8470afSJim Cownie @code 2035e8470afSJim Cownie extern float foo( void ); 2045e8470afSJim Cownie 2055e8470afSJim Cownie int main () { 2065e8470afSJim Cownie int i; 2075e8470afSJim Cownie float r = 0.0; 2085e8470afSJim Cownie #pragma omp parallel for schedule(dynamic) reduction(+:r) 2095e8470afSJim Cownie for ( i = 0; i < 10; i ++ ) { 2105e8470afSJim Cownie r += foo(); 2115e8470afSJim Cownie } 2125e8470afSJim Cownie } 2135e8470afSJim Cownie @endcode 2145e8470afSJim Cownie 2155e8470afSJim Cownie The transformed code looks like this. 2165e8470afSJim Cownie @code 2175e8470afSJim Cownie extern float foo( void ); 2185e8470afSJim Cownie 2195e8470afSJim Cownie int main () { 2205e8470afSJim Cownie static int zero = 0; 2215e8470afSJim Cownie auto int gtid; 2225e8470afSJim Cownie auto float r = 0.0; 2235e8470afSJim Cownie __kmpc_begin( & loc3, 0 ); 2245e8470afSJim Cownie // The gtid is not actually required in this example so could be omitted; 2255e8470afSJim Cownie // We show its initialization here because it is often required for calls into 2265e8470afSJim Cownie // the runtime and should be locally cached like this. 2275e8470afSJim Cownie gtid = __kmpc_global thread num( & loc3 ); 2285e8470afSJim Cownie __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r ); 2295e8470afSJim Cownie __kmpc_end( & loc0 ); 2305e8470afSJim Cownie return 0; 2315e8470afSJim Cownie } 2325e8470afSJim Cownie 2335e8470afSJim Cownie struct main_10_reduction_t_5 { float r_10_rpr; }; 2345e8470afSJim Cownie 2355e8470afSJim Cownie static kmp_critical_name lck = { 0 }; 2365e8470afSJim Cownie static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set 2375e8470afSJim Cownie // if compiler has generated an atomic reduction. 2385e8470afSJim Cownie 2395e8470afSJim Cownie void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) { 2405e8470afSJim Cownie auto int i_7_pr; 2415e8470afSJim Cownie auto int lower, upper, liter, incr; 2425e8470afSJim Cownie auto struct main_10_reduction_t_5 reduce; 2435e8470afSJim Cownie reduce.r_10_rpr = 0.F; 2445e8470afSJim Cownie liter = 0; 2455e8470afSJim Cownie __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 ); 2465e8470afSJim Cownie while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) { 2475e8470afSJim Cownie for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ ) 2485e8470afSJim Cownie reduce.r_10_rpr += foo(); 2495e8470afSJim Cownie } 2505e8470afSJim Cownie switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) { 2515e8470afSJim Cownie case 1: 2525e8470afSJim Cownie *r_7_shp += reduce.r_10_rpr; 2535e8470afSJim Cownie __kmpc_end_reduce_nowait( & loc10, *gtid, & lck ); 2545e8470afSJim Cownie break; 2555e8470afSJim Cownie case 2: 2565e8470afSJim Cownie __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr ); 2575e8470afSJim Cownie break; 2585e8470afSJim Cownie default:; 2595e8470afSJim Cownie } 2605e8470afSJim Cownie } 2615e8470afSJim Cownie 2625e8470afSJim Cownie void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs, 2635e8470afSJim Cownie struct main_10_reduction_t_5 *reduce_rhs ) 2645e8470afSJim Cownie { 2655e8470afSJim Cownie reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr; 2665e8470afSJim Cownie } 2675e8470afSJim Cownie @endcode 2685e8470afSJim Cownie 2695e8470afSJim Cownie @defgroup BASIC_TYPES Basic Types 2705e8470afSJim Cownie Types that are used throughout the runtime. 2715e8470afSJim Cownie 2725e8470afSJim Cownie @defgroup DEPRECATED Deprecated Functions 2735e8470afSJim Cownie Functions in this group are for backwards compatibility only, and 2745e8470afSJim Cownie should not be used in new code. 2755e8470afSJim Cownie 2765e8470afSJim Cownie @defgroup STARTUP_SHUTDOWN Startup and Shutdown 2775e8470afSJim Cownie These functions are for library initialization and shutdown. 2785e8470afSJim Cownie 2795e8470afSJim Cownie @defgroup PARALLEL Parallel (fork/join) 2805e8470afSJim Cownie These functions are used for implementing <tt>\#pragma omp parallel</tt>. 2815e8470afSJim Cownie 2825e8470afSJim Cownie @defgroup THREAD_STATES Thread Information 2835e8470afSJim Cownie These functions return information about the currently executing thread. 2845e8470afSJim Cownie 2855e8470afSJim Cownie @defgroup WORK_SHARING Work Sharing 2865e8470afSJim Cownie These functions are used for implementing 2875e8470afSJim Cownie <tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and 2885e8470afSJim Cownie <tt>\#pragma omp master</tt> constructs. 2895e8470afSJim Cownie 2905e8470afSJim Cownie When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types 2915e8470afSJim Cownie which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same, 2925e8470afSJim Cownie so they are only described once. 2935e8470afSJim Cownie 2945e8470afSJim Cownie Static loop scheduling is handled by @ref __kmpc_for_static_init_4 and friends. Only a single call is needed, 2955e8470afSJim Cownie since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known. 2965e8470afSJim Cownie 2975e8470afSJim Cownie Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions. 2985e8470afSJim Cownie The init function is called once in each thread outside the loop, while the next function is called each 2995e8470afSJim Cownie time that the previous chunk of work has been exhausted. 3005e8470afSJim Cownie 3015e8470afSJim Cownie @defgroup SYNCHRONIZATION Synchronization 3025e8470afSJim Cownie These functions are used for implementing barriers. 3035e8470afSJim Cownie 3045e8470afSJim Cownie @defgroup THREADPRIVATE Thread private data support 3055e8470afSJim Cownie These functions support copyin/out and thread private data. 3065e8470afSJim Cownie 3074cc4bb4cSJim Cownie @defgroup STATS_GATHERING Statistics Gathering from OMPTB 308469dcc63SJonathan Peyton These macros support profiling the libomp library. Use --stats=on when building with build.pl to enable 309469dcc63SJonathan Peyton and then use the KMP_* macros to profile (through counts or clock ticks) libomp during execution of an OpenMP program. 3104cc4bb4cSJim Cownie 3114cc4bb4cSJim Cownie @section sec_stats_env_vars Environment Variables 3124cc4bb4cSJim Cownie 313bb02c254SJonathan Peyton This section describes the environment variables relevant to stats-gathering in libomp 3144cc4bb4cSJim Cownie 3154cc4bb4cSJim Cownie @code 3164cc4bb4cSJim Cownie KMP_STATS_FILE 3174cc4bb4cSJim Cownie @endcode 3184cc4bb4cSJim Cownie This environment variable is set to an output filename that will be appended *NOT OVERWRITTEN* if it exists. If this environment variable is undefined, the statistics will be output to stderr 3194cc4bb4cSJim Cownie 3204cc4bb4cSJim Cownie @code 3214cc4bb4cSJim Cownie KMP_STATS_THREADS 3224cc4bb4cSJim Cownie @endcode 3234cc4bb4cSJim Cownie This environment variable indicates to print thread-specific statistics as well as aggregate statistics. Each thread's statistics will be shown as well as the collective sum of all threads. The values "true", "on", "1", "yes" will all indicate to print per thread statistics. 3244cc4bb4cSJim Cownie 3255e8470afSJim Cownie @defgroup TASKING Tasking support 3264cc4bb4cSJim Cownie These functions support tasking constructs. 3274cc4bb4cSJim Cownie 3284cc4bb4cSJim Cownie @defgroup USER User visible functions 3294cc4bb4cSJim Cownie These functions can be called directly by the user, but are runtime library specific, rather than being OpenMP interfaces. 3305e8470afSJim Cownie 3315e8470afSJim Cownie */ 3325e8470afSJim Cownie 333