xref: /llvm-project/openmp/runtime/doc/doxygen/libomp_interface.h (revision 309b00a42e902e816dad03c2c2f1a9e41ba130bc)
1*309b00a4SShilei Tian // clang-format off
25e8470afSJim Cownie // This file does not contain any code; it just contains additional text and formatting
35e8470afSJim Cownie // for doxygen.
45e8470afSJim Cownie 
55e8470afSJim Cownie 
65e8470afSJim Cownie //===----------------------------------------------------------------------===//
75e8470afSJim Cownie //
857b08b09SChandler Carruth // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
957b08b09SChandler Carruth // See https://llvm.org/LICENSE.txt for license information.
1057b08b09SChandler Carruth // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
115e8470afSJim Cownie //
125e8470afSJim Cownie //===----------------------------------------------------------------------===//
135e8470afSJim Cownie 
145e8470afSJim Cownie 
15820b2555SAndrey Churbanov /*! @mainpage LLVM  OpenMP* Runtime Library Interface
165e8470afSJim Cownie @section sec_intro Introduction
175e8470afSJim Cownie 
185e8470afSJim Cownie This document describes the interface provided by the
19820b2555SAndrey Churbanov LLVM  OpenMP\other runtime library to the compiler.
205e8470afSJim Cownie Routines that are directly called as simple functions by user code are
215e8470afSJim Cownie not currently described here, since their definition is in the OpenMP
225e8470afSJim Cownie specification available from http://openmp.org
235e8470afSJim Cownie 
245e8470afSJim Cownie The aim here is to explain the interface from the compiler to the runtime.
255e8470afSJim Cownie 
265e8470afSJim Cownie The overall design is described, and each function in the interface
275e8470afSJim Cownie has its own description. (At least, that's the ambition, we may not be there yet).
285e8470afSJim Cownie 
291acc2dbfSJonathan Peyton @section sec_building Quickly Building the Runtime
305e8470afSJim Cownie For the impatient, we cover building the runtime as the first topic here.
315e8470afSJim Cownie 
321acc2dbfSJonathan Peyton CMake is used to build the OpenMP runtime.  For details and a full list of options for the CMake build system,
332e809acdSJonas Hahnfeld see <tt>README.rst</tt> in the source code repository.  These instructions will provide the most typical build.
341acc2dbfSJonathan Peyton 
351acc2dbfSJonathan Peyton In-LLVM-tree build:.
365e8470afSJim Cownie @code
371acc2dbfSJonathan Peyton $ cd where-you-want-to-live
381acc2dbfSJonathan Peyton Check out openmp into llvm/projects
391acc2dbfSJonathan Peyton $ cd where-you-want-to-build
401acc2dbfSJonathan Peyton $ mkdir build && cd build
411acc2dbfSJonathan Peyton $ cmake path/to/llvm -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
421acc2dbfSJonathan Peyton $ make omp
435e8470afSJim Cownie @endcode
441acc2dbfSJonathan Peyton Out-of-LLVM-tree build:
455e8470afSJim Cownie @code
461acc2dbfSJonathan Peyton $ cd where-you-want-to-live
471acc2dbfSJonathan Peyton Check out openmp
482e809acdSJonas Hahnfeld $ cd where-you-want-to-live/openmp
491acc2dbfSJonathan Peyton $ mkdir build && cd build
501acc2dbfSJonathan Peyton $ cmake path/to/openmp -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
511acc2dbfSJonathan Peyton $ make
525e8470afSJim Cownie @endcode
535e8470afSJim Cownie 
545e8470afSJim Cownie @section sec_supported Supported RTL Build Configurations
555e8470afSJim Cownie 
565e8470afSJim Cownie The architectures supported are IA-32 architecture, Intel&reg;&nbsp; 64, and
575e8470afSJim Cownie Intel&reg;&nbsp; Many Integrated Core Architecture.  The build configurations
585e8470afSJim Cownie supported are shown in the table below.
595e8470afSJim Cownie 
605e8470afSJim Cownie <table border=1>
611acc2dbfSJonathan Peyton <tr><th> <th>icc/icl<th>gcc<th>clang
621acc2dbfSJonathan Peyton <tr><td>Linux\other OS<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7)
631acc2dbfSJonathan Peyton <tr><td>FreeBSD\other<td>Yes(1,5)<td>Yes(2,4)<td>Yes(4,6,7,8)
641acc2dbfSJonathan Peyton <tr><td>OS X\other<td>Yes(1,3,4)<td>No<td>Yes(4,6,7)
651acc2dbfSJonathan Peyton <tr><td>Windows\other OS<td>Yes(1,4)<td>No<td>No
665e8470afSJim Cownie </table>
675e8470afSJim Cownie (1) On IA-32 architecture and Intel&reg;&nbsp; 64, icc/icl versions 12.x
685e8470afSJim Cownie     are supported (12.1 is recommended).<br>
691acc2dbfSJonathan Peyton (2) gcc version 4.7 is supported.<br>
705e8470afSJim Cownie (3) For icc on OS X\other, OS X\other version 10.5.8 is supported.<br>
715e8470afSJim Cownie (4) Intel&reg;&nbsp; Many Integrated Core Architecture not supported.<br>
721acc2dbfSJonathan Peyton (5) On Intel&reg;&nbsp; Many Integrated Core Architecture, icc/icl versions 13.0 or later are required.<br>
731acc2dbfSJonathan Peyton (6) Clang\other version 3.3 is supported.<br>
741acc2dbfSJonathan Peyton (7) Clang\other currently does not offer a software-implemented 128 bit extended
751acc2dbfSJonathan Peyton     precision type.  Thus, all entry points reliant on this type are removed
761acc2dbfSJonathan Peyton     from the library and cannot be called in the user program.  The following
771acc2dbfSJonathan Peyton     functions are not available:
781acc2dbfSJonathan Peyton @code
791acc2dbfSJonathan Peyton     __kmpc_atomic_cmplx16_*
801acc2dbfSJonathan Peyton     __kmpc_atomic_float16_*
811acc2dbfSJonathan Peyton     __kmpc_atomic_*_fp
821acc2dbfSJonathan Peyton @endcode
831acc2dbfSJonathan Peyton (8) Community contribution provided AS IS, not tested by Intel.
841acc2dbfSJonathan Peyton 
851acc2dbfSJonathan Peyton Supported Architectures: IBM(R) Power 7 and Power 8
861acc2dbfSJonathan Peyton <table border=1>
871acc2dbfSJonathan Peyton <tr><th> <th>gcc<th>clang
881acc2dbfSJonathan Peyton <tr><td>Linux\other OS<td>Yes(1,2)<td>Yes(3,4)
891acc2dbfSJonathan Peyton </table>
901acc2dbfSJonathan Peyton (1) On Power 7, gcc version 4.8.2 is supported.<br>
911acc2dbfSJonathan Peyton (2) On Power 8, gcc version 4.8.2 is supported.<br>
921acc2dbfSJonathan Peyton (3) On Power 7, clang version 3.7 is supported.<br>
931acc2dbfSJonathan Peyton (4) On Power 8, clang version 3.7 is supported.<br>
945e8470afSJim Cownie 
955e8470afSJim Cownie @section sec_frontend Front-end Compilers that work with this RTL
965e8470afSJim Cownie 
975e8470afSJim Cownie The following compilers are known to do compatible code generation for
985e8470afSJim Cownie this RTL: icc/icl, gcc.  Code generation is discussed in more detail
995e8470afSJim Cownie later in this document.
1005e8470afSJim Cownie 
1015e8470afSJim Cownie @section sec_outlining Outlining
1025e8470afSJim Cownie 
1035e8470afSJim Cownie The runtime interface is based on the idea that the compiler
1045e8470afSJim Cownie "outlines" sections of code that are to run in parallel into separate
1055e8470afSJim Cownie functions that can then be invoked in multiple threads.  For instance,
1065e8470afSJim Cownie simple code like this
1075e8470afSJim Cownie 
1085e8470afSJim Cownie @code
1095e8470afSJim Cownie void foo()
1105e8470afSJim Cownie {
1115e8470afSJim Cownie #pragma omp parallel
1125e8470afSJim Cownie     {
1135e8470afSJim Cownie         ... do something ...
1145e8470afSJim Cownie     }
1155e8470afSJim Cownie }
1165e8470afSJim Cownie @endcode
1175e8470afSJim Cownie is converted into something that looks conceptually like this (where
1185e8470afSJim Cownie the names used are merely illustrative; the real library function
1195e8470afSJim Cownie names will be used later after we've discussed some more issues...)
1205e8470afSJim Cownie 
1215e8470afSJim Cownie @code
1225e8470afSJim Cownie static void outlinedFooBody()
1235e8470afSJim Cownie {
1245e8470afSJim Cownie     ... do something ...
1255e8470afSJim Cownie }
1265e8470afSJim Cownie 
1275e8470afSJim Cownie void foo()
1285e8470afSJim Cownie {
1295e8470afSJim Cownie     __OMP_runtime_fork(outlinedFooBody, (void*)0);   // Not the real function name!
1305e8470afSJim Cownie }
1315e8470afSJim Cownie @endcode
1325e8470afSJim Cownie 
1335e8470afSJim Cownie @subsection SEC_SHAREDVARS Addressing shared variables
1345e8470afSJim Cownie 
1355e8470afSJim Cownie In real uses of the OpenMP\other API there are normally references
1365e8470afSJim Cownie from the outlined code  to shared variables that are in scope in the containing function.
1375e8470afSJim Cownie Therefore the containing function must be able to address
1385e8470afSJim Cownie these variables. The runtime supports two alternate ways of doing
1395e8470afSJim Cownie this.
1405e8470afSJim Cownie 
1415e8470afSJim Cownie @subsubsection SEC_SEC_OT Current Technique
1425e8470afSJim Cownie The technique currently supported by the runtime library is to receive
1435e8470afSJim Cownie a separate pointer to each shared variable that can be accessed from
1445e8470afSJim Cownie the outlined function.  This is what is shown in the example below.
1455e8470afSJim Cownie 
1465e8470afSJim Cownie We hope soon to provide an alternative interface to support the
1475e8470afSJim Cownie alternate implementation described in the next section. The
1485e8470afSJim Cownie alternative implementation has performance advantages for small
1495e8470afSJim Cownie parallel regions that have many shared variables.
1505e8470afSJim Cownie 
1515e8470afSJim Cownie @subsubsection SEC_SEC_PT Future Technique
1525e8470afSJim Cownie The idea is to treat the outlined function as though it
1535e8470afSJim Cownie were a lexically nested function, and pass it a single argument which
1545e8470afSJim Cownie is the pointer to the parent's stack frame. Provided that the compiler
1555e8470afSJim Cownie knows the layout of the parent frame when it is generating the outlined
1565e8470afSJim Cownie function it can then access the up-level variables at appropriate
1575e8470afSJim Cownie offsets from the parent frame.  This is a classical compiler technique
1585e8470afSJim Cownie from the 1960s to support languages like Algol (and its descendants)
1595e8470afSJim Cownie that support lexically nested functions.
1605e8470afSJim Cownie 
1615e8470afSJim Cownie The main benefit of this technique is that there is no code required
1625e8470afSJim Cownie at the fork point to marshal the arguments to the outlined function.
1635e8470afSJim Cownie Since the runtime knows statically how many arguments must be passed to the
1645e8470afSJim Cownie outlined function, it can easily copy them to the thread's stack
1655e8470afSJim Cownie frame.  Therefore the performance of the fork code is independent of
1665e8470afSJim Cownie the number of shared variables that are accessed by the outlined
1675e8470afSJim Cownie function.
1685e8470afSJim Cownie 
1695e8470afSJim Cownie If it is hard to determine the stack layout of the parent while generating the
1705e8470afSJim Cownie outlined code, it is still possible to use this approach by collecting all of
1715e8470afSJim Cownie the variables in the parent that are accessed from outlined functions into
1725e8470afSJim Cownie a single `struct` which is placed on the stack, and whose address is passed
1735e8470afSJim Cownie to the outlined functions. In this way the offsets of the shared variables
1745e8470afSJim Cownie are known (since they are inside the struct) without needing to know
1755e8470afSJim Cownie the complete layout of the parent stack-frame. From the point of view
1765e8470afSJim Cownie of the runtime either of these techniques is equivalent, since in either
1775e8470afSJim Cownie case it only has to pass a single argument to the outlined function to allow
1785e8470afSJim Cownie it to access shared variables.
1795e8470afSJim Cownie 
1805e8470afSJim Cownie A scheme like this is how gcc\other generates outlined functions.
1815e8470afSJim Cownie 
1825e8470afSJim Cownie @section SEC_INTERFACES Library Interfaces
1835e8470afSJim Cownie The library functions used for specific parts of the OpenMP\other language implementation
1845e8470afSJim Cownie are documented in different modules.
1855e8470afSJim Cownie 
1865e8470afSJim Cownie  - @ref BASIC_TYPES fundamental types used by the runtime in many places
1875e8470afSJim Cownie  - @ref DEPRECATED  functions that are in the library but are no longer required
1885e8470afSJim Cownie  - @ref STARTUP_SHUTDOWN functions for initializing and finalizing the runtime
1895e8470afSJim Cownie  - @ref PARALLEL functions for implementing `omp parallel`
1905e8470afSJim Cownie  - @ref THREAD_STATES functions for supporting thread state inquiries
1915e8470afSJim Cownie  - @ref WORK_SHARING functions for work sharing constructs such as `omp for`, `omp sections`
1925e8470afSJim Cownie  - @ref THREADPRIVATE functions to support thread private data, copyin etc
1935e8470afSJim Cownie  - @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc
1945e8470afSJim Cownie  - @ref ATOMIC_OPS functions to support atomic operations
195469dcc63SJonathan Peyton  - @ref STATS_GATHERING macros to support developer profiling of libomp
1965e8470afSJim Cownie  - Documentation on tasking has still to be written...
1975e8470afSJim Cownie 
1985e8470afSJim Cownie @section SEC_EXAMPLES Examples
1995e8470afSJim Cownie @subsection SEC_WORKSHARING_EXAMPLE Work Sharing Example
2005e8470afSJim Cownie This example shows the code generated for a parallel for with reduction and dynamic scheduling.
2015e8470afSJim Cownie 
2025e8470afSJim Cownie @code
2035e8470afSJim Cownie extern float foo( void );
2045e8470afSJim Cownie 
2055e8470afSJim Cownie int main () {
2065e8470afSJim Cownie     int i;
2075e8470afSJim Cownie     float r = 0.0;
2085e8470afSJim Cownie     #pragma omp parallel for schedule(dynamic) reduction(+:r)
2095e8470afSJim Cownie     for ( i = 0; i < 10; i ++ ) {
2105e8470afSJim Cownie         r += foo();
2115e8470afSJim Cownie     }
2125e8470afSJim Cownie }
2135e8470afSJim Cownie @endcode
2145e8470afSJim Cownie 
2155e8470afSJim Cownie The transformed code looks like this.
2165e8470afSJim Cownie @code
2175e8470afSJim Cownie extern float foo( void );
2185e8470afSJim Cownie 
2195e8470afSJim Cownie int main () {
2205e8470afSJim Cownie     static int zero = 0;
2215e8470afSJim Cownie     auto int gtid;
2225e8470afSJim Cownie     auto float r = 0.0;
2235e8470afSJim Cownie     __kmpc_begin( & loc3, 0 );
2245e8470afSJim Cownie     // The gtid is not actually required in this example so could be omitted;
2255e8470afSJim Cownie     // We show its initialization here because it is often required for calls into
2265e8470afSJim Cownie     // the runtime and should be locally cached like this.
2275e8470afSJim Cownie     gtid = __kmpc_global thread num( & loc3 );
2285e8470afSJim Cownie     __kmpc_fork call( & loc7, 1, main_7_parallel_3, & r );
2295e8470afSJim Cownie     __kmpc_end( & loc0 );
2305e8470afSJim Cownie     return 0;
2315e8470afSJim Cownie }
2325e8470afSJim Cownie 
2335e8470afSJim Cownie struct main_10_reduction_t_5 { float r_10_rpr; };
2345e8470afSJim Cownie 
2355e8470afSJim Cownie static kmp_critical_name lck = { 0 };
2365e8470afSJim Cownie static ident_t loc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set
2375e8470afSJim Cownie                       // if compiler has generated an atomic reduction.
2385e8470afSJim Cownie 
2395e8470afSJim Cownie void main_7_parallel_3( int *gtid, int *btid, float *r_7_shp ) {
2405e8470afSJim Cownie     auto int i_7_pr;
2415e8470afSJim Cownie     auto int lower, upper, liter, incr;
2425e8470afSJim Cownie     auto struct main_10_reduction_t_5 reduce;
2435e8470afSJim Cownie     reduce.r_10_rpr = 0.F;
2445e8470afSJim Cownie     liter = 0;
2455e8470afSJim Cownie     __kmpc_dispatch_init_4( & loc7,*gtid, 35, 0, 9, 1, 1 );
2465e8470afSJim Cownie     while ( __kmpc_dispatch_next_4( & loc7, *gtid, & liter, & lower, & upper, & incr ) ) {
2475e8470afSJim Cownie         for( i_7_pr = lower; upper >= i_7_pr; i_7_pr ++ )
2485e8470afSJim Cownie           reduce.r_10_rpr += foo();
2495e8470afSJim Cownie     }
2505e8470afSJim Cownie     switch( __kmpc_reduce_nowait( & loc10, *gtid, 1, 4, & reduce, main_10_reduce_5, & lck ) ) {
2515e8470afSJim Cownie         case 1:
2525e8470afSJim Cownie            *r_7_shp += reduce.r_10_rpr;
2535e8470afSJim Cownie            __kmpc_end_reduce_nowait( & loc10, *gtid, & lck );
2545e8470afSJim Cownie            break;
2555e8470afSJim Cownie         case 2:
2565e8470afSJim Cownie            __kmpc_atomic_float4_add( & loc10, *gtid, r_7_shp, reduce.r_10_rpr );
2575e8470afSJim Cownie            break;
2585e8470afSJim Cownie         default:;
2595e8470afSJim Cownie     }
2605e8470afSJim Cownie }
2615e8470afSJim Cownie 
2625e8470afSJim Cownie void main_10_reduce_5( struct main_10_reduction_t_5 *reduce_lhs,
2635e8470afSJim Cownie                        struct main_10_reduction_t_5 *reduce_rhs )
2645e8470afSJim Cownie {
2655e8470afSJim Cownie     reduce_lhs->r_10_rpr += reduce_rhs->r_10_rpr;
2665e8470afSJim Cownie }
2675e8470afSJim Cownie @endcode
2685e8470afSJim Cownie 
2695e8470afSJim Cownie @defgroup BASIC_TYPES Basic Types
2705e8470afSJim Cownie Types that are used throughout the runtime.
2715e8470afSJim Cownie 
2725e8470afSJim Cownie @defgroup DEPRECATED Deprecated Functions
2735e8470afSJim Cownie Functions in this group are for backwards compatibility only, and
2745e8470afSJim Cownie should not be used in new code.
2755e8470afSJim Cownie 
2765e8470afSJim Cownie @defgroup STARTUP_SHUTDOWN Startup and Shutdown
2775e8470afSJim Cownie These functions are for library initialization and shutdown.
2785e8470afSJim Cownie 
2795e8470afSJim Cownie @defgroup PARALLEL Parallel (fork/join)
2805e8470afSJim Cownie These functions are used for implementing <tt>\#pragma omp parallel</tt>.
2815e8470afSJim Cownie 
2825e8470afSJim Cownie @defgroup THREAD_STATES Thread Information
2835e8470afSJim Cownie These functions return information about the currently executing thread.
2845e8470afSJim Cownie 
2855e8470afSJim Cownie @defgroup WORK_SHARING Work Sharing
2865e8470afSJim Cownie These functions are used for implementing
2875e8470afSJim Cownie <tt>\#pragma omp for</tt>, <tt>\#pragma omp sections</tt>, <tt>\#pragma omp single</tt> and
2885e8470afSJim Cownie <tt>\#pragma omp master</tt> constructs.
2895e8470afSJim Cownie 
2905e8470afSJim Cownie When handling loops, there are different functions for each of the signed and unsigned 32 and 64 bit integer types
2915e8470afSJim Cownie which have the name suffixes `_4`, `_4u`, `_8` and `_8u`. The semantics of each of the functions is the same,
2925e8470afSJim Cownie so they are only described once.
2935e8470afSJim Cownie 
2945e8470afSJim Cownie Static loop scheduling is handled by  @ref __kmpc_for_static_init_4 and friends. Only a single call is needed,
2955e8470afSJim Cownie since the iterations to be executed by any give thread can be determined as soon as the loop parameters are known.
2965e8470afSJim Cownie 
2975e8470afSJim Cownie Dynamic scheduling is handled by the @ref __kmpc_dispatch_init_4 and @ref __kmpc_dispatch_next_4 functions.
2985e8470afSJim Cownie The init function is called once in each thread outside the loop, while the next function is called each
2995e8470afSJim Cownie time that the previous chunk of work has been exhausted.
3005e8470afSJim Cownie 
3015e8470afSJim Cownie @defgroup SYNCHRONIZATION Synchronization
3025e8470afSJim Cownie These functions are used for implementing barriers.
3035e8470afSJim Cownie 
3045e8470afSJim Cownie @defgroup THREADPRIVATE Thread private data support
3055e8470afSJim Cownie These functions support copyin/out and thread private data.
3065e8470afSJim Cownie 
3074cc4bb4cSJim Cownie @defgroup STATS_GATHERING Statistics Gathering from OMPTB
308469dcc63SJonathan Peyton These macros support profiling the libomp library.  Use --stats=on when building with build.pl to enable
309469dcc63SJonathan Peyton and then use the KMP_* macros to profile (through counts or clock ticks) libomp during execution of an OpenMP program.
3104cc4bb4cSJim Cownie 
3114cc4bb4cSJim Cownie @section sec_stats_env_vars Environment Variables
3124cc4bb4cSJim Cownie 
313bb02c254SJonathan Peyton This section describes the environment variables relevant to stats-gathering in libomp
3144cc4bb4cSJim Cownie 
3154cc4bb4cSJim Cownie @code
3164cc4bb4cSJim Cownie KMP_STATS_FILE
3174cc4bb4cSJim Cownie @endcode
3184cc4bb4cSJim Cownie This environment variable is set to an output filename that will be appended *NOT OVERWRITTEN* if it exists.  If this environment variable is undefined, the statistics will be output to stderr
3194cc4bb4cSJim Cownie 
3204cc4bb4cSJim Cownie @code
3214cc4bb4cSJim Cownie KMP_STATS_THREADS
3224cc4bb4cSJim Cownie @endcode
3234cc4bb4cSJim Cownie This environment variable indicates to print thread-specific statistics as well as aggregate statistics.  Each thread's statistics will be shown as well as the collective sum of all threads.  The values "true", "on", "1", "yes" will all indicate to print per thread statistics.
3244cc4bb4cSJim Cownie 
3255e8470afSJim Cownie @defgroup TASKING Tasking support
3264cc4bb4cSJim Cownie These functions support tasking constructs.
3274cc4bb4cSJim Cownie 
3284cc4bb4cSJim Cownie @defgroup USER User visible functions
3294cc4bb4cSJim Cownie These functions can be called directly by the user, but are runtime library specific, rather than being OpenMP interfaces.
3305e8470afSJim Cownie 
3315e8470afSJim Cownie */
3325e8470afSJim Cownie 
333