1\input texinfo @c -*-texinfo-*- 2 3@c %**start of header 4@setfilename libgomp.info 5@settitle GNU libgomp 6@c %**end of header 7 8 9@copying 10Copyright @copyright{} 2006-2022 Free Software Foundation, Inc. 11 12Permission is granted to copy, distribute and/or modify this document 13under the terms of the GNU Free Documentation License, Version 1.3 or 14any later version published by the Free Software Foundation; with the 15Invariant Sections being ``Funding Free Software'', the Front-Cover 16texts being (a) (see below), and with the Back-Cover Texts being (b) 17(see below). A copy of the license is included in the section entitled 18``GNU Free Documentation License''. 19 20(a) The FSF's Front-Cover Text is: 21 22 A GNU Manual 23 24(b) The FSF's Back-Cover Text is: 25 26 You have freedom to copy and modify this GNU Manual, like GNU 27 software. Copies published by the Free Software Foundation raise 28 funds for GNU development. 29@end copying 30 31@ifinfo 32@dircategory GNU Libraries 33@direntry 34* libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library. 35@end direntry 36 37This manual documents libgomp, the GNU Offloading and Multi Processing 38Runtime library. This is the GNU implementation of the OpenMP and 39OpenACC APIs for parallel and accelerator programming in C/C++ and 40Fortran. 41 42Published by the Free Software Foundation 4351 Franklin Street, Fifth Floor 44Boston, MA 02110-1301 USA 45 46@insertcopying 47@end ifinfo 48 49 50@setchapternewpage odd 51 52@titlepage 53@title GNU Offloading and Multi Processing Runtime Library 54@subtitle The GNU OpenMP and OpenACC Implementation 55@page 56@vskip 0pt plus 1filll 57@comment For the @value{version-GCC} Version* 58@sp 1 59Published by the Free Software Foundation @* 6051 Franklin Street, Fifth Floor@* 61Boston, MA 02110-1301, USA@* 62@sp 1 63@insertcopying 64@end titlepage 65 66@summarycontents 67@contents 68@page 69 70 71@node Top, Enabling OpenMP 72@top Introduction 73@cindex Introduction 74 75This manual documents the usage of libgomp, the GNU Offloading and 76Multi Processing Runtime Library. This includes the GNU 77implementation of the @uref{https://www.openmp.org, OpenMP} Application 78Programming Interface (API) for multi-platform shared-memory parallel 79programming in C/C++ and Fortran, and the GNU implementation of the 80@uref{https://www.openacc.org, OpenACC} Application Programming 81Interface (API) for offloading of code to accelerator devices in C/C++ 82and Fortran. 83 84Originally, libgomp implemented the GNU OpenMP Runtime Library. Based 85on this, support for OpenACC and offloading (both OpenACC and OpenMP 864's target construct) has been added later on, and the library's name 87changed to GNU Offloading and Multi Processing Runtime Library. 88 89 90 91@comment 92@comment When you add a new menu item, please keep the right hand 93@comment aligned to the same column. Do not use tabs. This provides 94@comment better formatting. 95@comment 96@menu 97* Enabling OpenMP:: How to enable OpenMP for your applications. 98* OpenMP Implementation Status:: List of implemented features by OpenMP version 99* OpenMP Runtime Library Routines: Runtime Library Routines. 100 The OpenMP runtime application programming 101 interface. 102* OpenMP Environment Variables: Environment Variables. 103 Influencing OpenMP runtime behavior with 104 environment variables. 105* Enabling OpenACC:: How to enable OpenACC for your 106 applications. 107* OpenACC Runtime Library Routines:: The OpenACC runtime application 108 programming interface. 109* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with 110 environment variables. 111* CUDA Streams Usage:: Notes on the implementation of 112 asynchronous operations. 113* OpenACC Library Interoperability:: OpenACC library interoperability with the 114 NVIDIA CUBLAS library. 115* OpenACC Profiling Interface:: 116* The libgomp ABI:: Notes on the external ABI presented by libgomp. 117* Reporting Bugs:: How to report bugs in the GNU Offloading and 118 Multi Processing Runtime Library. 119* Copying:: GNU general public license says 120 how you can copy and share libgomp. 121* GNU Free Documentation License:: 122 How you can copy and share this manual. 123* Funding:: How to help assure continued work for free 124 software. 125* Library Index:: Index of this documentation. 126@end menu 127 128 129@c --------------------------------------------------------------------- 130@c Enabling OpenMP 131@c --------------------------------------------------------------------- 132 133@node Enabling OpenMP 134@chapter Enabling OpenMP 135 136To activate the OpenMP extensions for C/C++ and Fortran, the compile-time 137flag @command{-fopenmp} must be specified. This enables the OpenMP directive 138@code{#pragma omp} in C/C++ and @code{!$omp} directives in free form, 139@code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form, 140@code{!$} conditional compilation sentinels in free form and @code{c$}, 141@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also 142arranges for automatic linking of the OpenMP runtime library 143(@ref{Runtime Library Routines}). 144 145A complete description of all OpenMP directives may be found in the 146@uref{https://www.openmp.org, OpenMP Application Program Interface} manuals. 147See also @ref{OpenMP Implementation Status}. 148 149 150@c --------------------------------------------------------------------- 151@c OpenMP Implementation Status 152@c --------------------------------------------------------------------- 153 154@node OpenMP Implementation Status 155@chapter OpenMP Implementation Status 156 157@menu 158* OpenMP 4.5:: Feature completion status to 4.5 specification 159* OpenMP 5.0:: Feature completion status to 5.0 specification 160* OpenMP 5.1:: Feature completion status to 5.1 specification 161@end menu 162 163The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version} 164parameter, provided by @code{omp_lib.h} and the @code{omp_lib} module, have 165the value @code{201511} (i.e. OpenMP 4.5). 166 167@node OpenMP 4.5 168@section OpenMP 4.5 169 170The OpenMP 4.5 specification is fully supported. 171 172@node OpenMP 5.0 173@section OpenMP 5.0 174 175@unnumberedsubsec New features listed in Appendix B of the OpenMP specification 176@c This list is sorted as in OpenMP 5.1's B.3 not as in OpenMP 5.0's B.2 177 178@multitable @columnfractions .60 .10 .25 179@headitem Description @tab Status @tab Comments 180@item Array shaping @tab N @tab 181@item Array sections with non-unit strides in C and C++ @tab N @tab 182@item Iterators @tab Y @tab 183@item @code{metadirective} directive @tab N @tab 184@item @code{declare variant} directive 185 @tab P @tab simd traits not handled correctly 186@item @emph{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD} 187 env variable @tab Y @tab 188@item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab 189@item @code{requires} directive @tab P 190 @tab Only fulfillable requirement are @code{atomic_default_mem_order} 191 and @code{dynamic_allocators} 192@item @code{teams} construct outside an enclosing target region @tab Y @tab 193@item Non-rectangular loop nests @tab P @tab Only C/C++ 194@item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab 195@item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop 196 constructs @tab Y @tab 197@item Collapse of associated loops that are imperfectly nested loops @tab N @tab 198@item Clauses @code{if}, @code{nontemporal} and @code{order(concurrent)} in 199 @code{simd} construct @tab Y @tab 200@item @code{atomic} constructs in @code{simd} @tab Y @tab 201@item @code{loop} construct @tab Y @tab 202@item @code{order(concurrent)} clause @tab Y @tab 203@item @code{scan} directive and @code{in_scan} modifier for the 204 @code{reduction} clause @tab Y @tab 205@item @code{in_reduction} clause on @code{task} constructs @tab Y @tab 206@item @code{in_reduction} clause on @code{target} constructs @tab P 207 @tab @code{nowait} only stub 208@item @code{task_reduction} clause with @code{taskgroup} @tab Y @tab 209@item @code{task} modifier to @code{reduction} clause @tab Y @tab 210@item @code{affinity} clause to @code{task} construct @tab Y @tab Stub only 211@item @code{detach} clause to @code{task} construct @tab Y @tab 212@item @code{omp_fulfill_event} runtime routine @tab Y @tab 213@item @code{reduction} and @code{in_reduction} clauses on @code{taskloop} 214 and @code{taskloop simd} constructs @tab Y @tab 215@item @code{taskloop} construct cancelable by @code{cancel} construct 216 @tab Y @tab 217@item @code{mutexinoutset} @emph{dependence-type} for @code{depend} clause 218 @tab Y @tab 219@item Predefined memory spaces, memory allocators, allocator traits 220 @tab Y @tab Some are only stubs 221@item Memory management routines @tab Y @tab 222@item @code{allocate} directive @tab N @tab 223@item @code{allocate} clause @tab P @tab initial support 224@item @code{use_device_addr} clause on @code{target data} @tab Y @tab 225@item @code{ancestor} modifier on @code{device} clause 226 @tab P @tab Reverse offload unsupported 227@item Implicit declare target directive @tab Y @tab 228@item Discontiguous array section with @code{target update} construct 229 @tab N @tab 230@item C/C++'s lvalue expressions in @code{to}, @code{from} 231 and @code{map} clauses @tab N @tab 232@item C/C++'s lvalue expressions in @code{depend} clauses @tab Y @tab 233@item Nested @code{declare target} directive @tab Y @tab 234@item Combined @code{master} constructs @tab Y @tab 235@item @code{depend} clause on @code{taskwait} @tab Y @tab 236@item Weak memory ordering clauses on @code{atomic} and @code{flush} construct 237 @tab Y @tab 238@item @code{hint} clause on the @code{atomic} construct @tab Y @tab Stub only 239@item @code{depobj} construct and depend objects @tab Y @tab 240@item Lock hints were renamed to synchronization hints @tab Y @tab 241@item @code{conditional} modifier to @code{lastprivate} clause @tab Y @tab 242@item Map-order clarifications @tab P @tab 243@item @code{close} @emph{map-type-modifier} @tab Y @tab 244@item Mapping C/C++ pointer variables and to assign the address of 245 device memory mapped by an array section @tab P @tab 246@item Mapping of Fortran pointer and allocatable variables, including pointer 247 and allocatable components of variables 248 @tab P @tab Mapping of vars with allocatable components unsupported 249@item @code{defaultmap} extensions @tab Y @tab 250@item @code{declare mapper} directive @tab N @tab 251@item @code{omp_get_supported_active_levels} routine @tab Y @tab 252@item Runtime routines and environment variables to display runtime thread 253 affinity information @tab Y @tab 254@item @code{omp_pause_resource} and @code{omp_pause_resource_all} runtime 255 routines @tab Y @tab 256@item @code{omp_get_device_num} runtime routine @tab Y @tab 257@item OMPT interface @tab N @tab 258@item OMPD interface @tab N @tab 259@end multitable 260 261@unnumberedsubsec Other new OpenMP 5.0 features 262 263@multitable @columnfractions .60 .10 .25 264@headitem Description @tab Status @tab Comments 265@item Supporting C++'s range-based for loop @tab Y @tab 266@end multitable 267 268 269@node OpenMP 5.1 270@section OpenMP 5.1 271 272@unnumberedsubsec New features listed in Appendix B of the OpenMP specification 273 274@multitable @columnfractions .60 .10 .25 275@headitem Description @tab Status @tab Comments 276@item OpenMP directive as C++ attribute specifiers @tab Y @tab 277@item @code{omp_all_memory} reserved locator @tab N @tab 278@item @emph{target_device trait} in OpenMP Context @tab N @tab 279@item @code{target_device} selector set in context selectors @tab N @tab 280@item C/C++'s @code{declare variant} directive: elision support of 281 preprocessed code @tab N @tab 282@item @code{declare variant}: new clauses @code{adjust_args} and 283 @code{append_args} @tab N @tab 284@item @code{dispatch} construct @tab N @tab 285@item device-specific ICV settings the environment variables @tab N @tab 286@item assume directive @tab N @tab 287@item @code{nothing} directive @tab Y @tab 288@item @code{error} directive @tab Y @tab 289@item @code{masked} construct @tab Y @tab 290@item @code{scope} directive @tab Y @tab 291@item Loop transformation constructs @tab N @tab 292@item @code{strict} modifier in the @code{grainsize} and @code{num_tasks} 293 clauses of the taskloop construct @tab Y @tab 294@item @code{align} clause/modifier in @code{allocate} directive/clause 295 and @code{allocator} directive @tab P @tab C/C++ on clause only 296@item @code{thread_limit} clause to @code{target} construct @tab Y @tab 297@item @code{has_device_addr} clause to @code{target} construct @tab Y @tab 298@item iterators in @code{target update} motion clauses and @code{map} 299 clauses @tab N @tab 300@item indirect calls to the device version of a procedure or function in 301 @code{target} regions @tab N @tab 302@item @code{interop} directive @tab N @tab 303@item @code{omp_interop_t} object support in runtime routines @tab N @tab 304@item @code{nowait} clause in @code{taskwait} directive @tab N @tab 305@item Extensions to the @code{atomic} directive @tab Y @tab 306@item @code{seq_cst} clause on a @code{flush} construct @tab Y @tab 307@item @code{inoutset} argument to the @code{depend} clause @tab N @tab 308@item @code{private} and @code{firstprivate} argument to @code{default} 309 clause in C and C++ @tab Y @tab 310@item @code{present} argument to @code{defaultmap} clause @tab N @tab 311@item @code{omp_set_num_teams}, @code{omp_set_teams_thread_limit}, 312 @code{omp_get_max_teams}, @code{omp_get_teams_thread_limit} runtime 313 routines @tab Y @tab 314@item @code{omp_target_is_accessible} runtime routine @tab N @tab 315@item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async} 316 runtime routines @tab N @tab 317@item @code{omp_get_mapped_ptr} runtime routine @tab N @tab 318@item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and 319 @code{omp_aligned_calloc} runtime routines @tab Y @tab 320@item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added, 321 @code{omp_atv_default} changed @tab Y @tab 322@item @code{omp_display_env} runtime routine @tab Y @tab 323@item @code{ompt_scope_endpoint_t} enum: @code{ompt_scope_beginend} @tab N @tab 324@item @code{ompt_sync_region_t} enum additions @tab N @tab 325@item @code{ompt_state_t} enum: @code{ompt_state_wait_barrier_implementation} 326 and @code{ompt_state_wait_barrier_teams} @tab N @tab 327@item @code{ompt_callback_target_data_op_emi_t}, 328 @code{ompt_callback_target_emi_t}, @code{ompt_callback_target_map_emi_t} 329 and @code{ompt_callback_target_submit_emi_t} @tab N @tab 330@item @code{ompt_callback_error_t} type @tab N @tab 331@item @code{OMP_PLACES} syntax extensions @tab Y @tab 332@item @code{OMP_NUM_TEAMS} and @code{OMP_TEAMS_THREAD_LIMIT} environment 333 variables @tab Y @tab 334@end multitable 335 336@unnumberedsubsec Other new OpenMP 5.1 features 337 338@multitable @columnfractions .60 .10 .25 339@headitem Description @tab Status @tab Comments 340@item Support of strictly structured blocks in Fortran @tab Y @tab 341@item Support of structured block sequences in C/C++ @tab Y @tab 342@item @code{unconstrained} and @code{reproducible} modifiers on @code{order} 343 clause @tab Y @tab 344@end multitable 345 346 347@c --------------------------------------------------------------------- 348@c OpenMP Runtime Library Routines 349@c --------------------------------------------------------------------- 350 351@node Runtime Library Routines 352@chapter OpenMP Runtime Library Routines 353 354The runtime routines described here are defined by Section 3 of the OpenMP 355specification in version 4.5. The routines are structured in following 356three parts: 357 358@menu 359Control threads, processors and the parallel environment. They have C 360linkage, and do not throw exceptions. 361 362* omp_get_active_level:: Number of active parallel regions 363* omp_get_ancestor_thread_num:: Ancestor thread ID 364* omp_get_cancellation:: Whether cancellation support is enabled 365* omp_get_default_device:: Get the default device for target regions 366* omp_get_device_num:: Get device that current thread is running on 367* omp_get_dynamic:: Dynamic teams setting 368* omp_get_initial_device:: Device number of host device 369* omp_get_level:: Number of parallel regions 370* omp_get_max_active_levels:: Current maximum number of active regions 371* omp_get_max_task_priority:: Maximum task priority value that can be set 372* omp_get_max_teams:: Maximum number of teams for teams region 373* omp_get_max_threads:: Maximum number of threads of parallel region 374* omp_get_nested:: Nested parallel regions 375* omp_get_num_devices:: Number of target devices 376* omp_get_num_procs:: Number of processors online 377* omp_get_num_teams:: Number of teams 378* omp_get_num_threads:: Size of the active team 379* omp_get_proc_bind:: Whether threads may be moved between CPUs 380* omp_get_schedule:: Obtain the runtime scheduling method 381* omp_get_supported_active_levels:: Maximum number of active regions supported 382* omp_get_team_num:: Get team number 383* omp_get_team_size:: Number of threads in a team 384* omp_get_teams_thread_limit:: Maximum number of threads imposed by teams 385* omp_get_thread_limit:: Maximum number of threads 386* omp_get_thread_num:: Current thread ID 387* omp_in_parallel:: Whether a parallel region is active 388* omp_in_final:: Whether in final or included task region 389* omp_is_initial_device:: Whether executing on the host device 390* omp_set_default_device:: Set the default device for target regions 391* omp_set_dynamic:: Enable/disable dynamic teams 392* omp_set_max_active_levels:: Limits the number of active parallel regions 393* omp_set_nested:: Enable/disable nested parallel regions 394* omp_set_num_teams:: Set upper teams limit for teams region 395* omp_set_num_threads:: Set upper team size limit 396* omp_set_schedule:: Set the runtime scheduling method 397* omp_set_teams_thread_limit:: Set upper thread limit for teams construct 398 399Initialize, set, test, unset and destroy simple and nested locks. 400 401* omp_init_lock:: Initialize simple lock 402* omp_set_lock:: Wait for and set simple lock 403* omp_test_lock:: Test and set simple lock if available 404* omp_unset_lock:: Unset simple lock 405* omp_destroy_lock:: Destroy simple lock 406* omp_init_nest_lock:: Initialize nested lock 407* omp_set_nest_lock:: Wait for and set simple lock 408* omp_test_nest_lock:: Test and set nested lock if available 409* omp_unset_nest_lock:: Unset nested lock 410* omp_destroy_nest_lock:: Destroy nested lock 411 412Portable, thread-based, wall clock timer. 413 414* omp_get_wtick:: Get timer precision. 415* omp_get_wtime:: Elapsed wall clock time. 416 417Support for event objects. 418 419* omp_fulfill_event:: Fulfill and destroy an OpenMP event. 420@end menu 421 422 423 424@node omp_get_active_level 425@section @code{omp_get_active_level} -- Number of parallel regions 426@table @asis 427@item @emph{Description}: 428This function returns the nesting level for the active parallel blocks, 429which enclose the calling call. 430 431@item @emph{C/C++} 432@multitable @columnfractions .20 .80 433@item @emph{Prototype}: @tab @code{int omp_get_active_level(void);} 434@end multitable 435 436@item @emph{Fortran}: 437@multitable @columnfractions .20 .80 438@item @emph{Interface}: @tab @code{integer function omp_get_active_level()} 439@end multitable 440 441@item @emph{See also}: 442@ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels} 443 444@item @emph{Reference}: 445@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20. 446@end table 447 448 449 450@node omp_get_ancestor_thread_num 451@section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID 452@table @asis 453@item @emph{Description}: 454This function returns the thread identification number for the given 455nesting level of the current thread. For values of @var{level} outside 456zero to @code{omp_get_level} -1 is returned; if @var{level} is 457@code{omp_get_level} the result is identical to @code{omp_get_thread_num}. 458 459@item @emph{C/C++} 460@multitable @columnfractions .20 .80 461@item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);} 462@end multitable 463 464@item @emph{Fortran}: 465@multitable @columnfractions .20 .80 466@item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)} 467@item @tab @code{integer level} 468@end multitable 469 470@item @emph{See also}: 471@ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size} 472 473@item @emph{Reference}: 474@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18. 475@end table 476 477 478 479@node omp_get_cancellation 480@section @code{omp_get_cancellation} -- Whether cancellation support is enabled 481@table @asis 482@item @emph{Description}: 483This function returns @code{true} if cancellation is activated, @code{false} 484otherwise. Here, @code{true} and @code{false} represent their language-specific 485counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are 486deactivated. 487 488@item @emph{C/C++}: 489@multitable @columnfractions .20 .80 490@item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);} 491@end multitable 492 493@item @emph{Fortran}: 494@multitable @columnfractions .20 .80 495@item @emph{Interface}: @tab @code{logical function omp_get_cancellation()} 496@end multitable 497 498@item @emph{See also}: 499@ref{OMP_CANCELLATION} 500 501@item @emph{Reference}: 502@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9. 503@end table 504 505 506 507@node omp_get_default_device 508@section @code{omp_get_default_device} -- Get the default device for target regions 509@table @asis 510@item @emph{Description}: 511Get the default device for target regions without device clause. 512 513@item @emph{C/C++}: 514@multitable @columnfractions .20 .80 515@item @emph{Prototype}: @tab @code{int omp_get_default_device(void);} 516@end multitable 517 518@item @emph{Fortran}: 519@multitable @columnfractions .20 .80 520@item @emph{Interface}: @tab @code{integer function omp_get_default_device()} 521@end multitable 522 523@item @emph{See also}: 524@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device} 525 526@item @emph{Reference}: 527@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30. 528@end table 529 530 531 532@node omp_get_device_num 533@section @code{omp_get_device_num} -- Return device number of current device 534@table @asis 535@item @emph{Description}: 536This function returns a device number that represents the device that the 537current thread is executing on. For OpenMP 5.0, this must be equal to the 538value returned by the @code{omp_get_initial_device} function when called 539from the host. 540 541@item @emph{C/C++} 542@multitable @columnfractions .20 .80 543@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);} 544@end multitable 545 546@item @emph{Fortran}: 547@multitable @columnfractions .20 .80 548@item @emph{Interface}: @tab @code{integer function omp_get_device_num()} 549@end multitable 550 551@item @emph{See also}: 552@ref{omp_get_initial_device} 553 554@item @emph{Reference}: 555@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37. 556@end table 557 558 559 560@node omp_get_dynamic 561@section @code{omp_get_dynamic} -- Dynamic teams setting 562@table @asis 563@item @emph{Description}: 564This function returns @code{true} if enabled, @code{false} otherwise. 565Here, @code{true} and @code{false} represent their language-specific 566counterparts. 567 568The dynamic team setting may be initialized at startup by the 569@env{OMP_DYNAMIC} environment variable or at runtime using 570@code{omp_set_dynamic}. If undefined, dynamic adjustment is 571disabled by default. 572 573@item @emph{C/C++}: 574@multitable @columnfractions .20 .80 575@item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);} 576@end multitable 577 578@item @emph{Fortran}: 579@multitable @columnfractions .20 .80 580@item @emph{Interface}: @tab @code{logical function omp_get_dynamic()} 581@end multitable 582 583@item @emph{See also}: 584@ref{omp_set_dynamic}, @ref{OMP_DYNAMIC} 585 586@item @emph{Reference}: 587@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8. 588@end table 589 590 591 592@node omp_get_initial_device 593@section @code{omp_get_initial_device} -- Return device number of initial device 594@table @asis 595@item @emph{Description}: 596This function returns a device number that represents the host device. 597For OpenMP 5.1, this must be equal to the value returned by the 598@code{omp_get_num_devices} function. 599 600@item @emph{C/C++} 601@multitable @columnfractions .20 .80 602@item @emph{Prototype}: @tab @code{int omp_get_initial_device(void);} 603@end multitable 604 605@item @emph{Fortran}: 606@multitable @columnfractions .20 .80 607@item @emph{Interface}: @tab @code{integer function omp_get_initial_device()} 608@end multitable 609 610@item @emph{See also}: 611@ref{omp_get_num_devices} 612 613@item @emph{Reference}: 614@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.35. 615@end table 616 617 618 619@node omp_get_level 620@section @code{omp_get_level} -- Obtain the current nesting level 621@table @asis 622@item @emph{Description}: 623This function returns the nesting level for the parallel blocks, 624which enclose the calling call. 625 626@item @emph{C/C++} 627@multitable @columnfractions .20 .80 628@item @emph{Prototype}: @tab @code{int omp_get_level(void);} 629@end multitable 630 631@item @emph{Fortran}: 632@multitable @columnfractions .20 .80 633@item @emph{Interface}: @tab @code{integer function omp_level()} 634@end multitable 635 636@item @emph{See also}: 637@ref{omp_get_active_level} 638 639@item @emph{Reference}: 640@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17. 641@end table 642 643 644 645@node omp_get_max_active_levels 646@section @code{omp_get_max_active_levels} -- Current maximum number of active regions 647@table @asis 648@item @emph{Description}: 649This function obtains the maximum allowed number of nested, active parallel regions. 650 651@item @emph{C/C++} 652@multitable @columnfractions .20 .80 653@item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);} 654@end multitable 655 656@item @emph{Fortran}: 657@multitable @columnfractions .20 .80 658@item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()} 659@end multitable 660 661@item @emph{See also}: 662@ref{omp_set_max_active_levels}, @ref{omp_get_active_level} 663 664@item @emph{Reference}: 665@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16. 666@end table 667 668 669@node omp_get_max_task_priority 670@section @code{omp_get_max_task_priority} -- Maximum priority value 671that can be set for tasks. 672@table @asis 673@item @emph{Description}: 674This function obtains the maximum allowed priority number for tasks. 675 676@item @emph{C/C++} 677@multitable @columnfractions .20 .80 678@item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);} 679@end multitable 680 681@item @emph{Fortran}: 682@multitable @columnfractions .20 .80 683@item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()} 684@end multitable 685 686@item @emph{Reference}: 687@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29. 688@end table 689 690 691@node omp_get_max_teams 692@section @code{omp_get_max_teams} -- Maximum number of teams of teams region 693@table @asis 694@item @emph{Description}: 695Return the maximum number of teams used for the teams region 696that does not use the clause @code{num_teams}. 697 698@item @emph{C/C++}: 699@multitable @columnfractions .20 .80 700@item @emph{Prototype}: @tab @code{int omp_get_max_teams(void);} 701@end multitable 702 703@item @emph{Fortran}: 704@multitable @columnfractions .20 .80 705@item @emph{Interface}: @tab @code{integer function omp_get_max_teams()} 706@end multitable 707 708@item @emph{See also}: 709@ref{omp_set_num_teams}, @ref{omp_get_num_teams} 710 711@item @emph{Reference}: 712@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.4. 713@end table 714 715 716 717@node omp_get_max_threads 718@section @code{omp_get_max_threads} -- Maximum number of threads of parallel region 719@table @asis 720@item @emph{Description}: 721Return the maximum number of threads used for the current parallel region 722that does not use the clause @code{num_threads}. 723 724@item @emph{C/C++}: 725@multitable @columnfractions .20 .80 726@item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);} 727@end multitable 728 729@item @emph{Fortran}: 730@multitable @columnfractions .20 .80 731@item @emph{Interface}: @tab @code{integer function omp_get_max_threads()} 732@end multitable 733 734@item @emph{See also}: 735@ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit} 736 737@item @emph{Reference}: 738@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3. 739@end table 740 741 742 743@node omp_get_nested 744@section @code{omp_get_nested} -- Nested parallel regions 745@table @asis 746@item @emph{Description}: 747This function returns @code{true} if nested parallel regions are 748enabled, @code{false} otherwise. Here, @code{true} and @code{false} 749represent their language-specific counterparts. 750 751The state of nested parallel regions at startup depends on several 752environment variables. If @env{OMP_MAX_ACTIVE_LEVELS} is defined 753and is set to greater than one, then nested parallel regions will be 754enabled. If not defined, then the value of the @env{OMP_NESTED} 755environment variable will be followed if defined. If neither are 756defined, then if either @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND} 757are defined with a list of more than one value, then nested parallel 758regions are enabled. If none of these are defined, then nested parallel 759regions are disabled by default. 760 761Nested parallel regions can be enabled or disabled at runtime using 762@code{omp_set_nested}, or by setting the maximum number of nested 763regions with @code{omp_set_max_active_levels} to one to disable, or 764above one to enable. 765 766@item @emph{C/C++}: 767@multitable @columnfractions .20 .80 768@item @emph{Prototype}: @tab @code{int omp_get_nested(void);} 769@end multitable 770 771@item @emph{Fortran}: 772@multitable @columnfractions .20 .80 773@item @emph{Interface}: @tab @code{logical function omp_get_nested()} 774@end multitable 775 776@item @emph{See also}: 777@ref{omp_set_max_active_levels}, @ref{omp_set_nested}, 778@ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED} 779 780@item @emph{Reference}: 781@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11. 782@end table 783 784 785 786@node omp_get_num_devices 787@section @code{omp_get_num_devices} -- Number of target devices 788@table @asis 789@item @emph{Description}: 790Returns the number of target devices. 791 792@item @emph{C/C++}: 793@multitable @columnfractions .20 .80 794@item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);} 795@end multitable 796 797@item @emph{Fortran}: 798@multitable @columnfractions .20 .80 799@item @emph{Interface}: @tab @code{integer function omp_get_num_devices()} 800@end multitable 801 802@item @emph{Reference}: 803@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31. 804@end table 805 806 807 808@node omp_get_num_procs 809@section @code{omp_get_num_procs} -- Number of processors online 810@table @asis 811@item @emph{Description}: 812Returns the number of processors online on that device. 813 814@item @emph{C/C++}: 815@multitable @columnfractions .20 .80 816@item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);} 817@end multitable 818 819@item @emph{Fortran}: 820@multitable @columnfractions .20 .80 821@item @emph{Interface}: @tab @code{integer function omp_get_num_procs()} 822@end multitable 823 824@item @emph{Reference}: 825@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5. 826@end table 827 828 829 830@node omp_get_num_teams 831@section @code{omp_get_num_teams} -- Number of teams 832@table @asis 833@item @emph{Description}: 834Returns the number of teams in the current team region. 835 836@item @emph{C/C++}: 837@multitable @columnfractions .20 .80 838@item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);} 839@end multitable 840 841@item @emph{Fortran}: 842@multitable @columnfractions .20 .80 843@item @emph{Interface}: @tab @code{integer function omp_get_num_teams()} 844@end multitable 845 846@item @emph{Reference}: 847@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32. 848@end table 849 850 851 852@node omp_get_num_threads 853@section @code{omp_get_num_threads} -- Size of the active team 854@table @asis 855@item @emph{Description}: 856Returns the number of threads in the current team. In a sequential section of 857the program @code{omp_get_num_threads} returns 1. 858 859The default team size may be initialized at startup by the 860@env{OMP_NUM_THREADS} environment variable. At runtime, the size 861of the current team may be set either by the @code{NUM_THREADS} 862clause or by @code{omp_set_num_threads}. If none of the above were 863used to define a specific value and @env{OMP_DYNAMIC} is disabled, 864one thread per CPU online is used. 865 866@item @emph{C/C++}: 867@multitable @columnfractions .20 .80 868@item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);} 869@end multitable 870 871@item @emph{Fortran}: 872@multitable @columnfractions .20 .80 873@item @emph{Interface}: @tab @code{integer function omp_get_num_threads()} 874@end multitable 875 876@item @emph{See also}: 877@ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS} 878 879@item @emph{Reference}: 880@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2. 881@end table 882 883 884 885@node omp_get_proc_bind 886@section @code{omp_get_proc_bind} -- Whether threads may be moved between CPUs 887@table @asis 888@item @emph{Description}: 889This functions returns the currently active thread affinity policy, which is 890set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false}, 891@code{omp_proc_bind_true}, @code{omp_proc_bind_primary}, 892@code{omp_proc_bind_master}, @code{omp_proc_bind_close} and @code{omp_proc_bind_spread}, 893where @code{omp_proc_bind_master} is an alias for @code{omp_proc_bind_primary}. 894 895@item @emph{C/C++}: 896@multitable @columnfractions .20 .80 897@item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);} 898@end multitable 899 900@item @emph{Fortran}: 901@multitable @columnfractions .20 .80 902@item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()} 903@end multitable 904 905@item @emph{See also}: 906@ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, 907 908@item @emph{Reference}: 909@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22. 910@end table 911 912 913 914@node omp_get_schedule 915@section @code{omp_get_schedule} -- Obtain the runtime scheduling method 916@table @asis 917@item @emph{Description}: 918Obtain the runtime scheduling method. The @var{kind} argument will be 919set to the value @code{omp_sched_static}, @code{omp_sched_dynamic}, 920@code{omp_sched_guided} or @code{omp_sched_auto}. The second argument, 921@var{chunk_size}, is set to the chunk size. 922 923@item @emph{C/C++} 924@multitable @columnfractions .20 .80 925@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);} 926@end multitable 927 928@item @emph{Fortran}: 929@multitable @columnfractions .20 .80 930@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)} 931@item @tab @code{integer(kind=omp_sched_kind) kind} 932@item @tab @code{integer chunk_size} 933@end multitable 934 935@item @emph{See also}: 936@ref{omp_set_schedule}, @ref{OMP_SCHEDULE} 937 938@item @emph{Reference}: 939@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13. 940@end table 941 942 943@node omp_get_supported_active_levels 944@section @code{omp_get_supported_active_levels} -- Maximum number of active regions supported 945@table @asis 946@item @emph{Description}: 947This function returns the maximum number of nested, active parallel regions 948supported by this implementation. 949 950@item @emph{C/C++} 951@multitable @columnfractions .20 .80 952@item @emph{Prototype}: @tab @code{int omp_get_supported_active_levels(void);} 953@end multitable 954 955@item @emph{Fortran}: 956@multitable @columnfractions .20 .80 957@item @emph{Interface}: @tab @code{integer function omp_get_supported_active_levels()} 958@end multitable 959 960@item @emph{See also}: 961@ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels} 962 963@item @emph{Reference}: 964@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.15. 965@end table 966 967 968 969@node omp_get_team_num 970@section @code{omp_get_team_num} -- Get team number 971@table @asis 972@item @emph{Description}: 973Returns the team number of the calling thread. 974 975@item @emph{C/C++}: 976@multitable @columnfractions .20 .80 977@item @emph{Prototype}: @tab @code{int omp_get_team_num(void);} 978@end multitable 979 980@item @emph{Fortran}: 981@multitable @columnfractions .20 .80 982@item @emph{Interface}: @tab @code{integer function omp_get_team_num()} 983@end multitable 984 985@item @emph{Reference}: 986@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33. 987@end table 988 989 990 991@node omp_get_team_size 992@section @code{omp_get_team_size} -- Number of threads in a team 993@table @asis 994@item @emph{Description}: 995This function returns the number of threads in a thread team to which 996either the current thread or its ancestor belongs. For values of @var{level} 997outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero, 9981 is returned, and for @code{omp_get_level}, the result is identical 999to @code{omp_get_num_threads}. 1000 1001@item @emph{C/C++}: 1002@multitable @columnfractions .20 .80 1003@item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);} 1004@end multitable 1005 1006@item @emph{Fortran}: 1007@multitable @columnfractions .20 .80 1008@item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)} 1009@item @tab @code{integer level} 1010@end multitable 1011 1012@item @emph{See also}: 1013@ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num} 1014 1015@item @emph{Reference}: 1016@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19. 1017@end table 1018 1019 1020 1021@node omp_get_teams_thread_limit 1022@section @code{omp_get_teams_thread_limit} -- Maximum number of threads imposed by teams 1023@table @asis 1024@item @emph{Description}: 1025Return the maximum number of threads that will be able to participate in 1026each team created by a teams construct. 1027 1028@item @emph{C/C++}: 1029@multitable @columnfractions .20 .80 1030@item @emph{Prototype}: @tab @code{int omp_get_teams_thread_limit(void);} 1031@end multitable 1032 1033@item @emph{Fortran}: 1034@multitable @columnfractions .20 .80 1035@item @emph{Interface}: @tab @code{integer function omp_get_teams_thread_limit()} 1036@end multitable 1037 1038@item @emph{See also}: 1039@ref{omp_set_teams_thread_limit}, @ref{OMP_TEAMS_THREAD_LIMIT} 1040 1041@item @emph{Reference}: 1042@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.6. 1043@end table 1044 1045 1046 1047@node omp_get_thread_limit 1048@section @code{omp_get_thread_limit} -- Maximum number of threads 1049@table @asis 1050@item @emph{Description}: 1051Return the maximum number of threads of the program. 1052 1053@item @emph{C/C++}: 1054@multitable @columnfractions .20 .80 1055@item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);} 1056@end multitable 1057 1058@item @emph{Fortran}: 1059@multitable @columnfractions .20 .80 1060@item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()} 1061@end multitable 1062 1063@item @emph{See also}: 1064@ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT} 1065 1066@item @emph{Reference}: 1067@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14. 1068@end table 1069 1070 1071 1072@node omp_get_thread_num 1073@section @code{omp_get_thread_num} -- Current thread ID 1074@table @asis 1075@item @emph{Description}: 1076Returns a unique thread identification number within the current team. 1077In a sequential parts of the program, @code{omp_get_thread_num} 1078always returns 0. In parallel regions the return value varies 1079from 0 to @code{omp_get_num_threads}-1 inclusive. The return 1080value of the primary thread of a team is always 0. 1081 1082@item @emph{C/C++}: 1083@multitable @columnfractions .20 .80 1084@item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);} 1085@end multitable 1086 1087@item @emph{Fortran}: 1088@multitable @columnfractions .20 .80 1089@item @emph{Interface}: @tab @code{integer function omp_get_thread_num()} 1090@end multitable 1091 1092@item @emph{See also}: 1093@ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num} 1094 1095@item @emph{Reference}: 1096@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4. 1097@end table 1098 1099 1100 1101@node omp_in_parallel 1102@section @code{omp_in_parallel} -- Whether a parallel region is active 1103@table @asis 1104@item @emph{Description}: 1105This function returns @code{true} if currently running in parallel, 1106@code{false} otherwise. Here, @code{true} and @code{false} represent 1107their language-specific counterparts. 1108 1109@item @emph{C/C++}: 1110@multitable @columnfractions .20 .80 1111@item @emph{Prototype}: @tab @code{int omp_in_parallel(void);} 1112@end multitable 1113 1114@item @emph{Fortran}: 1115@multitable @columnfractions .20 .80 1116@item @emph{Interface}: @tab @code{logical function omp_in_parallel()} 1117@end multitable 1118 1119@item @emph{Reference}: 1120@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6. 1121@end table 1122 1123 1124@node omp_in_final 1125@section @code{omp_in_final} -- Whether in final or included task region 1126@table @asis 1127@item @emph{Description}: 1128This function returns @code{true} if currently running in a final 1129or included task region, @code{false} otherwise. Here, @code{true} 1130and @code{false} represent their language-specific counterparts. 1131 1132@item @emph{C/C++}: 1133@multitable @columnfractions .20 .80 1134@item @emph{Prototype}: @tab @code{int omp_in_final(void);} 1135@end multitable 1136 1137@item @emph{Fortran}: 1138@multitable @columnfractions .20 .80 1139@item @emph{Interface}: @tab @code{logical function omp_in_final()} 1140@end multitable 1141 1142@item @emph{Reference}: 1143@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21. 1144@end table 1145 1146 1147 1148@node omp_is_initial_device 1149@section @code{omp_is_initial_device} -- Whether executing on the host device 1150@table @asis 1151@item @emph{Description}: 1152This function returns @code{true} if currently running on the host device, 1153@code{false} otherwise. Here, @code{true} and @code{false} represent 1154their language-specific counterparts. 1155 1156@item @emph{C/C++}: 1157@multitable @columnfractions .20 .80 1158@item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);} 1159@end multitable 1160 1161@item @emph{Fortran}: 1162@multitable @columnfractions .20 .80 1163@item @emph{Interface}: @tab @code{logical function omp_is_initial_device()} 1164@end multitable 1165 1166@item @emph{Reference}: 1167@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34. 1168@end table 1169 1170 1171 1172@node omp_set_default_device 1173@section @code{omp_set_default_device} -- Set the default device for target regions 1174@table @asis 1175@item @emph{Description}: 1176Set the default device for target regions without device clause. The argument 1177shall be a nonnegative device number. 1178 1179@item @emph{C/C++}: 1180@multitable @columnfractions .20 .80 1181@item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);} 1182@end multitable 1183 1184@item @emph{Fortran}: 1185@multitable @columnfractions .20 .80 1186@item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)} 1187@item @tab @code{integer device_num} 1188@end multitable 1189 1190@item @emph{See also}: 1191@ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device} 1192 1193@item @emph{Reference}: 1194@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29. 1195@end table 1196 1197 1198 1199@node omp_set_dynamic 1200@section @code{omp_set_dynamic} -- Enable/disable dynamic teams 1201@table @asis 1202@item @emph{Description}: 1203Enable or disable the dynamic adjustment of the number of threads 1204within a team. The function takes the language-specific equivalent 1205of @code{true} and @code{false}, where @code{true} enables dynamic 1206adjustment of team sizes and @code{false} disables it. 1207 1208@item @emph{C/C++}: 1209@multitable @columnfractions .20 .80 1210@item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);} 1211@end multitable 1212 1213@item @emph{Fortran}: 1214@multitable @columnfractions .20 .80 1215@item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)} 1216@item @tab @code{logical, intent(in) :: dynamic_threads} 1217@end multitable 1218 1219@item @emph{See also}: 1220@ref{OMP_DYNAMIC}, @ref{omp_get_dynamic} 1221 1222@item @emph{Reference}: 1223@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7. 1224@end table 1225 1226 1227 1228@node omp_set_max_active_levels 1229@section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions 1230@table @asis 1231@item @emph{Description}: 1232This function limits the maximum allowed number of nested, active 1233parallel regions. @var{max_levels} must be less or equal to 1234the value returned by @code{omp_get_supported_active_levels}. 1235 1236@item @emph{C/C++} 1237@multitable @columnfractions .20 .80 1238@item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);} 1239@end multitable 1240 1241@item @emph{Fortran}: 1242@multitable @columnfractions .20 .80 1243@item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)} 1244@item @tab @code{integer max_levels} 1245@end multitable 1246 1247@item @emph{See also}: 1248@ref{omp_get_max_active_levels}, @ref{omp_get_active_level}, 1249@ref{omp_get_supported_active_levels} 1250 1251@item @emph{Reference}: 1252@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15. 1253@end table 1254 1255 1256 1257@node omp_set_nested 1258@section @code{omp_set_nested} -- Enable/disable nested parallel regions 1259@table @asis 1260@item @emph{Description}: 1261Enable or disable nested parallel regions, i.e., whether team members 1262are allowed to create new teams. The function takes the language-specific 1263equivalent of @code{true} and @code{false}, where @code{true} enables 1264dynamic adjustment of team sizes and @code{false} disables it. 1265 1266Enabling nested parallel regions will also set the maximum number of 1267active nested regions to the maximum supported. Disabling nested parallel 1268regions will set the maximum number of active nested regions to one. 1269 1270@item @emph{C/C++}: 1271@multitable @columnfractions .20 .80 1272@item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);} 1273@end multitable 1274 1275@item @emph{Fortran}: 1276@multitable @columnfractions .20 .80 1277@item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)} 1278@item @tab @code{logical, intent(in) :: nested} 1279@end multitable 1280 1281@item @emph{See also}: 1282@ref{omp_get_nested}, @ref{omp_set_max_active_levels}, 1283@ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED} 1284 1285@item @emph{Reference}: 1286@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10. 1287@end table 1288 1289 1290 1291@node omp_set_num_teams 1292@section @code{omp_set_num_teams} -- Set upper teams limit for teams construct 1293@table @asis 1294@item @emph{Description}: 1295Specifies the upper bound for number of teams created by the teams construct 1296which does not specify a @code{num_teams} clause. The 1297argument of @code{omp_set_num_teams} shall be a positive integer. 1298 1299@item @emph{C/C++}: 1300@multitable @columnfractions .20 .80 1301@item @emph{Prototype}: @tab @code{void omp_set_num_teams(int num_teams);} 1302@end multitable 1303 1304@item @emph{Fortran}: 1305@multitable @columnfractions .20 .80 1306@item @emph{Interface}: @tab @code{subroutine omp_set_num_teams(num_teams)} 1307@item @tab @code{integer, intent(in) :: num_teams} 1308@end multitable 1309 1310@item @emph{See also}: 1311@ref{OMP_NUM_TEAMS}, @ref{omp_get_num_teams}, @ref{omp_get_max_teams} 1312 1313@item @emph{Reference}: 1314@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.3. 1315@end table 1316 1317 1318 1319@node omp_set_num_threads 1320@section @code{omp_set_num_threads} -- Set upper team size limit 1321@table @asis 1322@item @emph{Description}: 1323Specifies the number of threads used by default in subsequent parallel 1324sections, if those do not specify a @code{num_threads} clause. The 1325argument of @code{omp_set_num_threads} shall be a positive integer. 1326 1327@item @emph{C/C++}: 1328@multitable @columnfractions .20 .80 1329@item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);} 1330@end multitable 1331 1332@item @emph{Fortran}: 1333@multitable @columnfractions .20 .80 1334@item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)} 1335@item @tab @code{integer, intent(in) :: num_threads} 1336@end multitable 1337 1338@item @emph{See also}: 1339@ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads} 1340 1341@item @emph{Reference}: 1342@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1. 1343@end table 1344 1345 1346 1347@node omp_set_schedule 1348@section @code{omp_set_schedule} -- Set the runtime scheduling method 1349@table @asis 1350@item @emph{Description}: 1351Sets the runtime scheduling method. The @var{kind} argument can have the 1352value @code{omp_sched_static}, @code{omp_sched_dynamic}, 1353@code{omp_sched_guided} or @code{omp_sched_auto}. Except for 1354@code{omp_sched_auto}, the chunk size is set to the value of 1355@var{chunk_size} if positive, or to the default value if zero or negative. 1356For @code{omp_sched_auto} the @var{chunk_size} argument is ignored. 1357 1358@item @emph{C/C++} 1359@multitable @columnfractions .20 .80 1360@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);} 1361@end multitable 1362 1363@item @emph{Fortran}: 1364@multitable @columnfractions .20 .80 1365@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)} 1366@item @tab @code{integer(kind=omp_sched_kind) kind} 1367@item @tab @code{integer chunk_size} 1368@end multitable 1369 1370@item @emph{See also}: 1371@ref{omp_get_schedule} 1372@ref{OMP_SCHEDULE} 1373 1374@item @emph{Reference}: 1375@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12. 1376@end table 1377 1378 1379 1380@node omp_set_teams_thread_limit 1381@section @code{omp_set_teams_thread_limit} -- Set upper thread limit for teams construct 1382@table @asis 1383@item @emph{Description}: 1384Specifies the upper bound for number of threads that will be available 1385for each team created by the teams construct which does not specify a 1386@code{thread_limit} clause. The argument of 1387@code{omp_set_teams_thread_limit} shall be a positive integer. 1388 1389@item @emph{C/C++}: 1390@multitable @columnfractions .20 .80 1391@item @emph{Prototype}: @tab @code{void omp_set_teams_thread_limit(int thread_limit);} 1392@end multitable 1393 1394@item @emph{Fortran}: 1395@multitable @columnfractions .20 .80 1396@item @emph{Interface}: @tab @code{subroutine omp_set_teams_thread_limit(thread_limit)} 1397@item @tab @code{integer, intent(in) :: thread_limit} 1398@end multitable 1399 1400@item @emph{See also}: 1401@ref{OMP_TEAMS_THREAD_LIMIT}, @ref{omp_get_teams_thread_limit}, @ref{omp_get_thread_limit} 1402 1403@item @emph{Reference}: 1404@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.5. 1405@end table 1406 1407 1408 1409@node omp_init_lock 1410@section @code{omp_init_lock} -- Initialize simple lock 1411@table @asis 1412@item @emph{Description}: 1413Initialize a simple lock. After initialization, the lock is in 1414an unlocked state. 1415 1416@item @emph{C/C++}: 1417@multitable @columnfractions .20 .80 1418@item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);} 1419@end multitable 1420 1421@item @emph{Fortran}: 1422@multitable @columnfractions .20 .80 1423@item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)} 1424@item @tab @code{integer(omp_lock_kind), intent(out) :: svar} 1425@end multitable 1426 1427@item @emph{See also}: 1428@ref{omp_destroy_lock} 1429 1430@item @emph{Reference}: 1431@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1. 1432@end table 1433 1434 1435 1436@node omp_set_lock 1437@section @code{omp_set_lock} -- Wait for and set simple lock 1438@table @asis 1439@item @emph{Description}: 1440Before setting a simple lock, the lock variable must be initialized by 1441@code{omp_init_lock}. The calling thread is blocked until the lock 1442is available. If the lock is already held by the current thread, 1443a deadlock occurs. 1444 1445@item @emph{C/C++}: 1446@multitable @columnfractions .20 .80 1447@item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);} 1448@end multitable 1449 1450@item @emph{Fortran}: 1451@multitable @columnfractions .20 .80 1452@item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)} 1453@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1454@end multitable 1455 1456@item @emph{See also}: 1457@ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock} 1458 1459@item @emph{Reference}: 1460@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4. 1461@end table 1462 1463 1464 1465@node omp_test_lock 1466@section @code{omp_test_lock} -- Test and set simple lock if available 1467@table @asis 1468@item @emph{Description}: 1469Before setting a simple lock, the lock variable must be initialized by 1470@code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock} 1471does not block if the lock is not available. This function returns 1472@code{true} upon success, @code{false} otherwise. Here, @code{true} and 1473@code{false} represent their language-specific counterparts. 1474 1475@item @emph{C/C++}: 1476@multitable @columnfractions .20 .80 1477@item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);} 1478@end multitable 1479 1480@item @emph{Fortran}: 1481@multitable @columnfractions .20 .80 1482@item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)} 1483@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1484@end multitable 1485 1486@item @emph{See also}: 1487@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock} 1488 1489@item @emph{Reference}: 1490@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6. 1491@end table 1492 1493 1494 1495@node omp_unset_lock 1496@section @code{omp_unset_lock} -- Unset simple lock 1497@table @asis 1498@item @emph{Description}: 1499A simple lock about to be unset must have been locked by @code{omp_set_lock} 1500or @code{omp_test_lock} before. In addition, the lock must be held by the 1501thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one 1502or more threads attempted to set the lock before, one of them is chosen to, 1503again, set the lock to itself. 1504 1505@item @emph{C/C++}: 1506@multitable @columnfractions .20 .80 1507@item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);} 1508@end multitable 1509 1510@item @emph{Fortran}: 1511@multitable @columnfractions .20 .80 1512@item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)} 1513@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1514@end multitable 1515 1516@item @emph{See also}: 1517@ref{omp_set_lock}, @ref{omp_test_lock} 1518 1519@item @emph{Reference}: 1520@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5. 1521@end table 1522 1523 1524 1525@node omp_destroy_lock 1526@section @code{omp_destroy_lock} -- Destroy simple lock 1527@table @asis 1528@item @emph{Description}: 1529Destroy a simple lock. In order to be destroyed, a simple lock must be 1530in the unlocked state. 1531 1532@item @emph{C/C++}: 1533@multitable @columnfractions .20 .80 1534@item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);} 1535@end multitable 1536 1537@item @emph{Fortran}: 1538@multitable @columnfractions .20 .80 1539@item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)} 1540@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1541@end multitable 1542 1543@item @emph{See also}: 1544@ref{omp_init_lock} 1545 1546@item @emph{Reference}: 1547@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3. 1548@end table 1549 1550 1551 1552@node omp_init_nest_lock 1553@section @code{omp_init_nest_lock} -- Initialize nested lock 1554@table @asis 1555@item @emph{Description}: 1556Initialize a nested lock. After initialization, the lock is in 1557an unlocked state and the nesting count is set to zero. 1558 1559@item @emph{C/C++}: 1560@multitable @columnfractions .20 .80 1561@item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);} 1562@end multitable 1563 1564@item @emph{Fortran}: 1565@multitable @columnfractions .20 .80 1566@item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)} 1567@item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar} 1568@end multitable 1569 1570@item @emph{See also}: 1571@ref{omp_destroy_nest_lock} 1572 1573@item @emph{Reference}: 1574@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1. 1575@end table 1576 1577 1578@node omp_set_nest_lock 1579@section @code{omp_set_nest_lock} -- Wait for and set nested lock 1580@table @asis 1581@item @emph{Description}: 1582Before setting a nested lock, the lock variable must be initialized by 1583@code{omp_init_nest_lock}. The calling thread is blocked until the lock 1584is available. If the lock is already held by the current thread, the 1585nesting count for the lock is incremented. 1586 1587@item @emph{C/C++}: 1588@multitable @columnfractions .20 .80 1589@item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);} 1590@end multitable 1591 1592@item @emph{Fortran}: 1593@multitable @columnfractions .20 .80 1594@item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)} 1595@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1596@end multitable 1597 1598@item @emph{See also}: 1599@ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock} 1600 1601@item @emph{Reference}: 1602@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4. 1603@end table 1604 1605 1606 1607@node omp_test_nest_lock 1608@section @code{omp_test_nest_lock} -- Test and set nested lock if available 1609@table @asis 1610@item @emph{Description}: 1611Before setting a nested lock, the lock variable must be initialized by 1612@code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock}, 1613@code{omp_test_nest_lock} does not block if the lock is not available. 1614If the lock is already held by the current thread, the new nesting count 1615is returned. Otherwise, the return value equals zero. 1616 1617@item @emph{C/C++}: 1618@multitable @columnfractions .20 .80 1619@item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);} 1620@end multitable 1621 1622@item @emph{Fortran}: 1623@multitable @columnfractions .20 .80 1624@item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)} 1625@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1626@end multitable 1627 1628 1629@item @emph{See also}: 1630@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock} 1631 1632@item @emph{Reference}: 1633@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6. 1634@end table 1635 1636 1637 1638@node omp_unset_nest_lock 1639@section @code{omp_unset_nest_lock} -- Unset nested lock 1640@table @asis 1641@item @emph{Description}: 1642A nested lock about to be unset must have been locked by @code{omp_set_nested_lock} 1643or @code{omp_test_nested_lock} before. In addition, the lock must be held by the 1644thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the 1645lock becomes unlocked. If one ore more threads attempted to set the lock before, 1646one of them is chosen to, again, set the lock to itself. 1647 1648@item @emph{C/C++}: 1649@multitable @columnfractions .20 .80 1650@item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);} 1651@end multitable 1652 1653@item @emph{Fortran}: 1654@multitable @columnfractions .20 .80 1655@item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)} 1656@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1657@end multitable 1658 1659@item @emph{See also}: 1660@ref{omp_set_nest_lock} 1661 1662@item @emph{Reference}: 1663@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5. 1664@end table 1665 1666 1667 1668@node omp_destroy_nest_lock 1669@section @code{omp_destroy_nest_lock} -- Destroy nested lock 1670@table @asis 1671@item @emph{Description}: 1672Destroy a nested lock. In order to be destroyed, a nested lock must be 1673in the unlocked state and its nesting count must equal zero. 1674 1675@item @emph{C/C++}: 1676@multitable @columnfractions .20 .80 1677@item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);} 1678@end multitable 1679 1680@item @emph{Fortran}: 1681@multitable @columnfractions .20 .80 1682@item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)} 1683@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1684@end multitable 1685 1686@item @emph{See also}: 1687@ref{omp_init_lock} 1688 1689@item @emph{Reference}: 1690@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3. 1691@end table 1692 1693 1694 1695@node omp_get_wtick 1696@section @code{omp_get_wtick} -- Get timer precision 1697@table @asis 1698@item @emph{Description}: 1699Gets the timer precision, i.e., the number of seconds between two 1700successive clock ticks. 1701 1702@item @emph{C/C++}: 1703@multitable @columnfractions .20 .80 1704@item @emph{Prototype}: @tab @code{double omp_get_wtick(void);} 1705@end multitable 1706 1707@item @emph{Fortran}: 1708@multitable @columnfractions .20 .80 1709@item @emph{Interface}: @tab @code{double precision function omp_get_wtick()} 1710@end multitable 1711 1712@item @emph{See also}: 1713@ref{omp_get_wtime} 1714 1715@item @emph{Reference}: 1716@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2. 1717@end table 1718 1719 1720 1721@node omp_get_wtime 1722@section @code{omp_get_wtime} -- Elapsed wall clock time 1723@table @asis 1724@item @emph{Description}: 1725Elapsed wall clock time in seconds. The time is measured per thread, no 1726guarantee can be made that two distinct threads measure the same time. 1727Time is measured from some "time in the past", which is an arbitrary time 1728guaranteed not to change during the execution of the program. 1729 1730@item @emph{C/C++}: 1731@multitable @columnfractions .20 .80 1732@item @emph{Prototype}: @tab @code{double omp_get_wtime(void);} 1733@end multitable 1734 1735@item @emph{Fortran}: 1736@multitable @columnfractions .20 .80 1737@item @emph{Interface}: @tab @code{double precision function omp_get_wtime()} 1738@end multitable 1739 1740@item @emph{See also}: 1741@ref{omp_get_wtick} 1742 1743@item @emph{Reference}: 1744@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1. 1745@end table 1746 1747 1748 1749@node omp_fulfill_event 1750@section @code{omp_fulfill_event} -- Fulfill and destroy an OpenMP event 1751@table @asis 1752@item @emph{Description}: 1753Fulfill the event associated with the event handle argument. Currently, it 1754is only used to fulfill events generated by detach clauses on task 1755constructs - the effect of fulfilling the event is to allow the task to 1756complete. 1757 1758The result of calling @code{omp_fulfill_event} with an event handle other 1759than that generated by a detach clause is undefined. Calling it with an 1760event handle that has already been fulfilled is also undefined. 1761 1762@item @emph{C/C++}: 1763@multitable @columnfractions .20 .80 1764@item @emph{Prototype}: @tab @code{void omp_fulfill_event(omp_event_handle_t event);} 1765@end multitable 1766 1767@item @emph{Fortran}: 1768@multitable @columnfractions .20 .80 1769@item @emph{Interface}: @tab @code{subroutine omp_fulfill_event(event)} 1770@item @tab @code{integer (kind=omp_event_handle_kind) :: event} 1771@end multitable 1772 1773@item @emph{Reference}: 1774@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.5.1. 1775@end table 1776 1777 1778 1779@c --------------------------------------------------------------------- 1780@c OpenMP Environment Variables 1781@c --------------------------------------------------------------------- 1782 1783@node Environment Variables 1784@chapter OpenMP Environment Variables 1785 1786The environment variables which beginning with @env{OMP_} are defined by 1787section 4 of the OpenMP specification in version 4.5, while those 1788beginning with @env{GOMP_} are GNU extensions. 1789 1790@menu 1791* OMP_CANCELLATION:: Set whether cancellation is activated 1792* OMP_DISPLAY_ENV:: Show OpenMP version and environment variables 1793* OMP_DEFAULT_DEVICE:: Set the device used in target regions 1794* OMP_DYNAMIC:: Dynamic adjustment of threads 1795* OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions 1796* OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value 1797* OMP_NESTED:: Nested parallel regions 1798* OMP_NUM_TEAMS:: Specifies the number of teams to use by teams region 1799* OMP_NUM_THREADS:: Specifies the number of threads to use 1800* OMP_PROC_BIND:: Whether threads may be moved between CPUs 1801* OMP_PLACES:: Specifies on which CPUs the threads should be placed 1802* OMP_STACKSIZE:: Set default thread stack size 1803* OMP_SCHEDULE:: How threads are scheduled 1804* OMP_TARGET_OFFLOAD:: Controls offloading behaviour 1805* OMP_TEAMS_THREAD_LIMIT:: Set the maximum number of threads imposed by teams 1806* OMP_THREAD_LIMIT:: Set the maximum number of threads 1807* OMP_WAIT_POLICY:: How waiting threads are handled 1808* GOMP_CPU_AFFINITY:: Bind threads to specific CPUs 1809* GOMP_DEBUG:: Enable debugging output 1810* GOMP_STACKSIZE:: Set default thread stack size 1811* GOMP_SPINCOUNT:: Set the busy-wait spin count 1812* GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools 1813@end menu 1814 1815 1816@node OMP_CANCELLATION 1817@section @env{OMP_CANCELLATION} -- Set whether cancellation is activated 1818@cindex Environment Variable 1819@table @asis 1820@item @emph{Description}: 1821If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or 1822if unset, cancellation is disabled and the @code{cancel} construct is ignored. 1823 1824@item @emph{See also}: 1825@ref{omp_get_cancellation} 1826 1827@item @emph{Reference}: 1828@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11 1829@end table 1830 1831 1832 1833@node OMP_DISPLAY_ENV 1834@section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables 1835@cindex Environment Variable 1836@table @asis 1837@item @emph{Description}: 1838If set to @code{TRUE}, the OpenMP version number and the values 1839associated with the OpenMP environment variables are printed to @code{stderr}. 1840If set to @code{VERBOSE}, it additionally shows the value of the environment 1841variables which are GNU extensions. If undefined or set to @code{FALSE}, 1842this information will not be shown. 1843 1844 1845@item @emph{Reference}: 1846@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12 1847@end table 1848 1849 1850 1851@node OMP_DEFAULT_DEVICE 1852@section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions 1853@cindex Environment Variable 1854@table @asis 1855@item @emph{Description}: 1856Set to choose the device which is used in a @code{target} region, unless the 1857value is overridden by @code{omp_set_default_device} or by a @code{device} 1858clause. The value shall be the nonnegative device number. If no device with 1859the given device number exists, the code is executed on the host. If unset, 1860device number 0 will be used. 1861 1862 1863@item @emph{See also}: 1864@ref{omp_get_default_device}, @ref{omp_set_default_device}, 1865 1866@item @emph{Reference}: 1867@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13 1868@end table 1869 1870 1871 1872@node OMP_DYNAMIC 1873@section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads 1874@cindex Environment Variable 1875@table @asis 1876@item @emph{Description}: 1877Enable or disable the dynamic adjustment of the number of threads 1878within a team. The value of this environment variable shall be 1879@code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is 1880disabled by default. 1881 1882@item @emph{See also}: 1883@ref{omp_set_dynamic} 1884 1885@item @emph{Reference}: 1886@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3 1887@end table 1888 1889 1890 1891@node OMP_MAX_ACTIVE_LEVELS 1892@section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions 1893@cindex Environment Variable 1894@table @asis 1895@item @emph{Description}: 1896Specifies the initial value for the maximum number of nested parallel 1897regions. The value of this variable shall be a positive integer. 1898If undefined, then if @env{OMP_NESTED} is defined and set to true, or 1899if @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND} are defined and set to 1900a list with more than one item, the maximum number of nested parallel 1901regions will be initialized to the largest number supported, otherwise 1902it will be set to one. 1903 1904@item @emph{See also}: 1905@ref{omp_set_max_active_levels}, @ref{OMP_NESTED} 1906 1907@item @emph{Reference}: 1908@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9 1909@end table 1910 1911 1912 1913@node OMP_MAX_TASK_PRIORITY 1914@section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority 1915number that can be set for a task. 1916@cindex Environment Variable 1917@table @asis 1918@item @emph{Description}: 1919Specifies the initial value for the maximum priority value that can be 1920set for a task. The value of this variable shall be a non-negative 1921integer, and zero is allowed. If undefined, the default priority is 19220. 1923 1924@item @emph{See also}: 1925@ref{omp_get_max_task_priority} 1926 1927@item @emph{Reference}: 1928@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14 1929@end table 1930 1931 1932 1933@node OMP_NESTED 1934@section @env{OMP_NESTED} -- Nested parallel regions 1935@cindex Environment Variable 1936@cindex Implementation specific setting 1937@table @asis 1938@item @emph{Description}: 1939Enable or disable nested parallel regions, i.e., whether team members 1940are allowed to create new teams. The value of this environment variable 1941shall be @code{TRUE} or @code{FALSE}. If set to @code{TRUE}, the number 1942of maximum active nested regions supported will by default be set to the 1943maximum supported, otherwise it will be set to one. If 1944@env{OMP_MAX_ACTIVE_LEVELS} is defined, its setting will override this 1945setting. If both are undefined, nested parallel regions are enabled if 1946@env{OMP_NUM_THREADS} or @env{OMP_PROC_BINDS} are defined to a list with 1947more than one item, otherwise they are disabled by default. 1948 1949@item @emph{See also}: 1950@ref{omp_set_max_active_levels}, @ref{omp_set_nested} 1951 1952@item @emph{Reference}: 1953@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6 1954@end table 1955 1956 1957 1958@node OMP_NUM_TEAMS 1959@section @env{OMP_NUM_TEAMS} -- Specifies the number of teams to use by teams region 1960@cindex Environment Variable 1961@table @asis 1962@item @emph{Description}: 1963Specifies the upper bound for number of teams to use in teams regions 1964without explicit @code{num_teams} clause. The value of this variable shall 1965be a positive integer. If undefined it defaults to 0 which means 1966implementation defined upper bound. 1967 1968@item @emph{See also}: 1969@ref{omp_set_num_teams} 1970 1971@item @emph{Reference}: 1972@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.23 1973@end table 1974 1975 1976 1977@node OMP_NUM_THREADS 1978@section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use 1979@cindex Environment Variable 1980@cindex Implementation specific setting 1981@table @asis 1982@item @emph{Description}: 1983Specifies the default number of threads to use in parallel regions. The 1984value of this variable shall be a comma-separated list of positive integers; 1985the value specifies the number of threads to use for the corresponding nested 1986level. Specifying more than one item in the list will automatically enable 1987nesting by default. If undefined one thread per CPU is used. 1988 1989@item @emph{See also}: 1990@ref{omp_set_num_threads}, @ref{OMP_NESTED} 1991 1992@item @emph{Reference}: 1993@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2 1994@end table 1995 1996 1997 1998@node OMP_PROC_BIND 1999@section @env{OMP_PROC_BIND} -- Whether threads may be moved between CPUs 2000@cindex Environment Variable 2001@table @asis 2002@item @emph{Description}: 2003Specifies whether threads may be moved between processors. If set to 2004@code{TRUE}, OpenMP threads should not be moved; if set to @code{FALSE} 2005they may be moved. Alternatively, a comma separated list with the 2006values @code{PRIMARY}, @code{MASTER}, @code{CLOSE} and @code{SPREAD} can 2007be used to specify the thread affinity policy for the corresponding nesting 2008level. With @code{PRIMARY} and @code{MASTER} the worker threads are in the 2009same place partition as the primary thread. With @code{CLOSE} those are 2010kept close to the primary thread in contiguous place partitions. And 2011with @code{SPREAD} a sparse distribution 2012across the place partitions is used. Specifying more than one item in the 2013list will automatically enable nesting by default. 2014 2015When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when 2016@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise. 2017 2018@item @emph{See also}: 2019@ref{omp_get_proc_bind}, @ref{GOMP_CPU_AFFINITY}, 2020@ref{OMP_NESTED}, @ref{OMP_PLACES} 2021 2022@item @emph{Reference}: 2023@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4 2024@end table 2025 2026 2027 2028@node OMP_PLACES 2029@section @env{OMP_PLACES} -- Specifies on which CPUs the threads should be placed 2030@cindex Environment Variable 2031@table @asis 2032@item @emph{Description}: 2033The thread placement can be either specified using an abstract name or by an 2034explicit list of the places. The abstract names @code{threads}, @code{cores}, 2035@code{sockets}, @code{ll_caches} and @code{numa_domains} can be optionally 2036followed by a positive number in parentheses, which denotes the how many places 2037shall be created. With @code{threads} each place corresponds to a single 2038hardware thread; @code{cores} to a single core with the corresponding number of 2039hardware threads; with @code{sockets} the place corresponds to a single 2040socket; with @code{ll_caches} to a set of cores that shares the last level 2041cache on the device; and @code{numa_domains} to a set of cores for which their 2042closest memory on the device is the same memory and at a similar distance from 2043the cores. The resulting placement can be shown by setting the 2044@env{OMP_DISPLAY_ENV} environment variable. 2045 2046Alternatively, the placement can be specified explicitly as comma-separated 2047list of places. A place is specified by set of nonnegative numbers in curly 2048braces, denoting the hardware threads. The curly braces can be omitted 2049when only a single number has been specified. The hardware threads 2050belonging to a place can either be specified as comma-separated list of 2051nonnegative thread numbers or using an interval. Multiple places can also be 2052either specified by a comma-separated list of places or by an interval. To 2053specify an interval, a colon followed by the count is placed after 2054the hardware thread number or the place. Optionally, the length can be 2055followed by a colon and the stride number -- otherwise a unit stride is 2056assumed. Placing an exclamation mark (@code{!}) directly before a curly 2057brace or numbers inside the curly braces (excluding intervals) will 2058exclude those hardware threads. 2059 2060For instance, the following specifies the same places list: 2061@code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"}; 2062@code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}. 2063 2064If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and 2065@env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved 2066between CPUs following no placement policy. 2067 2068@item @emph{See also}: 2069@ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}, 2070@ref{OMP_DISPLAY_ENV} 2071 2072@item @emph{Reference}: 2073@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5 2074@end table 2075 2076 2077 2078@node OMP_STACKSIZE 2079@section @env{OMP_STACKSIZE} -- Set default thread stack size 2080@cindex Environment Variable 2081@table @asis 2082@item @emph{Description}: 2083Set the default thread stack size in kilobytes, unless the number 2084is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which 2085case the size is, respectively, in bytes, kilobytes, megabytes 2086or gigabytes. This is different from @code{pthread_attr_setstacksize} 2087which gets the number of bytes as an argument. If the stack size cannot 2088be set due to system constraints, an error is reported and the initial 2089stack size is left unchanged. If undefined, the stack size is system 2090dependent. 2091 2092@item @emph{Reference}: 2093@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7 2094@end table 2095 2096 2097 2098@node OMP_SCHEDULE 2099@section @env{OMP_SCHEDULE} -- How threads are scheduled 2100@cindex Environment Variable 2101@cindex Implementation specific setting 2102@table @asis 2103@item @emph{Description}: 2104Allows to specify @code{schedule type} and @code{chunk size}. 2105The value of the variable shall have the form: @code{type[,chunk]} where 2106@code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto} 2107The optional @code{chunk} size shall be a positive integer. If undefined, 2108dynamic scheduling and a chunk size of 1 is used. 2109 2110@item @emph{See also}: 2111@ref{omp_set_schedule} 2112 2113@item @emph{Reference}: 2114@uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1 2115@end table 2116 2117 2118 2119@node OMP_TARGET_OFFLOAD 2120@section @env{OMP_TARGET_OFFLOAD} -- Controls offloading behaviour 2121@cindex Environment Variable 2122@cindex Implementation specific setting 2123@table @asis 2124@item @emph{Description}: 2125Specifies the behaviour with regard to offloading code to a device. This 2126variable can be set to one of three values - @code{MANDATORY}, @code{DISABLED} 2127or @code{DEFAULT}. 2128 2129If set to @code{MANDATORY}, the program will terminate with an error if 2130the offload device is not present or is not supported. If set to 2131@code{DISABLED}, then offloading is disabled and all code will run on the 2132host. If set to @code{DEFAULT}, the program will try offloading to the 2133device first, then fall back to running code on the host if it cannot. 2134 2135If undefined, then the program will behave as if @code{DEFAULT} was set. 2136 2137@item @emph{Reference}: 2138@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.17 2139@end table 2140 2141 2142 2143@node OMP_TEAMS_THREAD_LIMIT 2144@section @env{OMP_TEAMS_THREAD_LIMIT} -- Set the maximum number of threads imposed by teams 2145@cindex Environment Variable 2146@table @asis 2147@item @emph{Description}: 2148Specifies an upper bound for the number of threads to use by each contention 2149group created by a teams construct without explicit @code{thread_limit} 2150clause. The value of this variable shall be a positive integer. If undefined, 2151the value of 0 is used which stands for an implementation defined upper 2152limit. 2153 2154@item @emph{See also}: 2155@ref{OMP_THREAD_LIMIT}, @ref{omp_set_teams_thread_limit} 2156 2157@item @emph{Reference}: 2158@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.24 2159@end table 2160 2161 2162 2163@node OMP_THREAD_LIMIT 2164@section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads 2165@cindex Environment Variable 2166@table @asis 2167@item @emph{Description}: 2168Specifies the number of threads to use for the whole program. The 2169value of this variable shall be a positive integer. If undefined, 2170the number of threads is not limited. 2171 2172@item @emph{See also}: 2173@ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit} 2174 2175@item @emph{Reference}: 2176@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10 2177@end table 2178 2179 2180 2181@node OMP_WAIT_POLICY 2182@section @env{OMP_WAIT_POLICY} -- How waiting threads are handled 2183@cindex Environment Variable 2184@table @asis 2185@item @emph{Description}: 2186Specifies whether waiting threads should be active or passive. If 2187the value is @code{PASSIVE}, waiting threads should not consume CPU 2188power while waiting; while the value is @code{ACTIVE} specifies that 2189they should. If undefined, threads wait actively for a short time 2190before waiting passively. 2191 2192@item @emph{See also}: 2193@ref{GOMP_SPINCOUNT} 2194 2195@item @emph{Reference}: 2196@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8 2197@end table 2198 2199 2200 2201@node GOMP_CPU_AFFINITY 2202@section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs 2203@cindex Environment Variable 2204@table @asis 2205@item @emph{Description}: 2206Binds threads to specific CPUs. The variable should contain a space-separated 2207or comma-separated list of CPUs. This list may contain different kinds of 2208entries: either single CPU numbers in any order, a range of CPUs (M-N) 2209or a range with some stride (M-N:S). CPU numbers are zero based. For example, 2210@code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread 2211to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to 2212CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12, 2213and 14 respectively and then start assigning back from the beginning of 2214the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0. 2215 2216There is no libgomp library routine to determine whether a CPU affinity 2217specification is in effect. As a workaround, language-specific library 2218functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in 2219Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY} 2220environment variable. A defined CPU affinity on startup cannot be changed 2221or disabled during the runtime of the application. 2222 2223If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set, 2224@env{OMP_PROC_BIND} has a higher precedence. If neither has been set and 2225@env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to 2226@code{FALSE}, the host system will handle the assignment of threads to CPUs. 2227 2228@item @emph{See also}: 2229@ref{OMP_PLACES}, @ref{OMP_PROC_BIND} 2230@end table 2231 2232 2233 2234@node GOMP_DEBUG 2235@section @env{GOMP_DEBUG} -- Enable debugging output 2236@cindex Environment Variable 2237@table @asis 2238@item @emph{Description}: 2239Enable debugging output. The variable should be set to @code{0} 2240(disabled, also the default if not set), or @code{1} (enabled). 2241 2242If enabled, some debugging output will be printed during execution. 2243This is currently not specified in more detail, and subject to change. 2244@end table 2245 2246 2247 2248@node GOMP_STACKSIZE 2249@section @env{GOMP_STACKSIZE} -- Set default thread stack size 2250@cindex Environment Variable 2251@cindex Implementation specific setting 2252@table @asis 2253@item @emph{Description}: 2254Set the default thread stack size in kilobytes. This is different from 2255@code{pthread_attr_setstacksize} which gets the number of bytes as an 2256argument. If the stack size cannot be set due to system constraints, an 2257error is reported and the initial stack size is left unchanged. If undefined, 2258the stack size is system dependent. 2259 2260@item @emph{See also}: 2261@ref{OMP_STACKSIZE} 2262 2263@item @emph{Reference}: 2264@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html, 2265GCC Patches Mailinglist}, 2266@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html, 2267GCC Patches Mailinglist} 2268@end table 2269 2270 2271 2272@node GOMP_SPINCOUNT 2273@section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count 2274@cindex Environment Variable 2275@cindex Implementation specific setting 2276@table @asis 2277@item @emph{Description}: 2278Determines how long a threads waits actively with consuming CPU power 2279before waiting passively without consuming CPU power. The value may be 2280either @code{INFINITE}, @code{INFINITY} to always wait actively or an 2281integer which gives the number of spins of the busy-wait loop. The 2282integer may optionally be followed by the following suffixes acting 2283as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega, 2284million), @code{G} (giga, billion), or @code{T} (tera, trillion). 2285If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE}, 2286300,000 is used when @env{OMP_WAIT_POLICY} is undefined and 228730 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}. 2288If there are more OpenMP threads than available CPUs, 1000 and 100 2289spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or 2290undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower 2291or @env{OMP_WAIT_POLICY} is @code{PASSIVE}. 2292 2293@item @emph{See also}: 2294@ref{OMP_WAIT_POLICY} 2295@end table 2296 2297 2298 2299@node GOMP_RTEMS_THREAD_POOLS 2300@section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools 2301@cindex Environment Variable 2302@cindex Implementation specific setting 2303@table @asis 2304@item @emph{Description}: 2305This environment variable is only used on the RTEMS real-time operating system. 2306It determines the scheduler instance specific thread pools. The format for 2307@env{GOMP_RTEMS_THREAD_POOLS} is a list of optional 2308@code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations 2309separated by @code{:} where: 2310@itemize @bullet 2311@item @code{<thread-pool-count>} is the thread pool count for this scheduler 2312instance. 2313@item @code{$<priority>} is an optional priority for the worker threads of a 2314thread pool according to @code{pthread_setschedparam}. In case a priority 2315value is omitted, then a worker thread will inherit the priority of the OpenMP 2316primary thread that created it. The priority of the worker thread is not 2317changed after creation, even if a new OpenMP primary thread using the worker has 2318a different priority. 2319@item @code{@@<scheduler-name>} is the scheduler instance name according to the 2320RTEMS application configuration. 2321@end itemize 2322In case no thread pool configuration is specified for a scheduler instance, 2323then each OpenMP primary thread of this scheduler instance will use its own 2324dynamically allocated thread pool. To limit the worker thread count of the 2325thread pools, each OpenMP primary thread must call @code{omp_set_num_threads}. 2326@item @emph{Example}: 2327Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and 2328@code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to 2329@code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for 2330scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is 2331one thread pool available. Since no priority is specified for this scheduler 2332instance, the worker thread inherits the priority of the OpenMP primary thread 2333that created it. In the scheduler instance @code{WRK1} there are three thread 2334pools available and their worker threads run at priority four. 2335@end table 2336 2337 2338 2339@c --------------------------------------------------------------------- 2340@c Enabling OpenACC 2341@c --------------------------------------------------------------------- 2342 2343@node Enabling OpenACC 2344@chapter Enabling OpenACC 2345 2346To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 2347flag @option{-fopenacc} must be specified. This enables the OpenACC directive 2348@code{#pragma acc} in C/C++ and @code{!$acc} directives in free form, 2349@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form, 2350@code{!$} conditional compilation sentinels in free form and @code{c$}, 2351@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also 2352arranges for automatic linking of the OpenACC runtime library 2353(@ref{OpenACC Runtime Library Routines}). 2354 2355See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information. 2356 2357A complete description of all OpenACC directives accepted may be found in 2358the @uref{https://www.openacc.org, OpenACC} Application Programming 2359Interface manual, version 2.6. 2360 2361 2362 2363@c --------------------------------------------------------------------- 2364@c OpenACC Runtime Library Routines 2365@c --------------------------------------------------------------------- 2366 2367@node OpenACC Runtime Library Routines 2368@chapter OpenACC Runtime Library Routines 2369 2370The runtime routines described here are defined by section 3 of the OpenACC 2371specifications in version 2.6. 2372They have C linkage, and do not throw exceptions. 2373Generally, they are available only for the host, with the exception of 2374@code{acc_on_device}, which is available for both the host and the 2375acceleration device. 2376 2377@menu 2378* acc_get_num_devices:: Get number of devices for the given device 2379 type. 2380* acc_set_device_type:: Set type of device accelerator to use. 2381* acc_get_device_type:: Get type of device accelerator to be used. 2382* acc_set_device_num:: Set device number to use. 2383* acc_get_device_num:: Get device number to be used. 2384* acc_get_property:: Get device property. 2385* acc_async_test:: Tests for completion of a specific asynchronous 2386 operation. 2387* acc_async_test_all:: Tests for completion of all asynchronous 2388 operations. 2389* acc_wait:: Wait for completion of a specific asynchronous 2390 operation. 2391* acc_wait_all:: Waits for completion of all asynchronous 2392 operations. 2393* acc_wait_all_async:: Wait for completion of all asynchronous 2394 operations. 2395* acc_wait_async:: Wait for completion of asynchronous operations. 2396* acc_init:: Initialize runtime for a specific device type. 2397* acc_shutdown:: Shuts down the runtime for a specific device 2398 type. 2399* acc_on_device:: Whether executing on a particular device 2400* acc_malloc:: Allocate device memory. 2401* acc_free:: Free device memory. 2402* acc_copyin:: Allocate device memory and copy host memory to 2403 it. 2404* acc_present_or_copyin:: If the data is not present on the device, 2405 allocate device memory and copy from host 2406 memory. 2407* acc_create:: Allocate device memory and map it to host 2408 memory. 2409* acc_present_or_create:: If the data is not present on the device, 2410 allocate device memory and map it to host 2411 memory. 2412* acc_copyout:: Copy device memory to host memory. 2413* acc_delete:: Free device memory. 2414* acc_update_device:: Update device memory from mapped host memory. 2415* acc_update_self:: Update host memory from mapped device memory. 2416* acc_map_data:: Map previously allocated device memory to host 2417 memory. 2418* acc_unmap_data:: Unmap device memory from host memory. 2419* acc_deviceptr:: Get device pointer associated with specific 2420 host address. 2421* acc_hostptr:: Get host pointer associated with specific 2422 device address. 2423* acc_is_present:: Indicate whether host variable / array is 2424 present on device. 2425* acc_memcpy_to_device:: Copy host memory to device memory. 2426* acc_memcpy_from_device:: Copy device memory to host memory. 2427* acc_attach:: Let device pointer point to device-pointer target. 2428* acc_detach:: Let device pointer point to host-pointer target. 2429 2430API routines for target platforms. 2431 2432* acc_get_current_cuda_device:: Get CUDA device handle. 2433* acc_get_current_cuda_context::Get CUDA context handle. 2434* acc_get_cuda_stream:: Get CUDA stream handle. 2435* acc_set_cuda_stream:: Set CUDA stream handle. 2436 2437API routines for the OpenACC Profiling Interface. 2438 2439* acc_prof_register:: Register callbacks. 2440* acc_prof_unregister:: Unregister callbacks. 2441* acc_prof_lookup:: Obtain inquiry functions. 2442* acc_register_library:: Library registration. 2443@end menu 2444 2445 2446 2447@node acc_get_num_devices 2448@section @code{acc_get_num_devices} -- Get number of devices for given device type 2449@table @asis 2450@item @emph{Description} 2451This function returns a value indicating the number of devices available 2452for the device type specified in @var{devicetype}. 2453 2454@item @emph{C/C++}: 2455@multitable @columnfractions .20 .80 2456@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);} 2457@end multitable 2458 2459@item @emph{Fortran}: 2460@multitable @columnfractions .20 .80 2461@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)} 2462@item @tab @code{integer(kind=acc_device_kind) devicetype} 2463@end multitable 2464 2465@item @emph{Reference}: 2466@uref{https://www.openacc.org, OpenACC specification v2.6}, section 24673.2.1. 2468@end table 2469 2470 2471 2472@node acc_set_device_type 2473@section @code{acc_set_device_type} -- Set type of device accelerator to use. 2474@table @asis 2475@item @emph{Description} 2476This function indicates to the runtime library which device type, specified 2477in @var{devicetype}, to use when executing a parallel or kernels region. 2478 2479@item @emph{C/C++}: 2480@multitable @columnfractions .20 .80 2481@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);} 2482@end multitable 2483 2484@item @emph{Fortran}: 2485@multitable @columnfractions .20 .80 2486@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)} 2487@item @tab @code{integer(kind=acc_device_kind) devicetype} 2488@end multitable 2489 2490@item @emph{Reference}: 2491@uref{https://www.openacc.org, OpenACC specification v2.6}, section 24923.2.2. 2493@end table 2494 2495 2496 2497@node acc_get_device_type 2498@section @code{acc_get_device_type} -- Get type of device accelerator to be used. 2499@table @asis 2500@item @emph{Description} 2501This function returns what device type will be used when executing a 2502parallel or kernels region. 2503 2504This function returns @code{acc_device_none} if 2505@code{acc_get_device_type} is called from 2506@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end} 2507callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling 2508Interface}), that is, if the device is currently being initialized. 2509 2510@item @emph{C/C++}: 2511@multitable @columnfractions .20 .80 2512@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);} 2513@end multitable 2514 2515@item @emph{Fortran}: 2516@multitable @columnfractions .20 .80 2517@item @emph{Interface}: @tab @code{function acc_get_device_type(void)} 2518@item @tab @code{integer(kind=acc_device_kind) acc_get_device_type} 2519@end multitable 2520 2521@item @emph{Reference}: 2522@uref{https://www.openacc.org, OpenACC specification v2.6}, section 25233.2.3. 2524@end table 2525 2526 2527 2528@node acc_set_device_num 2529@section @code{acc_set_device_num} -- Set device number to use. 2530@table @asis 2531@item @emph{Description} 2532This function will indicate to the runtime which device number, 2533specified by @var{devicenum}, associated with the specified device 2534type @var{devicetype}. 2535 2536@item @emph{C/C++}: 2537@multitable @columnfractions .20 .80 2538@item @emph{Prototype}: @tab @code{acc_set_device_num(int devicenum, acc_device_t devicetype);} 2539@end multitable 2540 2541@item @emph{Fortran}: 2542@multitable @columnfractions .20 .80 2543@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)} 2544@item @tab @code{integer devicenum} 2545@item @tab @code{integer(kind=acc_device_kind) devicetype} 2546@end multitable 2547 2548@item @emph{Reference}: 2549@uref{https://www.openacc.org, OpenACC specification v2.6}, section 25503.2.4. 2551@end table 2552 2553 2554 2555@node acc_get_device_num 2556@section @code{acc_get_device_num} -- Get device number to be used. 2557@table @asis 2558@item @emph{Description} 2559This function returns which device number associated with the specified device 2560type @var{devicetype}, will be used when executing a parallel or kernels 2561region. 2562 2563@item @emph{C/C++}: 2564@multitable @columnfractions .20 .80 2565@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);} 2566@end multitable 2567 2568@item @emph{Fortran}: 2569@multitable @columnfractions .20 .80 2570@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)} 2571@item @tab @code{integer(kind=acc_device_kind) devicetype} 2572@item @tab @code{integer acc_get_device_num} 2573@end multitable 2574 2575@item @emph{Reference}: 2576@uref{https://www.openacc.org, OpenACC specification v2.6}, section 25773.2.5. 2578@end table 2579 2580 2581 2582@node acc_get_property 2583@section @code{acc_get_property} -- Get device property. 2584@cindex acc_get_property 2585@cindex acc_get_property_string 2586@table @asis 2587@item @emph{Description} 2588These routines return the value of the specified @var{property} for the 2589device being queried according to @var{devicenum} and @var{devicetype}. 2590Integer-valued and string-valued properties are returned by 2591@code{acc_get_property} and @code{acc_get_property_string} respectively. 2592The Fortran @code{acc_get_property_string} subroutine returns the string 2593retrieved in its fourth argument while the remaining entry points are 2594functions, which pass the return value as their result. 2595 2596Note for Fortran, only: the OpenACC technical committee corrected and, hence, 2597modified the interface introduced in OpenACC 2.6. The kind-value parameter 2598@code{acc_device_property} has been renamed to @code{acc_device_property_kind} 2599for consistency and the return type of the @code{acc_get_property} function is 2600now a @code{c_size_t} integer instead of a @code{acc_device_property} integer. 2601The parameter @code{acc_device_property} will continue to be provided, 2602but might be removed in a future version of GCC. 2603 2604@item @emph{C/C++}: 2605@multitable @columnfractions .20 .80 2606@item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);} 2607@item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);} 2608@end multitable 2609 2610@item @emph{Fortran}: 2611@multitable @columnfractions .20 .80 2612@item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)} 2613@item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)} 2614@item @tab @code{use ISO_C_Binding, only: c_size_t} 2615@item @tab @code{integer devicenum} 2616@item @tab @code{integer(kind=acc_device_kind) devicetype} 2617@item @tab @code{integer(kind=acc_device_property_kind) property} 2618@item @tab @code{integer(kind=c_size_t) acc_get_property} 2619@item @tab @code{character(*) string} 2620@end multitable 2621 2622@item @emph{Reference}: 2623@uref{https://www.openacc.org, OpenACC specification v2.6}, section 26243.2.6. 2625@end table 2626 2627 2628 2629@node acc_async_test 2630@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation. 2631@table @asis 2632@item @emph{Description} 2633This function tests for completion of the asynchronous operation specified 2634in @var{arg}. In C/C++, a non-zero value will be returned to indicate 2635the specified asynchronous operation has completed. While Fortran will return 2636a @code{true}. If the asynchronous operation has not completed, C/C++ returns 2637a zero and Fortran returns a @code{false}. 2638 2639@item @emph{C/C++}: 2640@multitable @columnfractions .20 .80 2641@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);} 2642@end multitable 2643 2644@item @emph{Fortran}: 2645@multitable @columnfractions .20 .80 2646@item @emph{Interface}: @tab @code{function acc_async_test(arg)} 2647@item @tab @code{integer(kind=acc_handle_kind) arg} 2648@item @tab @code{logical acc_async_test} 2649@end multitable 2650 2651@item @emph{Reference}: 2652@uref{https://www.openacc.org, OpenACC specification v2.6}, section 26533.2.9. 2654@end table 2655 2656 2657 2658@node acc_async_test_all 2659@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations. 2660@table @asis 2661@item @emph{Description} 2662This function tests for completion of all asynchronous operations. 2663In C/C++, a non-zero value will be returned to indicate all asynchronous 2664operations have completed. While Fortran will return a @code{true}. If 2665any asynchronous operation has not completed, C/C++ returns a zero and 2666Fortran returns a @code{false}. 2667 2668@item @emph{C/C++}: 2669@multitable @columnfractions .20 .80 2670@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);} 2671@end multitable 2672 2673@item @emph{Fortran}: 2674@multitable @columnfractions .20 .80 2675@item @emph{Interface}: @tab @code{function acc_async_test()} 2676@item @tab @code{logical acc_get_device_num} 2677@end multitable 2678 2679@item @emph{Reference}: 2680@uref{https://www.openacc.org, OpenACC specification v2.6}, section 26813.2.10. 2682@end table 2683 2684 2685 2686@node acc_wait 2687@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation. 2688@table @asis 2689@item @emph{Description} 2690This function waits for completion of the asynchronous operation 2691specified in @var{arg}. 2692 2693@item @emph{C/C++}: 2694@multitable @columnfractions .20 .80 2695@item @emph{Prototype}: @tab @code{acc_wait(arg);} 2696@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);} 2697@end multitable 2698 2699@item @emph{Fortran}: 2700@multitable @columnfractions .20 .80 2701@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)} 2702@item @tab @code{integer(acc_handle_kind) arg} 2703@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)} 2704@item @tab @code{integer(acc_handle_kind) arg} 2705@end multitable 2706 2707@item @emph{Reference}: 2708@uref{https://www.openacc.org, OpenACC specification v2.6}, section 27093.2.11. 2710@end table 2711 2712 2713 2714@node acc_wait_all 2715@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations. 2716@table @asis 2717@item @emph{Description} 2718This function waits for the completion of all asynchronous operations. 2719 2720@item @emph{C/C++}: 2721@multitable @columnfractions .20 .80 2722@item @emph{Prototype}: @tab @code{acc_wait_all(void);} 2723@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);} 2724@end multitable 2725 2726@item @emph{Fortran}: 2727@multitable @columnfractions .20 .80 2728@item @emph{Interface}: @tab @code{subroutine acc_wait_all()} 2729@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()} 2730@end multitable 2731 2732@item @emph{Reference}: 2733@uref{https://www.openacc.org, OpenACC specification v2.6}, section 27343.2.13. 2735@end table 2736 2737 2738 2739@node acc_wait_all_async 2740@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations. 2741@table @asis 2742@item @emph{Description} 2743This function enqueues a wait operation on the queue @var{async} for any 2744and all asynchronous operations that have been previously enqueued on 2745any queue. 2746 2747@item @emph{C/C++}: 2748@multitable @columnfractions .20 .80 2749@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);} 2750@end multitable 2751 2752@item @emph{Fortran}: 2753@multitable @columnfractions .20 .80 2754@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)} 2755@item @tab @code{integer(acc_handle_kind) async} 2756@end multitable 2757 2758@item @emph{Reference}: 2759@uref{https://www.openacc.org, OpenACC specification v2.6}, section 27603.2.14. 2761@end table 2762 2763 2764 2765@node acc_wait_async 2766@section @code{acc_wait_async} -- Wait for completion of asynchronous operations. 2767@table @asis 2768@item @emph{Description} 2769This function enqueues a wait operation on queue @var{async} for any and all 2770asynchronous operations enqueued on queue @var{arg}. 2771 2772@item @emph{C/C++}: 2773@multitable @columnfractions .20 .80 2774@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);} 2775@end multitable 2776 2777@item @emph{Fortran}: 2778@multitable @columnfractions .20 .80 2779@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)} 2780@item @tab @code{integer(acc_handle_kind) arg, async} 2781@end multitable 2782 2783@item @emph{Reference}: 2784@uref{https://www.openacc.org, OpenACC specification v2.6}, section 27853.2.12. 2786@end table 2787 2788 2789 2790@node acc_init 2791@section @code{acc_init} -- Initialize runtime for a specific device type. 2792@table @asis 2793@item @emph{Description} 2794This function initializes the runtime for the device type specified in 2795@var{devicetype}. 2796 2797@item @emph{C/C++}: 2798@multitable @columnfractions .20 .80 2799@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);} 2800@end multitable 2801 2802@item @emph{Fortran}: 2803@multitable @columnfractions .20 .80 2804@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)} 2805@item @tab @code{integer(acc_device_kind) devicetype} 2806@end multitable 2807 2808@item @emph{Reference}: 2809@uref{https://www.openacc.org, OpenACC specification v2.6}, section 28103.2.7. 2811@end table 2812 2813 2814 2815@node acc_shutdown 2816@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type. 2817@table @asis 2818@item @emph{Description} 2819This function shuts down the runtime for the device type specified in 2820@var{devicetype}. 2821 2822@item @emph{C/C++}: 2823@multitable @columnfractions .20 .80 2824@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);} 2825@end multitable 2826 2827@item @emph{Fortran}: 2828@multitable @columnfractions .20 .80 2829@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)} 2830@item @tab @code{integer(acc_device_kind) devicetype} 2831@end multitable 2832 2833@item @emph{Reference}: 2834@uref{https://www.openacc.org, OpenACC specification v2.6}, section 28353.2.8. 2836@end table 2837 2838 2839 2840@node acc_on_device 2841@section @code{acc_on_device} -- Whether executing on a particular device 2842@table @asis 2843@item @emph{Description}: 2844This function returns whether the program is executing on a particular 2845device specified in @var{devicetype}. In C/C++ a non-zero value is 2846returned to indicate the device is executing on the specified device type. 2847In Fortran, @code{true} will be returned. If the program is not executing 2848on the specified device type C/C++ will return a zero, while Fortran will 2849return @code{false}. 2850 2851@item @emph{C/C++}: 2852@multitable @columnfractions .20 .80 2853@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);} 2854@end multitable 2855 2856@item @emph{Fortran}: 2857@multitable @columnfractions .20 .80 2858@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)} 2859@item @tab @code{integer(acc_device_kind) devicetype} 2860@item @tab @code{logical acc_on_device} 2861@end multitable 2862 2863 2864@item @emph{Reference}: 2865@uref{https://www.openacc.org, OpenACC specification v2.6}, section 28663.2.17. 2867@end table 2868 2869 2870 2871@node acc_malloc 2872@section @code{acc_malloc} -- Allocate device memory. 2873@table @asis 2874@item @emph{Description} 2875This function allocates @var{len} bytes of device memory. It returns 2876the device address of the allocated memory. 2877 2878@item @emph{C/C++}: 2879@multitable @columnfractions .20 .80 2880@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);} 2881@end multitable 2882 2883@item @emph{Reference}: 2884@uref{https://www.openacc.org, OpenACC specification v2.6}, section 28853.2.18. 2886@end table 2887 2888 2889 2890@node acc_free 2891@section @code{acc_free} -- Free device memory. 2892@table @asis 2893@item @emph{Description} 2894Free previously allocated device memory at the device address @code{a}. 2895 2896@item @emph{C/C++}: 2897@multitable @columnfractions .20 .80 2898@item @emph{Prototype}: @tab @code{acc_free(d_void *a);} 2899@end multitable 2900 2901@item @emph{Reference}: 2902@uref{https://www.openacc.org, OpenACC specification v2.6}, section 29033.2.19. 2904@end table 2905 2906 2907 2908@node acc_copyin 2909@section @code{acc_copyin} -- Allocate device memory and copy host memory to it. 2910@table @asis 2911@item @emph{Description} 2912In C/C++, this function allocates @var{len} bytes of device memory 2913and maps it to the specified host address in @var{a}. The device 2914address of the newly allocated device memory is returned. 2915 2916In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2917a contiguous array section. The second form @var{a} specifies a 2918variable or array element and @var{len} specifies the length in bytes. 2919 2920@item @emph{C/C++}: 2921@multitable @columnfractions .20 .80 2922@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);} 2923@item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);} 2924@end multitable 2925 2926@item @emph{Fortran}: 2927@multitable @columnfractions .20 .80 2928@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)} 2929@item @tab @code{type, dimension(:[,:]...) :: a} 2930@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)} 2931@item @tab @code{type, dimension(:[,:]...) :: a} 2932@item @tab @code{integer len} 2933@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)} 2934@item @tab @code{type, dimension(:[,:]...) :: a} 2935@item @tab @code{integer(acc_handle_kind) :: async} 2936@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)} 2937@item @tab @code{type, dimension(:[,:]...) :: a} 2938@item @tab @code{integer len} 2939@item @tab @code{integer(acc_handle_kind) :: async} 2940@end multitable 2941 2942@item @emph{Reference}: 2943@uref{https://www.openacc.org, OpenACC specification v2.6}, section 29443.2.20. 2945@end table 2946 2947 2948 2949@node acc_present_or_copyin 2950@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory. 2951@table @asis 2952@item @emph{Description} 2953This function tests if the host data specified by @var{a} and of length 2954@var{len} is present or not. If it is not present, then device memory 2955will be allocated and the host memory copied. The device address of 2956the newly allocated device memory is returned. 2957 2958In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2959a contiguous array section. The second form @var{a} specifies a variable or 2960array element and @var{len} specifies the length in bytes. 2961 2962Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for 2963backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead. 2964 2965@item @emph{C/C++}: 2966@multitable @columnfractions .20 .80 2967@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);} 2968@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);} 2969@end multitable 2970 2971@item @emph{Fortran}: 2972@multitable @columnfractions .20 .80 2973@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)} 2974@item @tab @code{type, dimension(:[,:]...) :: a} 2975@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)} 2976@item @tab @code{type, dimension(:[,:]...) :: a} 2977@item @tab @code{integer len} 2978@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)} 2979@item @tab @code{type, dimension(:[,:]...) :: a} 2980@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)} 2981@item @tab @code{type, dimension(:[,:]...) :: a} 2982@item @tab @code{integer len} 2983@end multitable 2984 2985@item @emph{Reference}: 2986@uref{https://www.openacc.org, OpenACC specification v2.6}, section 29873.2.20. 2988@end table 2989 2990 2991 2992@node acc_create 2993@section @code{acc_create} -- Allocate device memory and map it to host memory. 2994@table @asis 2995@item @emph{Description} 2996This function allocates device memory and maps it to host memory specified 2997by the host address @var{a} with a length of @var{len} bytes. In C/C++, 2998the function returns the device address of the allocated device memory. 2999 3000In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 3001a contiguous array section. The second form @var{a} specifies a variable or 3002array element and @var{len} specifies the length in bytes. 3003 3004@item @emph{C/C++}: 3005@multitable @columnfractions .20 .80 3006@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);} 3007@item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);} 3008@end multitable 3009 3010@item @emph{Fortran}: 3011@multitable @columnfractions .20 .80 3012@item @emph{Interface}: @tab @code{subroutine acc_create(a)} 3013@item @tab @code{type, dimension(:[,:]...) :: a} 3014@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)} 3015@item @tab @code{type, dimension(:[,:]...) :: a} 3016@item @tab @code{integer len} 3017@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)} 3018@item @tab @code{type, dimension(:[,:]...) :: a} 3019@item @tab @code{integer(acc_handle_kind) :: async} 3020@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)} 3021@item @tab @code{type, dimension(:[,:]...) :: a} 3022@item @tab @code{integer len} 3023@item @tab @code{integer(acc_handle_kind) :: async} 3024@end multitable 3025 3026@item @emph{Reference}: 3027@uref{https://www.openacc.org, OpenACC specification v2.6}, section 30283.2.21. 3029@end table 3030 3031 3032 3033@node acc_present_or_create 3034@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory. 3035@table @asis 3036@item @emph{Description} 3037This function tests if the host data specified by @var{a} and of length 3038@var{len} is present or not. If it is not present, then device memory 3039will be allocated and mapped to host memory. In C/C++, the device address 3040of the newly allocated device memory is returned. 3041 3042In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 3043a contiguous array section. The second form @var{a} specifies a variable or 3044array element and @var{len} specifies the length in bytes. 3045 3046Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for 3047backward compatibility with OpenACC 2.0; use @ref{acc_create} instead. 3048 3049@item @emph{C/C++}: 3050@multitable @columnfractions .20 .80 3051@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)} 3052@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)} 3053@end multitable 3054 3055@item @emph{Fortran}: 3056@multitable @columnfractions .20 .80 3057@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)} 3058@item @tab @code{type, dimension(:[,:]...) :: a} 3059@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)} 3060@item @tab @code{type, dimension(:[,:]...) :: a} 3061@item @tab @code{integer len} 3062@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)} 3063@item @tab @code{type, dimension(:[,:]...) :: a} 3064@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)} 3065@item @tab @code{type, dimension(:[,:]...) :: a} 3066@item @tab @code{integer len} 3067@end multitable 3068 3069@item @emph{Reference}: 3070@uref{https://www.openacc.org, OpenACC specification v2.6}, section 30713.2.21. 3072@end table 3073 3074 3075 3076@node acc_copyout 3077@section @code{acc_copyout} -- Copy device memory to host memory. 3078@table @asis 3079@item @emph{Description} 3080This function copies mapped device memory to host memory which is specified 3081by host address @var{a} for a length @var{len} bytes in C/C++. 3082 3083In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 3084a contiguous array section. The second form @var{a} specifies a variable or 3085array element and @var{len} specifies the length in bytes. 3086 3087@item @emph{C/C++}: 3088@multitable @columnfractions .20 .80 3089@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);} 3090@item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);} 3091@item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);} 3092@item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);} 3093@end multitable 3094 3095@item @emph{Fortran}: 3096@multitable @columnfractions .20 .80 3097@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)} 3098@item @tab @code{type, dimension(:[,:]...) :: a} 3099@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)} 3100@item @tab @code{type, dimension(:[,:]...) :: a} 3101@item @tab @code{integer len} 3102@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)} 3103@item @tab @code{type, dimension(:[,:]...) :: a} 3104@item @tab @code{integer(acc_handle_kind) :: async} 3105@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)} 3106@item @tab @code{type, dimension(:[,:]...) :: a} 3107@item @tab @code{integer len} 3108@item @tab @code{integer(acc_handle_kind) :: async} 3109@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)} 3110@item @tab @code{type, dimension(:[,:]...) :: a} 3111@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)} 3112@item @tab @code{type, dimension(:[,:]...) :: a} 3113@item @tab @code{integer len} 3114@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)} 3115@item @tab @code{type, dimension(:[,:]...) :: a} 3116@item @tab @code{integer(acc_handle_kind) :: async} 3117@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)} 3118@item @tab @code{type, dimension(:[,:]...) :: a} 3119@item @tab @code{integer len} 3120@item @tab @code{integer(acc_handle_kind) :: async} 3121@end multitable 3122 3123@item @emph{Reference}: 3124@uref{https://www.openacc.org, OpenACC specification v2.6}, section 31253.2.22. 3126@end table 3127 3128 3129 3130@node acc_delete 3131@section @code{acc_delete} -- Free device memory. 3132@table @asis 3133@item @emph{Description} 3134This function frees previously allocated device memory specified by 3135the device address @var{a} and the length of @var{len} bytes. 3136 3137In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 3138a contiguous array section. The second form @var{a} specifies a variable or 3139array element and @var{len} specifies the length in bytes. 3140 3141@item @emph{C/C++}: 3142@multitable @columnfractions .20 .80 3143@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);} 3144@item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);} 3145@item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);} 3146@item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);} 3147@end multitable 3148 3149@item @emph{Fortran}: 3150@multitable @columnfractions .20 .80 3151@item @emph{Interface}: @tab @code{subroutine acc_delete(a)} 3152@item @tab @code{type, dimension(:[,:]...) :: a} 3153@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)} 3154@item @tab @code{type, dimension(:[,:]...) :: a} 3155@item @tab @code{integer len} 3156@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)} 3157@item @tab @code{type, dimension(:[,:]...) :: a} 3158@item @tab @code{integer(acc_handle_kind) :: async} 3159@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)} 3160@item @tab @code{type, dimension(:[,:]...) :: a} 3161@item @tab @code{integer len} 3162@item @tab @code{integer(acc_handle_kind) :: async} 3163@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)} 3164@item @tab @code{type, dimension(:[,:]...) :: a} 3165@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)} 3166@item @tab @code{type, dimension(:[,:]...) :: a} 3167@item @tab @code{integer len} 3168@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)} 3169@item @tab @code{type, dimension(:[,:]...) :: a} 3170@item @tab @code{integer(acc_handle_kind) :: async} 3171@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)} 3172@item @tab @code{type, dimension(:[,:]...) :: a} 3173@item @tab @code{integer len} 3174@item @tab @code{integer(acc_handle_kind) :: async} 3175@end multitable 3176 3177@item @emph{Reference}: 3178@uref{https://www.openacc.org, OpenACC specification v2.6}, section 31793.2.23. 3180@end table 3181 3182 3183 3184@node acc_update_device 3185@section @code{acc_update_device} -- Update device memory from mapped host memory. 3186@table @asis 3187@item @emph{Description} 3188This function updates the device copy from the previously mapped host memory. 3189The host memory is specified with the host address @var{a} and a length of 3190@var{len} bytes. 3191 3192In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 3193a contiguous array section. The second form @var{a} specifies a variable or 3194array element and @var{len} specifies the length in bytes. 3195 3196@item @emph{C/C++}: 3197@multitable @columnfractions .20 .80 3198@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);} 3199@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);} 3200@end multitable 3201 3202@item @emph{Fortran}: 3203@multitable @columnfractions .20 .80 3204@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)} 3205@item @tab @code{type, dimension(:[,:]...) :: a} 3206@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)} 3207@item @tab @code{type, dimension(:[,:]...) :: a} 3208@item @tab @code{integer len} 3209@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)} 3210@item @tab @code{type, dimension(:[,:]...) :: a} 3211@item @tab @code{integer(acc_handle_kind) :: async} 3212@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)} 3213@item @tab @code{type, dimension(:[,:]...) :: a} 3214@item @tab @code{integer len} 3215@item @tab @code{integer(acc_handle_kind) :: async} 3216@end multitable 3217 3218@item @emph{Reference}: 3219@uref{https://www.openacc.org, OpenACC specification v2.6}, section 32203.2.24. 3221@end table 3222 3223 3224 3225@node acc_update_self 3226@section @code{acc_update_self} -- Update host memory from mapped device memory. 3227@table @asis 3228@item @emph{Description} 3229This function updates the host copy from the previously mapped device memory. 3230The host memory is specified with the host address @var{a} and a length of 3231@var{len} bytes. 3232 3233In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 3234a contiguous array section. The second form @var{a} specifies a variable or 3235array element and @var{len} specifies the length in bytes. 3236 3237@item @emph{C/C++}: 3238@multitable @columnfractions .20 .80 3239@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);} 3240@item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);} 3241@end multitable 3242 3243@item @emph{Fortran}: 3244@multitable @columnfractions .20 .80 3245@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)} 3246@item @tab @code{type, dimension(:[,:]...) :: a} 3247@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)} 3248@item @tab @code{type, dimension(:[,:]...) :: a} 3249@item @tab @code{integer len} 3250@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)} 3251@item @tab @code{type, dimension(:[,:]...) :: a} 3252@item @tab @code{integer(acc_handle_kind) :: async} 3253@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)} 3254@item @tab @code{type, dimension(:[,:]...) :: a} 3255@item @tab @code{integer len} 3256@item @tab @code{integer(acc_handle_kind) :: async} 3257@end multitable 3258 3259@item @emph{Reference}: 3260@uref{https://www.openacc.org, OpenACC specification v2.6}, section 32613.2.25. 3262@end table 3263 3264 3265 3266@node acc_map_data 3267@section @code{acc_map_data} -- Map previously allocated device memory to host memory. 3268@table @asis 3269@item @emph{Description} 3270This function maps previously allocated device and host memory. The device 3271memory is specified with the device address @var{d}. The host memory is 3272specified with the host address @var{h} and a length of @var{len}. 3273 3274@item @emph{C/C++}: 3275@multitable @columnfractions .20 .80 3276@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);} 3277@end multitable 3278 3279@item @emph{Reference}: 3280@uref{https://www.openacc.org, OpenACC specification v2.6}, section 32813.2.26. 3282@end table 3283 3284 3285 3286@node acc_unmap_data 3287@section @code{acc_unmap_data} -- Unmap device memory from host memory. 3288@table @asis 3289@item @emph{Description} 3290This function unmaps previously mapped device and host memory. The latter 3291specified by @var{h}. 3292 3293@item @emph{C/C++}: 3294@multitable @columnfractions .20 .80 3295@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);} 3296@end multitable 3297 3298@item @emph{Reference}: 3299@uref{https://www.openacc.org, OpenACC specification v2.6}, section 33003.2.27. 3301@end table 3302 3303 3304 3305@node acc_deviceptr 3306@section @code{acc_deviceptr} -- Get device pointer associated with specific host address. 3307@table @asis 3308@item @emph{Description} 3309This function returns the device address that has been mapped to the 3310host address specified by @var{h}. 3311 3312@item @emph{C/C++}: 3313@multitable @columnfractions .20 .80 3314@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);} 3315@end multitable 3316 3317@item @emph{Reference}: 3318@uref{https://www.openacc.org, OpenACC specification v2.6}, section 33193.2.28. 3320@end table 3321 3322 3323 3324@node acc_hostptr 3325@section @code{acc_hostptr} -- Get host pointer associated with specific device address. 3326@table @asis 3327@item @emph{Description} 3328This function returns the host address that has been mapped to the 3329device address specified by @var{d}. 3330 3331@item @emph{C/C++}: 3332@multitable @columnfractions .20 .80 3333@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);} 3334@end multitable 3335 3336@item @emph{Reference}: 3337@uref{https://www.openacc.org, OpenACC specification v2.6}, section 33383.2.29. 3339@end table 3340 3341 3342 3343@node acc_is_present 3344@section @code{acc_is_present} -- Indicate whether host variable / array is present on device. 3345@table @asis 3346@item @emph{Description} 3347This function indicates whether the specified host address in @var{a} and a 3348length of @var{len} bytes is present on the device. In C/C++, a non-zero 3349value is returned to indicate the presence of the mapped memory on the 3350device. A zero is returned to indicate the memory is not mapped on the 3351device. 3352 3353In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 3354a contiguous array section. The second form @var{a} specifies a variable or 3355array element and @var{len} specifies the length in bytes. If the host 3356memory is mapped to device memory, then a @code{true} is returned. Otherwise, 3357a @code{false} is return to indicate the mapped memory is not present. 3358 3359@item @emph{C/C++}: 3360@multitable @columnfractions .20 .80 3361@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);} 3362@end multitable 3363 3364@item @emph{Fortran}: 3365@multitable @columnfractions .20 .80 3366@item @emph{Interface}: @tab @code{function acc_is_present(a)} 3367@item @tab @code{type, dimension(:[,:]...) :: a} 3368@item @tab @code{logical acc_is_present} 3369@item @emph{Interface}: @tab @code{function acc_is_present(a, len)} 3370@item @tab @code{type, dimension(:[,:]...) :: a} 3371@item @tab @code{integer len} 3372@item @tab @code{logical acc_is_present} 3373@end multitable 3374 3375@item @emph{Reference}: 3376@uref{https://www.openacc.org, OpenACC specification v2.6}, section 33773.2.30. 3378@end table 3379 3380 3381 3382@node acc_memcpy_to_device 3383@section @code{acc_memcpy_to_device} -- Copy host memory to device memory. 3384@table @asis 3385@item @emph{Description} 3386This function copies host memory specified by host address of @var{src} to 3387device memory specified by the device address @var{dest} for a length of 3388@var{bytes} bytes. 3389 3390@item @emph{C/C++}: 3391@multitable @columnfractions .20 .80 3392@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);} 3393@end multitable 3394 3395@item @emph{Reference}: 3396@uref{https://www.openacc.org, OpenACC specification v2.6}, section 33973.2.31. 3398@end table 3399 3400 3401 3402@node acc_memcpy_from_device 3403@section @code{acc_memcpy_from_device} -- Copy device memory to host memory. 3404@table @asis 3405@item @emph{Description} 3406This function copies host memory specified by host address of @var{src} from 3407device memory specified by the device address @var{dest} for a length of 3408@var{bytes} bytes. 3409 3410@item @emph{C/C++}: 3411@multitable @columnfractions .20 .80 3412@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);} 3413@end multitable 3414 3415@item @emph{Reference}: 3416@uref{https://www.openacc.org, OpenACC specification v2.6}, section 34173.2.32. 3418@end table 3419 3420 3421 3422@node acc_attach 3423@section @code{acc_attach} -- Let device pointer point to device-pointer target. 3424@table @asis 3425@item @emph{Description} 3426This function updates a pointer on the device from pointing to a host-pointer 3427address to pointing to the corresponding device data. 3428 3429@item @emph{C/C++}: 3430@multitable @columnfractions .20 .80 3431@item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);} 3432@item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);} 3433@end multitable 3434 3435@item @emph{Reference}: 3436@uref{https://www.openacc.org, OpenACC specification v2.6}, section 34373.2.34. 3438@end table 3439 3440 3441 3442@node acc_detach 3443@section @code{acc_detach} -- Let device pointer point to host-pointer target. 3444@table @asis 3445@item @emph{Description} 3446This function updates a pointer on the device from pointing to a device-pointer 3447address to pointing to the corresponding host data. 3448 3449@item @emph{C/C++}: 3450@multitable @columnfractions .20 .80 3451@item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);} 3452@item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);} 3453@item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);} 3454@item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);} 3455@end multitable 3456 3457@item @emph{Reference}: 3458@uref{https://www.openacc.org, OpenACC specification v2.6}, section 34593.2.35. 3460@end table 3461 3462 3463 3464@node acc_get_current_cuda_device 3465@section @code{acc_get_current_cuda_device} -- Get CUDA device handle. 3466@table @asis 3467@item @emph{Description} 3468This function returns the CUDA device handle. This handle is the same 3469as used by the CUDA Runtime or Driver API's. 3470 3471@item @emph{C/C++}: 3472@multitable @columnfractions .20 .80 3473@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);} 3474@end multitable 3475 3476@item @emph{Reference}: 3477@uref{https://www.openacc.org, OpenACC specification v2.6}, section 3478A.2.1.1. 3479@end table 3480 3481 3482 3483@node acc_get_current_cuda_context 3484@section @code{acc_get_current_cuda_context} -- Get CUDA context handle. 3485@table @asis 3486@item @emph{Description} 3487This function returns the CUDA context handle. This handle is the same 3488as used by the CUDA Runtime or Driver API's. 3489 3490@item @emph{C/C++}: 3491@multitable @columnfractions .20 .80 3492@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);} 3493@end multitable 3494 3495@item @emph{Reference}: 3496@uref{https://www.openacc.org, OpenACC specification v2.6}, section 3497A.2.1.2. 3498@end table 3499 3500 3501 3502@node acc_get_cuda_stream 3503@section @code{acc_get_cuda_stream} -- Get CUDA stream handle. 3504@table @asis 3505@item @emph{Description} 3506This function returns the CUDA stream handle for the queue @var{async}. 3507This handle is the same as used by the CUDA Runtime or Driver API's. 3508 3509@item @emph{C/C++}: 3510@multitable @columnfractions .20 .80 3511@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);} 3512@end multitable 3513 3514@item @emph{Reference}: 3515@uref{https://www.openacc.org, OpenACC specification v2.6}, section 3516A.2.1.3. 3517@end table 3518 3519 3520 3521@node acc_set_cuda_stream 3522@section @code{acc_set_cuda_stream} -- Set CUDA stream handle. 3523@table @asis 3524@item @emph{Description} 3525This function associates the stream handle specified by @var{stream} with 3526the queue @var{async}. 3527 3528This cannot be used to change the stream handle associated with 3529@code{acc_async_sync}. 3530 3531The return value is not specified. 3532 3533@item @emph{C/C++}: 3534@multitable @columnfractions .20 .80 3535@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);} 3536@end multitable 3537 3538@item @emph{Reference}: 3539@uref{https://www.openacc.org, OpenACC specification v2.6}, section 3540A.2.1.4. 3541@end table 3542 3543 3544 3545@node acc_prof_register 3546@section @code{acc_prof_register} -- Register callbacks. 3547@table @asis 3548@item @emph{Description}: 3549This function registers callbacks. 3550 3551@item @emph{C/C++}: 3552@multitable @columnfractions .20 .80 3553@item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);} 3554@end multitable 3555 3556@item @emph{See also}: 3557@ref{OpenACC Profiling Interface} 3558 3559@item @emph{Reference}: 3560@uref{https://www.openacc.org, OpenACC specification v2.6}, section 35615.3. 3562@end table 3563 3564 3565 3566@node acc_prof_unregister 3567@section @code{acc_prof_unregister} -- Unregister callbacks. 3568@table @asis 3569@item @emph{Description}: 3570This function unregisters callbacks. 3571 3572@item @emph{C/C++}: 3573@multitable @columnfractions .20 .80 3574@item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);} 3575@end multitable 3576 3577@item @emph{See also}: 3578@ref{OpenACC Profiling Interface} 3579 3580@item @emph{Reference}: 3581@uref{https://www.openacc.org, OpenACC specification v2.6}, section 35825.3. 3583@end table 3584 3585 3586 3587@node acc_prof_lookup 3588@section @code{acc_prof_lookup} -- Obtain inquiry functions. 3589@table @asis 3590@item @emph{Description}: 3591Function to obtain inquiry functions. 3592 3593@item @emph{C/C++}: 3594@multitable @columnfractions .20 .80 3595@item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);} 3596@end multitable 3597 3598@item @emph{See also}: 3599@ref{OpenACC Profiling Interface} 3600 3601@item @emph{Reference}: 3602@uref{https://www.openacc.org, OpenACC specification v2.6}, section 36035.3. 3604@end table 3605 3606 3607 3608@node acc_register_library 3609@section @code{acc_register_library} -- Library registration. 3610@table @asis 3611@item @emph{Description}: 3612Function for library registration. 3613 3614@item @emph{C/C++}: 3615@multitable @columnfractions .20 .80 3616@item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);} 3617@end multitable 3618 3619@item @emph{See also}: 3620@ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB} 3621 3622@item @emph{Reference}: 3623@uref{https://www.openacc.org, OpenACC specification v2.6}, section 36245.3. 3625@end table 3626 3627 3628 3629@c --------------------------------------------------------------------- 3630@c OpenACC Environment Variables 3631@c --------------------------------------------------------------------- 3632 3633@node OpenACC Environment Variables 3634@chapter OpenACC Environment Variables 3635 3636The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} 3637are defined by section 4 of the OpenACC specification in version 2.0. 3638The variable @env{ACC_PROFLIB} 3639is defined by section 4 of the OpenACC specification in version 2.6. 3640The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes. 3641 3642@menu 3643* ACC_DEVICE_TYPE:: 3644* ACC_DEVICE_NUM:: 3645* ACC_PROFLIB:: 3646* GCC_ACC_NOTIFY:: 3647@end menu 3648 3649 3650 3651@node ACC_DEVICE_TYPE 3652@section @code{ACC_DEVICE_TYPE} 3653@table @asis 3654@item @emph{Reference}: 3655@uref{https://www.openacc.org, OpenACC specification v2.6}, section 36564.1. 3657@end table 3658 3659 3660 3661@node ACC_DEVICE_NUM 3662@section @code{ACC_DEVICE_NUM} 3663@table @asis 3664@item @emph{Reference}: 3665@uref{https://www.openacc.org, OpenACC specification v2.6}, section 36664.2. 3667@end table 3668 3669 3670 3671@node ACC_PROFLIB 3672@section @code{ACC_PROFLIB} 3673@table @asis 3674@item @emph{See also}: 3675@ref{acc_register_library}, @ref{OpenACC Profiling Interface} 3676 3677@item @emph{Reference}: 3678@uref{https://www.openacc.org, OpenACC specification v2.6}, section 36794.3. 3680@end table 3681 3682 3683 3684@node GCC_ACC_NOTIFY 3685@section @code{GCC_ACC_NOTIFY} 3686@table @asis 3687@item @emph{Description}: 3688Print debug information pertaining to the accelerator. 3689@end table 3690 3691 3692 3693@c --------------------------------------------------------------------- 3694@c CUDA Streams Usage 3695@c --------------------------------------------------------------------- 3696 3697@node CUDA Streams Usage 3698@chapter CUDA Streams Usage 3699 3700This applies to the @code{nvptx} plugin only. 3701 3702The library provides elements that perform asynchronous movement of 3703data and asynchronous operation of computing constructs. This 3704asynchronous functionality is implemented by making use of CUDA 3705streams@footnote{See "Stream Management" in "CUDA Driver API", 3706TRM-06703-001, Version 5.5, for additional information}. 3707 3708The primary means by that the asynchronous functionality is accessed 3709is through the use of those OpenACC directives which make use of the 3710@code{async} and @code{wait} clauses. When the @code{async} clause is 3711first used with a directive, it creates a CUDA stream. If an 3712@code{async-argument} is used with the @code{async} clause, then the 3713stream is associated with the specified @code{async-argument}. 3714 3715Following the creation of an association between a CUDA stream and the 3716@code{async-argument} of an @code{async} clause, both the @code{wait} 3717clause and the @code{wait} directive can be used. When either the 3718clause or directive is used after stream creation, it creates a 3719rendezvous point whereby execution waits until all operations 3720associated with the @code{async-argument}, that is, stream, have 3721completed. 3722 3723Normally, the management of the streams that are created as a result of 3724using the @code{async} clause, is done without any intervention by the 3725caller. This implies the association between the @code{async-argument} 3726and the CUDA stream will be maintained for the lifetime of the program. 3727However, this association can be changed through the use of the library 3728function @code{acc_set_cuda_stream}. When the function 3729@code{acc_set_cuda_stream} is called, the CUDA stream that was 3730originally associated with the @code{async} clause will be destroyed. 3731Caution should be taken when changing the association as subsequent 3732references to the @code{async-argument} refer to a different 3733CUDA stream. 3734 3735 3736 3737@c --------------------------------------------------------------------- 3738@c OpenACC Library Interoperability 3739@c --------------------------------------------------------------------- 3740 3741@node OpenACC Library Interoperability 3742@chapter OpenACC Library Interoperability 3743 3744@section Introduction 3745 3746The OpenACC library uses the CUDA Driver API, and may interact with 3747programs that use the Runtime library directly, or another library 3748based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26, 3749"Interactions with the CUDA Driver API" in 3750"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU 3751Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5, 3752for additional information on library interoperability.}. 3753This chapter describes the use cases and what changes are 3754required in order to use both the OpenACC library and the CUBLAS and Runtime 3755libraries within a program. 3756 3757@section First invocation: NVIDIA CUBLAS library API 3758 3759In this first use case (see below), a function in the CUBLAS library is called 3760prior to any of the functions in the OpenACC library. More specifically, the 3761function @code{cublasCreate()}. 3762 3763When invoked, the function initializes the library and allocates the 3764hardware resources on the host and the device on behalf of the caller. Once 3765the initialization and allocation has completed, a handle is returned to the 3766caller. The OpenACC library also requires initialization and allocation of 3767hardware resources. Since the CUBLAS library has already allocated the 3768hardware resources for the device, all that is left to do is to initialize 3769the OpenACC library and acquire the hardware resources on the host. 3770 3771Prior to calling the OpenACC function that initializes the library and 3772allocate the host hardware resources, you need to acquire the device number 3773that was allocated during the call to @code{cublasCreate()}. The invoking of the 3774runtime library function @code{cudaGetDevice()} accomplishes this. Once 3775acquired, the device number is passed along with the device type as 3776parameters to the OpenACC library function @code{acc_set_device_num()}. 3777 3778Once the call to @code{acc_set_device_num()} has completed, the OpenACC 3779library uses the context that was created during the call to 3780@code{cublasCreate()}. In other words, both libraries will be sharing the 3781same context. 3782 3783@smallexample 3784 /* Create the handle */ 3785 s = cublasCreate(&h); 3786 if (s != CUBLAS_STATUS_SUCCESS) 3787 @{ 3788 fprintf(stderr, "cublasCreate failed %d\n", s); 3789 exit(EXIT_FAILURE); 3790 @} 3791 3792 /* Get the device number */ 3793 e = cudaGetDevice(&dev); 3794 if (e != cudaSuccess) 3795 @{ 3796 fprintf(stderr, "cudaGetDevice failed %d\n", e); 3797 exit(EXIT_FAILURE); 3798 @} 3799 3800 /* Initialize OpenACC library and use device 'dev' */ 3801 acc_set_device_num(dev, acc_device_nvidia); 3802 3803@end smallexample 3804@center Use Case 1 3805 3806@section First invocation: OpenACC library API 3807 3808In this second use case (see below), a function in the OpenACC library is 3809called prior to any of the functions in the CUBLAS library. More specificially, 3810the function @code{acc_set_device_num()}. 3811 3812In the use case presented here, the function @code{acc_set_device_num()} 3813is used to both initialize the OpenACC library and allocate the hardware 3814resources on the host and the device. In the call to the function, the 3815call parameters specify which device to use and what device 3816type to use, i.e., @code{acc_device_nvidia}. It should be noted that this 3817is but one method to initialize the OpenACC library and allocate the 3818appropriate hardware resources. Other methods are available through the 3819use of environment variables and these will be discussed in the next section. 3820 3821Once the call to @code{acc_set_device_num()} has completed, other OpenACC 3822functions can be called as seen with multiple calls being made to 3823@code{acc_copyin()}. In addition, calls can be made to functions in the 3824CUBLAS library. In the use case a call to @code{cublasCreate()} is made 3825subsequent to the calls to @code{acc_copyin()}. 3826As seen in the previous use case, a call to @code{cublasCreate()} 3827initializes the CUBLAS library and allocates the hardware resources on the 3828host and the device. However, since the device has already been allocated, 3829@code{cublasCreate()} will only initialize the CUBLAS library and allocate 3830the appropriate hardware resources on the host. The context that was created 3831as part of the OpenACC initialization is shared with the CUBLAS library, 3832similarly to the first use case. 3833 3834@smallexample 3835 dev = 0; 3836 3837 acc_set_device_num(dev, acc_device_nvidia); 3838 3839 /* Copy the first set to the device */ 3840 d_X = acc_copyin(&h_X[0], N * sizeof (float)); 3841 if (d_X == NULL) 3842 @{ 3843 fprintf(stderr, "copyin error h_X\n"); 3844 exit(EXIT_FAILURE); 3845 @} 3846 3847 /* Copy the second set to the device */ 3848 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float)); 3849 if (d_Y == NULL) 3850 @{ 3851 fprintf(stderr, "copyin error h_Y1\n"); 3852 exit(EXIT_FAILURE); 3853 @} 3854 3855 /* Create the handle */ 3856 s = cublasCreate(&h); 3857 if (s != CUBLAS_STATUS_SUCCESS) 3858 @{ 3859 fprintf(stderr, "cublasCreate failed %d\n", s); 3860 exit(EXIT_FAILURE); 3861 @} 3862 3863 /* Perform saxpy using CUBLAS library function */ 3864 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1); 3865 if (s != CUBLAS_STATUS_SUCCESS) 3866 @{ 3867 fprintf(stderr, "cublasSaxpy failed %d\n", s); 3868 exit(EXIT_FAILURE); 3869 @} 3870 3871 /* Copy the results from the device */ 3872 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float)); 3873 3874@end smallexample 3875@center Use Case 2 3876 3877@section OpenACC library and environment variables 3878 3879There are two environment variables associated with the OpenACC library 3880that may be used to control the device type and device number: 3881@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respectively. These two 3882environment variables can be used as an alternative to calling 3883@code{acc_set_device_num()}. As seen in the second use case, the device 3884type and device number were specified using @code{acc_set_device_num()}. 3885If however, the aforementioned environment variables were set, then the 3886call to @code{acc_set_device_num()} would not be required. 3887 3888 3889The use of the environment variables is only relevant when an OpenACC function 3890is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()} 3891is called prior to a call to an OpenACC function, then you must call 3892@code{acc_set_device_num()}@footnote{More complete information 3893about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in 3894sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC} 3895Application Programming Interface”, Version 2.6.} 3896 3897 3898 3899@c --------------------------------------------------------------------- 3900@c OpenACC Profiling Interface 3901@c --------------------------------------------------------------------- 3902 3903@node OpenACC Profiling Interface 3904@chapter OpenACC Profiling Interface 3905 3906@section Implementation Status and Implementation-Defined Behavior 3907 3908We're implementing the OpenACC Profiling Interface as defined by the 3909OpenACC 2.6 specification. We're clarifying some aspects here as 3910@emph{implementation-defined behavior}, while they're still under 3911discussion within the OpenACC Technical Committee. 3912 3913This implementation is tuned to keep the performance impact as low as 3914possible for the (very common) case that the Profiling Interface is 3915not enabled. This is relevant, as the Profiling Interface affects all 3916the @emph{hot} code paths (in the target code, not in the offloaded 3917code). Users of the OpenACC Profiling Interface can be expected to 3918understand that performance will be impacted to some degree once the 3919Profiling Interface has gotten enabled: for example, because of the 3920@emph{runtime} (libgomp) calling into a third-party @emph{library} for 3921every event that has been registered. 3922 3923We're not yet accounting for the fact that @cite{OpenACC events may 3924occur during event processing}. 3925We just handle one case specially, as required by CUDA 9.0 3926@command{nvprof}, that @code{acc_get_device_type} 3927(@ref{acc_get_device_type})) may be called from 3928@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end} 3929callbacks. 3930 3931We're not yet implementing initialization via a 3932@code{acc_register_library} function that is either statically linked 3933in, or dynamically via @env{LD_PRELOAD}. 3934Initialization via @code{acc_register_library} functions dynamically 3935loaded via the @env{ACC_PROFLIB} environment variable does work, as 3936does directly calling @code{acc_prof_register}, 3937@code{acc_prof_unregister}, @code{acc_prof_lookup}. 3938 3939As currently there are no inquiry functions defined, calls to 3940@code{acc_prof_lookup} will always return @code{NULL}. 3941 3942There aren't separate @emph{start}, @emph{stop} events defined for the 3943event types @code{acc_ev_create}, @code{acc_ev_delete}, 3944@code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these 3945should be triggered before or after the actual device-specific call is 3946made. We trigger them after. 3947 3948Remarks about data provided to callbacks: 3949 3950@table @asis 3951 3952@item @code{acc_prof_info.event_type} 3953It's not clear if for @emph{nested} event callbacks (for example, 3954@code{acc_ev_enqueue_launch_start} as part of a parent compute 3955construct), this should be set for the nested event 3956(@code{acc_ev_enqueue_launch_start}), or if the value of the parent 3957construct should remain (@code{acc_ev_compute_construct_start}). In 3958this implementation, the value will generally correspond to the 3959innermost nested event type. 3960 3961@item @code{acc_prof_info.device_type} 3962@itemize 3963 3964@item 3965For @code{acc_ev_compute_construct_start}, and in presence of an 3966@code{if} clause with @emph{false} argument, this will still refer to 3967the offloading device type. 3968It's not clear if that's the expected behavior. 3969 3970@item 3971Complementary to the item before, for 3972@code{acc_ev_compute_construct_end}, this is set to 3973@code{acc_device_host} in presence of an @code{if} clause with 3974@emph{false} argument. 3975It's not clear if that's the expected behavior. 3976 3977@end itemize 3978 3979@item @code{acc_prof_info.thread_id} 3980Always @code{-1}; not yet implemented. 3981 3982@item @code{acc_prof_info.async} 3983@itemize 3984 3985@item 3986Not yet implemented correctly for 3987@code{acc_ev_compute_construct_start}. 3988 3989@item 3990In a compute construct, for host-fallback 3991execution/@code{acc_device_host} it will always be 3992@code{acc_async_sync}. 3993It's not clear if that's the expected behavior. 3994 3995@item 3996For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}, 3997it will always be @code{acc_async_sync}. 3998It's not clear if that's the expected behavior. 3999 4000@end itemize 4001 4002@item @code{acc_prof_info.async_queue} 4003There is no @cite{limited number of asynchronous queues} in libgomp. 4004This will always have the same value as @code{acc_prof_info.async}. 4005 4006@item @code{acc_prof_info.src_file} 4007Always @code{NULL}; not yet implemented. 4008 4009@item @code{acc_prof_info.func_name} 4010Always @code{NULL}; not yet implemented. 4011 4012@item @code{acc_prof_info.line_no} 4013Always @code{-1}; not yet implemented. 4014 4015@item @code{acc_prof_info.end_line_no} 4016Always @code{-1}; not yet implemented. 4017 4018@item @code{acc_prof_info.func_line_no} 4019Always @code{-1}; not yet implemented. 4020 4021@item @code{acc_prof_info.func_end_line_no} 4022Always @code{-1}; not yet implemented. 4023 4024@item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type} 4025Relating to @code{acc_prof_info.event_type} discussed above, in this 4026implementation, this will always be the same value as 4027@code{acc_prof_info.event_type}. 4028 4029@item @code{acc_event_info.*.parent_construct} 4030@itemize 4031 4032@item 4033Will be @code{acc_construct_parallel} for all OpenACC compute 4034constructs as well as many OpenACC Runtime API calls; should be the 4035one matching the actual construct, or 4036@code{acc_construct_runtime_api}, respectively. 4037 4038@item 4039Will be @code{acc_construct_enter_data} or 4040@code{acc_construct_exit_data} when processing variable mappings 4041specified in OpenACC @emph{declare} directives; should be 4042@code{acc_construct_declare}. 4043 4044@item 4045For implicit @code{acc_ev_device_init_start}, 4046@code{acc_ev_device_init_end}, and explicit as well as implicit 4047@code{acc_ev_alloc}, @code{acc_ev_free}, 4048@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}, 4049@code{acc_ev_enqueue_download_start}, and 4050@code{acc_ev_enqueue_download_end}, will be 4051@code{acc_construct_parallel}; should reflect the real parent 4052construct. 4053 4054@end itemize 4055 4056@item @code{acc_event_info.*.implicit} 4057For @code{acc_ev_alloc}, @code{acc_ev_free}, 4058@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}, 4059@code{acc_ev_enqueue_download_start}, and 4060@code{acc_ev_enqueue_download_end}, this currently will be @code{1} 4061also for explicit usage. 4062 4063@item @code{acc_event_info.data_event.var_name} 4064Always @code{NULL}; not yet implemented. 4065 4066@item @code{acc_event_info.data_event.host_ptr} 4067For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always 4068@code{NULL}. 4069 4070@item @code{typedef union acc_api_info} 4071@dots{} as printed in @cite{5.2.3. Third Argument: API-Specific 4072Information}. This should obviously be @code{typedef @emph{struct} 4073acc_api_info}. 4074 4075@item @code{acc_api_info.device_api} 4076Possibly not yet implemented correctly for 4077@code{acc_ev_compute_construct_start}, 4078@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}: 4079will always be @code{acc_device_api_none} for these event types. 4080For @code{acc_ev_enter_data_start}, it will be 4081@code{acc_device_api_none} in some cases. 4082 4083@item @code{acc_api_info.device_type} 4084Always the same as @code{acc_prof_info.device_type}. 4085 4086@item @code{acc_api_info.vendor} 4087Always @code{-1}; not yet implemented. 4088 4089@item @code{acc_api_info.device_handle} 4090Always @code{NULL}; not yet implemented. 4091 4092@item @code{acc_api_info.context_handle} 4093Always @code{NULL}; not yet implemented. 4094 4095@item @code{acc_api_info.async_handle} 4096Always @code{NULL}; not yet implemented. 4097 4098@end table 4099 4100Remarks about certain event types: 4101 4102@table @asis 4103 4104@item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end} 4105@itemize 4106 4107@item 4108@c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in 4109@c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c', 4110@c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'. 4111When a compute construct triggers implicit 4112@code{acc_ev_device_init_start} and @code{acc_ev_device_init_end} 4113events, they currently aren't @emph{nested within} the corresponding 4114@code{acc_ev_compute_construct_start} and 4115@code{acc_ev_compute_construct_end}, but they're currently observed 4116@emph{before} @code{acc_ev_compute_construct_start}. 4117It's not clear what to do: the standard asks us provide a lot of 4118details to the @code{acc_ev_compute_construct_start} callback, without 4119(implicitly) initializing a device before? 4120 4121@item 4122Callbacks for these event types will not be invoked for calls to the 4123@code{acc_set_device_type} and @code{acc_set_device_num} functions. 4124It's not clear if they should be. 4125 4126@end itemize 4127 4128@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end} 4129@itemize 4130 4131@item 4132Callbacks for these event types will also be invoked for OpenACC 4133@emph{host_data} constructs. 4134It's not clear if they should be. 4135 4136@item 4137Callbacks for these event types will also be invoked when processing 4138variable mappings specified in OpenACC @emph{declare} directives. 4139It's not clear if they should be. 4140 4141@end itemize 4142 4143@end table 4144 4145Callbacks for the following event types will be invoked, but dispatch 4146and information provided therein has not yet been thoroughly reviewed: 4147 4148@itemize 4149@item @code{acc_ev_alloc} 4150@item @code{acc_ev_free} 4151@item @code{acc_ev_update_start}, @code{acc_ev_update_end} 4152@item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end} 4153@item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end} 4154@end itemize 4155 4156During device initialization, and finalization, respectively, 4157callbacks for the following event types will not yet be invoked: 4158 4159@itemize 4160@item @code{acc_ev_alloc} 4161@item @code{acc_ev_free} 4162@end itemize 4163 4164Callbacks for the following event types have not yet been implemented, 4165so currently won't be invoked: 4166 4167@itemize 4168@item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end} 4169@item @code{acc_ev_runtime_shutdown} 4170@item @code{acc_ev_create}, @code{acc_ev_delete} 4171@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end} 4172@end itemize 4173 4174For the following runtime library functions, not all expected 4175callbacks will be invoked (mostly concerning implicit device 4176initialization): 4177 4178@itemize 4179@item @code{acc_get_num_devices} 4180@item @code{acc_set_device_type} 4181@item @code{acc_get_device_type} 4182@item @code{acc_set_device_num} 4183@item @code{acc_get_device_num} 4184@item @code{acc_init} 4185@item @code{acc_shutdown} 4186@end itemize 4187 4188Aside from implicit device initialization, for the following runtime 4189library functions, no callbacks will be invoked for shared-memory 4190offloading devices (it's not clear if they should be): 4191 4192@itemize 4193@item @code{acc_malloc} 4194@item @code{acc_free} 4195@item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async} 4196@item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async} 4197@item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async} 4198@item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async} 4199@item @code{acc_update_device}, @code{acc_update_device_async} 4200@item @code{acc_update_self}, @code{acc_update_self_async} 4201@item @code{acc_map_data}, @code{acc_unmap_data} 4202@item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async} 4203@item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async} 4204@end itemize 4205 4206 4207 4208@c --------------------------------------------------------------------- 4209@c The libgomp ABI 4210@c --------------------------------------------------------------------- 4211 4212@node The libgomp ABI 4213@chapter The libgomp ABI 4214 4215The following sections present notes on the external ABI as 4216presented by libgomp. Only maintainers should need them. 4217 4218@menu 4219* Implementing MASTER construct:: 4220* Implementing CRITICAL construct:: 4221* Implementing ATOMIC construct:: 4222* Implementing FLUSH construct:: 4223* Implementing BARRIER construct:: 4224* Implementing THREADPRIVATE construct:: 4225* Implementing PRIVATE clause:: 4226* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses:: 4227* Implementing REDUCTION clause:: 4228* Implementing PARALLEL construct:: 4229* Implementing FOR construct:: 4230* Implementing ORDERED construct:: 4231* Implementing SECTIONS construct:: 4232* Implementing SINGLE construct:: 4233* Implementing OpenACC's PARALLEL construct:: 4234@end menu 4235 4236 4237@node Implementing MASTER construct 4238@section Implementing MASTER construct 4239 4240@smallexample 4241if (omp_get_thread_num () == 0) 4242 block 4243@end smallexample 4244 4245Alternately, we generate two copies of the parallel subfunction 4246and only include this in the version run by the primary thread. 4247Surely this is not worthwhile though... 4248 4249 4250 4251@node Implementing CRITICAL construct 4252@section Implementing CRITICAL construct 4253 4254Without a specified name, 4255 4256@smallexample 4257 void GOMP_critical_start (void); 4258 void GOMP_critical_end (void); 4259@end smallexample 4260 4261so that we don't get COPY relocations from libgomp to the main 4262application. 4263 4264With a specified name, use omp_set_lock and omp_unset_lock with 4265name being transformed into a variable declared like 4266 4267@smallexample 4268 omp_lock_t gomp_critical_user_<name> __attribute__((common)) 4269@end smallexample 4270 4271Ideally the ABI would specify that all zero is a valid unlocked 4272state, and so we wouldn't need to initialize this at 4273startup. 4274 4275 4276 4277@node Implementing ATOMIC construct 4278@section Implementing ATOMIC construct 4279 4280The target should implement the @code{__sync} builtins. 4281 4282Failing that we could add 4283 4284@smallexample 4285 void GOMP_atomic_enter (void) 4286 void GOMP_atomic_exit (void) 4287@end smallexample 4288 4289which reuses the regular lock code, but with yet another lock 4290object private to the library. 4291 4292 4293 4294@node Implementing FLUSH construct 4295@section Implementing FLUSH construct 4296 4297Expands to the @code{__sync_synchronize} builtin. 4298 4299 4300 4301@node Implementing BARRIER construct 4302@section Implementing BARRIER construct 4303 4304@smallexample 4305 void GOMP_barrier (void) 4306@end smallexample 4307 4308 4309@node Implementing THREADPRIVATE construct 4310@section Implementing THREADPRIVATE construct 4311 4312In _most_ cases we can map this directly to @code{__thread}. Except 4313that OMP allows constructors for C++ objects. We can either 4314refuse to support this (how often is it used?) or we can 4315implement something akin to .ctors. 4316 4317Even more ideally, this ctor feature is handled by extensions 4318to the main pthreads library. Failing that, we can have a set 4319of entry points to register ctor functions to be called. 4320 4321 4322 4323@node Implementing PRIVATE clause 4324@section Implementing PRIVATE clause 4325 4326In association with a PARALLEL, or within the lexical extent 4327of a PARALLEL block, the variable becomes a local variable in 4328the parallel subfunction. 4329 4330In association with FOR or SECTIONS blocks, create a new 4331automatic variable within the current function. This preserves 4332the semantic of new variable creation. 4333 4334 4335 4336@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses 4337@section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses 4338 4339This seems simple enough for PARALLEL blocks. Create a private 4340struct for communicating between the parent and subfunction. 4341In the parent, copy in values for scalar and "small" structs; 4342copy in addresses for others TREE_ADDRESSABLE types. In the 4343subfunction, copy the value into the local variable. 4344 4345It is not clear what to do with bare FOR or SECTION blocks. 4346The only thing I can figure is that we do something like: 4347 4348@smallexample 4349#pragma omp for firstprivate(x) lastprivate(y) 4350for (int i = 0; i < n; ++i) 4351 body; 4352@end smallexample 4353 4354which becomes 4355 4356@smallexample 4357@{ 4358 int x = x, y; 4359 4360 // for stuff 4361 4362 if (i == n) 4363 y = y; 4364@} 4365@end smallexample 4366 4367where the "x=x" and "y=y" assignments actually have different 4368uids for the two variables, i.e. not something you could write 4369directly in C. Presumably this only makes sense if the "outer" 4370x and y are global variables. 4371 4372COPYPRIVATE would work the same way, except the structure 4373broadcast would have to happen via SINGLE machinery instead. 4374 4375 4376 4377@node Implementing REDUCTION clause 4378@section Implementing REDUCTION clause 4379 4380The private struct mentioned in the previous section should have 4381a pointer to an array of the type of the variable, indexed by the 4382thread's @var{team_id}. The thread stores its final value into the 4383array, and after the barrier, the primary thread iterates over the 4384array to collect the values. 4385 4386 4387@node Implementing PARALLEL construct 4388@section Implementing PARALLEL construct 4389 4390@smallexample 4391 #pragma omp parallel 4392 @{ 4393 body; 4394 @} 4395@end smallexample 4396 4397becomes 4398 4399@smallexample 4400 void subfunction (void *data) 4401 @{ 4402 use data; 4403 body; 4404 @} 4405 4406 setup data; 4407 GOMP_parallel_start (subfunction, &data, num_threads); 4408 subfunction (&data); 4409 GOMP_parallel_end (); 4410@end smallexample 4411 4412@smallexample 4413 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads) 4414@end smallexample 4415 4416The @var{FN} argument is the subfunction to be run in parallel. 4417 4418The @var{DATA} argument is a pointer to a structure used to 4419communicate data in and out of the subfunction, as discussed 4420above with respect to FIRSTPRIVATE et al. 4421 4422The @var{NUM_THREADS} argument is 1 if an IF clause is present 4423and false, or the value of the NUM_THREADS clause, if 4424present, or 0. 4425 4426The function needs to create the appropriate number of 4427threads and/or launch them from the dock. It needs to 4428create the team structure and assign team ids. 4429 4430@smallexample 4431 void GOMP_parallel_end (void) 4432@end smallexample 4433 4434Tears down the team and returns us to the previous @code{omp_in_parallel()} state. 4435 4436 4437 4438@node Implementing FOR construct 4439@section Implementing FOR construct 4440 4441@smallexample 4442 #pragma omp parallel for 4443 for (i = lb; i <= ub; i++) 4444 body; 4445@end smallexample 4446 4447becomes 4448 4449@smallexample 4450 void subfunction (void *data) 4451 @{ 4452 long _s0, _e0; 4453 while (GOMP_loop_static_next (&_s0, &_e0)) 4454 @{ 4455 long _e1 = _e0, i; 4456 for (i = _s0; i < _e1; i++) 4457 body; 4458 @} 4459 GOMP_loop_end_nowait (); 4460 @} 4461 4462 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0); 4463 subfunction (NULL); 4464 GOMP_parallel_end (); 4465@end smallexample 4466 4467@smallexample 4468 #pragma omp for schedule(runtime) 4469 for (i = 0; i < n; i++) 4470 body; 4471@end smallexample 4472 4473becomes 4474 4475@smallexample 4476 @{ 4477 long i, _s0, _e0; 4478 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0)) 4479 do @{ 4480 long _e1 = _e0; 4481 for (i = _s0, i < _e0; i++) 4482 body; 4483 @} while (GOMP_loop_runtime_next (&_s0, _&e0)); 4484 GOMP_loop_end (); 4485 @} 4486@end smallexample 4487 4488Note that while it looks like there is trickiness to propagating 4489a non-constant STEP, there isn't really. We're explicitly allowed 4490to evaluate it as many times as we want, and any variables involved 4491should automatically be handled as PRIVATE or SHARED like any other 4492variables. So the expression should remain evaluable in the 4493subfunction. We can also pull it into a local variable if we like, 4494but since its supposed to remain unchanged, we can also not if we like. 4495 4496If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be 4497able to get away with no work-sharing context at all, since we can 4498simply perform the arithmetic directly in each thread to divide up 4499the iterations. Which would mean that we wouldn't need to call any 4500of these routines. 4501 4502There are separate routines for handling loops with an ORDERED 4503clause. Bookkeeping for that is non-trivial... 4504 4505 4506 4507@node Implementing ORDERED construct 4508@section Implementing ORDERED construct 4509 4510@smallexample 4511 void GOMP_ordered_start (void) 4512 void GOMP_ordered_end (void) 4513@end smallexample 4514 4515 4516 4517@node Implementing SECTIONS construct 4518@section Implementing SECTIONS construct 4519 4520A block as 4521 4522@smallexample 4523 #pragma omp sections 4524 @{ 4525 #pragma omp section 4526 stmt1; 4527 #pragma omp section 4528 stmt2; 4529 #pragma omp section 4530 stmt3; 4531 @} 4532@end smallexample 4533 4534becomes 4535 4536@smallexample 4537 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ()) 4538 switch (i) 4539 @{ 4540 case 1: 4541 stmt1; 4542 break; 4543 case 2: 4544 stmt2; 4545 break; 4546 case 3: 4547 stmt3; 4548 break; 4549 @} 4550 GOMP_barrier (); 4551@end smallexample 4552 4553 4554@node Implementing SINGLE construct 4555@section Implementing SINGLE construct 4556 4557A block like 4558 4559@smallexample 4560 #pragma omp single 4561 @{ 4562 body; 4563 @} 4564@end smallexample 4565 4566becomes 4567 4568@smallexample 4569 if (GOMP_single_start ()) 4570 body; 4571 GOMP_barrier (); 4572@end smallexample 4573 4574while 4575 4576@smallexample 4577 #pragma omp single copyprivate(x) 4578 body; 4579@end smallexample 4580 4581becomes 4582 4583@smallexample 4584 datap = GOMP_single_copy_start (); 4585 if (datap == NULL) 4586 @{ 4587 body; 4588 data.x = x; 4589 GOMP_single_copy_end (&data); 4590 @} 4591 else 4592 x = datap->x; 4593 GOMP_barrier (); 4594@end smallexample 4595 4596 4597 4598@node Implementing OpenACC's PARALLEL construct 4599@section Implementing OpenACC's PARALLEL construct 4600 4601@smallexample 4602 void GOACC_parallel () 4603@end smallexample 4604 4605 4606 4607@c --------------------------------------------------------------------- 4608@c Reporting Bugs 4609@c --------------------------------------------------------------------- 4610 4611@node Reporting Bugs 4612@chapter Reporting Bugs 4613 4614Bugs in the GNU Offloading and Multi Processing Runtime Library should 4615be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add 4616"openacc", or "openmp", or both to the keywords field in the bug 4617report, as appropriate. 4618 4619 4620 4621@c --------------------------------------------------------------------- 4622@c GNU General Public License 4623@c --------------------------------------------------------------------- 4624 4625@include gpl_v3.texi 4626 4627 4628 4629@c --------------------------------------------------------------------- 4630@c GNU Free Documentation License 4631@c --------------------------------------------------------------------- 4632 4633@include fdl.texi 4634 4635 4636 4637@c --------------------------------------------------------------------- 4638@c Funding Free Software 4639@c --------------------------------------------------------------------- 4640 4641@include funding.texi 4642 4643@c --------------------------------------------------------------------- 4644@c Index 4645@c --------------------------------------------------------------------- 4646 4647@node Library Index 4648@unnumbered Library Index 4649 4650@printindex cp 4651 4652@bye 4653