1\input texinfo @c -*-texinfo-*- 2 3@c %**start of header 4@setfilename libgomp.info 5@settitle GNU libgomp 6@c %**end of header 7 8 9@copying 10Copyright @copyright{} 2006-2019 Free Software Foundation, Inc. 11 12Permission is granted to copy, distribute and/or modify this document 13under the terms of the GNU Free Documentation License, Version 1.3 or 14any later version published by the Free Software Foundation; with the 15Invariant Sections being ``Funding Free Software'', the Front-Cover 16texts being (a) (see below), and with the Back-Cover Texts being (b) 17(see below). A copy of the license is included in the section entitled 18``GNU Free Documentation License''. 19 20(a) The FSF's Front-Cover Text is: 21 22 A GNU Manual 23 24(b) The FSF's Back-Cover Text is: 25 26 You have freedom to copy and modify this GNU Manual, like GNU 27 software. Copies published by the Free Software Foundation raise 28 funds for GNU development. 29@end copying 30 31@ifinfo 32@dircategory GNU Libraries 33@direntry 34* libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library. 35@end direntry 36 37This manual documents libgomp, the GNU Offloading and Multi Processing 38Runtime library. This is the GNU implementation of the OpenMP and 39OpenACC APIs for parallel and accelerator programming in C/C++ and 40Fortran. 41 42Published by the Free Software Foundation 4351 Franklin Street, Fifth Floor 44Boston, MA 02110-1301 USA 45 46@insertcopying 47@end ifinfo 48 49 50@setchapternewpage odd 51 52@titlepage 53@title GNU Offloading and Multi Processing Runtime Library 54@subtitle The GNU OpenMP and OpenACC Implementation 55@page 56@vskip 0pt plus 1filll 57@comment For the @value{version-GCC} Version* 58@sp 1 59Published by the Free Software Foundation @* 6051 Franklin Street, Fifth Floor@* 61Boston, MA 02110-1301, USA@* 62@sp 1 63@insertcopying 64@end titlepage 65 66@summarycontents 67@contents 68@page 69 70 71@node Top 72@top Introduction 73@cindex Introduction 74 75This manual documents the usage of libgomp, the GNU Offloading and 76Multi Processing Runtime Library. This includes the GNU 77implementation of the @uref{https://www.openmp.org, OpenMP} Application 78Programming Interface (API) for multi-platform shared-memory parallel 79programming in C/C++ and Fortran, and the GNU implementation of the 80@uref{https://www.openacc.org, OpenACC} Application Programming 81Interface (API) for offloading of code to accelerator devices in C/C++ 82and Fortran. 83 84Originally, libgomp implemented the GNU OpenMP Runtime Library. Based 85on this, support for OpenACC and offloading (both OpenACC and OpenMP 864's target construct) has been added later on, and the library's name 87changed to GNU Offloading and Multi Processing Runtime Library. 88 89 90 91@comment 92@comment When you add a new menu item, please keep the right hand 93@comment aligned to the same column. Do not use tabs. This provides 94@comment better formatting. 95@comment 96@menu 97* Enabling OpenMP:: How to enable OpenMP for your applications. 98* OpenMP Runtime Library Routines: Runtime Library Routines. 99 The OpenMP runtime application programming 100 interface. 101* OpenMP Environment Variables: Environment Variables. 102 Influencing OpenMP runtime behavior with 103 environment variables. 104* Enabling OpenACC:: How to enable OpenACC for your 105 applications. 106* OpenACC Runtime Library Routines:: The OpenACC runtime application 107 programming interface. 108* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with 109 environment variables. 110* CUDA Streams Usage:: Notes on the implementation of 111 asynchronous operations. 112* OpenACC Library Interoperability:: OpenACC library interoperability with the 113 NVIDIA CUBLAS library. 114* The libgomp ABI:: Notes on the external ABI presented by libgomp. 115* Reporting Bugs:: How to report bugs in the GNU Offloading and 116 Multi Processing Runtime Library. 117* Copying:: GNU general public license says 118 how you can copy and share libgomp. 119* GNU Free Documentation License:: 120 How you can copy and share this manual. 121* Funding:: How to help assure continued work for free 122 software. 123* Library Index:: Index of this documentation. 124@end menu 125 126 127@c --------------------------------------------------------------------- 128@c Enabling OpenMP 129@c --------------------------------------------------------------------- 130 131@node Enabling OpenMP 132@chapter Enabling OpenMP 133 134To activate the OpenMP extensions for C/C++ and Fortran, the compile-time 135flag @command{-fopenmp} must be specified. This enables the OpenMP directive 136@code{#pragma omp} in C/C++ and @code{!$omp} directives in free form, 137@code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form, 138@code{!$} conditional compilation sentinels in free form and @code{c$}, 139@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also 140arranges for automatic linking of the OpenMP runtime library 141(@ref{Runtime Library Routines}). 142 143A complete description of all OpenMP directives accepted may be found in 144the @uref{https://www.openmp.org, OpenMP Application Program Interface} manual, 145version 4.5. 146 147 148@c --------------------------------------------------------------------- 149@c OpenMP Runtime Library Routines 150@c --------------------------------------------------------------------- 151 152@node Runtime Library Routines 153@chapter OpenMP Runtime Library Routines 154 155The runtime routines described here are defined by Section 3 of the OpenMP 156specification in version 4.5. The routines are structured in following 157three parts: 158 159@menu 160Control threads, processors and the parallel environment. They have C 161linkage, and do not throw exceptions. 162 163* omp_get_active_level:: Number of active parallel regions 164* omp_get_ancestor_thread_num:: Ancestor thread ID 165* omp_get_cancellation:: Whether cancellation support is enabled 166* omp_get_default_device:: Get the default device for target regions 167* omp_get_dynamic:: Dynamic teams setting 168* omp_get_level:: Number of parallel regions 169* omp_get_max_active_levels:: Maximum number of active regions 170* omp_get_max_task_priority:: Maximum task priority value that can be set 171* omp_get_max_threads:: Maximum number of threads of parallel region 172* omp_get_nested:: Nested parallel regions 173* omp_get_num_devices:: Number of target devices 174* omp_get_num_procs:: Number of processors online 175* omp_get_num_teams:: Number of teams 176* omp_get_num_threads:: Size of the active team 177* omp_get_proc_bind:: Whether theads may be moved between CPUs 178* omp_get_schedule:: Obtain the runtime scheduling method 179* omp_get_team_num:: Get team number 180* omp_get_team_size:: Number of threads in a team 181* omp_get_thread_limit:: Maximum number of threads 182* omp_get_thread_num:: Current thread ID 183* omp_in_parallel:: Whether a parallel region is active 184* omp_in_final:: Whether in final or included task region 185* omp_is_initial_device:: Whether executing on the host device 186* omp_set_default_device:: Set the default device for target regions 187* omp_set_dynamic:: Enable/disable dynamic teams 188* omp_set_max_active_levels:: Limits the number of active parallel regions 189* omp_set_nested:: Enable/disable nested parallel regions 190* omp_set_num_threads:: Set upper team size limit 191* omp_set_schedule:: Set the runtime scheduling method 192 193Initialize, set, test, unset and destroy simple and nested locks. 194 195* omp_init_lock:: Initialize simple lock 196* omp_set_lock:: Wait for and set simple lock 197* omp_test_lock:: Test and set simple lock if available 198* omp_unset_lock:: Unset simple lock 199* omp_destroy_lock:: Destroy simple lock 200* omp_init_nest_lock:: Initialize nested lock 201* omp_set_nest_lock:: Wait for and set simple lock 202* omp_test_nest_lock:: Test and set nested lock if available 203* omp_unset_nest_lock:: Unset nested lock 204* omp_destroy_nest_lock:: Destroy nested lock 205 206Portable, thread-based, wall clock timer. 207 208* omp_get_wtick:: Get timer precision. 209* omp_get_wtime:: Elapsed wall clock time. 210@end menu 211 212 213 214@node omp_get_active_level 215@section @code{omp_get_active_level} -- Number of parallel regions 216@table @asis 217@item @emph{Description}: 218This function returns the nesting level for the active parallel blocks, 219which enclose the calling call. 220 221@item @emph{C/C++} 222@multitable @columnfractions .20 .80 223@item @emph{Prototype}: @tab @code{int omp_get_active_level(void);} 224@end multitable 225 226@item @emph{Fortran}: 227@multitable @columnfractions .20 .80 228@item @emph{Interface}: @tab @code{integer function omp_get_active_level()} 229@end multitable 230 231@item @emph{See also}: 232@ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels} 233 234@item @emph{Reference}: 235@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20. 236@end table 237 238 239 240@node omp_get_ancestor_thread_num 241@section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID 242@table @asis 243@item @emph{Description}: 244This function returns the thread identification number for the given 245nesting level of the current thread. For values of @var{level} outside 246zero to @code{omp_get_level} -1 is returned; if @var{level} is 247@code{omp_get_level} the result is identical to @code{omp_get_thread_num}. 248 249@item @emph{C/C++} 250@multitable @columnfractions .20 .80 251@item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);} 252@end multitable 253 254@item @emph{Fortran}: 255@multitable @columnfractions .20 .80 256@item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)} 257@item @tab @code{integer level} 258@end multitable 259 260@item @emph{See also}: 261@ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size} 262 263@item @emph{Reference}: 264@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18. 265@end table 266 267 268 269@node omp_get_cancellation 270@section @code{omp_get_cancellation} -- Whether cancellation support is enabled 271@table @asis 272@item @emph{Description}: 273This function returns @code{true} if cancellation is activated, @code{false} 274otherwise. Here, @code{true} and @code{false} represent their language-specific 275counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are 276deactivated. 277 278@item @emph{C/C++}: 279@multitable @columnfractions .20 .80 280@item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);} 281@end multitable 282 283@item @emph{Fortran}: 284@multitable @columnfractions .20 .80 285@item @emph{Interface}: @tab @code{logical function omp_get_cancellation()} 286@end multitable 287 288@item @emph{See also}: 289@ref{OMP_CANCELLATION} 290 291@item @emph{Reference}: 292@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9. 293@end table 294 295 296 297@node omp_get_default_device 298@section @code{omp_get_default_device} -- Get the default device for target regions 299@table @asis 300@item @emph{Description}: 301Get the default device for target regions without device clause. 302 303@item @emph{C/C++}: 304@multitable @columnfractions .20 .80 305@item @emph{Prototype}: @tab @code{int omp_get_default_device(void);} 306@end multitable 307 308@item @emph{Fortran}: 309@multitable @columnfractions .20 .80 310@item @emph{Interface}: @tab @code{integer function omp_get_default_device()} 311@end multitable 312 313@item @emph{See also}: 314@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device} 315 316@item @emph{Reference}: 317@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30. 318@end table 319 320 321 322@node omp_get_dynamic 323@section @code{omp_get_dynamic} -- Dynamic teams setting 324@table @asis 325@item @emph{Description}: 326This function returns @code{true} if enabled, @code{false} otherwise. 327Here, @code{true} and @code{false} represent their language-specific 328counterparts. 329 330The dynamic team setting may be initialized at startup by the 331@env{OMP_DYNAMIC} environment variable or at runtime using 332@code{omp_set_dynamic}. If undefined, dynamic adjustment is 333disabled by default. 334 335@item @emph{C/C++}: 336@multitable @columnfractions .20 .80 337@item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);} 338@end multitable 339 340@item @emph{Fortran}: 341@multitable @columnfractions .20 .80 342@item @emph{Interface}: @tab @code{logical function omp_get_dynamic()} 343@end multitable 344 345@item @emph{See also}: 346@ref{omp_set_dynamic}, @ref{OMP_DYNAMIC} 347 348@item @emph{Reference}: 349@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8. 350@end table 351 352 353 354@node omp_get_level 355@section @code{omp_get_level} -- Obtain the current nesting level 356@table @asis 357@item @emph{Description}: 358This function returns the nesting level for the parallel blocks, 359which enclose the calling call. 360 361@item @emph{C/C++} 362@multitable @columnfractions .20 .80 363@item @emph{Prototype}: @tab @code{int omp_get_level(void);} 364@end multitable 365 366@item @emph{Fortran}: 367@multitable @columnfractions .20 .80 368@item @emph{Interface}: @tab @code{integer function omp_level()} 369@end multitable 370 371@item @emph{See also}: 372@ref{omp_get_active_level} 373 374@item @emph{Reference}: 375@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17. 376@end table 377 378 379 380@node omp_get_max_active_levels 381@section @code{omp_get_max_active_levels} -- Maximum number of active regions 382@table @asis 383@item @emph{Description}: 384This function obtains the maximum allowed number of nested, active parallel regions. 385 386@item @emph{C/C++} 387@multitable @columnfractions .20 .80 388@item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);} 389@end multitable 390 391@item @emph{Fortran}: 392@multitable @columnfractions .20 .80 393@item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()} 394@end multitable 395 396@item @emph{See also}: 397@ref{omp_set_max_active_levels}, @ref{omp_get_active_level} 398 399@item @emph{Reference}: 400@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16. 401@end table 402 403 404@node omp_get_max_task_priority 405@section @code{omp_get_max_task_priority} -- Maximum priority value 406that can be set for tasks. 407@table @asis 408@item @emph{Description}: 409This function obtains the maximum allowed priority number for tasks. 410 411@item @emph{C/C++} 412@multitable @columnfractions .20 .80 413@item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);} 414@end multitable 415 416@item @emph{Fortran}: 417@multitable @columnfractions .20 .80 418@item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()} 419@end multitable 420 421@item @emph{Reference}: 422@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29. 423@end table 424 425 426@node omp_get_max_threads 427@section @code{omp_get_max_threads} -- Maximum number of threads of parallel region 428@table @asis 429@item @emph{Description}: 430Return the maximum number of threads used for the current parallel region 431that does not use the clause @code{num_threads}. 432 433@item @emph{C/C++}: 434@multitable @columnfractions .20 .80 435@item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);} 436@end multitable 437 438@item @emph{Fortran}: 439@multitable @columnfractions .20 .80 440@item @emph{Interface}: @tab @code{integer function omp_get_max_threads()} 441@end multitable 442 443@item @emph{See also}: 444@ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit} 445 446@item @emph{Reference}: 447@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3. 448@end table 449 450 451 452@node omp_get_nested 453@section @code{omp_get_nested} -- Nested parallel regions 454@table @asis 455@item @emph{Description}: 456This function returns @code{true} if nested parallel regions are 457enabled, @code{false} otherwise. Here, @code{true} and @code{false} 458represent their language-specific counterparts. 459 460Nested parallel regions may be initialized at startup by the 461@env{OMP_NESTED} environment variable or at runtime using 462@code{omp_set_nested}. If undefined, nested parallel regions are 463disabled by default. 464 465@item @emph{C/C++}: 466@multitable @columnfractions .20 .80 467@item @emph{Prototype}: @tab @code{int omp_get_nested(void);} 468@end multitable 469 470@item @emph{Fortran}: 471@multitable @columnfractions .20 .80 472@item @emph{Interface}: @tab @code{logical function omp_get_nested()} 473@end multitable 474 475@item @emph{See also}: 476@ref{omp_set_nested}, @ref{OMP_NESTED} 477 478@item @emph{Reference}: 479@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11. 480@end table 481 482 483 484@node omp_get_num_devices 485@section @code{omp_get_num_devices} -- Number of target devices 486@table @asis 487@item @emph{Description}: 488Returns the number of target devices. 489 490@item @emph{C/C++}: 491@multitable @columnfractions .20 .80 492@item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);} 493@end multitable 494 495@item @emph{Fortran}: 496@multitable @columnfractions .20 .80 497@item @emph{Interface}: @tab @code{integer function omp_get_num_devices()} 498@end multitable 499 500@item @emph{Reference}: 501@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31. 502@end table 503 504 505 506@node omp_get_num_procs 507@section @code{omp_get_num_procs} -- Number of processors online 508@table @asis 509@item @emph{Description}: 510Returns the number of processors online on that device. 511 512@item @emph{C/C++}: 513@multitable @columnfractions .20 .80 514@item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);} 515@end multitable 516 517@item @emph{Fortran}: 518@multitable @columnfractions .20 .80 519@item @emph{Interface}: @tab @code{integer function omp_get_num_procs()} 520@end multitable 521 522@item @emph{Reference}: 523@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5. 524@end table 525 526 527 528@node omp_get_num_teams 529@section @code{omp_get_num_teams} -- Number of teams 530@table @asis 531@item @emph{Description}: 532Returns the number of teams in the current team region. 533 534@item @emph{C/C++}: 535@multitable @columnfractions .20 .80 536@item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);} 537@end multitable 538 539@item @emph{Fortran}: 540@multitable @columnfractions .20 .80 541@item @emph{Interface}: @tab @code{integer function omp_get_num_teams()} 542@end multitable 543 544@item @emph{Reference}: 545@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32. 546@end table 547 548 549 550@node omp_get_num_threads 551@section @code{omp_get_num_threads} -- Size of the active team 552@table @asis 553@item @emph{Description}: 554Returns the number of threads in the current team. In a sequential section of 555the program @code{omp_get_num_threads} returns 1. 556 557The default team size may be initialized at startup by the 558@env{OMP_NUM_THREADS} environment variable. At runtime, the size 559of the current team may be set either by the @code{NUM_THREADS} 560clause or by @code{omp_set_num_threads}. If none of the above were 561used to define a specific value and @env{OMP_DYNAMIC} is disabled, 562one thread per CPU online is used. 563 564@item @emph{C/C++}: 565@multitable @columnfractions .20 .80 566@item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);} 567@end multitable 568 569@item @emph{Fortran}: 570@multitable @columnfractions .20 .80 571@item @emph{Interface}: @tab @code{integer function omp_get_num_threads()} 572@end multitable 573 574@item @emph{See also}: 575@ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS} 576 577@item @emph{Reference}: 578@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2. 579@end table 580 581 582 583@node omp_get_proc_bind 584@section @code{omp_get_proc_bind} -- Whether theads may be moved between CPUs 585@table @asis 586@item @emph{Description}: 587This functions returns the currently active thread affinity policy, which is 588set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false}, 589@code{omp_proc_bind_true}, @code{omp_proc_bind_master}, 590@code{omp_proc_bind_close} and @code{omp_proc_bind_spread}. 591 592@item @emph{C/C++}: 593@multitable @columnfractions .20 .80 594@item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);} 595@end multitable 596 597@item @emph{Fortran}: 598@multitable @columnfractions .20 .80 599@item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()} 600@end multitable 601 602@item @emph{See also}: 603@ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, 604 605@item @emph{Reference}: 606@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22. 607@end table 608 609 610 611@node omp_get_schedule 612@section @code{omp_get_schedule} -- Obtain the runtime scheduling method 613@table @asis 614@item @emph{Description}: 615Obtain the runtime scheduling method. The @var{kind} argument will be 616set to the value @code{omp_sched_static}, @code{omp_sched_dynamic}, 617@code{omp_sched_guided} or @code{omp_sched_auto}. The second argument, 618@var{chunk_size}, is set to the chunk size. 619 620@item @emph{C/C++} 621@multitable @columnfractions .20 .80 622@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);} 623@end multitable 624 625@item @emph{Fortran}: 626@multitable @columnfractions .20 .80 627@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)} 628@item @tab @code{integer(kind=omp_sched_kind) kind} 629@item @tab @code{integer chunk_size} 630@end multitable 631 632@item @emph{See also}: 633@ref{omp_set_schedule}, @ref{OMP_SCHEDULE} 634 635@item @emph{Reference}: 636@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13. 637@end table 638 639 640 641@node omp_get_team_num 642@section @code{omp_get_team_num} -- Get team number 643@table @asis 644@item @emph{Description}: 645Returns the team number of the calling thread. 646 647@item @emph{C/C++}: 648@multitable @columnfractions .20 .80 649@item @emph{Prototype}: @tab @code{int omp_get_team_num(void);} 650@end multitable 651 652@item @emph{Fortran}: 653@multitable @columnfractions .20 .80 654@item @emph{Interface}: @tab @code{integer function omp_get_team_num()} 655@end multitable 656 657@item @emph{Reference}: 658@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33. 659@end table 660 661 662 663@node omp_get_team_size 664@section @code{omp_get_team_size} -- Number of threads in a team 665@table @asis 666@item @emph{Description}: 667This function returns the number of threads in a thread team to which 668either the current thread or its ancestor belongs. For values of @var{level} 669outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero, 6701 is returned, and for @code{omp_get_level}, the result is identical 671to @code{omp_get_num_threads}. 672 673@item @emph{C/C++}: 674@multitable @columnfractions .20 .80 675@item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);} 676@end multitable 677 678@item @emph{Fortran}: 679@multitable @columnfractions .20 .80 680@item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)} 681@item @tab @code{integer level} 682@end multitable 683 684@item @emph{See also}: 685@ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num} 686 687@item @emph{Reference}: 688@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19. 689@end table 690 691 692 693@node omp_get_thread_limit 694@section @code{omp_get_thread_limit} -- Maximum number of threads 695@table @asis 696@item @emph{Description}: 697Return the maximum number of threads of the program. 698 699@item @emph{C/C++}: 700@multitable @columnfractions .20 .80 701@item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);} 702@end multitable 703 704@item @emph{Fortran}: 705@multitable @columnfractions .20 .80 706@item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()} 707@end multitable 708 709@item @emph{See also}: 710@ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT} 711 712@item @emph{Reference}: 713@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14. 714@end table 715 716 717 718@node omp_get_thread_num 719@section @code{omp_get_thread_num} -- Current thread ID 720@table @asis 721@item @emph{Description}: 722Returns a unique thread identification number within the current team. 723In a sequential parts of the program, @code{omp_get_thread_num} 724always returns 0. In parallel regions the return value varies 725from 0 to @code{omp_get_num_threads}-1 inclusive. The return 726value of the master thread of a team is always 0. 727 728@item @emph{C/C++}: 729@multitable @columnfractions .20 .80 730@item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);} 731@end multitable 732 733@item @emph{Fortran}: 734@multitable @columnfractions .20 .80 735@item @emph{Interface}: @tab @code{integer function omp_get_thread_num()} 736@end multitable 737 738@item @emph{See also}: 739@ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num} 740 741@item @emph{Reference}: 742@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4. 743@end table 744 745 746 747@node omp_in_parallel 748@section @code{omp_in_parallel} -- Whether a parallel region is active 749@table @asis 750@item @emph{Description}: 751This function returns @code{true} if currently running in parallel, 752@code{false} otherwise. Here, @code{true} and @code{false} represent 753their language-specific counterparts. 754 755@item @emph{C/C++}: 756@multitable @columnfractions .20 .80 757@item @emph{Prototype}: @tab @code{int omp_in_parallel(void);} 758@end multitable 759 760@item @emph{Fortran}: 761@multitable @columnfractions .20 .80 762@item @emph{Interface}: @tab @code{logical function omp_in_parallel()} 763@end multitable 764 765@item @emph{Reference}: 766@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6. 767@end table 768 769 770@node omp_in_final 771@section @code{omp_in_final} -- Whether in final or included task region 772@table @asis 773@item @emph{Description}: 774This function returns @code{true} if currently running in a final 775or included task region, @code{false} otherwise. Here, @code{true} 776and @code{false} represent their language-specific counterparts. 777 778@item @emph{C/C++}: 779@multitable @columnfractions .20 .80 780@item @emph{Prototype}: @tab @code{int omp_in_final(void);} 781@end multitable 782 783@item @emph{Fortran}: 784@multitable @columnfractions .20 .80 785@item @emph{Interface}: @tab @code{logical function omp_in_final()} 786@end multitable 787 788@item @emph{Reference}: 789@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21. 790@end table 791 792 793 794@node omp_is_initial_device 795@section @code{omp_is_initial_device} -- Whether executing on the host device 796@table @asis 797@item @emph{Description}: 798This function returns @code{true} if currently running on the host device, 799@code{false} otherwise. Here, @code{true} and @code{false} represent 800their language-specific counterparts. 801 802@item @emph{C/C++}: 803@multitable @columnfractions .20 .80 804@item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);} 805@end multitable 806 807@item @emph{Fortran}: 808@multitable @columnfractions .20 .80 809@item @emph{Interface}: @tab @code{logical function omp_is_initial_device()} 810@end multitable 811 812@item @emph{Reference}: 813@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34. 814@end table 815 816 817 818@node omp_set_default_device 819@section @code{omp_set_default_device} -- Set the default device for target regions 820@table @asis 821@item @emph{Description}: 822Set the default device for target regions without device clause. The argument 823shall be a nonnegative device number. 824 825@item @emph{C/C++}: 826@multitable @columnfractions .20 .80 827@item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);} 828@end multitable 829 830@item @emph{Fortran}: 831@multitable @columnfractions .20 .80 832@item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)} 833@item @tab @code{integer device_num} 834@end multitable 835 836@item @emph{See also}: 837@ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device} 838 839@item @emph{Reference}: 840@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29. 841@end table 842 843 844 845@node omp_set_dynamic 846@section @code{omp_set_dynamic} -- Enable/disable dynamic teams 847@table @asis 848@item @emph{Description}: 849Enable or disable the dynamic adjustment of the number of threads 850within a team. The function takes the language-specific equivalent 851of @code{true} and @code{false}, where @code{true} enables dynamic 852adjustment of team sizes and @code{false} disables it. 853 854@item @emph{C/C++}: 855@multitable @columnfractions .20 .80 856@item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);} 857@end multitable 858 859@item @emph{Fortran}: 860@multitable @columnfractions .20 .80 861@item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)} 862@item @tab @code{logical, intent(in) :: dynamic_threads} 863@end multitable 864 865@item @emph{See also}: 866@ref{OMP_DYNAMIC}, @ref{omp_get_dynamic} 867 868@item @emph{Reference}: 869@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7. 870@end table 871 872 873 874@node omp_set_max_active_levels 875@section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions 876@table @asis 877@item @emph{Description}: 878This function limits the maximum allowed number of nested, active 879parallel regions. 880 881@item @emph{C/C++} 882@multitable @columnfractions .20 .80 883@item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);} 884@end multitable 885 886@item @emph{Fortran}: 887@multitable @columnfractions .20 .80 888@item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)} 889@item @tab @code{integer max_levels} 890@end multitable 891 892@item @emph{See also}: 893@ref{omp_get_max_active_levels}, @ref{omp_get_active_level} 894 895@item @emph{Reference}: 896@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15. 897@end table 898 899 900 901@node omp_set_nested 902@section @code{omp_set_nested} -- Enable/disable nested parallel regions 903@table @asis 904@item @emph{Description}: 905Enable or disable nested parallel regions, i.e., whether team members 906are allowed to create new teams. The function takes the language-specific 907equivalent of @code{true} and @code{false}, where @code{true} enables 908dynamic adjustment of team sizes and @code{false} disables it. 909 910@item @emph{C/C++}: 911@multitable @columnfractions .20 .80 912@item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);} 913@end multitable 914 915@item @emph{Fortran}: 916@multitable @columnfractions .20 .80 917@item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)} 918@item @tab @code{logical, intent(in) :: nested} 919@end multitable 920 921@item @emph{See also}: 922@ref{OMP_NESTED}, @ref{omp_get_nested} 923 924@item @emph{Reference}: 925@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10. 926@end table 927 928 929 930@node omp_set_num_threads 931@section @code{omp_set_num_threads} -- Set upper team size limit 932@table @asis 933@item @emph{Description}: 934Specifies the number of threads used by default in subsequent parallel 935sections, if those do not specify a @code{num_threads} clause. The 936argument of @code{omp_set_num_threads} shall be a positive integer. 937 938@item @emph{C/C++}: 939@multitable @columnfractions .20 .80 940@item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);} 941@end multitable 942 943@item @emph{Fortran}: 944@multitable @columnfractions .20 .80 945@item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)} 946@item @tab @code{integer, intent(in) :: num_threads} 947@end multitable 948 949@item @emph{See also}: 950@ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads} 951 952@item @emph{Reference}: 953@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1. 954@end table 955 956 957 958@node omp_set_schedule 959@section @code{omp_set_schedule} -- Set the runtime scheduling method 960@table @asis 961@item @emph{Description}: 962Sets the runtime scheduling method. The @var{kind} argument can have the 963value @code{omp_sched_static}, @code{omp_sched_dynamic}, 964@code{omp_sched_guided} or @code{omp_sched_auto}. Except for 965@code{omp_sched_auto}, the chunk size is set to the value of 966@var{chunk_size} if positive, or to the default value if zero or negative. 967For @code{omp_sched_auto} the @var{chunk_size} argument is ignored. 968 969@item @emph{C/C++} 970@multitable @columnfractions .20 .80 971@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);} 972@end multitable 973 974@item @emph{Fortran}: 975@multitable @columnfractions .20 .80 976@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)} 977@item @tab @code{integer(kind=omp_sched_kind) kind} 978@item @tab @code{integer chunk_size} 979@end multitable 980 981@item @emph{See also}: 982@ref{omp_get_schedule} 983@ref{OMP_SCHEDULE} 984 985@item @emph{Reference}: 986@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12. 987@end table 988 989 990 991@node omp_init_lock 992@section @code{omp_init_lock} -- Initialize simple lock 993@table @asis 994@item @emph{Description}: 995Initialize a simple lock. After initialization, the lock is in 996an unlocked state. 997 998@item @emph{C/C++}: 999@multitable @columnfractions .20 .80 1000@item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);} 1001@end multitable 1002 1003@item @emph{Fortran}: 1004@multitable @columnfractions .20 .80 1005@item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)} 1006@item @tab @code{integer(omp_lock_kind), intent(out) :: svar} 1007@end multitable 1008 1009@item @emph{See also}: 1010@ref{omp_destroy_lock} 1011 1012@item @emph{Reference}: 1013@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1. 1014@end table 1015 1016 1017 1018@node omp_set_lock 1019@section @code{omp_set_lock} -- Wait for and set simple lock 1020@table @asis 1021@item @emph{Description}: 1022Before setting a simple lock, the lock variable must be initialized by 1023@code{omp_init_lock}. The calling thread is blocked until the lock 1024is available. If the lock is already held by the current thread, 1025a deadlock occurs. 1026 1027@item @emph{C/C++}: 1028@multitable @columnfractions .20 .80 1029@item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);} 1030@end multitable 1031 1032@item @emph{Fortran}: 1033@multitable @columnfractions .20 .80 1034@item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)} 1035@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1036@end multitable 1037 1038@item @emph{See also}: 1039@ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock} 1040 1041@item @emph{Reference}: 1042@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4. 1043@end table 1044 1045 1046 1047@node omp_test_lock 1048@section @code{omp_test_lock} -- Test and set simple lock if available 1049@table @asis 1050@item @emph{Description}: 1051Before setting a simple lock, the lock variable must be initialized by 1052@code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock} 1053does not block if the lock is not available. This function returns 1054@code{true} upon success, @code{false} otherwise. Here, @code{true} and 1055@code{false} represent their language-specific counterparts. 1056 1057@item @emph{C/C++}: 1058@multitable @columnfractions .20 .80 1059@item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);} 1060@end multitable 1061 1062@item @emph{Fortran}: 1063@multitable @columnfractions .20 .80 1064@item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)} 1065@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1066@end multitable 1067 1068@item @emph{See also}: 1069@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock} 1070 1071@item @emph{Reference}: 1072@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6. 1073@end table 1074 1075 1076 1077@node omp_unset_lock 1078@section @code{omp_unset_lock} -- Unset simple lock 1079@table @asis 1080@item @emph{Description}: 1081A simple lock about to be unset must have been locked by @code{omp_set_lock} 1082or @code{omp_test_lock} before. In addition, the lock must be held by the 1083thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one 1084or more threads attempted to set the lock before, one of them is chosen to, 1085again, set the lock to itself. 1086 1087@item @emph{C/C++}: 1088@multitable @columnfractions .20 .80 1089@item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);} 1090@end multitable 1091 1092@item @emph{Fortran}: 1093@multitable @columnfractions .20 .80 1094@item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)} 1095@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1096@end multitable 1097 1098@item @emph{See also}: 1099@ref{omp_set_lock}, @ref{omp_test_lock} 1100 1101@item @emph{Reference}: 1102@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5. 1103@end table 1104 1105 1106 1107@node omp_destroy_lock 1108@section @code{omp_destroy_lock} -- Destroy simple lock 1109@table @asis 1110@item @emph{Description}: 1111Destroy a simple lock. In order to be destroyed, a simple lock must be 1112in the unlocked state. 1113 1114@item @emph{C/C++}: 1115@multitable @columnfractions .20 .80 1116@item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);} 1117@end multitable 1118 1119@item @emph{Fortran}: 1120@multitable @columnfractions .20 .80 1121@item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)} 1122@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar} 1123@end multitable 1124 1125@item @emph{See also}: 1126@ref{omp_init_lock} 1127 1128@item @emph{Reference}: 1129@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3. 1130@end table 1131 1132 1133 1134@node omp_init_nest_lock 1135@section @code{omp_init_nest_lock} -- Initialize nested lock 1136@table @asis 1137@item @emph{Description}: 1138Initialize a nested lock. After initialization, the lock is in 1139an unlocked state and the nesting count is set to zero. 1140 1141@item @emph{C/C++}: 1142@multitable @columnfractions .20 .80 1143@item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);} 1144@end multitable 1145 1146@item @emph{Fortran}: 1147@multitable @columnfractions .20 .80 1148@item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)} 1149@item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar} 1150@end multitable 1151 1152@item @emph{See also}: 1153@ref{omp_destroy_nest_lock} 1154 1155@item @emph{Reference}: 1156@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1. 1157@end table 1158 1159 1160@node omp_set_nest_lock 1161@section @code{omp_set_nest_lock} -- Wait for and set nested lock 1162@table @asis 1163@item @emph{Description}: 1164Before setting a nested lock, the lock variable must be initialized by 1165@code{omp_init_nest_lock}. The calling thread is blocked until the lock 1166is available. If the lock is already held by the current thread, the 1167nesting count for the lock is incremented. 1168 1169@item @emph{C/C++}: 1170@multitable @columnfractions .20 .80 1171@item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);} 1172@end multitable 1173 1174@item @emph{Fortran}: 1175@multitable @columnfractions .20 .80 1176@item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)} 1177@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1178@end multitable 1179 1180@item @emph{See also}: 1181@ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock} 1182 1183@item @emph{Reference}: 1184@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4. 1185@end table 1186 1187 1188 1189@node omp_test_nest_lock 1190@section @code{omp_test_nest_lock} -- Test and set nested lock if available 1191@table @asis 1192@item @emph{Description}: 1193Before setting a nested lock, the lock variable must be initialized by 1194@code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock}, 1195@code{omp_test_nest_lock} does not block if the lock is not available. 1196If the lock is already held by the current thread, the new nesting count 1197is returned. Otherwise, the return value equals zero. 1198 1199@item @emph{C/C++}: 1200@multitable @columnfractions .20 .80 1201@item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);} 1202@end multitable 1203 1204@item @emph{Fortran}: 1205@multitable @columnfractions .20 .80 1206@item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)} 1207@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1208@end multitable 1209 1210 1211@item @emph{See also}: 1212@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock} 1213 1214@item @emph{Reference}: 1215@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6. 1216@end table 1217 1218 1219 1220@node omp_unset_nest_lock 1221@section @code{omp_unset_nest_lock} -- Unset nested lock 1222@table @asis 1223@item @emph{Description}: 1224A nested lock about to be unset must have been locked by @code{omp_set_nested_lock} 1225or @code{omp_test_nested_lock} before. In addition, the lock must be held by the 1226thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the 1227lock becomes unlocked. If one ore more threads attempted to set the lock before, 1228one of them is chosen to, again, set the lock to itself. 1229 1230@item @emph{C/C++}: 1231@multitable @columnfractions .20 .80 1232@item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);} 1233@end multitable 1234 1235@item @emph{Fortran}: 1236@multitable @columnfractions .20 .80 1237@item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)} 1238@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1239@end multitable 1240 1241@item @emph{See also}: 1242@ref{omp_set_nest_lock} 1243 1244@item @emph{Reference}: 1245@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5. 1246@end table 1247 1248 1249 1250@node omp_destroy_nest_lock 1251@section @code{omp_destroy_nest_lock} -- Destroy nested lock 1252@table @asis 1253@item @emph{Description}: 1254Destroy a nested lock. In order to be destroyed, a nested lock must be 1255in the unlocked state and its nesting count must equal zero. 1256 1257@item @emph{C/C++}: 1258@multitable @columnfractions .20 .80 1259@item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);} 1260@end multitable 1261 1262@item @emph{Fortran}: 1263@multitable @columnfractions .20 .80 1264@item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)} 1265@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar} 1266@end multitable 1267 1268@item @emph{See also}: 1269@ref{omp_init_lock} 1270 1271@item @emph{Reference}: 1272@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3. 1273@end table 1274 1275 1276 1277@node omp_get_wtick 1278@section @code{omp_get_wtick} -- Get timer precision 1279@table @asis 1280@item @emph{Description}: 1281Gets the timer precision, i.e., the number of seconds between two 1282successive clock ticks. 1283 1284@item @emph{C/C++}: 1285@multitable @columnfractions .20 .80 1286@item @emph{Prototype}: @tab @code{double omp_get_wtick(void);} 1287@end multitable 1288 1289@item @emph{Fortran}: 1290@multitable @columnfractions .20 .80 1291@item @emph{Interface}: @tab @code{double precision function omp_get_wtick()} 1292@end multitable 1293 1294@item @emph{See also}: 1295@ref{omp_get_wtime} 1296 1297@item @emph{Reference}: 1298@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2. 1299@end table 1300 1301 1302 1303@node omp_get_wtime 1304@section @code{omp_get_wtime} -- Elapsed wall clock time 1305@table @asis 1306@item @emph{Description}: 1307Elapsed wall clock time in seconds. The time is measured per thread, no 1308guarantee can be made that two distinct threads measure the same time. 1309Time is measured from some "time in the past", which is an arbitrary time 1310guaranteed not to change during the execution of the program. 1311 1312@item @emph{C/C++}: 1313@multitable @columnfractions .20 .80 1314@item @emph{Prototype}: @tab @code{double omp_get_wtime(void);} 1315@end multitable 1316 1317@item @emph{Fortran}: 1318@multitable @columnfractions .20 .80 1319@item @emph{Interface}: @tab @code{double precision function omp_get_wtime()} 1320@end multitable 1321 1322@item @emph{See also}: 1323@ref{omp_get_wtick} 1324 1325@item @emph{Reference}: 1326@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1. 1327@end table 1328 1329 1330 1331@c --------------------------------------------------------------------- 1332@c OpenMP Environment Variables 1333@c --------------------------------------------------------------------- 1334 1335@node Environment Variables 1336@chapter OpenMP Environment Variables 1337 1338The environment variables which beginning with @env{OMP_} are defined by 1339section 4 of the OpenMP specification in version 4.5, while those 1340beginning with @env{GOMP_} are GNU extensions. 1341 1342@menu 1343* OMP_CANCELLATION:: Set whether cancellation is activated 1344* OMP_DISPLAY_ENV:: Show OpenMP version and environment variables 1345* OMP_DEFAULT_DEVICE:: Set the device used in target regions 1346* OMP_DYNAMIC:: Dynamic adjustment of threads 1347* OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions 1348* OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value 1349* OMP_NESTED:: Nested parallel regions 1350* OMP_NUM_THREADS:: Specifies the number of threads to use 1351* OMP_PROC_BIND:: Whether theads may be moved between CPUs 1352* OMP_PLACES:: Specifies on which CPUs the theads should be placed 1353* OMP_STACKSIZE:: Set default thread stack size 1354* OMP_SCHEDULE:: How threads are scheduled 1355* OMP_THREAD_LIMIT:: Set the maximum number of threads 1356* OMP_WAIT_POLICY:: How waiting threads are handled 1357* GOMP_CPU_AFFINITY:: Bind threads to specific CPUs 1358* GOMP_DEBUG:: Enable debugging output 1359* GOMP_STACKSIZE:: Set default thread stack size 1360* GOMP_SPINCOUNT:: Set the busy-wait spin count 1361* GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools 1362@end menu 1363 1364 1365@node OMP_CANCELLATION 1366@section @env{OMP_CANCELLATION} -- Set whether cancellation is activated 1367@cindex Environment Variable 1368@table @asis 1369@item @emph{Description}: 1370If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or 1371if unset, cancellation is disabled and the @code{cancel} construct is ignored. 1372 1373@item @emph{See also}: 1374@ref{omp_get_cancellation} 1375 1376@item @emph{Reference}: 1377@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11 1378@end table 1379 1380 1381 1382@node OMP_DISPLAY_ENV 1383@section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables 1384@cindex Environment Variable 1385@table @asis 1386@item @emph{Description}: 1387If set to @code{TRUE}, the OpenMP version number and the values 1388associated with the OpenMP environment variables are printed to @code{stderr}. 1389If set to @code{VERBOSE}, it additionally shows the value of the environment 1390variables which are GNU extensions. If undefined or set to @code{FALSE}, 1391this information will not be shown. 1392 1393 1394@item @emph{Reference}: 1395@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12 1396@end table 1397 1398 1399 1400@node OMP_DEFAULT_DEVICE 1401@section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions 1402@cindex Environment Variable 1403@table @asis 1404@item @emph{Description}: 1405Set to choose the device which is used in a @code{target} region, unless the 1406value is overridden by @code{omp_set_default_device} or by a @code{device} 1407clause. The value shall be the nonnegative device number. If no device with 1408the given device number exists, the code is executed on the host. If unset, 1409device number 0 will be used. 1410 1411 1412@item @emph{See also}: 1413@ref{omp_get_default_device}, @ref{omp_set_default_device}, 1414 1415@item @emph{Reference}: 1416@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13 1417@end table 1418 1419 1420 1421@node OMP_DYNAMIC 1422@section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads 1423@cindex Environment Variable 1424@table @asis 1425@item @emph{Description}: 1426Enable or disable the dynamic adjustment of the number of threads 1427within a team. The value of this environment variable shall be 1428@code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is 1429disabled by default. 1430 1431@item @emph{See also}: 1432@ref{omp_set_dynamic} 1433 1434@item @emph{Reference}: 1435@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3 1436@end table 1437 1438 1439 1440@node OMP_MAX_ACTIVE_LEVELS 1441@section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions 1442@cindex Environment Variable 1443@table @asis 1444@item @emph{Description}: 1445Specifies the initial value for the maximum number of nested parallel 1446regions. The value of this variable shall be a positive integer. 1447If undefined, the number of active levels is unlimited. 1448 1449@item @emph{See also}: 1450@ref{omp_set_max_active_levels} 1451 1452@item @emph{Reference}: 1453@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9 1454@end table 1455 1456 1457 1458@node OMP_MAX_TASK_PRIORITY 1459@section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority 1460number that can be set for a task. 1461@cindex Environment Variable 1462@table @asis 1463@item @emph{Description}: 1464Specifies the initial value for the maximum priority value that can be 1465set for a task. The value of this variable shall be a non-negative 1466integer, and zero is allowed. If undefined, the default priority is 14670. 1468 1469@item @emph{See also}: 1470@ref{omp_get_max_task_priority} 1471 1472@item @emph{Reference}: 1473@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14 1474@end table 1475 1476 1477 1478@node OMP_NESTED 1479@section @env{OMP_NESTED} -- Nested parallel regions 1480@cindex Environment Variable 1481@cindex Implementation specific setting 1482@table @asis 1483@item @emph{Description}: 1484Enable or disable nested parallel regions, i.e., whether team members 1485are allowed to create new teams. The value of this environment variable 1486shall be @code{TRUE} or @code{FALSE}. If undefined, nested parallel 1487regions are disabled by default. 1488 1489@item @emph{See also}: 1490@ref{omp_set_nested} 1491 1492@item @emph{Reference}: 1493@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6 1494@end table 1495 1496 1497 1498@node OMP_NUM_THREADS 1499@section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use 1500@cindex Environment Variable 1501@cindex Implementation specific setting 1502@table @asis 1503@item @emph{Description}: 1504Specifies the default number of threads to use in parallel regions. The 1505value of this variable shall be a comma-separated list of positive integers; 1506the value specified the number of threads to use for the corresponding nested 1507level. If undefined one thread per CPU is used. 1508 1509@item @emph{See also}: 1510@ref{omp_set_num_threads} 1511 1512@item @emph{Reference}: 1513@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2 1514@end table 1515 1516 1517 1518@node OMP_PROC_BIND 1519@section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs 1520@cindex Environment Variable 1521@table @asis 1522@item @emph{Description}: 1523Specifies whether threads may be moved between processors. If set to 1524@code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE} 1525they may be moved. Alternatively, a comma separated list with the 1526values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify 1527the thread affinity policy for the corresponding nesting level. With 1528@code{MASTER} the worker threads are in the same place partition as the 1529master thread. With @code{CLOSE} those are kept close to the master thread 1530in contiguous place partitions. And with @code{SPREAD} a sparse distribution 1531across the place partitions is used. 1532 1533When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when 1534@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise. 1535 1536@item @emph{See also}: 1537@ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind} 1538 1539@item @emph{Reference}: 1540@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4 1541@end table 1542 1543 1544 1545@node OMP_PLACES 1546@section @env{OMP_PLACES} -- Specifies on which CPUs the theads should be placed 1547@cindex Environment Variable 1548@table @asis 1549@item @emph{Description}: 1550The thread placement can be either specified using an abstract name or by an 1551explicit list of the places. The abstract names @code{threads}, @code{cores} 1552and @code{sockets} can be optionally followed by a positive number in 1553parentheses, which denotes the how many places shall be created. With 1554@code{threads} each place corresponds to a single hardware thread; @code{cores} 1555to a single core with the corresponding number of hardware threads; and with 1556@code{sockets} the place corresponds to a single socket. The resulting 1557placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment 1558variable. 1559 1560Alternatively, the placement can be specified explicitly as comma-separated 1561list of places. A place is specified by set of nonnegative numbers in curly 1562braces, denoting the denoting the hardware threads. The hardware threads 1563belonging to a place can either be specified as comma-separated list of 1564nonnegative thread numbers or using an interval. Multiple places can also be 1565either specified by a comma-separated list of places or by an interval. To 1566specify an interval, a colon followed by the count is placed after after 1567the hardware thread number or the place. Optionally, the length can be 1568followed by a colon and the stride number -- otherwise a unit stride is 1569assumed. For instance, the following specifies the same places list: 1570@code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"}; 1571@code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}. 1572 1573If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and 1574@env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved 1575between CPUs following no placement policy. 1576 1577@item @emph{See also}: 1578@ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}, 1579@ref{OMP_DISPLAY_ENV} 1580 1581@item @emph{Reference}: 1582@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5 1583@end table 1584 1585 1586 1587@node OMP_STACKSIZE 1588@section @env{OMP_STACKSIZE} -- Set default thread stack size 1589@cindex Environment Variable 1590@table @asis 1591@item @emph{Description}: 1592Set the default thread stack size in kilobytes, unless the number 1593is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which 1594case the size is, respectively, in bytes, kilobytes, megabytes 1595or gigabytes. This is different from @code{pthread_attr_setstacksize} 1596which gets the number of bytes as an argument. If the stack size cannot 1597be set due to system constraints, an error is reported and the initial 1598stack size is left unchanged. If undefined, the stack size is system 1599dependent. 1600 1601@item @emph{Reference}: 1602@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7 1603@end table 1604 1605 1606 1607@node OMP_SCHEDULE 1608@section @env{OMP_SCHEDULE} -- How threads are scheduled 1609@cindex Environment Variable 1610@cindex Implementation specific setting 1611@table @asis 1612@item @emph{Description}: 1613Allows to specify @code{schedule type} and @code{chunk size}. 1614The value of the variable shall have the form: @code{type[,chunk]} where 1615@code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto} 1616The optional @code{chunk} size shall be a positive integer. If undefined, 1617dynamic scheduling and a chunk size of 1 is used. 1618 1619@item @emph{See also}: 1620@ref{omp_set_schedule} 1621 1622@item @emph{Reference}: 1623@uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1 1624@end table 1625 1626 1627 1628@node OMP_THREAD_LIMIT 1629@section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads 1630@cindex Environment Variable 1631@table @asis 1632@item @emph{Description}: 1633Specifies the number of threads to use for the whole program. The 1634value of this variable shall be a positive integer. If undefined, 1635the number of threads is not limited. 1636 1637@item @emph{See also}: 1638@ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit} 1639 1640@item @emph{Reference}: 1641@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10 1642@end table 1643 1644 1645 1646@node OMP_WAIT_POLICY 1647@section @env{OMP_WAIT_POLICY} -- How waiting threads are handled 1648@cindex Environment Variable 1649@table @asis 1650@item @emph{Description}: 1651Specifies whether waiting threads should be active or passive. If 1652the value is @code{PASSIVE}, waiting threads should not consume CPU 1653power while waiting; while the value is @code{ACTIVE} specifies that 1654they should. If undefined, threads wait actively for a short time 1655before waiting passively. 1656 1657@item @emph{See also}: 1658@ref{GOMP_SPINCOUNT} 1659 1660@item @emph{Reference}: 1661@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8 1662@end table 1663 1664 1665 1666@node GOMP_CPU_AFFINITY 1667@section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs 1668@cindex Environment Variable 1669@table @asis 1670@item @emph{Description}: 1671Binds threads to specific CPUs. The variable should contain a space-separated 1672or comma-separated list of CPUs. This list may contain different kinds of 1673entries: either single CPU numbers in any order, a range of CPUs (M-N) 1674or a range with some stride (M-N:S). CPU numbers are zero based. For example, 1675@code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread 1676to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to 1677CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12, 1678and 14 respectively and then start assigning back from the beginning of 1679the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0. 1680 1681There is no libgomp library routine to determine whether a CPU affinity 1682specification is in effect. As a workaround, language-specific library 1683functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in 1684Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY} 1685environment variable. A defined CPU affinity on startup cannot be changed 1686or disabled during the runtime of the application. 1687 1688If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set, 1689@env{OMP_PROC_BIND} has a higher precedence. If neither has been set and 1690@env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to 1691@code{FALSE}, the host system will handle the assignment of threads to CPUs. 1692 1693@item @emph{See also}: 1694@ref{OMP_PLACES}, @ref{OMP_PROC_BIND} 1695@end table 1696 1697 1698 1699@node GOMP_DEBUG 1700@section @env{GOMP_DEBUG} -- Enable debugging output 1701@cindex Environment Variable 1702@table @asis 1703@item @emph{Description}: 1704Enable debugging output. The variable should be set to @code{0} 1705(disabled, also the default if not set), or @code{1} (enabled). 1706 1707If enabled, some debugging output will be printed during execution. 1708This is currently not specified in more detail, and subject to change. 1709@end table 1710 1711 1712 1713@node GOMP_STACKSIZE 1714@section @env{GOMP_STACKSIZE} -- Set default thread stack size 1715@cindex Environment Variable 1716@cindex Implementation specific setting 1717@table @asis 1718@item @emph{Description}: 1719Set the default thread stack size in kilobytes. This is different from 1720@code{pthread_attr_setstacksize} which gets the number of bytes as an 1721argument. If the stack size cannot be set due to system constraints, an 1722error is reported and the initial stack size is left unchanged. If undefined, 1723the stack size is system dependent. 1724 1725@item @emph{See also}: 1726@ref{OMP_STACKSIZE} 1727 1728@item @emph{Reference}: 1729@uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html, 1730GCC Patches Mailinglist}, 1731@uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html, 1732GCC Patches Mailinglist} 1733@end table 1734 1735 1736 1737@node GOMP_SPINCOUNT 1738@section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count 1739@cindex Environment Variable 1740@cindex Implementation specific setting 1741@table @asis 1742@item @emph{Description}: 1743Determines how long a threads waits actively with consuming CPU power 1744before waiting passively without consuming CPU power. The value may be 1745either @code{INFINITE}, @code{INFINITY} to always wait actively or an 1746integer which gives the number of spins of the busy-wait loop. The 1747integer may optionally be followed by the following suffixes acting 1748as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega, 1749million), @code{G} (giga, billion), or @code{T} (tera, trillion). 1750If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE}, 1751300,000 is used when @env{OMP_WAIT_POLICY} is undefined and 175230 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}. 1753If there are more OpenMP threads than available CPUs, 1000 and 100 1754spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or 1755undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower 1756or @env{OMP_WAIT_POLICY} is @code{PASSIVE}. 1757 1758@item @emph{See also}: 1759@ref{OMP_WAIT_POLICY} 1760@end table 1761 1762 1763 1764@node GOMP_RTEMS_THREAD_POOLS 1765@section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools 1766@cindex Environment Variable 1767@cindex Implementation specific setting 1768@table @asis 1769@item @emph{Description}: 1770This environment variable is only used on the RTEMS real-time operating system. 1771It determines the scheduler instance specific thread pools. The format for 1772@env{GOMP_RTEMS_THREAD_POOLS} is a list of optional 1773@code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations 1774separated by @code{:} where: 1775@itemize @bullet 1776@item @code{<thread-pool-count>} is the thread pool count for this scheduler 1777instance. 1778@item @code{$<priority>} is an optional priority for the worker threads of a 1779thread pool according to @code{pthread_setschedparam}. In case a priority 1780value is omitted, then a worker thread will inherit the priority of the OpenMP 1781master thread that created it. The priority of the worker thread is not 1782changed after creation, even if a new OpenMP master thread using the worker has 1783a different priority. 1784@item @code{@@<scheduler-name>} is the scheduler instance name according to the 1785RTEMS application configuration. 1786@end itemize 1787In case no thread pool configuration is specified for a scheduler instance, 1788then each OpenMP master thread of this scheduler instance will use its own 1789dynamically allocated thread pool. To limit the worker thread count of the 1790thread pools, each OpenMP master thread must call @code{omp_set_num_threads}. 1791@item @emph{Example}: 1792Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and 1793@code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to 1794@code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for 1795scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is 1796one thread pool available. Since no priority is specified for this scheduler 1797instance, the worker thread inherits the priority of the OpenMP master thread 1798that created it. In the scheduler instance @code{WRK1} there are three thread 1799pools available and their worker threads run at priority four. 1800@end table 1801 1802 1803 1804@c --------------------------------------------------------------------- 1805@c Enabling OpenACC 1806@c --------------------------------------------------------------------- 1807 1808@node Enabling OpenACC 1809@chapter Enabling OpenACC 1810 1811To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 1812flag @option{-fopenacc} must be specified. This enables the OpenACC directive 1813@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form, 1814@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form, 1815@code{!$} conditional compilation sentinels in free form and @code{c$}, 1816@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also 1817arranges for automatic linking of the OpenACC runtime library 1818(@ref{OpenACC Runtime Library Routines}). 1819 1820A complete description of all OpenACC directives accepted may be found in 1821the @uref{https://www.openacc.org, OpenACC} Application Programming 1822Interface manual, version 2.0. 1823 1824Note that this is an experimental feature and subject to 1825change in future versions of GCC. See 1826@uref{https://gcc.gnu.org/wiki/OpenACC} for more information. 1827 1828 1829 1830@c --------------------------------------------------------------------- 1831@c OpenACC Runtime Library Routines 1832@c --------------------------------------------------------------------- 1833 1834@node OpenACC Runtime Library Routines 1835@chapter OpenACC Runtime Library Routines 1836 1837The runtime routines described here are defined by section 3 of the OpenACC 1838specifications in version 2.0. 1839They have C linkage, and do not throw exceptions. 1840Generally, they are available only for the host, with the exception of 1841@code{acc_on_device}, which is available for both the host and the 1842acceleration device. 1843 1844@menu 1845* acc_get_num_devices:: Get number of devices for the given device 1846 type. 1847* acc_set_device_type:: Set type of device accelerator to use. 1848* acc_get_device_type:: Get type of device accelerator to be used. 1849* acc_set_device_num:: Set device number to use. 1850* acc_get_device_num:: Get device number to be used. 1851* acc_async_test:: Tests for completion of a specific asynchronous 1852 operation. 1853* acc_async_test_all:: Tests for completion of all asychronous 1854 operations. 1855* acc_wait:: Wait for completion of a specific asynchronous 1856 operation. 1857* acc_wait_all:: Waits for completion of all asyncrhonous 1858 operations. 1859* acc_wait_all_async:: Wait for completion of all asynchronous 1860 operations. 1861* acc_wait_async:: Wait for completion of asynchronous operations. 1862* acc_init:: Initialize runtime for a specific device type. 1863* acc_shutdown:: Shuts down the runtime for a specific device 1864 type. 1865* acc_on_device:: Whether executing on a particular device 1866* acc_malloc:: Allocate device memory. 1867* acc_free:: Free device memory. 1868* acc_copyin:: Allocate device memory and copy host memory to 1869 it. 1870* acc_present_or_copyin:: If the data is not present on the device, 1871 allocate device memory and copy from host 1872 memory. 1873* acc_create:: Allocate device memory and map it to host 1874 memory. 1875* acc_present_or_create:: If the data is not present on the device, 1876 allocate device memory and map it to host 1877 memory. 1878* acc_copyout:: Copy device memory to host memory. 1879* acc_delete:: Free device memory. 1880* acc_update_device:: Update device memory from mapped host memory. 1881* acc_update_self:: Update host memory from mapped device memory. 1882* acc_map_data:: Map previously allocated device memory to host 1883 memory. 1884* acc_unmap_data:: Unmap device memory from host memory. 1885* acc_deviceptr:: Get device pointer associated with specific 1886 host address. 1887* acc_hostptr:: Get host pointer associated with specific 1888 device address. 1889* acc_is_present:: Indiciate whether host variable / array is 1890 present on device. 1891* acc_memcpy_to_device:: Copy host memory to device memory. 1892* acc_memcpy_from_device:: Copy device memory to host memory. 1893 1894API routines for target platforms. 1895 1896* acc_get_current_cuda_device:: Get CUDA device handle. 1897* acc_get_current_cuda_context::Get CUDA context handle. 1898* acc_get_cuda_stream:: Get CUDA stream handle. 1899* acc_set_cuda_stream:: Set CUDA stream handle. 1900@end menu 1901 1902 1903 1904@node acc_get_num_devices 1905@section @code{acc_get_num_devices} -- Get number of devices for given device type 1906@table @asis 1907@item @emph{Description} 1908This function returns a value indicating the number of devices available 1909for the device type specified in @var{devicetype}. 1910 1911@item @emph{C/C++}: 1912@multitable @columnfractions .20 .80 1913@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);} 1914@end multitable 1915 1916@item @emph{Fortran}: 1917@multitable @columnfractions .20 .80 1918@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)} 1919@item @tab @code{integer(kind=acc_device_kind) devicetype} 1920@end multitable 1921 1922@item @emph{Reference}: 1923@uref{https://www.openacc.org, OpenACC specification v2.0}, section 19243.2.1. 1925@end table 1926 1927 1928 1929@node acc_set_device_type 1930@section @code{acc_set_device_type} -- Set type of device accelerator to use. 1931@table @asis 1932@item @emph{Description} 1933This function indicates to the runtime library which device typr, specified 1934in @var{devicetype}, to use when executing a parallel or kernels region. 1935 1936@item @emph{C/C++}: 1937@multitable @columnfractions .20 .80 1938@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);} 1939@end multitable 1940 1941@item @emph{Fortran}: 1942@multitable @columnfractions .20 .80 1943@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)} 1944@item @tab @code{integer(kind=acc_device_kind) devicetype} 1945@end multitable 1946 1947@item @emph{Reference}: 1948@uref{https://www.openacc.org, OpenACC specification v2.0}, section 19493.2.2. 1950@end table 1951 1952 1953 1954@node acc_get_device_type 1955@section @code{acc_get_device_type} -- Get type of device accelerator to be used. 1956@table @asis 1957@item @emph{Description} 1958This function returns what device type will be used when executing a 1959parallel or kernels region. 1960 1961@item @emph{C/C++}: 1962@multitable @columnfractions .20 .80 1963@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);} 1964@end multitable 1965 1966@item @emph{Fortran}: 1967@multitable @columnfractions .20 .80 1968@item @emph{Interface}: @tab @code{function acc_get_device_type(void)} 1969@item @tab @code{integer(kind=acc_device_kind) acc_get_device_type} 1970@end multitable 1971 1972@item @emph{Reference}: 1973@uref{https://www.openacc.org, OpenACC specification v2.0}, section 19743.2.3. 1975@end table 1976 1977 1978 1979@node acc_set_device_num 1980@section @code{acc_set_device_num} -- Set device number to use. 1981@table @asis 1982@item @emph{Description} 1983This function will indicate to the runtime which device number, 1984specified by @var{num}, associated with the specifed device 1985type @var{devicetype}. 1986 1987@item @emph{C/C++}: 1988@multitable @columnfractions .20 .80 1989@item @emph{Prototype}: @tab @code{acc_set_device_num(int num, acc_device_t devicetype);} 1990@end multitable 1991 1992@item @emph{Fortran}: 1993@multitable @columnfractions .20 .80 1994@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)} 1995@item @tab @code{integer devicenum} 1996@item @tab @code{integer(kind=acc_device_kind) devicetype} 1997@end multitable 1998 1999@item @emph{Reference}: 2000@uref{https://www.openacc.org, OpenACC specification v2.0}, section 20013.2.4. 2002@end table 2003 2004 2005 2006@node acc_get_device_num 2007@section @code{acc_get_device_num} -- Get device number to be used. 2008@table @asis 2009@item @emph{Description} 2010This function returns which device number associated with the specified device 2011type @var{devicetype}, will be used when executing a parallel or kernels 2012region. 2013 2014@item @emph{C/C++}: 2015@multitable @columnfractions .20 .80 2016@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);} 2017@end multitable 2018 2019@item @emph{Fortran}: 2020@multitable @columnfractions .20 .80 2021@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)} 2022@item @tab @code{integer(kind=acc_device_kind) devicetype} 2023@item @tab @code{integer acc_get_device_num} 2024@end multitable 2025 2026@item @emph{Reference}: 2027@uref{https://www.openacc.org, OpenACC specification v2.0}, section 20283.2.5. 2029@end table 2030 2031 2032 2033@node acc_async_test 2034@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation. 2035@table @asis 2036@item @emph{Description} 2037This function tests for completion of the asynchrounous operation specified 2038in @var{arg}. In C/C++, a non-zero value will be returned to indicate 2039the specified asynchronous operation has completed. While Fortran will return 2040a @code{true}. If the asynchrounous operation has not completed, C/C++ returns 2041a zero and Fortran returns a @code{false}. 2042 2043@item @emph{C/C++}: 2044@multitable @columnfractions .20 .80 2045@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);} 2046@end multitable 2047 2048@item @emph{Fortran}: 2049@multitable @columnfractions .20 .80 2050@item @emph{Interface}: @tab @code{function acc_async_test(arg)} 2051@item @tab @code{integer(kind=acc_handle_kind) arg} 2052@item @tab @code{logical acc_async_test} 2053@end multitable 2054 2055@item @emph{Reference}: 2056@uref{https://www.openacc.org, OpenACC specification v2.0}, section 20573.2.6. 2058@end table 2059 2060 2061 2062@node acc_async_test_all 2063@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations. 2064@table @asis 2065@item @emph{Description} 2066This function tests for completion of all asynchrounous operations. 2067In C/C++, a non-zero value will be returned to indicate all asynchronous 2068operations have completed. While Fortran will return a @code{true}. If 2069any asynchronous operation has not completed, C/C++ returns a zero and 2070Fortran returns a @code{false}. 2071 2072@item @emph{C/C++}: 2073@multitable @columnfractions .20 .80 2074@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);} 2075@end multitable 2076 2077@item @emph{Fortran}: 2078@multitable @columnfractions .20 .80 2079@item @emph{Interface}: @tab @code{function acc_async_test()} 2080@item @tab @code{logical acc_get_device_num} 2081@end multitable 2082 2083@item @emph{Reference}: 2084@uref{https://www.openacc.org, OpenACC specification v2.0}, section 20853.2.7. 2086@end table 2087 2088 2089 2090@node acc_wait 2091@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation. 2092@table @asis 2093@item @emph{Description} 2094This function waits for completion of the asynchronous operation 2095specified in @var{arg}. 2096 2097@item @emph{C/C++}: 2098@multitable @columnfractions .20 .80 2099@item @emph{Prototype}: @tab @code{acc_wait(arg);} 2100@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);} 2101@end multitable 2102 2103@item @emph{Fortran}: 2104@multitable @columnfractions .20 .80 2105@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)} 2106@item @tab @code{integer(acc_handle_kind) arg} 2107@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)} 2108@item @tab @code{integer(acc_handle_kind) arg} 2109@end multitable 2110 2111@item @emph{Reference}: 2112@uref{https://www.openacc.org, OpenACC specification v2.0}, section 21133.2.8. 2114@end table 2115 2116 2117 2118@node acc_wait_all 2119@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations. 2120@table @asis 2121@item @emph{Description} 2122This function waits for the completion of all asynchronous operations. 2123 2124@item @emph{C/C++}: 2125@multitable @columnfractions .20 .80 2126@item @emph{Prototype}: @tab @code{acc_wait_all(void);} 2127@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);} 2128@end multitable 2129 2130@item @emph{Fortran}: 2131@multitable @columnfractions .20 .80 2132@item @emph{Interface}: @tab @code{subroutine acc_wait_all()} 2133@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()} 2134@end multitable 2135 2136@item @emph{Reference}: 2137@uref{https://www.openacc.org, OpenACC specification v2.0}, section 21383.2.10. 2139@end table 2140 2141 2142 2143@node acc_wait_all_async 2144@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations. 2145@table @asis 2146@item @emph{Description} 2147This function enqueues a wait operation on the queue @var{async} for any 2148and all asynchronous operations that have been previously enqueued on 2149any queue. 2150 2151@item @emph{C/C++}: 2152@multitable @columnfractions .20 .80 2153@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);} 2154@end multitable 2155 2156@item @emph{Fortran}: 2157@multitable @columnfractions .20 .80 2158@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)} 2159@item @tab @code{integer(acc_handle_kind) async} 2160@end multitable 2161 2162@item @emph{Reference}: 2163@uref{https://www.openacc.org, OpenACC specification v2.0}, section 21643.2.11. 2165@end table 2166 2167 2168 2169@node acc_wait_async 2170@section @code{acc_wait_async} -- Wait for completion of asynchronous operations. 2171@table @asis 2172@item @emph{Description} 2173This function enqueues a wait operation on queue @var{async} for any and all 2174asynchronous operations enqueued on queue @var{arg}. 2175 2176@item @emph{C/C++}: 2177@multitable @columnfractions .20 .80 2178@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);} 2179@end multitable 2180 2181@item @emph{Fortran}: 2182@multitable @columnfractions .20 .80 2183@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)} 2184@item @tab @code{integer(acc_handle_kind) arg, async} 2185@end multitable 2186 2187@item @emph{Reference}: 2188@uref{https://www.openacc.org, OpenACC specification v2.0}, section 21893.2.9. 2190@end table 2191 2192 2193 2194@node acc_init 2195@section @code{acc_init} -- Initialize runtime for a specific device type. 2196@table @asis 2197@item @emph{Description} 2198This function initializes the runtime for the device type specified in 2199@var{devicetype}. 2200 2201@item @emph{C/C++}: 2202@multitable @columnfractions .20 .80 2203@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);} 2204@end multitable 2205 2206@item @emph{Fortran}: 2207@multitable @columnfractions .20 .80 2208@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)} 2209@item @tab @code{integer(acc_device_kind) devicetype} 2210@end multitable 2211 2212@item @emph{Reference}: 2213@uref{https://www.openacc.org, OpenACC specification v2.0}, section 22143.2.12. 2215@end table 2216 2217 2218 2219@node acc_shutdown 2220@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type. 2221@table @asis 2222@item @emph{Description} 2223This function shuts down the runtime for the device type specified in 2224@var{devicetype}. 2225 2226@item @emph{C/C++}: 2227@multitable @columnfractions .20 .80 2228@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);} 2229@end multitable 2230 2231@item @emph{Fortran}: 2232@multitable @columnfractions .20 .80 2233@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)} 2234@item @tab @code{integer(acc_device_kind) devicetype} 2235@end multitable 2236 2237@item @emph{Reference}: 2238@uref{https://www.openacc.org, OpenACC specification v2.0}, section 22393.2.13. 2240@end table 2241 2242 2243 2244@node acc_on_device 2245@section @code{acc_on_device} -- Whether executing on a particular device 2246@table @asis 2247@item @emph{Description}: 2248This function returns whether the program is executing on a particular 2249device specified in @var{devicetype}. In C/C++ a non-zero value is 2250returned to indicate the device is execiting on the specified device type. 2251In Fortran, @code{true} will be returned. If the program is not executing 2252on the specified device type C/C++ will return a zero, while Fortran will 2253return @code{false}. 2254 2255@item @emph{C/C++}: 2256@multitable @columnfractions .20 .80 2257@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);} 2258@end multitable 2259 2260@item @emph{Fortran}: 2261@multitable @columnfractions .20 .80 2262@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)} 2263@item @tab @code{integer(acc_device_kind) devicetype} 2264@item @tab @code{logical acc_on_device} 2265@end multitable 2266 2267 2268@item @emph{Reference}: 2269@uref{https://www.openacc.org, OpenACC specification v2.0}, section 22703.2.14. 2271@end table 2272 2273 2274 2275@node acc_malloc 2276@section @code{acc_malloc} -- Allocate device memory. 2277@table @asis 2278@item @emph{Description} 2279This function allocates @var{len} bytes of device memory. It returns 2280the device address of the allocated memory. 2281 2282@item @emph{C/C++}: 2283@multitable @columnfractions .20 .80 2284@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);} 2285@end multitable 2286 2287@item @emph{Reference}: 2288@uref{https://www.openacc.org, OpenACC specification v2.0}, section 22893.2.15. 2290@end table 2291 2292 2293 2294@node acc_free 2295@section @code{acc_free} -- Free device memory. 2296@table @asis 2297@item @emph{Description} 2298Free previously allocated device memory at the device address @code{a}. 2299 2300@item @emph{C/C++}: 2301@multitable @columnfractions .20 .80 2302@item @emph{Prototype}: @tab @code{acc_free(d_void *a);} 2303@end multitable 2304 2305@item @emph{Reference}: 2306@uref{https://www.openacc.org, OpenACC specification v2.0}, section 23073.2.16. 2308@end table 2309 2310 2311 2312@node acc_copyin 2313@section @code{acc_copyin} -- Allocate device memory and copy host memory to it. 2314@table @asis 2315@item @emph{Description} 2316In C/C++, this function allocates @var{len} bytes of device memory 2317and maps it to the specified host address in @var{a}. The device 2318address of the newly allocated device memory is returned. 2319 2320In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2321a contiguous array section. The second form @var{a} specifies a 2322variable or array element and @var{len} specifies the length in bytes. 2323 2324@item @emph{C/C++}: 2325@multitable @columnfractions .20 .80 2326@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);} 2327@end multitable 2328 2329@item @emph{Fortran}: 2330@multitable @columnfractions .20 .80 2331@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)} 2332@item @tab @code{type, dimension(:[,:]...) :: a} 2333@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)} 2334@item @tab @code{type, dimension(:[,:]...) :: a} 2335@item @tab @code{integer len} 2336@end multitable 2337 2338@item @emph{Reference}: 2339@uref{https://www.openacc.org, OpenACC specification v2.0}, section 23403.2.17. 2341@end table 2342 2343 2344 2345@node acc_present_or_copyin 2346@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory. 2347@table @asis 2348@item @emph{Description} 2349This function tests if the host data specifed by @var{a} and of length 2350@var{len} is present or not. If it is not present, then device memory 2351will be allocated and the host memory copied. The device address of 2352the newly allocated device memory is returned. 2353 2354In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2355a contiguous array section. The second form @var{a} specifies a variable or 2356array element and @var{len} specifies the length in bytes. 2357 2358@item @emph{C/C++}: 2359@multitable @columnfractions .20 .80 2360@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);} 2361@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);} 2362@end multitable 2363 2364@item @emph{Fortran}: 2365@multitable @columnfractions .20 .80 2366@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)} 2367@item @tab @code{type, dimension(:[,:]...) :: a} 2368@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)} 2369@item @tab @code{type, dimension(:[,:]...) :: a} 2370@item @tab @code{integer len} 2371@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)} 2372@item @tab @code{type, dimension(:[,:]...) :: a} 2373@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)} 2374@item @tab @code{type, dimension(:[,:]...) :: a} 2375@item @tab @code{integer len} 2376@end multitable 2377 2378@item @emph{Reference}: 2379@uref{https://www.openacc.org, OpenACC specification v2.0}, section 23803.2.18. 2381@end table 2382 2383 2384 2385@node acc_create 2386@section @code{acc_create} -- Allocate device memory and map it to host memory. 2387@table @asis 2388@item @emph{Description} 2389This function allocates device memory and maps it to host memory specified 2390by the host address @var{a} with a length of @var{len} bytes. In C/C++, 2391the function returns the device address of the allocated device memory. 2392 2393In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2394a contiguous array section. The second form @var{a} specifies a variable or 2395array element and @var{len} specifies the length in bytes. 2396 2397@item @emph{C/C++}: 2398@multitable @columnfractions .20 .80 2399@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);} 2400@end multitable 2401 2402@item @emph{Fortran}: 2403@multitable @columnfractions .20 .80 2404@item @emph{Interface}: @tab @code{subroutine acc_create(a)} 2405@item @tab @code{type, dimension(:[,:]...) :: a} 2406@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)} 2407@item @tab @code{type, dimension(:[,:]...) :: a} 2408@item @tab @code{integer len} 2409@end multitable 2410 2411@item @emph{Reference}: 2412@uref{https://www.openacc.org, OpenACC specification v2.0}, section 24133.2.19. 2414@end table 2415 2416 2417 2418@node acc_present_or_create 2419@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory. 2420@table @asis 2421@item @emph{Description} 2422This function tests if the host data specifed by @var{a} and of length 2423@var{len} is present or not. If it is not present, then device memory 2424will be allocated and mapped to host memory. In C/C++, the device address 2425of the newly allocated device memory is returned. 2426 2427In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2428a contiguous array section. The second form @var{a} specifies a variable or 2429array element and @var{len} specifies the length in bytes. 2430 2431 2432@item @emph{C/C++}: 2433@multitable @columnfractions .20 .80 2434@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)} 2435@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)} 2436@end multitable 2437 2438@item @emph{Fortran}: 2439@multitable @columnfractions .20 .80 2440@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)} 2441@item @tab @code{type, dimension(:[,:]...) :: a} 2442@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)} 2443@item @tab @code{type, dimension(:[,:]...) :: a} 2444@item @tab @code{integer len} 2445@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)} 2446@item @tab @code{type, dimension(:[,:]...) :: a} 2447@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)} 2448@item @tab @code{type, dimension(:[,:]...) :: a} 2449@item @tab @code{integer len} 2450@end multitable 2451 2452@item @emph{Reference}: 2453@uref{https://www.openacc.org, OpenACC specification v2.0}, section 24543.2.20. 2455@end table 2456 2457 2458 2459@node acc_copyout 2460@section @code{acc_copyout} -- Copy device memory to host memory. 2461@table @asis 2462@item @emph{Description} 2463This function copies mapped device memory to host memory which is specified 2464by host address @var{a} for a length @var{len} bytes in C/C++. 2465 2466In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2467a contiguous array section. The second form @var{a} specifies a variable or 2468array element and @var{len} specifies the length in bytes. 2469 2470@item @emph{C/C++}: 2471@multitable @columnfractions .20 .80 2472@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);} 2473@end multitable 2474 2475@item @emph{Fortran}: 2476@multitable @columnfractions .20 .80 2477@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)} 2478@item @tab @code{type, dimension(:[,:]...) :: a} 2479@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)} 2480@item @tab @code{type, dimension(:[,:]...) :: a} 2481@item @tab @code{integer len} 2482@end multitable 2483 2484@item @emph{Reference}: 2485@uref{https://www.openacc.org, OpenACC specification v2.0}, section 24863.2.21. 2487@end table 2488 2489 2490 2491@node acc_delete 2492@section @code{acc_delete} -- Free device memory. 2493@table @asis 2494@item @emph{Description} 2495This function frees previously allocated device memory specified by 2496the device address @var{a} and the length of @var{len} bytes. 2497 2498In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2499a contiguous array section. The second form @var{a} specifies a variable or 2500array element and @var{len} specifies the length in bytes. 2501 2502@item @emph{C/C++}: 2503@multitable @columnfractions .20 .80 2504@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);} 2505@end multitable 2506 2507@item @emph{Fortran}: 2508@multitable @columnfractions .20 .80 2509@item @emph{Interface}: @tab @code{subroutine acc_delete(a)} 2510@item @tab @code{type, dimension(:[,:]...) :: a} 2511@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)} 2512@item @tab @code{type, dimension(:[,:]...) :: a} 2513@item @tab @code{integer len} 2514@end multitable 2515 2516@item @emph{Reference}: 2517@uref{https://www.openacc.org, OpenACC specification v2.0}, section 25183.2.22. 2519@end table 2520 2521 2522 2523@node acc_update_device 2524@section @code{acc_update_device} -- Update device memory from mapped host memory. 2525@table @asis 2526@item @emph{Description} 2527This function updates the device copy from the previously mapped host memory. 2528The host memory is specified with the host address @var{a} and a length of 2529@var{len} bytes. 2530 2531In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2532a contiguous array section. The second form @var{a} specifies a variable or 2533array element and @var{len} specifies the length in bytes. 2534 2535@item @emph{C/C++}: 2536@multitable @columnfractions .20 .80 2537@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);} 2538@end multitable 2539 2540@item @emph{Fortran}: 2541@multitable @columnfractions .20 .80 2542@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)} 2543@item @tab @code{type, dimension(:[,:]...) :: a} 2544@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)} 2545@item @tab @code{type, dimension(:[,:]...) :: a} 2546@item @tab @code{integer len} 2547@end multitable 2548 2549@item @emph{Reference}: 2550@uref{https://www.openacc.org, OpenACC specification v2.0}, section 25513.2.23. 2552@end table 2553 2554 2555 2556@node acc_update_self 2557@section @code{acc_update_self} -- Update host memory from mapped device memory. 2558@table @asis 2559@item @emph{Description} 2560This function updates the host copy from the previously mapped device memory. 2561The host memory is specified with the host address @var{a} and a length of 2562@var{len} bytes. 2563 2564In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2565a contiguous array section. The second form @var{a} specifies a variable or 2566array element and @var{len} specifies the length in bytes. 2567 2568@item @emph{C/C++}: 2569@multitable @columnfractions .20 .80 2570@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);} 2571@end multitable 2572 2573@item @emph{Fortran}: 2574@multitable @columnfractions .20 .80 2575@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)} 2576@item @tab @code{type, dimension(:[,:]...) :: a} 2577@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)} 2578@item @tab @code{type, dimension(:[,:]...) :: a} 2579@item @tab @code{integer len} 2580@end multitable 2581 2582@item @emph{Reference}: 2583@uref{https://www.openacc.org, OpenACC specification v2.0}, section 25843.2.24. 2585@end table 2586 2587 2588 2589@node acc_map_data 2590@section @code{acc_map_data} -- Map previously allocated device memory to host memory. 2591@table @asis 2592@item @emph{Description} 2593This function maps previously allocated device and host memory. The device 2594memory is specified with the device address @var{d}. The host memory is 2595specified with the host address @var{h} and a length of @var{len}. 2596 2597@item @emph{C/C++}: 2598@multitable @columnfractions .20 .80 2599@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);} 2600@end multitable 2601 2602@item @emph{Reference}: 2603@uref{https://www.openacc.org, OpenACC specification v2.0}, section 26043.2.25. 2605@end table 2606 2607 2608 2609@node acc_unmap_data 2610@section @code{acc_unmap_data} -- Unmap device memory from host memory. 2611@table @asis 2612@item @emph{Description} 2613This function unmaps previously mapped device and host memory. The latter 2614specified by @var{h}. 2615 2616@item @emph{C/C++}: 2617@multitable @columnfractions .20 .80 2618@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);} 2619@end multitable 2620 2621@item @emph{Reference}: 2622@uref{https://www.openacc.org, OpenACC specification v2.0}, section 26233.2.26. 2624@end table 2625 2626 2627 2628@node acc_deviceptr 2629@section @code{acc_deviceptr} -- Get device pointer associated with specific host address. 2630@table @asis 2631@item @emph{Description} 2632This function returns the device address that has been mapped to the 2633host address specified by @var{h}. 2634 2635@item @emph{C/C++}: 2636@multitable @columnfractions .20 .80 2637@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);} 2638@end multitable 2639 2640@item @emph{Reference}: 2641@uref{https://www.openacc.org, OpenACC specification v2.0}, section 26423.2.27. 2643@end table 2644 2645 2646 2647@node acc_hostptr 2648@section @code{acc_hostptr} -- Get host pointer associated with specific device address. 2649@table @asis 2650@item @emph{Description} 2651This function returns the host address that has been mapped to the 2652device address specified by @var{d}. 2653 2654@item @emph{C/C++}: 2655@multitable @columnfractions .20 .80 2656@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);} 2657@end multitable 2658 2659@item @emph{Reference}: 2660@uref{https://www.openacc.org, OpenACC specification v2.0}, section 26613.2.28. 2662@end table 2663 2664 2665 2666@node acc_is_present 2667@section @code{acc_is_present} -- Indicate whether host variable / array is present on device. 2668@table @asis 2669@item @emph{Description} 2670This function indicates whether the specified host address in @var{a} and a 2671length of @var{len} bytes is present on the device. In C/C++, a non-zero 2672value is returned to indicate the presence of the mapped memory on the 2673device. A zero is returned to indicate the memory is not mapped on the 2674device. 2675 2676In Fortran, two (2) forms are supported. In the first form, @var{a} specifies 2677a contiguous array section. The second form @var{a} specifies a variable or 2678array element and @var{len} specifies the length in bytes. If the host 2679memory is mapped to device memory, then a @code{true} is returned. Otherwise, 2680a @code{false} is return to indicate the mapped memory is not present. 2681 2682@item @emph{C/C++}: 2683@multitable @columnfractions .20 .80 2684@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);} 2685@end multitable 2686 2687@item @emph{Fortran}: 2688@multitable @columnfractions .20 .80 2689@item @emph{Interface}: @tab @code{function acc_is_present(a)} 2690@item @tab @code{type, dimension(:[,:]...) :: a} 2691@item @tab @code{logical acc_is_present} 2692@item @emph{Interface}: @tab @code{function acc_is_present(a, len)} 2693@item @tab @code{type, dimension(:[,:]...) :: a} 2694@item @tab @code{integer len} 2695@item @tab @code{logical acc_is_present} 2696@end multitable 2697 2698@item @emph{Reference}: 2699@uref{https://www.openacc.org, OpenACC specification v2.0}, section 27003.2.29. 2701@end table 2702 2703 2704 2705@node acc_memcpy_to_device 2706@section @code{acc_memcpy_to_device} -- Copy host memory to device memory. 2707@table @asis 2708@item @emph{Description} 2709This function copies host memory specified by host address of @var{src} to 2710device memory specified by the device address @var{dest} for a length of 2711@var{bytes} bytes. 2712 2713@item @emph{C/C++}: 2714@multitable @columnfractions .20 .80 2715@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);} 2716@end multitable 2717 2718@item @emph{Reference}: 2719@uref{https://www.openacc.org, OpenACC specification v2.0}, section 27203.2.30. 2721@end table 2722 2723 2724 2725@node acc_memcpy_from_device 2726@section @code{acc_memcpy_from_device} -- Copy device memory to host memory. 2727@table @asis 2728@item @emph{Description} 2729This function copies host memory specified by host address of @var{src} from 2730device memory specified by the device address @var{dest} for a length of 2731@var{bytes} bytes. 2732 2733@item @emph{C/C++}: 2734@multitable @columnfractions .20 .80 2735@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);} 2736@end multitable 2737 2738@item @emph{Reference}: 2739@uref{https://www.openacc.org, OpenACC specification v2.0}, section 27403.2.31. 2741@end table 2742 2743 2744 2745@node acc_get_current_cuda_device 2746@section @code{acc_get_current_cuda_device} -- Get CUDA device handle. 2747@table @asis 2748@item @emph{Description} 2749This function returns the CUDA device handle. This handle is the same 2750as used by the CUDA Runtime or Driver API's. 2751 2752@item @emph{C/C++}: 2753@multitable @columnfractions .20 .80 2754@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);} 2755@end multitable 2756 2757@item @emph{Reference}: 2758@uref{https://www.openacc.org, OpenACC specification v2.0}, section 2759A.2.1.1. 2760@end table 2761 2762 2763 2764@node acc_get_current_cuda_context 2765@section @code{acc_get_current_cuda_context} -- Get CUDA context handle. 2766@table @asis 2767@item @emph{Description} 2768This function returns the CUDA context handle. This handle is the same 2769as used by the CUDA Runtime or Driver API's. 2770 2771@item @emph{C/C++}: 2772@multitable @columnfractions .20 .80 2773@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);} 2774@end multitable 2775 2776@item @emph{Reference}: 2777@uref{https://www.openacc.org, OpenACC specification v2.0}, section 2778A.2.1.2. 2779@end table 2780 2781 2782 2783@node acc_get_cuda_stream 2784@section @code{acc_get_cuda_stream} -- Get CUDA stream handle. 2785@table @asis 2786@item @emph{Description} 2787This function returns the CUDA stream handle for the queue @var{async}. 2788This handle is the same as used by the CUDA Runtime or Driver API's. 2789 2790@item @emph{C/C++}: 2791@multitable @columnfractions .20 .80 2792@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);} 2793@end multitable 2794 2795@item @emph{Reference}: 2796@uref{https://www.openacc.org, OpenACC specification v2.0}, section 2797A.2.1.3. 2798@end table 2799 2800 2801 2802@node acc_set_cuda_stream 2803@section @code{acc_set_cuda_stream} -- Set CUDA stream handle. 2804@table @asis 2805@item @emph{Description} 2806This function associates the stream handle specified by @var{stream} with 2807the queue @var{async}. 2808 2809This cannot be used to change the stream handle associated with 2810@code{acc_async_sync}. 2811 2812The return value is not specified. 2813 2814@item @emph{C/C++}: 2815@multitable @columnfractions .20 .80 2816@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);} 2817@end multitable 2818 2819@item @emph{Reference}: 2820@uref{https://www.openacc.org, OpenACC specification v2.0}, section 2821A.2.1.4. 2822@end table 2823 2824 2825 2826@c --------------------------------------------------------------------- 2827@c OpenACC Environment Variables 2828@c --------------------------------------------------------------------- 2829 2830@node OpenACC Environment Variables 2831@chapter OpenACC Environment Variables 2832 2833The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} 2834are defined by section 4 of the OpenACC specification in version 2.0. 2835The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes. 2836 2837@menu 2838* ACC_DEVICE_TYPE:: 2839* ACC_DEVICE_NUM:: 2840* GCC_ACC_NOTIFY:: 2841@end menu 2842 2843 2844 2845@node ACC_DEVICE_TYPE 2846@section @code{ACC_DEVICE_TYPE} 2847@table @asis 2848@item @emph{Reference}: 2849@uref{https://www.openacc.org, OpenACC specification v2.0}, section 28504.1. 2851@end table 2852 2853 2854 2855@node ACC_DEVICE_NUM 2856@section @code{ACC_DEVICE_NUM} 2857@table @asis 2858@item @emph{Reference}: 2859@uref{https://www.openacc.org, OpenACC specification v2.0}, section 28604.2. 2861@end table 2862 2863 2864 2865@node GCC_ACC_NOTIFY 2866@section @code{GCC_ACC_NOTIFY} 2867@table @asis 2868@item @emph{Description}: 2869Print debug information pertaining to the accelerator. 2870@end table 2871 2872 2873 2874@c --------------------------------------------------------------------- 2875@c CUDA Streams Usage 2876@c --------------------------------------------------------------------- 2877 2878@node CUDA Streams Usage 2879@chapter CUDA Streams Usage 2880 2881This applies to the @code{nvptx} plugin only. 2882 2883The library provides elements that perform asynchronous movement of 2884data and asynchronous operation of computing constructs. This 2885asynchronous functionality is implemented by making use of CUDA 2886streams@footnote{See "Stream Management" in "CUDA Driver API", 2887TRM-06703-001, Version 5.5, for additional information}. 2888 2889The primary means by that the asychronous functionality is accessed 2890is through the use of those OpenACC directives which make use of the 2891@code{async} and @code{wait} clauses. When the @code{async} clause is 2892first used with a directive, it creates a CUDA stream. If an 2893@code{async-argument} is used with the @code{async} clause, then the 2894stream is associated with the specified @code{async-argument}. 2895 2896Following the creation of an association between a CUDA stream and the 2897@code{async-argument} of an @code{async} clause, both the @code{wait} 2898clause and the @code{wait} directive can be used. When either the 2899clause or directive is used after stream creation, it creates a 2900rendezvous point whereby execution waits until all operations 2901associated with the @code{async-argument}, that is, stream, have 2902completed. 2903 2904Normally, the management of the streams that are created as a result of 2905using the @code{async} clause, is done without any intervention by the 2906caller. This implies the association between the @code{async-argument} 2907and the CUDA stream will be maintained for the lifetime of the program. 2908However, this association can be changed through the use of the library 2909function @code{acc_set_cuda_stream}. When the function 2910@code{acc_set_cuda_stream} is called, the CUDA stream that was 2911originally associated with the @code{async} clause will be destroyed. 2912Caution should be taken when changing the association as subsequent 2913references to the @code{async-argument} refer to a different 2914CUDA stream. 2915 2916 2917 2918@c --------------------------------------------------------------------- 2919@c OpenACC Library Interoperability 2920@c --------------------------------------------------------------------- 2921 2922@node OpenACC Library Interoperability 2923@chapter OpenACC Library Interoperability 2924 2925@section Introduction 2926 2927The OpenACC library uses the CUDA Driver API, and may interact with 2928programs that use the Runtime library directly, or another library 2929based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26, 2930"Interactions with the CUDA Driver API" in 2931"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU 2932Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5, 2933for additional information on library interoperability.}. 2934This chapter describes the use cases and what changes are 2935required in order to use both the OpenACC library and the CUBLAS and Runtime 2936libraries within a program. 2937 2938@section First invocation: NVIDIA CUBLAS library API 2939 2940In this first use case (see below), a function in the CUBLAS library is called 2941prior to any of the functions in the OpenACC library. More specifically, the 2942function @code{cublasCreate()}. 2943 2944When invoked, the function initializes the library and allocates the 2945hardware resources on the host and the device on behalf of the caller. Once 2946the initialization and allocation has completed, a handle is returned to the 2947caller. The OpenACC library also requires initialization and allocation of 2948hardware resources. Since the CUBLAS library has already allocated the 2949hardware resources for the device, all that is left to do is to initialize 2950the OpenACC library and acquire the hardware resources on the host. 2951 2952Prior to calling the OpenACC function that initializes the library and 2953allocate the host hardware resources, you need to acquire the device number 2954that was allocated during the call to @code{cublasCreate()}. The invoking of the 2955runtime library function @code{cudaGetDevice()} accomplishes this. Once 2956acquired, the device number is passed along with the device type as 2957parameters to the OpenACC library function @code{acc_set_device_num()}. 2958 2959Once the call to @code{acc_set_device_num()} has completed, the OpenACC 2960library uses the context that was created during the call to 2961@code{cublasCreate()}. In other words, both libraries will be sharing the 2962same context. 2963 2964@smallexample 2965 /* Create the handle */ 2966 s = cublasCreate(&h); 2967 if (s != CUBLAS_STATUS_SUCCESS) 2968 @{ 2969 fprintf(stderr, "cublasCreate failed %d\n", s); 2970 exit(EXIT_FAILURE); 2971 @} 2972 2973 /* Get the device number */ 2974 e = cudaGetDevice(&dev); 2975 if (e != cudaSuccess) 2976 @{ 2977 fprintf(stderr, "cudaGetDevice failed %d\n", e); 2978 exit(EXIT_FAILURE); 2979 @} 2980 2981 /* Initialize OpenACC library and use device 'dev' */ 2982 acc_set_device_num(dev, acc_device_nvidia); 2983 2984@end smallexample 2985@center Use Case 1 2986 2987@section First invocation: OpenACC library API 2988 2989In this second use case (see below), a function in the OpenACC library is 2990called prior to any of the functions in the CUBLAS library. More specificially, 2991the function @code{acc_set_device_num()}. 2992 2993In the use case presented here, the function @code{acc_set_device_num()} 2994is used to both initialize the OpenACC library and allocate the hardware 2995resources on the host and the device. In the call to the function, the 2996call parameters specify which device to use and what device 2997type to use, i.e., @code{acc_device_nvidia}. It should be noted that this 2998is but one method to initialize the OpenACC library and allocate the 2999appropriate hardware resources. Other methods are available through the 3000use of environment variables and these will be discussed in the next section. 3001 3002Once the call to @code{acc_set_device_num()} has completed, other OpenACC 3003functions can be called as seen with multiple calls being made to 3004@code{acc_copyin()}. In addition, calls can be made to functions in the 3005CUBLAS library. In the use case a call to @code{cublasCreate()} is made 3006subsequent to the calls to @code{acc_copyin()}. 3007As seen in the previous use case, a call to @code{cublasCreate()} 3008initializes the CUBLAS library and allocates the hardware resources on the 3009host and the device. However, since the device has already been allocated, 3010@code{cublasCreate()} will only initialize the CUBLAS library and allocate 3011the appropriate hardware resources on the host. The context that was created 3012as part of the OpenACC initialization is shared with the CUBLAS library, 3013similarly to the first use case. 3014 3015@smallexample 3016 dev = 0; 3017 3018 acc_set_device_num(dev, acc_device_nvidia); 3019 3020 /* Copy the first set to the device */ 3021 d_X = acc_copyin(&h_X[0], N * sizeof (float)); 3022 if (d_X == NULL) 3023 @{ 3024 fprintf(stderr, "copyin error h_X\n"); 3025 exit(EXIT_FAILURE); 3026 @} 3027 3028 /* Copy the second set to the device */ 3029 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float)); 3030 if (d_Y == NULL) 3031 @{ 3032 fprintf(stderr, "copyin error h_Y1\n"); 3033 exit(EXIT_FAILURE); 3034 @} 3035 3036 /* Create the handle */ 3037 s = cublasCreate(&h); 3038 if (s != CUBLAS_STATUS_SUCCESS) 3039 @{ 3040 fprintf(stderr, "cublasCreate failed %d\n", s); 3041 exit(EXIT_FAILURE); 3042 @} 3043 3044 /* Perform saxpy using CUBLAS library function */ 3045 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1); 3046 if (s != CUBLAS_STATUS_SUCCESS) 3047 @{ 3048 fprintf(stderr, "cublasSaxpy failed %d\n", s); 3049 exit(EXIT_FAILURE); 3050 @} 3051 3052 /* Copy the results from the device */ 3053 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float)); 3054 3055@end smallexample 3056@center Use Case 2 3057 3058@section OpenACC library and environment variables 3059 3060There are two environment variables associated with the OpenACC library 3061that may be used to control the device type and device number: 3062@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respecively. These two 3063environement variables can be used as an alternative to calling 3064@code{acc_set_device_num()}. As seen in the second use case, the device 3065type and device number were specified using @code{acc_set_device_num()}. 3066If however, the aforementioned environment variables were set, then the 3067call to @code{acc_set_device_num()} would not be required. 3068 3069 3070The use of the environment variables is only relevant when an OpenACC function 3071is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()} 3072is called prior to a call to an OpenACC function, then you must call 3073@code{acc_set_device_num()}@footnote{More complete information 3074about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in 3075sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC} 3076Application Programming Interface”, Version 2.0.} 3077 3078 3079 3080@c --------------------------------------------------------------------- 3081@c The libgomp ABI 3082@c --------------------------------------------------------------------- 3083 3084@node The libgomp ABI 3085@chapter The libgomp ABI 3086 3087The following sections present notes on the external ABI as 3088presented by libgomp. Only maintainers should need them. 3089 3090@menu 3091* Implementing MASTER construct:: 3092* Implementing CRITICAL construct:: 3093* Implementing ATOMIC construct:: 3094* Implementing FLUSH construct:: 3095* Implementing BARRIER construct:: 3096* Implementing THREADPRIVATE construct:: 3097* Implementing PRIVATE clause:: 3098* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses:: 3099* Implementing REDUCTION clause:: 3100* Implementing PARALLEL construct:: 3101* Implementing FOR construct:: 3102* Implementing ORDERED construct:: 3103* Implementing SECTIONS construct:: 3104* Implementing SINGLE construct:: 3105* Implementing OpenACC's PARALLEL construct:: 3106@end menu 3107 3108 3109@node Implementing MASTER construct 3110@section Implementing MASTER construct 3111 3112@smallexample 3113if (omp_get_thread_num () == 0) 3114 block 3115@end smallexample 3116 3117Alternately, we generate two copies of the parallel subfunction 3118and only include this in the version run by the master thread. 3119Surely this is not worthwhile though... 3120 3121 3122 3123@node Implementing CRITICAL construct 3124@section Implementing CRITICAL construct 3125 3126Without a specified name, 3127 3128@smallexample 3129 void GOMP_critical_start (void); 3130 void GOMP_critical_end (void); 3131@end smallexample 3132 3133so that we don't get COPY relocations from libgomp to the main 3134application. 3135 3136With a specified name, use omp_set_lock and omp_unset_lock with 3137name being transformed into a variable declared like 3138 3139@smallexample 3140 omp_lock_t gomp_critical_user_<name> __attribute__((common)) 3141@end smallexample 3142 3143Ideally the ABI would specify that all zero is a valid unlocked 3144state, and so we wouldn't need to initialize this at 3145startup. 3146 3147 3148 3149@node Implementing ATOMIC construct 3150@section Implementing ATOMIC construct 3151 3152The target should implement the @code{__sync} builtins. 3153 3154Failing that we could add 3155 3156@smallexample 3157 void GOMP_atomic_enter (void) 3158 void GOMP_atomic_exit (void) 3159@end smallexample 3160 3161which reuses the regular lock code, but with yet another lock 3162object private to the library. 3163 3164 3165 3166@node Implementing FLUSH construct 3167@section Implementing FLUSH construct 3168 3169Expands to the @code{__sync_synchronize} builtin. 3170 3171 3172 3173@node Implementing BARRIER construct 3174@section Implementing BARRIER construct 3175 3176@smallexample 3177 void GOMP_barrier (void) 3178@end smallexample 3179 3180 3181@node Implementing THREADPRIVATE construct 3182@section Implementing THREADPRIVATE construct 3183 3184In _most_ cases we can map this directly to @code{__thread}. Except 3185that OMP allows constructors for C++ objects. We can either 3186refuse to support this (how often is it used?) or we can 3187implement something akin to .ctors. 3188 3189Even more ideally, this ctor feature is handled by extensions 3190to the main pthreads library. Failing that, we can have a set 3191of entry points to register ctor functions to be called. 3192 3193 3194 3195@node Implementing PRIVATE clause 3196@section Implementing PRIVATE clause 3197 3198In association with a PARALLEL, or within the lexical extent 3199of a PARALLEL block, the variable becomes a local variable in 3200the parallel subfunction. 3201 3202In association with FOR or SECTIONS blocks, create a new 3203automatic variable within the current function. This preserves 3204the semantic of new variable creation. 3205 3206 3207 3208@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses 3209@section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses 3210 3211This seems simple enough for PARALLEL blocks. Create a private 3212struct for communicating between the parent and subfunction. 3213In the parent, copy in values for scalar and "small" structs; 3214copy in addresses for others TREE_ADDRESSABLE types. In the 3215subfunction, copy the value into the local variable. 3216 3217It is not clear what to do with bare FOR or SECTION blocks. 3218The only thing I can figure is that we do something like: 3219 3220@smallexample 3221#pragma omp for firstprivate(x) lastprivate(y) 3222for (int i = 0; i < n; ++i) 3223 body; 3224@end smallexample 3225 3226which becomes 3227 3228@smallexample 3229@{ 3230 int x = x, y; 3231 3232 // for stuff 3233 3234 if (i == n) 3235 y = y; 3236@} 3237@end smallexample 3238 3239where the "x=x" and "y=y" assignments actually have different 3240uids for the two variables, i.e. not something you could write 3241directly in C. Presumably this only makes sense if the "outer" 3242x and y are global variables. 3243 3244COPYPRIVATE would work the same way, except the structure 3245broadcast would have to happen via SINGLE machinery instead. 3246 3247 3248 3249@node Implementing REDUCTION clause 3250@section Implementing REDUCTION clause 3251 3252The private struct mentioned in the previous section should have 3253a pointer to an array of the type of the variable, indexed by the 3254thread's @var{team_id}. The thread stores its final value into the 3255array, and after the barrier, the master thread iterates over the 3256array to collect the values. 3257 3258 3259@node Implementing PARALLEL construct 3260@section Implementing PARALLEL construct 3261 3262@smallexample 3263 #pragma omp parallel 3264 @{ 3265 body; 3266 @} 3267@end smallexample 3268 3269becomes 3270 3271@smallexample 3272 void subfunction (void *data) 3273 @{ 3274 use data; 3275 body; 3276 @} 3277 3278 setup data; 3279 GOMP_parallel_start (subfunction, &data, num_threads); 3280 subfunction (&data); 3281 GOMP_parallel_end (); 3282@end smallexample 3283 3284@smallexample 3285 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads) 3286@end smallexample 3287 3288The @var{FN} argument is the subfunction to be run in parallel. 3289 3290The @var{DATA} argument is a pointer to a structure used to 3291communicate data in and out of the subfunction, as discussed 3292above with respect to FIRSTPRIVATE et al. 3293 3294The @var{NUM_THREADS} argument is 1 if an IF clause is present 3295and false, or the value of the NUM_THREADS clause, if 3296present, or 0. 3297 3298The function needs to create the appropriate number of 3299threads and/or launch them from the dock. It needs to 3300create the team structure and assign team ids. 3301 3302@smallexample 3303 void GOMP_parallel_end (void) 3304@end smallexample 3305 3306Tears down the team and returns us to the previous @code{omp_in_parallel()} state. 3307 3308 3309 3310@node Implementing FOR construct 3311@section Implementing FOR construct 3312 3313@smallexample 3314 #pragma omp parallel for 3315 for (i = lb; i <= ub; i++) 3316 body; 3317@end smallexample 3318 3319becomes 3320 3321@smallexample 3322 void subfunction (void *data) 3323 @{ 3324 long _s0, _e0; 3325 while (GOMP_loop_static_next (&_s0, &_e0)) 3326 @{ 3327 long _e1 = _e0, i; 3328 for (i = _s0; i < _e1; i++) 3329 body; 3330 @} 3331 GOMP_loop_end_nowait (); 3332 @} 3333 3334 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0); 3335 subfunction (NULL); 3336 GOMP_parallel_end (); 3337@end smallexample 3338 3339@smallexample 3340 #pragma omp for schedule(runtime) 3341 for (i = 0; i < n; i++) 3342 body; 3343@end smallexample 3344 3345becomes 3346 3347@smallexample 3348 @{ 3349 long i, _s0, _e0; 3350 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0)) 3351 do @{ 3352 long _e1 = _e0; 3353 for (i = _s0, i < _e0; i++) 3354 body; 3355 @} while (GOMP_loop_runtime_next (&_s0, _&e0)); 3356 GOMP_loop_end (); 3357 @} 3358@end smallexample 3359 3360Note that while it looks like there is trickiness to propagating 3361a non-constant STEP, there isn't really. We're explicitly allowed 3362to evaluate it as many times as we want, and any variables involved 3363should automatically be handled as PRIVATE or SHARED like any other 3364variables. So the expression should remain evaluable in the 3365subfunction. We can also pull it into a local variable if we like, 3366but since its supposed to remain unchanged, we can also not if we like. 3367 3368If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be 3369able to get away with no work-sharing context at all, since we can 3370simply perform the arithmetic directly in each thread to divide up 3371the iterations. Which would mean that we wouldn't need to call any 3372of these routines. 3373 3374There are separate routines for handling loops with an ORDERED 3375clause. Bookkeeping for that is non-trivial... 3376 3377 3378 3379@node Implementing ORDERED construct 3380@section Implementing ORDERED construct 3381 3382@smallexample 3383 void GOMP_ordered_start (void) 3384 void GOMP_ordered_end (void) 3385@end smallexample 3386 3387 3388 3389@node Implementing SECTIONS construct 3390@section Implementing SECTIONS construct 3391 3392A block as 3393 3394@smallexample 3395 #pragma omp sections 3396 @{ 3397 #pragma omp section 3398 stmt1; 3399 #pragma omp section 3400 stmt2; 3401 #pragma omp section 3402 stmt3; 3403 @} 3404@end smallexample 3405 3406becomes 3407 3408@smallexample 3409 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ()) 3410 switch (i) 3411 @{ 3412 case 1: 3413 stmt1; 3414 break; 3415 case 2: 3416 stmt2; 3417 break; 3418 case 3: 3419 stmt3; 3420 break; 3421 @} 3422 GOMP_barrier (); 3423@end smallexample 3424 3425 3426@node Implementing SINGLE construct 3427@section Implementing SINGLE construct 3428 3429A block like 3430 3431@smallexample 3432 #pragma omp single 3433 @{ 3434 body; 3435 @} 3436@end smallexample 3437 3438becomes 3439 3440@smallexample 3441 if (GOMP_single_start ()) 3442 body; 3443 GOMP_barrier (); 3444@end smallexample 3445 3446while 3447 3448@smallexample 3449 #pragma omp single copyprivate(x) 3450 body; 3451@end smallexample 3452 3453becomes 3454 3455@smallexample 3456 datap = GOMP_single_copy_start (); 3457 if (datap == NULL) 3458 @{ 3459 body; 3460 data.x = x; 3461 GOMP_single_copy_end (&data); 3462 @} 3463 else 3464 x = datap->x; 3465 GOMP_barrier (); 3466@end smallexample 3467 3468 3469 3470@node Implementing OpenACC's PARALLEL construct 3471@section Implementing OpenACC's PARALLEL construct 3472 3473@smallexample 3474 void GOACC_parallel () 3475@end smallexample 3476 3477 3478 3479@c --------------------------------------------------------------------- 3480@c Reporting Bugs 3481@c --------------------------------------------------------------------- 3482 3483@node Reporting Bugs 3484@chapter Reporting Bugs 3485 3486Bugs in the GNU Offloading and Multi Processing Runtime Library should 3487be reported via @uref{http://gcc.gnu.org/bugzilla/, Bugzilla}. Please add 3488"openacc", or "openmp", or both to the keywords field in the bug 3489report, as appropriate. 3490 3491 3492 3493@c --------------------------------------------------------------------- 3494@c GNU General Public License 3495@c --------------------------------------------------------------------- 3496 3497@include gpl_v3.texi 3498 3499 3500 3501@c --------------------------------------------------------------------- 3502@c GNU Free Documentation License 3503@c --------------------------------------------------------------------- 3504 3505@include fdl.texi 3506 3507 3508 3509@c --------------------------------------------------------------------- 3510@c Funding Free Software 3511@c --------------------------------------------------------------------- 3512 3513@include funding.texi 3514 3515@c --------------------------------------------------------------------- 3516@c Index 3517@c --------------------------------------------------------------------- 3518 3519@node Library Index 3520@unnumbered Library Index 3521 3522@printindex cp 3523 3524@bye 3525