xref: /netbsd-src/external/gpl3/gcc/dist/libgomp/libgomp.texi (revision a8c74629f602faa0ccf8a463757d7baf858bbf3a)
1\input texinfo @c -*-texinfo-*-
2
3@c %**start of header
4@setfilename libgomp.info
5@settitle GNU libgomp
6@c %**end of header
7
8
9@copying
10Copyright @copyright{} 2006-2019 Free Software Foundation, Inc.
11
12Permission is granted to copy, distribute and/or modify this document
13under the terms of the GNU Free Documentation License, Version 1.3 or
14any later version published by the Free Software Foundation; with the
15Invariant Sections being ``Funding Free Software'', the Front-Cover
16texts being (a) (see below), and with the Back-Cover Texts being (b)
17(see below).  A copy of the license is included in the section entitled
18``GNU Free Documentation License''.
19
20(a) The FSF's Front-Cover Text is:
21
22     A GNU Manual
23
24(b) The FSF's Back-Cover Text is:
25
26     You have freedom to copy and modify this GNU Manual, like GNU
27     software.  Copies published by the Free Software Foundation raise
28     funds for GNU development.
29@end copying
30
31@ifinfo
32@dircategory GNU Libraries
33@direntry
34* libgomp: (libgomp).          GNU Offloading and Multi Processing Runtime Library.
35@end direntry
36
37This manual documents libgomp, the GNU Offloading and Multi Processing
38Runtime library.  This is the GNU implementation of the OpenMP and
39OpenACC APIs for parallel and accelerator programming in C/C++ and
40Fortran.
41
42Published by the Free Software Foundation
4351 Franklin Street, Fifth Floor
44Boston, MA 02110-1301 USA
45
46@insertcopying
47@end ifinfo
48
49
50@setchapternewpage odd
51
52@titlepage
53@title GNU Offloading and Multi Processing Runtime Library
54@subtitle The GNU OpenMP and OpenACC Implementation
55@page
56@vskip 0pt plus 1filll
57@comment For the @value{version-GCC} Version*
58@sp 1
59Published by the Free Software Foundation @*
6051 Franklin Street, Fifth Floor@*
61Boston, MA 02110-1301, USA@*
62@sp 1
63@insertcopying
64@end titlepage
65
66@summarycontents
67@contents
68@page
69
70
71@node Top
72@top Introduction
73@cindex Introduction
74
75This manual documents the usage of libgomp, the GNU Offloading and
76Multi Processing Runtime Library.  This includes the GNU
77implementation of the @uref{https://www.openmp.org, OpenMP} Application
78Programming Interface (API) for multi-platform shared-memory parallel
79programming in C/C++ and Fortran, and the GNU implementation of the
80@uref{https://www.openacc.org, OpenACC} Application Programming
81Interface (API) for offloading of code to accelerator devices in C/C++
82and Fortran.
83
84Originally, libgomp implemented the GNU OpenMP Runtime Library.  Based
85on this, support for OpenACC and offloading (both OpenACC and OpenMP
864's target construct) has been added later on, and the library's name
87changed to GNU Offloading and Multi Processing Runtime Library.
88
89
90
91@comment
92@comment  When you add a new menu item, please keep the right hand
93@comment  aligned to the same column.  Do not use tabs.  This provides
94@comment  better formatting.
95@comment
96@menu
97* Enabling OpenMP::            How to enable OpenMP for your applications.
98* OpenMP Runtime Library Routines: Runtime Library Routines.
99                               The OpenMP runtime application programming
100                               interface.
101* OpenMP Environment Variables: Environment Variables.
102                               Influencing OpenMP runtime behavior with
103                               environment variables.
104* Enabling OpenACC::           How to enable OpenACC for your
105                               applications.
106* OpenACC Runtime Library Routines:: The OpenACC runtime application
107                               programming interface.
108* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
109                               environment variables.
110* CUDA Streams Usage::         Notes on the implementation of
111                               asynchronous operations.
112* OpenACC Library Interoperability:: OpenACC library interoperability with the
113                               NVIDIA CUBLAS library.
114* The libgomp ABI::            Notes on the external ABI presented by libgomp.
115* Reporting Bugs::             How to report bugs in the GNU Offloading and
116                               Multi Processing Runtime Library.
117* Copying::                    GNU general public license says
118                               how you can copy and share libgomp.
119* GNU Free Documentation License::
120                               How you can copy and share this manual.
121* Funding::                    How to help assure continued work for free
122                               software.
123* Library Index::              Index of this documentation.
124@end menu
125
126
127@c ---------------------------------------------------------------------
128@c Enabling OpenMP
129@c ---------------------------------------------------------------------
130
131@node Enabling OpenMP
132@chapter Enabling OpenMP
133
134To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
135flag @command{-fopenmp} must be specified.  This enables the OpenMP directive
136@code{#pragma omp} in C/C++ and @code{!$omp} directives in free form,
137@code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form,
138@code{!$} conditional compilation sentinels in free form and @code{c$},
139@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
140arranges for automatic linking of the OpenMP runtime library
141(@ref{Runtime Library Routines}).
142
143A complete description of all OpenMP directives accepted may be found in
144the @uref{https://www.openmp.org, OpenMP Application Program Interface} manual,
145version 4.5.
146
147
148@c ---------------------------------------------------------------------
149@c OpenMP Runtime Library Routines
150@c ---------------------------------------------------------------------
151
152@node Runtime Library Routines
153@chapter OpenMP Runtime Library Routines
154
155The runtime routines described here are defined by Section 3 of the OpenMP
156specification in version 4.5.  The routines are structured in following
157three parts:
158
159@menu
160Control threads, processors and the parallel environment.  They have C
161linkage, and do not throw exceptions.
162
163* omp_get_active_level::        Number of active parallel regions
164* omp_get_ancestor_thread_num:: Ancestor thread ID
165* omp_get_cancellation::        Whether cancellation support is enabled
166* omp_get_default_device::      Get the default device for target regions
167* omp_get_dynamic::             Dynamic teams setting
168* omp_get_level::               Number of parallel regions
169* omp_get_max_active_levels::   Maximum number of active regions
170* omp_get_max_task_priority::   Maximum task priority value that can be set
171* omp_get_max_threads::         Maximum number of threads of parallel region
172* omp_get_nested::              Nested parallel regions
173* omp_get_num_devices::         Number of target devices
174* omp_get_num_procs::           Number of processors online
175* omp_get_num_teams::           Number of teams
176* omp_get_num_threads::         Size of the active team
177* omp_get_proc_bind::           Whether theads may be moved between CPUs
178* omp_get_schedule::            Obtain the runtime scheduling method
179* omp_get_team_num::            Get team number
180* omp_get_team_size::           Number of threads in a team
181* omp_get_thread_limit::        Maximum number of threads
182* omp_get_thread_num::          Current thread ID
183* omp_in_parallel::             Whether a parallel region is active
184* omp_in_final::                Whether in final or included task region
185* omp_is_initial_device::       Whether executing on the host device
186* omp_set_default_device::      Set the default device for target regions
187* omp_set_dynamic::             Enable/disable dynamic teams
188* omp_set_max_active_levels::   Limits the number of active parallel regions
189* omp_set_nested::              Enable/disable nested parallel regions
190* omp_set_num_threads::         Set upper team size limit
191* omp_set_schedule::            Set the runtime scheduling method
192
193Initialize, set, test, unset and destroy simple and nested locks.
194
195* omp_init_lock::            Initialize simple lock
196* omp_set_lock::             Wait for and set simple lock
197* omp_test_lock::            Test and set simple lock if available
198* omp_unset_lock::           Unset simple lock
199* omp_destroy_lock::         Destroy simple lock
200* omp_init_nest_lock::       Initialize nested lock
201* omp_set_nest_lock::        Wait for and set simple lock
202* omp_test_nest_lock::       Test and set nested lock if available
203* omp_unset_nest_lock::      Unset nested lock
204* omp_destroy_nest_lock::    Destroy nested lock
205
206Portable, thread-based, wall clock timer.
207
208* omp_get_wtick::            Get timer precision.
209* omp_get_wtime::            Elapsed wall clock time.
210@end menu
211
212
213
214@node omp_get_active_level
215@section @code{omp_get_active_level} -- Number of parallel regions
216@table @asis
217@item @emph{Description}:
218This function returns the nesting level for the active parallel blocks,
219which enclose the calling call.
220
221@item @emph{C/C++}
222@multitable @columnfractions .20 .80
223@item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
224@end multitable
225
226@item @emph{Fortran}:
227@multitable @columnfractions .20 .80
228@item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
229@end multitable
230
231@item @emph{See also}:
232@ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
233
234@item @emph{Reference}:
235@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
236@end table
237
238
239
240@node omp_get_ancestor_thread_num
241@section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
242@table @asis
243@item @emph{Description}:
244This function returns the thread identification number for the given
245nesting level of the current thread.  For values of @var{level} outside
246zero to @code{omp_get_level} -1 is returned; if @var{level} is
247@code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
248
249@item @emph{C/C++}
250@multitable @columnfractions .20 .80
251@item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
252@end multitable
253
254@item @emph{Fortran}:
255@multitable @columnfractions .20 .80
256@item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
257@item                   @tab @code{integer level}
258@end multitable
259
260@item @emph{See also}:
261@ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
262
263@item @emph{Reference}:
264@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
265@end table
266
267
268
269@node omp_get_cancellation
270@section @code{omp_get_cancellation} -- Whether cancellation support is enabled
271@table @asis
272@item @emph{Description}:
273This function returns @code{true} if cancellation is activated, @code{false}
274otherwise.  Here, @code{true} and @code{false} represent their language-specific
275counterparts.  Unless @env{OMP_CANCELLATION} is set true, cancellations are
276deactivated.
277
278@item @emph{C/C++}:
279@multitable @columnfractions .20 .80
280@item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
281@end multitable
282
283@item @emph{Fortran}:
284@multitable @columnfractions .20 .80
285@item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
286@end multitable
287
288@item @emph{See also}:
289@ref{OMP_CANCELLATION}
290
291@item @emph{Reference}:
292@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
293@end table
294
295
296
297@node omp_get_default_device
298@section @code{omp_get_default_device} -- Get the default device for target regions
299@table @asis
300@item @emph{Description}:
301Get the default device for target regions without device clause.
302
303@item @emph{C/C++}:
304@multitable @columnfractions .20 .80
305@item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
306@end multitable
307
308@item @emph{Fortran}:
309@multitable @columnfractions .20 .80
310@item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
311@end multitable
312
313@item @emph{See also}:
314@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
315
316@item @emph{Reference}:
317@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
318@end table
319
320
321
322@node omp_get_dynamic
323@section @code{omp_get_dynamic} -- Dynamic teams setting
324@table @asis
325@item @emph{Description}:
326This function returns @code{true} if enabled, @code{false} otherwise.
327Here, @code{true} and @code{false} represent their language-specific
328counterparts.
329
330The dynamic team setting may be initialized at startup by the
331@env{OMP_DYNAMIC} environment variable or at runtime using
332@code{omp_set_dynamic}.  If undefined, dynamic adjustment is
333disabled by default.
334
335@item @emph{C/C++}:
336@multitable @columnfractions .20 .80
337@item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
338@end multitable
339
340@item @emph{Fortran}:
341@multitable @columnfractions .20 .80
342@item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
343@end multitable
344
345@item @emph{See also}:
346@ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
347
348@item @emph{Reference}:
349@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
350@end table
351
352
353
354@node omp_get_level
355@section @code{omp_get_level} -- Obtain the current nesting level
356@table @asis
357@item @emph{Description}:
358This function returns the nesting level for the parallel blocks,
359which enclose the calling call.
360
361@item @emph{C/C++}
362@multitable @columnfractions .20 .80
363@item @emph{Prototype}: @tab @code{int omp_get_level(void);}
364@end multitable
365
366@item @emph{Fortran}:
367@multitable @columnfractions .20 .80
368@item @emph{Interface}: @tab @code{integer function omp_level()}
369@end multitable
370
371@item @emph{See also}:
372@ref{omp_get_active_level}
373
374@item @emph{Reference}:
375@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
376@end table
377
378
379
380@node omp_get_max_active_levels
381@section @code{omp_get_max_active_levels} -- Maximum number of active regions
382@table @asis
383@item @emph{Description}:
384This function obtains the maximum allowed number of nested, active parallel regions.
385
386@item @emph{C/C++}
387@multitable @columnfractions .20 .80
388@item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
389@end multitable
390
391@item @emph{Fortran}:
392@multitable @columnfractions .20 .80
393@item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
394@end multitable
395
396@item @emph{See also}:
397@ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
398
399@item @emph{Reference}:
400@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
401@end table
402
403
404@node omp_get_max_task_priority
405@section @code{omp_get_max_task_priority} -- Maximum priority value
406that can be set for tasks.
407@table @asis
408@item @emph{Description}:
409This function obtains the maximum allowed priority number for tasks.
410
411@item @emph{C/C++}
412@multitable @columnfractions .20 .80
413@item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
414@end multitable
415
416@item @emph{Fortran}:
417@multitable @columnfractions .20 .80
418@item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
419@end multitable
420
421@item @emph{Reference}:
422@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
423@end table
424
425
426@node omp_get_max_threads
427@section @code{omp_get_max_threads} -- Maximum number of threads of parallel region
428@table @asis
429@item @emph{Description}:
430Return the maximum number of threads used for the current parallel region
431that does not use the clause @code{num_threads}.
432
433@item @emph{C/C++}:
434@multitable @columnfractions .20 .80
435@item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
436@end multitable
437
438@item @emph{Fortran}:
439@multitable @columnfractions .20 .80
440@item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
441@end multitable
442
443@item @emph{See also}:
444@ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
445
446@item @emph{Reference}:
447@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
448@end table
449
450
451
452@node omp_get_nested
453@section @code{omp_get_nested} -- Nested parallel regions
454@table @asis
455@item @emph{Description}:
456This function returns @code{true} if nested parallel regions are
457enabled, @code{false} otherwise.  Here, @code{true} and @code{false}
458represent their language-specific counterparts.
459
460Nested parallel regions may be initialized at startup by the
461@env{OMP_NESTED} environment variable or at runtime using
462@code{omp_set_nested}.  If undefined, nested parallel regions are
463disabled by default.
464
465@item @emph{C/C++}:
466@multitable @columnfractions .20 .80
467@item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
468@end multitable
469
470@item @emph{Fortran}:
471@multitable @columnfractions .20 .80
472@item @emph{Interface}: @tab @code{logical function omp_get_nested()}
473@end multitable
474
475@item @emph{See also}:
476@ref{omp_set_nested}, @ref{OMP_NESTED}
477
478@item @emph{Reference}:
479@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
480@end table
481
482
483
484@node omp_get_num_devices
485@section @code{omp_get_num_devices} -- Number of target devices
486@table @asis
487@item @emph{Description}:
488Returns the number of target devices.
489
490@item @emph{C/C++}:
491@multitable @columnfractions .20 .80
492@item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
493@end multitable
494
495@item @emph{Fortran}:
496@multitable @columnfractions .20 .80
497@item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
498@end multitable
499
500@item @emph{Reference}:
501@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
502@end table
503
504
505
506@node omp_get_num_procs
507@section @code{omp_get_num_procs} -- Number of processors online
508@table @asis
509@item @emph{Description}:
510Returns the number of processors online on that device.
511
512@item @emph{C/C++}:
513@multitable @columnfractions .20 .80
514@item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
515@end multitable
516
517@item @emph{Fortran}:
518@multitable @columnfractions .20 .80
519@item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
520@end multitable
521
522@item @emph{Reference}:
523@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
524@end table
525
526
527
528@node omp_get_num_teams
529@section @code{omp_get_num_teams} -- Number of teams
530@table @asis
531@item @emph{Description}:
532Returns the number of teams in the current team region.
533
534@item @emph{C/C++}:
535@multitable @columnfractions .20 .80
536@item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
537@end multitable
538
539@item @emph{Fortran}:
540@multitable @columnfractions .20 .80
541@item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
542@end multitable
543
544@item @emph{Reference}:
545@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
546@end table
547
548
549
550@node omp_get_num_threads
551@section @code{omp_get_num_threads} -- Size of the active team
552@table @asis
553@item @emph{Description}:
554Returns the number of threads in the current team.  In a sequential section of
555the program @code{omp_get_num_threads} returns 1.
556
557The default team size may be initialized at startup by the
558@env{OMP_NUM_THREADS} environment variable.  At runtime, the size
559of the current team may be set either by the @code{NUM_THREADS}
560clause or by @code{omp_set_num_threads}.  If none of the above were
561used to define a specific value and @env{OMP_DYNAMIC} is disabled,
562one thread per CPU online is used.
563
564@item @emph{C/C++}:
565@multitable @columnfractions .20 .80
566@item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
567@end multitable
568
569@item @emph{Fortran}:
570@multitable @columnfractions .20 .80
571@item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
572@end multitable
573
574@item @emph{See also}:
575@ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
576
577@item @emph{Reference}:
578@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
579@end table
580
581
582
583@node omp_get_proc_bind
584@section @code{omp_get_proc_bind} -- Whether theads may be moved between CPUs
585@table @asis
586@item @emph{Description}:
587This functions returns the currently active thread affinity policy, which is
588set via @env{OMP_PROC_BIND}.  Possible values are @code{omp_proc_bind_false},
589@code{omp_proc_bind_true}, @code{omp_proc_bind_master},
590@code{omp_proc_bind_close} and @code{omp_proc_bind_spread}.
591
592@item @emph{C/C++}:
593@multitable @columnfractions .20 .80
594@item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
595@end multitable
596
597@item @emph{Fortran}:
598@multitable @columnfractions .20 .80
599@item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
600@end multitable
601
602@item @emph{See also}:
603@ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
604
605@item @emph{Reference}:
606@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
607@end table
608
609
610
611@node omp_get_schedule
612@section @code{omp_get_schedule} -- Obtain the runtime scheduling method
613@table @asis
614@item @emph{Description}:
615Obtain the runtime scheduling method.  The @var{kind} argument will be
616set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
617@code{omp_sched_guided} or @code{omp_sched_auto}.  The second argument,
618@var{chunk_size}, is set to the chunk size.
619
620@item @emph{C/C++}
621@multitable @columnfractions .20 .80
622@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
623@end multitable
624
625@item @emph{Fortran}:
626@multitable @columnfractions .20 .80
627@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
628@item                   @tab @code{integer(kind=omp_sched_kind) kind}
629@item                   @tab @code{integer chunk_size}
630@end multitable
631
632@item @emph{See also}:
633@ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
634
635@item @emph{Reference}:
636@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
637@end table
638
639
640
641@node omp_get_team_num
642@section @code{omp_get_team_num} -- Get team number
643@table @asis
644@item @emph{Description}:
645Returns the team number of the calling thread.
646
647@item @emph{C/C++}:
648@multitable @columnfractions .20 .80
649@item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
650@end multitable
651
652@item @emph{Fortran}:
653@multitable @columnfractions .20 .80
654@item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
655@end multitable
656
657@item @emph{Reference}:
658@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
659@end table
660
661
662
663@node omp_get_team_size
664@section @code{omp_get_team_size} -- Number of threads in a team
665@table @asis
666@item @emph{Description}:
667This function returns the number of threads in a thread team to which
668either the current thread or its ancestor belongs.  For values of @var{level}
669outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
6701 is returned, and for @code{omp_get_level}, the result is identical
671to @code{omp_get_num_threads}.
672
673@item @emph{C/C++}:
674@multitable @columnfractions .20 .80
675@item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
676@end multitable
677
678@item @emph{Fortran}:
679@multitable @columnfractions .20 .80
680@item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
681@item                   @tab @code{integer level}
682@end multitable
683
684@item @emph{See also}:
685@ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
686
687@item @emph{Reference}:
688@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
689@end table
690
691
692
693@node omp_get_thread_limit
694@section @code{omp_get_thread_limit} -- Maximum number of threads
695@table @asis
696@item @emph{Description}:
697Return the maximum number of threads of the program.
698
699@item @emph{C/C++}:
700@multitable @columnfractions .20 .80
701@item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
702@end multitable
703
704@item @emph{Fortran}:
705@multitable @columnfractions .20 .80
706@item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
707@end multitable
708
709@item @emph{See also}:
710@ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
711
712@item @emph{Reference}:
713@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
714@end table
715
716
717
718@node omp_get_thread_num
719@section @code{omp_get_thread_num} -- Current thread ID
720@table @asis
721@item @emph{Description}:
722Returns a unique thread identification number within the current team.
723In a sequential parts of the program, @code{omp_get_thread_num}
724always returns 0.  In parallel regions the return value varies
725from 0 to @code{omp_get_num_threads}-1 inclusive.  The return
726value of the master thread of a team is always 0.
727
728@item @emph{C/C++}:
729@multitable @columnfractions .20 .80
730@item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
731@end multitable
732
733@item @emph{Fortran}:
734@multitable @columnfractions .20 .80
735@item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
736@end multitable
737
738@item @emph{See also}:
739@ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
740
741@item @emph{Reference}:
742@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
743@end table
744
745
746
747@node omp_in_parallel
748@section @code{omp_in_parallel} -- Whether a parallel region is active
749@table @asis
750@item @emph{Description}:
751This function returns @code{true} if currently running in parallel,
752@code{false} otherwise.  Here, @code{true} and @code{false} represent
753their language-specific counterparts.
754
755@item @emph{C/C++}:
756@multitable @columnfractions .20 .80
757@item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
758@end multitable
759
760@item @emph{Fortran}:
761@multitable @columnfractions .20 .80
762@item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
763@end multitable
764
765@item @emph{Reference}:
766@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
767@end table
768
769
770@node omp_in_final
771@section @code{omp_in_final} -- Whether in final or included task region
772@table @asis
773@item @emph{Description}:
774This function returns @code{true} if currently running in a final
775or included task region, @code{false} otherwise.  Here, @code{true}
776and @code{false} represent their language-specific counterparts.
777
778@item @emph{C/C++}:
779@multitable @columnfractions .20 .80
780@item @emph{Prototype}: @tab @code{int omp_in_final(void);}
781@end multitable
782
783@item @emph{Fortran}:
784@multitable @columnfractions .20 .80
785@item @emph{Interface}: @tab @code{logical function omp_in_final()}
786@end multitable
787
788@item @emph{Reference}:
789@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
790@end table
791
792
793
794@node omp_is_initial_device
795@section @code{omp_is_initial_device} -- Whether executing on the host device
796@table @asis
797@item @emph{Description}:
798This function returns @code{true} if currently running on the host device,
799@code{false} otherwise.  Here, @code{true} and @code{false} represent
800their language-specific counterparts.
801
802@item @emph{C/C++}:
803@multitable @columnfractions .20 .80
804@item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
805@end multitable
806
807@item @emph{Fortran}:
808@multitable @columnfractions .20 .80
809@item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
810@end multitable
811
812@item @emph{Reference}:
813@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
814@end table
815
816
817
818@node omp_set_default_device
819@section @code{omp_set_default_device} -- Set the default device for target regions
820@table @asis
821@item @emph{Description}:
822Set the default device for target regions without device clause.  The argument
823shall be a nonnegative device number.
824
825@item @emph{C/C++}:
826@multitable @columnfractions .20 .80
827@item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
828@end multitable
829
830@item @emph{Fortran}:
831@multitable @columnfractions .20 .80
832@item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
833@item                   @tab @code{integer device_num}
834@end multitable
835
836@item @emph{See also}:
837@ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
838
839@item @emph{Reference}:
840@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
841@end table
842
843
844
845@node omp_set_dynamic
846@section @code{omp_set_dynamic} -- Enable/disable dynamic teams
847@table @asis
848@item @emph{Description}:
849Enable or disable the dynamic adjustment of the number of threads
850within a team.  The function takes the language-specific equivalent
851of @code{true} and @code{false}, where @code{true} enables dynamic
852adjustment of team sizes and @code{false} disables it.
853
854@item @emph{C/C++}:
855@multitable @columnfractions .20 .80
856@item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
857@end multitable
858
859@item @emph{Fortran}:
860@multitable @columnfractions .20 .80
861@item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
862@item                   @tab @code{logical, intent(in) :: dynamic_threads}
863@end multitable
864
865@item @emph{See also}:
866@ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
867
868@item @emph{Reference}:
869@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
870@end table
871
872
873
874@node omp_set_max_active_levels
875@section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
876@table @asis
877@item @emph{Description}:
878This function limits the maximum allowed number of nested, active
879parallel regions.
880
881@item @emph{C/C++}
882@multitable @columnfractions .20 .80
883@item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
884@end multitable
885
886@item @emph{Fortran}:
887@multitable @columnfractions .20 .80
888@item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
889@item                   @tab @code{integer max_levels}
890@end multitable
891
892@item @emph{See also}:
893@ref{omp_get_max_active_levels}, @ref{omp_get_active_level}
894
895@item @emph{Reference}:
896@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
897@end table
898
899
900
901@node omp_set_nested
902@section @code{omp_set_nested} -- Enable/disable nested parallel regions
903@table @asis
904@item @emph{Description}:
905Enable or disable nested parallel regions, i.e., whether team members
906are allowed to create new teams.  The function takes the language-specific
907equivalent of @code{true} and @code{false}, where @code{true} enables
908dynamic adjustment of team sizes and @code{false} disables it.
909
910@item @emph{C/C++}:
911@multitable @columnfractions .20 .80
912@item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
913@end multitable
914
915@item @emph{Fortran}:
916@multitable @columnfractions .20 .80
917@item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
918@item                   @tab @code{logical, intent(in) :: nested}
919@end multitable
920
921@item @emph{See also}:
922@ref{OMP_NESTED}, @ref{omp_get_nested}
923
924@item @emph{Reference}:
925@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
926@end table
927
928
929
930@node omp_set_num_threads
931@section @code{omp_set_num_threads} -- Set upper team size limit
932@table @asis
933@item @emph{Description}:
934Specifies the number of threads used by default in subsequent parallel
935sections, if those do not specify a @code{num_threads} clause.  The
936argument of @code{omp_set_num_threads} shall be a positive integer.
937
938@item @emph{C/C++}:
939@multitable @columnfractions .20 .80
940@item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
941@end multitable
942
943@item @emph{Fortran}:
944@multitable @columnfractions .20 .80
945@item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
946@item                   @tab @code{integer, intent(in) :: num_threads}
947@end multitable
948
949@item @emph{See also}:
950@ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
951
952@item @emph{Reference}:
953@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
954@end table
955
956
957
958@node omp_set_schedule
959@section @code{omp_set_schedule} -- Set the runtime scheduling method
960@table @asis
961@item @emph{Description}:
962Sets the runtime scheduling method.  The @var{kind} argument can have the
963value @code{omp_sched_static}, @code{omp_sched_dynamic},
964@code{omp_sched_guided} or @code{omp_sched_auto}.  Except for
965@code{omp_sched_auto}, the chunk size is set to the value of
966@var{chunk_size} if positive, or to the default value if zero or negative.
967For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
968
969@item @emph{C/C++}
970@multitable @columnfractions .20 .80
971@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
972@end multitable
973
974@item @emph{Fortran}:
975@multitable @columnfractions .20 .80
976@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
977@item                   @tab @code{integer(kind=omp_sched_kind) kind}
978@item                   @tab @code{integer chunk_size}
979@end multitable
980
981@item @emph{See also}:
982@ref{omp_get_schedule}
983@ref{OMP_SCHEDULE}
984
985@item @emph{Reference}:
986@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
987@end table
988
989
990
991@node omp_init_lock
992@section @code{omp_init_lock} -- Initialize simple lock
993@table @asis
994@item @emph{Description}:
995Initialize a simple lock.  After initialization, the lock is in
996an unlocked state.
997
998@item @emph{C/C++}:
999@multitable @columnfractions .20 .80
1000@item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
1001@end multitable
1002
1003@item @emph{Fortran}:
1004@multitable @columnfractions .20 .80
1005@item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
1006@item                   @tab @code{integer(omp_lock_kind), intent(out) :: svar}
1007@end multitable
1008
1009@item @emph{See also}:
1010@ref{omp_destroy_lock}
1011
1012@item @emph{Reference}:
1013@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1014@end table
1015
1016
1017
1018@node omp_set_lock
1019@section @code{omp_set_lock} -- Wait for and set simple lock
1020@table @asis
1021@item @emph{Description}:
1022Before setting a simple lock, the lock variable must be initialized by
1023@code{omp_init_lock}.  The calling thread is blocked until the lock
1024is available.  If the lock is already held by the current thread,
1025a deadlock occurs.
1026
1027@item @emph{C/C++}:
1028@multitable @columnfractions .20 .80
1029@item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
1030@end multitable
1031
1032@item @emph{Fortran}:
1033@multitable @columnfractions .20 .80
1034@item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
1035@item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1036@end multitable
1037
1038@item @emph{See also}:
1039@ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
1040
1041@item @emph{Reference}:
1042@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1043@end table
1044
1045
1046
1047@node omp_test_lock
1048@section @code{omp_test_lock} -- Test and set simple lock if available
1049@table @asis
1050@item @emph{Description}:
1051Before setting a simple lock, the lock variable must be initialized by
1052@code{omp_init_lock}.  Contrary to @code{omp_set_lock}, @code{omp_test_lock}
1053does not block if the lock is not available.  This function returns
1054@code{true} upon success, @code{false} otherwise.  Here, @code{true} and
1055@code{false} represent their language-specific counterparts.
1056
1057@item @emph{C/C++}:
1058@multitable @columnfractions .20 .80
1059@item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
1060@end multitable
1061
1062@item @emph{Fortran}:
1063@multitable @columnfractions .20 .80
1064@item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
1065@item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1066@end multitable
1067
1068@item @emph{See also}:
1069@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1070
1071@item @emph{Reference}:
1072@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1073@end table
1074
1075
1076
1077@node omp_unset_lock
1078@section @code{omp_unset_lock} -- Unset simple lock
1079@table @asis
1080@item @emph{Description}:
1081A simple lock about to be unset must have been locked by @code{omp_set_lock}
1082or @code{omp_test_lock} before.  In addition, the lock must be held by the
1083thread calling @code{omp_unset_lock}.  Then, the lock becomes unlocked.  If one
1084or more threads attempted to set the lock before, one of them is chosen to,
1085again, set the lock to itself.
1086
1087@item @emph{C/C++}:
1088@multitable @columnfractions .20 .80
1089@item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
1090@end multitable
1091
1092@item @emph{Fortran}:
1093@multitable @columnfractions .20 .80
1094@item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
1095@item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1096@end multitable
1097
1098@item @emph{See also}:
1099@ref{omp_set_lock}, @ref{omp_test_lock}
1100
1101@item @emph{Reference}:
1102@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1103@end table
1104
1105
1106
1107@node omp_destroy_lock
1108@section @code{omp_destroy_lock} -- Destroy simple lock
1109@table @asis
1110@item @emph{Description}:
1111Destroy a simple lock.  In order to be destroyed, a simple lock must be
1112in the unlocked state.
1113
1114@item @emph{C/C++}:
1115@multitable @columnfractions .20 .80
1116@item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
1117@end multitable
1118
1119@item @emph{Fortran}:
1120@multitable @columnfractions .20 .80
1121@item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
1122@item                   @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1123@end multitable
1124
1125@item @emph{See also}:
1126@ref{omp_init_lock}
1127
1128@item @emph{Reference}:
1129@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1130@end table
1131
1132
1133
1134@node omp_init_nest_lock
1135@section @code{omp_init_nest_lock} -- Initialize nested lock
1136@table @asis
1137@item @emph{Description}:
1138Initialize a nested lock.  After initialization, the lock is in
1139an unlocked state and the nesting count is set to zero.
1140
1141@item @emph{C/C++}:
1142@multitable @columnfractions .20 .80
1143@item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
1144@end multitable
1145
1146@item @emph{Fortran}:
1147@multitable @columnfractions .20 .80
1148@item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
1149@item                   @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
1150@end multitable
1151
1152@item @emph{See also}:
1153@ref{omp_destroy_nest_lock}
1154
1155@item @emph{Reference}:
1156@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1157@end table
1158
1159
1160@node omp_set_nest_lock
1161@section @code{omp_set_nest_lock} -- Wait for and set nested lock
1162@table @asis
1163@item @emph{Description}:
1164Before setting a nested lock, the lock variable must be initialized by
1165@code{omp_init_nest_lock}.  The calling thread is blocked until the lock
1166is available.  If the lock is already held by the current thread, the
1167nesting count for the lock is incremented.
1168
1169@item @emph{C/C++}:
1170@multitable @columnfractions .20 .80
1171@item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
1172@end multitable
1173
1174@item @emph{Fortran}:
1175@multitable @columnfractions .20 .80
1176@item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
1177@item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1178@end multitable
1179
1180@item @emph{See also}:
1181@ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
1182
1183@item @emph{Reference}:
1184@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1185@end table
1186
1187
1188
1189@node omp_test_nest_lock
1190@section @code{omp_test_nest_lock} -- Test and set nested lock if available
1191@table @asis
1192@item @emph{Description}:
1193Before setting a nested lock, the lock variable must be initialized by
1194@code{omp_init_nest_lock}.  Contrary to @code{omp_set_nest_lock},
1195@code{omp_test_nest_lock} does not block if the lock is not available.
1196If the lock is already held by the current thread, the new nesting count
1197is returned.  Otherwise, the return value equals zero.
1198
1199@item @emph{C/C++}:
1200@multitable @columnfractions .20 .80
1201@item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
1202@end multitable
1203
1204@item @emph{Fortran}:
1205@multitable @columnfractions .20 .80
1206@item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
1207@item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1208@end multitable
1209
1210
1211@item @emph{See also}:
1212@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1213
1214@item @emph{Reference}:
1215@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1216@end table
1217
1218
1219
1220@node omp_unset_nest_lock
1221@section @code{omp_unset_nest_lock} -- Unset nested lock
1222@table @asis
1223@item @emph{Description}:
1224A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
1225or @code{omp_test_nested_lock} before.  In addition, the lock must be held by the
1226thread calling @code{omp_unset_nested_lock}.  If the nesting count drops to zero, the
1227lock becomes unlocked.  If one ore more threads attempted to set the lock before,
1228one of them is chosen to, again, set the lock to itself.
1229
1230@item @emph{C/C++}:
1231@multitable @columnfractions .20 .80
1232@item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
1233@end multitable
1234
1235@item @emph{Fortran}:
1236@multitable @columnfractions .20 .80
1237@item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
1238@item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1239@end multitable
1240
1241@item @emph{See also}:
1242@ref{omp_set_nest_lock}
1243
1244@item @emph{Reference}:
1245@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1246@end table
1247
1248
1249
1250@node omp_destroy_nest_lock
1251@section @code{omp_destroy_nest_lock} -- Destroy nested lock
1252@table @asis
1253@item @emph{Description}:
1254Destroy a nested lock.  In order to be destroyed, a nested lock must be
1255in the unlocked state and its nesting count must equal zero.
1256
1257@item @emph{C/C++}:
1258@multitable @columnfractions .20 .80
1259@item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
1260@end multitable
1261
1262@item @emph{Fortran}:
1263@multitable @columnfractions .20 .80
1264@item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
1265@item                   @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1266@end multitable
1267
1268@item @emph{See also}:
1269@ref{omp_init_lock}
1270
1271@item @emph{Reference}:
1272@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1273@end table
1274
1275
1276
1277@node omp_get_wtick
1278@section @code{omp_get_wtick} -- Get timer precision
1279@table @asis
1280@item @emph{Description}:
1281Gets the timer precision, i.e., the number of seconds between two
1282successive clock ticks.
1283
1284@item @emph{C/C++}:
1285@multitable @columnfractions .20 .80
1286@item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
1287@end multitable
1288
1289@item @emph{Fortran}:
1290@multitable @columnfractions .20 .80
1291@item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
1292@end multitable
1293
1294@item @emph{See also}:
1295@ref{omp_get_wtime}
1296
1297@item @emph{Reference}:
1298@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
1299@end table
1300
1301
1302
1303@node omp_get_wtime
1304@section @code{omp_get_wtime} -- Elapsed wall clock time
1305@table @asis
1306@item @emph{Description}:
1307Elapsed wall clock time in seconds.  The time is measured per thread, no
1308guarantee can be made that two distinct threads measure the same time.
1309Time is measured from some "time in the past", which is an arbitrary time
1310guaranteed not to change during the execution of the program.
1311
1312@item @emph{C/C++}:
1313@multitable @columnfractions .20 .80
1314@item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
1315@end multitable
1316
1317@item @emph{Fortran}:
1318@multitable @columnfractions .20 .80
1319@item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
1320@end multitable
1321
1322@item @emph{See also}:
1323@ref{omp_get_wtick}
1324
1325@item @emph{Reference}:
1326@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
1327@end table
1328
1329
1330
1331@c ---------------------------------------------------------------------
1332@c OpenMP Environment Variables
1333@c ---------------------------------------------------------------------
1334
1335@node Environment Variables
1336@chapter OpenMP Environment Variables
1337
1338The environment variables which beginning with @env{OMP_} are defined by
1339section 4 of the OpenMP specification in version 4.5, while those
1340beginning with @env{GOMP_} are GNU extensions.
1341
1342@menu
1343* OMP_CANCELLATION::        Set whether cancellation is activated
1344* OMP_DISPLAY_ENV::         Show OpenMP version and environment variables
1345* OMP_DEFAULT_DEVICE::      Set the device used in target regions
1346* OMP_DYNAMIC::             Dynamic adjustment of threads
1347* OMP_MAX_ACTIVE_LEVELS::   Set the maximum number of nested parallel regions
1348* OMP_MAX_TASK_PRIORITY::   Set the maximum task priority value
1349* OMP_NESTED::              Nested parallel regions
1350* OMP_NUM_THREADS::         Specifies the number of threads to use
1351* OMP_PROC_BIND::           Whether theads may be moved between CPUs
1352* OMP_PLACES::              Specifies on which CPUs the theads should be placed
1353* OMP_STACKSIZE::           Set default thread stack size
1354* OMP_SCHEDULE::            How threads are scheduled
1355* OMP_THREAD_LIMIT::        Set the maximum number of threads
1356* OMP_WAIT_POLICY::         How waiting threads are handled
1357* GOMP_CPU_AFFINITY::       Bind threads to specific CPUs
1358* GOMP_DEBUG::              Enable debugging output
1359* GOMP_STACKSIZE::          Set default thread stack size
1360* GOMP_SPINCOUNT::          Set the busy-wait spin count
1361* GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
1362@end menu
1363
1364
1365@node OMP_CANCELLATION
1366@section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
1367@cindex Environment Variable
1368@table @asis
1369@item @emph{Description}:
1370If set to @code{TRUE}, the cancellation is activated.  If set to @code{FALSE} or
1371if unset, cancellation is disabled and the @code{cancel} construct is ignored.
1372
1373@item @emph{See also}:
1374@ref{omp_get_cancellation}
1375
1376@item @emph{Reference}:
1377@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
1378@end table
1379
1380
1381
1382@node OMP_DISPLAY_ENV
1383@section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
1384@cindex Environment Variable
1385@table @asis
1386@item @emph{Description}:
1387If set to @code{TRUE}, the OpenMP version number and the values
1388associated with the OpenMP environment variables are printed to @code{stderr}.
1389If set to @code{VERBOSE}, it additionally shows the value of the environment
1390variables which are GNU extensions.  If undefined or set to @code{FALSE},
1391this information will not be shown.
1392
1393
1394@item @emph{Reference}:
1395@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
1396@end table
1397
1398
1399
1400@node OMP_DEFAULT_DEVICE
1401@section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
1402@cindex Environment Variable
1403@table @asis
1404@item @emph{Description}:
1405Set to choose the device which is used in a @code{target} region, unless the
1406value is overridden by @code{omp_set_default_device} or by a @code{device}
1407clause.  The value shall be the nonnegative device number. If no device with
1408the given device number exists, the code is executed on the host.  If unset,
1409device number 0 will be used.
1410
1411
1412@item @emph{See also}:
1413@ref{omp_get_default_device}, @ref{omp_set_default_device},
1414
1415@item @emph{Reference}:
1416@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
1417@end table
1418
1419
1420
1421@node OMP_DYNAMIC
1422@section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
1423@cindex Environment Variable
1424@table @asis
1425@item @emph{Description}:
1426Enable or disable the dynamic adjustment of the number of threads
1427within a team.  The value of this environment variable shall be
1428@code{TRUE} or @code{FALSE}.  If undefined, dynamic adjustment is
1429disabled by default.
1430
1431@item @emph{See also}:
1432@ref{omp_set_dynamic}
1433
1434@item @emph{Reference}:
1435@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
1436@end table
1437
1438
1439
1440@node OMP_MAX_ACTIVE_LEVELS
1441@section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
1442@cindex Environment Variable
1443@table @asis
1444@item @emph{Description}:
1445Specifies the initial value for the maximum number of nested parallel
1446regions.  The value of this variable shall be a positive integer.
1447If undefined, the number of active levels is unlimited.
1448
1449@item @emph{See also}:
1450@ref{omp_set_max_active_levels}
1451
1452@item @emph{Reference}:
1453@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
1454@end table
1455
1456
1457
1458@node OMP_MAX_TASK_PRIORITY
1459@section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
1460number that can be set for a task.
1461@cindex Environment Variable
1462@table @asis
1463@item @emph{Description}:
1464Specifies the initial value for the maximum priority value that can be
1465set for a task.  The value of this variable shall be a non-negative
1466integer, and zero is allowed.  If undefined, the default priority is
14670.
1468
1469@item @emph{See also}:
1470@ref{omp_get_max_task_priority}
1471
1472@item @emph{Reference}:
1473@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
1474@end table
1475
1476
1477
1478@node OMP_NESTED
1479@section @env{OMP_NESTED} -- Nested parallel regions
1480@cindex Environment Variable
1481@cindex Implementation specific setting
1482@table @asis
1483@item @emph{Description}:
1484Enable or disable nested parallel regions, i.e., whether team members
1485are allowed to create new teams.  The value of this environment variable
1486shall be @code{TRUE} or @code{FALSE}.  If undefined, nested parallel
1487regions are disabled by default.
1488
1489@item @emph{See also}:
1490@ref{omp_set_nested}
1491
1492@item @emph{Reference}:
1493@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
1494@end table
1495
1496
1497
1498@node OMP_NUM_THREADS
1499@section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
1500@cindex Environment Variable
1501@cindex Implementation specific setting
1502@table @asis
1503@item @emph{Description}:
1504Specifies the default number of threads to use in parallel regions.  The
1505value of this variable shall be a comma-separated list of positive integers;
1506the value specified the number of threads to use for the corresponding nested
1507level.  If undefined one thread per CPU is used.
1508
1509@item @emph{See also}:
1510@ref{omp_set_num_threads}
1511
1512@item @emph{Reference}:
1513@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
1514@end table
1515
1516
1517
1518@node OMP_PROC_BIND
1519@section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
1520@cindex Environment Variable
1521@table @asis
1522@item @emph{Description}:
1523Specifies whether threads may be moved between processors.  If set to
1524@code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
1525they may be moved.  Alternatively, a comma separated list with the
1526values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
1527the thread affinity policy for the corresponding nesting level.  With
1528@code{MASTER} the worker threads are in the same place partition as the
1529master thread.  With @code{CLOSE} those are kept close to the master thread
1530in contiguous place partitions.  And with @code{SPREAD} a sparse distribution
1531across the place partitions is used.
1532
1533When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
1534@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
1535
1536@item @emph{See also}:
1537@ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}
1538
1539@item @emph{Reference}:
1540@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
1541@end table
1542
1543
1544
1545@node OMP_PLACES
1546@section @env{OMP_PLACES} -- Specifies on which CPUs the theads should be placed
1547@cindex Environment Variable
1548@table @asis
1549@item @emph{Description}:
1550The thread placement can be either specified using an abstract name or by an
1551explicit list of the places.  The abstract names @code{threads}, @code{cores}
1552and @code{sockets} can be optionally followed by a positive number in
1553parentheses, which denotes the how many places shall be created.  With
1554@code{threads} each place corresponds to a single hardware thread; @code{cores}
1555to a single core with the corresponding number of hardware threads; and with
1556@code{sockets} the place corresponds to a single socket.  The resulting
1557placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment
1558variable.
1559
1560Alternatively, the placement can be specified explicitly as comma-separated
1561list of places.  A place is specified by set of nonnegative numbers in curly
1562braces, denoting the denoting the hardware threads.  The hardware threads
1563belonging to a place can either be specified as comma-separated list of
1564nonnegative thread numbers or using an interval.  Multiple places can also be
1565either specified by a comma-separated list of places or by an interval.  To
1566specify an interval, a colon followed by the count is placed after after
1567the hardware thread number or the place.  Optionally, the length can be
1568followed by a colon and the stride number -- otherwise a unit stride is
1569assumed.  For instance, the following specifies the same places list:
1570@code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
1571@code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
1572
1573If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
1574@env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
1575between CPUs following no placement policy.
1576
1577@item @emph{See also}:
1578@ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
1579@ref{OMP_DISPLAY_ENV}
1580
1581@item @emph{Reference}:
1582@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
1583@end table
1584
1585
1586
1587@node OMP_STACKSIZE
1588@section @env{OMP_STACKSIZE} -- Set default thread stack size
1589@cindex Environment Variable
1590@table @asis
1591@item @emph{Description}:
1592Set the default thread stack size in kilobytes, unless the number
1593is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
1594case the size is, respectively, in bytes, kilobytes, megabytes
1595or gigabytes.  This is different from @code{pthread_attr_setstacksize}
1596which gets the number of bytes as an argument.  If the stack size cannot
1597be set due to system constraints, an error is reported and the initial
1598stack size is left unchanged.  If undefined, the stack size is system
1599dependent.
1600
1601@item @emph{Reference}:
1602@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
1603@end table
1604
1605
1606
1607@node OMP_SCHEDULE
1608@section @env{OMP_SCHEDULE} -- How threads are scheduled
1609@cindex Environment Variable
1610@cindex Implementation specific setting
1611@table @asis
1612@item @emph{Description}:
1613Allows to specify @code{schedule type} and @code{chunk size}.
1614The value of the variable shall have the form: @code{type[,chunk]} where
1615@code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
1616The optional @code{chunk} size shall be a positive integer.  If undefined,
1617dynamic scheduling and a chunk size of 1 is used.
1618
1619@item @emph{See also}:
1620@ref{omp_set_schedule}
1621
1622@item @emph{Reference}:
1623@uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
1624@end table
1625
1626
1627
1628@node OMP_THREAD_LIMIT
1629@section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
1630@cindex Environment Variable
1631@table @asis
1632@item @emph{Description}:
1633Specifies the number of threads to use for the whole program.  The
1634value of this variable shall be a positive integer.  If undefined,
1635the number of threads is not limited.
1636
1637@item @emph{See also}:
1638@ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
1639
1640@item @emph{Reference}:
1641@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
1642@end table
1643
1644
1645
1646@node OMP_WAIT_POLICY
1647@section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
1648@cindex Environment Variable
1649@table @asis
1650@item @emph{Description}:
1651Specifies whether waiting threads should be active or passive.  If
1652the value is @code{PASSIVE}, waiting threads should not consume CPU
1653power while waiting; while the value is @code{ACTIVE} specifies that
1654they should.  If undefined, threads wait actively for a short time
1655before waiting passively.
1656
1657@item @emph{See also}:
1658@ref{GOMP_SPINCOUNT}
1659
1660@item @emph{Reference}:
1661@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
1662@end table
1663
1664
1665
1666@node GOMP_CPU_AFFINITY
1667@section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
1668@cindex Environment Variable
1669@table @asis
1670@item @emph{Description}:
1671Binds threads to specific CPUs.  The variable should contain a space-separated
1672or comma-separated list of CPUs.  This list may contain different kinds of
1673entries: either single CPU numbers in any order, a range of CPUs (M-N)
1674or a range with some stride (M-N:S).  CPU numbers are zero based.  For example,
1675@code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread
1676to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
1677CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
1678and 14 respectively and then start assigning back from the beginning of
1679the list.  @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
1680
1681There is no libgomp library routine to determine whether a CPU affinity
1682specification is in effect.  As a workaround, language-specific library
1683functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
1684Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
1685environment variable.  A defined CPU affinity on startup cannot be changed
1686or disabled during the runtime of the application.
1687
1688If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
1689@env{OMP_PROC_BIND} has a higher precedence.  If neither has been set and
1690@env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
1691@code{FALSE}, the host system will handle the assignment of threads to CPUs.
1692
1693@item @emph{See also}:
1694@ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
1695@end table
1696
1697
1698
1699@node GOMP_DEBUG
1700@section @env{GOMP_DEBUG} -- Enable debugging output
1701@cindex Environment Variable
1702@table @asis
1703@item @emph{Description}:
1704Enable debugging output.  The variable should be set to @code{0}
1705(disabled, also the default if not set), or @code{1} (enabled).
1706
1707If enabled, some debugging output will be printed during execution.
1708This is currently not specified in more detail, and subject to change.
1709@end table
1710
1711
1712
1713@node GOMP_STACKSIZE
1714@section @env{GOMP_STACKSIZE} -- Set default thread stack size
1715@cindex Environment Variable
1716@cindex Implementation specific setting
1717@table @asis
1718@item @emph{Description}:
1719Set the default thread stack size in kilobytes.  This is different from
1720@code{pthread_attr_setstacksize} which gets the number of bytes as an
1721argument.  If the stack size cannot be set due to system constraints, an
1722error is reported and the initial stack size is left unchanged.  If undefined,
1723the stack size is system dependent.
1724
1725@item @emph{See also}:
1726@ref{OMP_STACKSIZE}
1727
1728@item @emph{Reference}:
1729@uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
1730GCC Patches Mailinglist},
1731@uref{http://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
1732GCC Patches Mailinglist}
1733@end table
1734
1735
1736
1737@node GOMP_SPINCOUNT
1738@section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
1739@cindex Environment Variable
1740@cindex Implementation specific setting
1741@table @asis
1742@item @emph{Description}:
1743Determines how long a threads waits actively with consuming CPU power
1744before waiting passively without consuming CPU power.  The value may be
1745either @code{INFINITE}, @code{INFINITY} to always wait actively or an
1746integer which gives the number of spins of the busy-wait loop.  The
1747integer may optionally be followed by the following suffixes acting
1748as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
1749million), @code{G} (giga, billion), or @code{T} (tera, trillion).
1750If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
1751300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
175230 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
1753If there are more OpenMP threads than available CPUs, 1000 and 100
1754spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
1755undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
1756or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
1757
1758@item @emph{See also}:
1759@ref{OMP_WAIT_POLICY}
1760@end table
1761
1762
1763
1764@node GOMP_RTEMS_THREAD_POOLS
1765@section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
1766@cindex Environment Variable
1767@cindex Implementation specific setting
1768@table @asis
1769@item @emph{Description}:
1770This environment variable is only used on the RTEMS real-time operating system.
1771It determines the scheduler instance specific thread pools.  The format for
1772@env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
1773@code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
1774separated by @code{:} where:
1775@itemize @bullet
1776@item @code{<thread-pool-count>} is the thread pool count for this scheduler
1777instance.
1778@item @code{$<priority>} is an optional priority for the worker threads of a
1779thread pool according to @code{pthread_setschedparam}.  In case a priority
1780value is omitted, then a worker thread will inherit the priority of the OpenMP
1781master thread that created it.  The priority of the worker thread is not
1782changed after creation, even if a new OpenMP master thread using the worker has
1783a different priority.
1784@item @code{@@<scheduler-name>} is the scheduler instance name according to the
1785RTEMS application configuration.
1786@end itemize
1787In case no thread pool configuration is specified for a scheduler instance,
1788then each OpenMP master thread of this scheduler instance will use its own
1789dynamically allocated thread pool.  To limit the worker thread count of the
1790thread pools, each OpenMP master thread must call @code{omp_set_num_threads}.
1791@item @emph{Example}:
1792Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
1793@code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
1794@code{"1@@WRK0:3$4@@WRK1"}.  Then there are no thread pool restrictions for
1795scheduler instance @code{IO}.  In the scheduler instance @code{WRK0} there is
1796one thread pool available.  Since no priority is specified for this scheduler
1797instance, the worker thread inherits the priority of the OpenMP master thread
1798that created it.  In the scheduler instance @code{WRK1} there are three thread
1799pools available and their worker threads run at priority four.
1800@end table
1801
1802
1803
1804@c ---------------------------------------------------------------------
1805@c Enabling OpenACC
1806@c ---------------------------------------------------------------------
1807
1808@node Enabling OpenACC
1809@chapter Enabling OpenACC
1810
1811To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
1812flag @option{-fopenacc} must be specified.  This enables the OpenACC directive
1813@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
1814@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
1815@code{!$} conditional compilation sentinels in free form and @code{c$},
1816@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
1817arranges for automatic linking of the OpenACC runtime library
1818(@ref{OpenACC Runtime Library Routines}).
1819
1820A complete description of all OpenACC directives accepted may be found in
1821the @uref{https://www.openacc.org, OpenACC} Application Programming
1822Interface manual, version 2.0.
1823
1824Note that this is an experimental feature and subject to
1825change in future versions of GCC.  See
1826@uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
1827
1828
1829
1830@c ---------------------------------------------------------------------
1831@c OpenACC Runtime Library Routines
1832@c ---------------------------------------------------------------------
1833
1834@node OpenACC Runtime Library Routines
1835@chapter OpenACC Runtime Library Routines
1836
1837The runtime routines described here are defined by section 3 of the OpenACC
1838specifications in version 2.0.
1839They have C linkage, and do not throw exceptions.
1840Generally, they are available only for the host, with the exception of
1841@code{acc_on_device}, which is available for both the host and the
1842acceleration device.
1843
1844@menu
1845* acc_get_num_devices::         Get number of devices for the given device
1846                                type.
1847* acc_set_device_type::         Set type of device accelerator to use.
1848* acc_get_device_type::         Get type of device accelerator to be used.
1849* acc_set_device_num::          Set device number to use.
1850* acc_get_device_num::          Get device number to be used.
1851* acc_async_test::              Tests for completion of a specific asynchronous
1852                                operation.
1853* acc_async_test_all::          Tests for completion of all asychronous
1854                                operations.
1855* acc_wait::                    Wait for completion of a specific asynchronous
1856                                operation.
1857* acc_wait_all::                Waits for completion of all asyncrhonous
1858                                operations.
1859* acc_wait_all_async::          Wait for completion of all asynchronous
1860                                operations.
1861* acc_wait_async::              Wait for completion of asynchronous operations.
1862* acc_init::                    Initialize runtime for a specific device type.
1863* acc_shutdown::                Shuts down the runtime for a specific device
1864                                type.
1865* acc_on_device::               Whether executing on a particular device
1866* acc_malloc::                  Allocate device memory.
1867* acc_free::                    Free device memory.
1868* acc_copyin::                  Allocate device memory and copy host memory to
1869                                it.
1870* acc_present_or_copyin::       If the data is not present on the device,
1871                                allocate device memory and copy from host
1872                                memory.
1873* acc_create::                  Allocate device memory and map it to host
1874                                memory.
1875* acc_present_or_create::       If the data is not present on the device,
1876                                allocate device memory and map it to host
1877                                memory.
1878* acc_copyout::                 Copy device memory to host memory.
1879* acc_delete::                  Free device memory.
1880* acc_update_device::           Update device memory from mapped host memory.
1881* acc_update_self::             Update host memory from mapped device memory.
1882* acc_map_data::                Map previously allocated device memory to host
1883                                memory.
1884* acc_unmap_data::              Unmap device memory from host memory.
1885* acc_deviceptr::               Get device pointer associated with specific
1886                                host address.
1887* acc_hostptr::                 Get host pointer associated with specific
1888                                device address.
1889* acc_is_present::              Indiciate whether host variable / array is
1890                                present on device.
1891* acc_memcpy_to_device::        Copy host memory to device memory.
1892* acc_memcpy_from_device::      Copy device memory to host memory.
1893
1894API routines for target platforms.
1895
1896* acc_get_current_cuda_device:: Get CUDA device handle.
1897* acc_get_current_cuda_context::Get CUDA context handle.
1898* acc_get_cuda_stream::         Get CUDA stream handle.
1899* acc_set_cuda_stream::         Set CUDA stream handle.
1900@end menu
1901
1902
1903
1904@node acc_get_num_devices
1905@section @code{acc_get_num_devices} -- Get number of devices for given device type
1906@table @asis
1907@item @emph{Description}
1908This function returns a value indicating the number of devices available
1909for the device type specified in @var{devicetype}.
1910
1911@item @emph{C/C++}:
1912@multitable @columnfractions .20 .80
1913@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
1914@end multitable
1915
1916@item @emph{Fortran}:
1917@multitable @columnfractions .20 .80
1918@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
1919@item                  @tab @code{integer(kind=acc_device_kind) devicetype}
1920@end multitable
1921
1922@item @emph{Reference}:
1923@uref{https://www.openacc.org, OpenACC specification v2.0}, section
19243.2.1.
1925@end table
1926
1927
1928
1929@node acc_set_device_type
1930@section @code{acc_set_device_type} -- Set type of device accelerator to use.
1931@table @asis
1932@item @emph{Description}
1933This function indicates to the runtime library which device typr, specified
1934in @var{devicetype}, to use when executing a parallel or kernels region.
1935
1936@item @emph{C/C++}:
1937@multitable @columnfractions .20 .80
1938@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
1939@end multitable
1940
1941@item @emph{Fortran}:
1942@multitable @columnfractions .20 .80
1943@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
1944@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
1945@end multitable
1946
1947@item @emph{Reference}:
1948@uref{https://www.openacc.org, OpenACC specification v2.0}, section
19493.2.2.
1950@end table
1951
1952
1953
1954@node acc_get_device_type
1955@section @code{acc_get_device_type} -- Get type of device accelerator to be used.
1956@table @asis
1957@item @emph{Description}
1958This function returns what device type will be used when executing a
1959parallel or kernels region.
1960
1961@item @emph{C/C++}:
1962@multitable @columnfractions .20 .80
1963@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
1964@end multitable
1965
1966@item @emph{Fortran}:
1967@multitable @columnfractions .20 .80
1968@item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
1969@item                  @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
1970@end multitable
1971
1972@item @emph{Reference}:
1973@uref{https://www.openacc.org, OpenACC specification v2.0}, section
19743.2.3.
1975@end table
1976
1977
1978
1979@node acc_set_device_num
1980@section @code{acc_set_device_num} -- Set device number to use.
1981@table @asis
1982@item @emph{Description}
1983This function will indicate to the runtime which device number,
1984specified by @var{num}, associated with the specifed device
1985type @var{devicetype}.
1986
1987@item @emph{C/C++}:
1988@multitable @columnfractions .20 .80
1989@item @emph{Prototype}: @tab @code{acc_set_device_num(int num, acc_device_t devicetype);}
1990@end multitable
1991
1992@item @emph{Fortran}:
1993@multitable @columnfractions .20 .80
1994@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
1995@item                   @tab @code{integer devicenum}
1996@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
1997@end multitable
1998
1999@item @emph{Reference}:
2000@uref{https://www.openacc.org, OpenACC specification v2.0}, section
20013.2.4.
2002@end table
2003
2004
2005
2006@node acc_get_device_num
2007@section @code{acc_get_device_num} -- Get device number to be used.
2008@table @asis
2009@item @emph{Description}
2010This function returns which device number associated with the specified device
2011type @var{devicetype}, will be used when executing a parallel or kernels
2012region.
2013
2014@item @emph{C/C++}:
2015@multitable @columnfractions .20 .80
2016@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
2017@end multitable
2018
2019@item @emph{Fortran}:
2020@multitable @columnfractions .20 .80
2021@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
2022@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
2023@item                   @tab @code{integer acc_get_device_num}
2024@end multitable
2025
2026@item @emph{Reference}:
2027@uref{https://www.openacc.org, OpenACC specification v2.0}, section
20283.2.5.
2029@end table
2030
2031
2032
2033@node acc_async_test
2034@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
2035@table @asis
2036@item @emph{Description}
2037This function tests for completion of the asynchrounous operation specified
2038in @var{arg}. In C/C++, a non-zero value will be returned to indicate
2039the specified asynchronous operation has completed. While Fortran will return
2040a @code{true}. If the asynchrounous operation has not completed, C/C++ returns
2041a zero and Fortran returns a @code{false}.
2042
2043@item @emph{C/C++}:
2044@multitable @columnfractions .20 .80
2045@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
2046@end multitable
2047
2048@item @emph{Fortran}:
2049@multitable @columnfractions .20 .80
2050@item @emph{Interface}: @tab @code{function acc_async_test(arg)}
2051@item                   @tab @code{integer(kind=acc_handle_kind) arg}
2052@item                   @tab @code{logical acc_async_test}
2053@end multitable
2054
2055@item @emph{Reference}:
2056@uref{https://www.openacc.org, OpenACC specification v2.0}, section
20573.2.6.
2058@end table
2059
2060
2061
2062@node acc_async_test_all
2063@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
2064@table @asis
2065@item @emph{Description}
2066This function tests for completion of all asynchrounous operations.
2067In C/C++, a non-zero value will be returned to indicate all asynchronous
2068operations have completed. While Fortran will return a @code{true}. If
2069any asynchronous operation has not completed, C/C++ returns a zero and
2070Fortran returns a @code{false}.
2071
2072@item @emph{C/C++}:
2073@multitable @columnfractions .20 .80
2074@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
2075@end multitable
2076
2077@item @emph{Fortran}:
2078@multitable @columnfractions .20 .80
2079@item @emph{Interface}: @tab @code{function acc_async_test()}
2080@item                   @tab @code{logical acc_get_device_num}
2081@end multitable
2082
2083@item @emph{Reference}:
2084@uref{https://www.openacc.org, OpenACC specification v2.0}, section
20853.2.7.
2086@end table
2087
2088
2089
2090@node acc_wait
2091@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
2092@table @asis
2093@item @emph{Description}
2094This function waits for completion of the asynchronous operation
2095specified in @var{arg}.
2096
2097@item @emph{C/C++}:
2098@multitable @columnfractions .20 .80
2099@item @emph{Prototype}: @tab @code{acc_wait(arg);}
2100@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
2101@end multitable
2102
2103@item @emph{Fortran}:
2104@multitable @columnfractions .20 .80
2105@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
2106@item                   @tab @code{integer(acc_handle_kind) arg}
2107@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
2108@item                                               @tab @code{integer(acc_handle_kind) arg}
2109@end multitable
2110
2111@item @emph{Reference}:
2112@uref{https://www.openacc.org, OpenACC specification v2.0}, section
21133.2.8.
2114@end table
2115
2116
2117
2118@node acc_wait_all
2119@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
2120@table @asis
2121@item @emph{Description}
2122This function waits for the completion of all asynchronous operations.
2123
2124@item @emph{C/C++}:
2125@multitable @columnfractions .20 .80
2126@item @emph{Prototype}: @tab @code{acc_wait_all(void);}
2127@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
2128@end multitable
2129
2130@item @emph{Fortran}:
2131@multitable @columnfractions .20 .80
2132@item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
2133@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
2134@end multitable
2135
2136@item @emph{Reference}:
2137@uref{https://www.openacc.org, OpenACC specification v2.0}, section
21383.2.10.
2139@end table
2140
2141
2142
2143@node acc_wait_all_async
2144@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
2145@table @asis
2146@item @emph{Description}
2147This function enqueues a wait operation on the queue @var{async} for any
2148and all asynchronous operations that have been previously enqueued on
2149any queue.
2150
2151@item @emph{C/C++}:
2152@multitable @columnfractions .20 .80
2153@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
2154@end multitable
2155
2156@item @emph{Fortran}:
2157@multitable @columnfractions .20 .80
2158@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
2159@item                   @tab @code{integer(acc_handle_kind) async}
2160@end multitable
2161
2162@item @emph{Reference}:
2163@uref{https://www.openacc.org, OpenACC specification v2.0}, section
21643.2.11.
2165@end table
2166
2167
2168
2169@node acc_wait_async
2170@section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
2171@table @asis
2172@item @emph{Description}
2173This function enqueues a wait operation on queue @var{async} for any and all
2174asynchronous operations enqueued on queue @var{arg}.
2175
2176@item @emph{C/C++}:
2177@multitable @columnfractions .20 .80
2178@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
2179@end multitable
2180
2181@item @emph{Fortran}:
2182@multitable @columnfractions .20 .80
2183@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
2184@item                   @tab @code{integer(acc_handle_kind) arg, async}
2185@end multitable
2186
2187@item @emph{Reference}:
2188@uref{https://www.openacc.org, OpenACC specification v2.0}, section
21893.2.9.
2190@end table
2191
2192
2193
2194@node acc_init
2195@section @code{acc_init} -- Initialize runtime for a specific device type.
2196@table @asis
2197@item @emph{Description}
2198This function initializes the runtime for the device type specified in
2199@var{devicetype}.
2200
2201@item @emph{C/C++}:
2202@multitable @columnfractions .20 .80
2203@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
2204@end multitable
2205
2206@item @emph{Fortran}:
2207@multitable @columnfractions .20 .80
2208@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
2209@item                   @tab @code{integer(acc_device_kind) devicetype}
2210@end multitable
2211
2212@item @emph{Reference}:
2213@uref{https://www.openacc.org, OpenACC specification v2.0}, section
22143.2.12.
2215@end table
2216
2217
2218
2219@node acc_shutdown
2220@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
2221@table @asis
2222@item @emph{Description}
2223This function shuts down the runtime for the device type specified in
2224@var{devicetype}.
2225
2226@item @emph{C/C++}:
2227@multitable @columnfractions .20 .80
2228@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
2229@end multitable
2230
2231@item @emph{Fortran}:
2232@multitable @columnfractions .20 .80
2233@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
2234@item                   @tab @code{integer(acc_device_kind) devicetype}
2235@end multitable
2236
2237@item @emph{Reference}:
2238@uref{https://www.openacc.org, OpenACC specification v2.0}, section
22393.2.13.
2240@end table
2241
2242
2243
2244@node acc_on_device
2245@section @code{acc_on_device} -- Whether executing on a particular device
2246@table @asis
2247@item @emph{Description}:
2248This function returns whether the program is executing on a particular
2249device specified in @var{devicetype}. In C/C++ a non-zero value is
2250returned to indicate the device is execiting on the specified device type.
2251In Fortran, @code{true} will be returned. If the program is not executing
2252on the specified device type C/C++ will return a zero, while Fortran will
2253return @code{false}.
2254
2255@item @emph{C/C++}:
2256@multitable @columnfractions .20 .80
2257@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
2258@end multitable
2259
2260@item @emph{Fortran}:
2261@multitable @columnfractions .20 .80
2262@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
2263@item                   @tab @code{integer(acc_device_kind) devicetype}
2264@item                   @tab @code{logical acc_on_device}
2265@end multitable
2266
2267
2268@item @emph{Reference}:
2269@uref{https://www.openacc.org, OpenACC specification v2.0}, section
22703.2.14.
2271@end table
2272
2273
2274
2275@node acc_malloc
2276@section @code{acc_malloc} -- Allocate device memory.
2277@table @asis
2278@item @emph{Description}
2279This function allocates @var{len} bytes of device memory. It returns
2280the device address of the allocated memory.
2281
2282@item @emph{C/C++}:
2283@multitable @columnfractions .20 .80
2284@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
2285@end multitable
2286
2287@item @emph{Reference}:
2288@uref{https://www.openacc.org, OpenACC specification v2.0}, section
22893.2.15.
2290@end table
2291
2292
2293
2294@node acc_free
2295@section @code{acc_free} -- Free device memory.
2296@table @asis
2297@item @emph{Description}
2298Free previously allocated device memory at the device address @code{a}.
2299
2300@item @emph{C/C++}:
2301@multitable @columnfractions .20 .80
2302@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
2303@end multitable
2304
2305@item @emph{Reference}:
2306@uref{https://www.openacc.org, OpenACC specification v2.0}, section
23073.2.16.
2308@end table
2309
2310
2311
2312@node acc_copyin
2313@section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
2314@table @asis
2315@item @emph{Description}
2316In C/C++, this function allocates @var{len} bytes of device memory
2317and maps it to the specified host address in @var{a}. The device
2318address of the newly allocated device memory is returned.
2319
2320In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2321a contiguous array section. The second form @var{a} specifies a
2322variable or array element and @var{len} specifies the length in bytes.
2323
2324@item @emph{C/C++}:
2325@multitable @columnfractions .20 .80
2326@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
2327@end multitable
2328
2329@item @emph{Fortran}:
2330@multitable @columnfractions .20 .80
2331@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
2332@item                   @tab @code{type, dimension(:[,:]...) :: a}
2333@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
2334@item                   @tab @code{type, dimension(:[,:]...) :: a}
2335@item                   @tab @code{integer len}
2336@end multitable
2337
2338@item @emph{Reference}:
2339@uref{https://www.openacc.org, OpenACC specification v2.0}, section
23403.2.17.
2341@end table
2342
2343
2344
2345@node acc_present_or_copyin
2346@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
2347@table @asis
2348@item @emph{Description}
2349This function tests if the host data specifed by @var{a} and of length
2350@var{len} is present or not. If it is not present, then device memory
2351will be allocated and the host memory copied. The device address of
2352the newly allocated device memory is returned.
2353
2354In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2355a contiguous array section. The second form @var{a} specifies a variable or
2356array element and @var{len} specifies the length in bytes.
2357
2358@item @emph{C/C++}:
2359@multitable @columnfractions .20 .80
2360@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
2361@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
2362@end multitable
2363
2364@item @emph{Fortran}:
2365@multitable @columnfractions .20 .80
2366@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
2367@item                   @tab @code{type, dimension(:[,:]...) :: a}
2368@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
2369@item                   @tab @code{type, dimension(:[,:]...) :: a}
2370@item                   @tab @code{integer len}
2371@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
2372@item                   @tab @code{type, dimension(:[,:]...) :: a}
2373@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
2374@item                   @tab @code{type, dimension(:[,:]...) :: a}
2375@item                   @tab @code{integer len}
2376@end multitable
2377
2378@item @emph{Reference}:
2379@uref{https://www.openacc.org, OpenACC specification v2.0}, section
23803.2.18.
2381@end table
2382
2383
2384
2385@node acc_create
2386@section @code{acc_create} -- Allocate device memory and map it to host memory.
2387@table @asis
2388@item @emph{Description}
2389This function allocates device memory and maps it to host memory specified
2390by the host address @var{a} with a length of @var{len} bytes. In C/C++,
2391the function returns the device address of the allocated device memory.
2392
2393In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2394a contiguous array section. The second form @var{a} specifies a variable or
2395array element and @var{len} specifies the length in bytes.
2396
2397@item @emph{C/C++}:
2398@multitable @columnfractions .20 .80
2399@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
2400@end multitable
2401
2402@item @emph{Fortran}:
2403@multitable @columnfractions .20 .80
2404@item @emph{Interface}: @tab @code{subroutine acc_create(a)}
2405@item                   @tab @code{type, dimension(:[,:]...) :: a}
2406@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
2407@item                   @tab @code{type, dimension(:[,:]...) :: a}
2408@item                   @tab @code{integer len}
2409@end multitable
2410
2411@item @emph{Reference}:
2412@uref{https://www.openacc.org, OpenACC specification v2.0}, section
24133.2.19.
2414@end table
2415
2416
2417
2418@node acc_present_or_create
2419@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
2420@table @asis
2421@item @emph{Description}
2422This function tests if the host data specifed by @var{a} and of length
2423@var{len} is present or not. If it is not present, then device memory
2424will be allocated and mapped to host memory. In C/C++, the device address
2425of the newly allocated device memory is returned.
2426
2427In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2428a contiguous array section. The second form @var{a} specifies a variable or
2429array element and @var{len} specifies the length in bytes.
2430
2431
2432@item @emph{C/C++}:
2433@multitable @columnfractions .20 .80
2434@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
2435@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
2436@end multitable
2437
2438@item @emph{Fortran}:
2439@multitable @columnfractions .20 .80
2440@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
2441@item                   @tab @code{type, dimension(:[,:]...) :: a}
2442@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
2443@item                   @tab @code{type, dimension(:[,:]...) :: a}
2444@item                   @tab @code{integer len}
2445@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
2446@item                   @tab @code{type, dimension(:[,:]...) :: a}
2447@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
2448@item                   @tab @code{type, dimension(:[,:]...) :: a}
2449@item                   @tab @code{integer len}
2450@end multitable
2451
2452@item @emph{Reference}:
2453@uref{https://www.openacc.org, OpenACC specification v2.0}, section
24543.2.20.
2455@end table
2456
2457
2458
2459@node acc_copyout
2460@section @code{acc_copyout} -- Copy device memory to host memory.
2461@table @asis
2462@item @emph{Description}
2463This function copies mapped device memory to host memory which is specified
2464by host address @var{a} for a length @var{len} bytes in C/C++.
2465
2466In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2467a contiguous array section. The second form @var{a} specifies a variable or
2468array element and @var{len} specifies the length in bytes.
2469
2470@item @emph{C/C++}:
2471@multitable @columnfractions .20 .80
2472@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
2473@end multitable
2474
2475@item @emph{Fortran}:
2476@multitable @columnfractions .20 .80
2477@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
2478@item                   @tab @code{type, dimension(:[,:]...) :: a}
2479@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
2480@item                   @tab @code{type, dimension(:[,:]...) :: a}
2481@item                   @tab @code{integer len}
2482@end multitable
2483
2484@item @emph{Reference}:
2485@uref{https://www.openacc.org, OpenACC specification v2.0}, section
24863.2.21.
2487@end table
2488
2489
2490
2491@node acc_delete
2492@section @code{acc_delete} -- Free device memory.
2493@table @asis
2494@item @emph{Description}
2495This function frees previously allocated device memory specified by
2496the device address @var{a} and the length of @var{len} bytes.
2497
2498In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2499a contiguous array section. The second form @var{a} specifies a variable or
2500array element and @var{len} specifies the length in bytes.
2501
2502@item @emph{C/C++}:
2503@multitable @columnfractions .20 .80
2504@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
2505@end multitable
2506
2507@item @emph{Fortran}:
2508@multitable @columnfractions .20 .80
2509@item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
2510@item                   @tab @code{type, dimension(:[,:]...) :: a}
2511@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
2512@item                   @tab @code{type, dimension(:[,:]...) :: a}
2513@item                   @tab @code{integer len}
2514@end multitable
2515
2516@item @emph{Reference}:
2517@uref{https://www.openacc.org, OpenACC specification v2.0}, section
25183.2.22.
2519@end table
2520
2521
2522
2523@node acc_update_device
2524@section @code{acc_update_device} -- Update device memory from mapped host memory.
2525@table @asis
2526@item @emph{Description}
2527This function updates the device copy from the previously mapped host memory.
2528The host memory is specified with the host address @var{a} and a length of
2529@var{len} bytes.
2530
2531In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2532a contiguous array section. The second form @var{a} specifies a variable or
2533array element and @var{len} specifies the length in bytes.
2534
2535@item @emph{C/C++}:
2536@multitable @columnfractions .20 .80
2537@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
2538@end multitable
2539
2540@item @emph{Fortran}:
2541@multitable @columnfractions .20 .80
2542@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
2543@item                   @tab @code{type, dimension(:[,:]...) :: a}
2544@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
2545@item                   @tab @code{type, dimension(:[,:]...) :: a}
2546@item                   @tab @code{integer len}
2547@end multitable
2548
2549@item @emph{Reference}:
2550@uref{https://www.openacc.org, OpenACC specification v2.0}, section
25513.2.23.
2552@end table
2553
2554
2555
2556@node acc_update_self
2557@section @code{acc_update_self} -- Update host memory from mapped device memory.
2558@table @asis
2559@item @emph{Description}
2560This function updates the host copy from the previously mapped device memory.
2561The host memory is specified with the host address @var{a} and a length of
2562@var{len} bytes.
2563
2564In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2565a contiguous array section. The second form @var{a} specifies a variable or
2566array element and @var{len} specifies the length in bytes.
2567
2568@item @emph{C/C++}:
2569@multitable @columnfractions .20 .80
2570@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
2571@end multitable
2572
2573@item @emph{Fortran}:
2574@multitable @columnfractions .20 .80
2575@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
2576@item                   @tab @code{type, dimension(:[,:]...) :: a}
2577@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
2578@item                   @tab @code{type, dimension(:[,:]...) :: a}
2579@item                   @tab @code{integer len}
2580@end multitable
2581
2582@item @emph{Reference}:
2583@uref{https://www.openacc.org, OpenACC specification v2.0}, section
25843.2.24.
2585@end table
2586
2587
2588
2589@node acc_map_data
2590@section @code{acc_map_data} -- Map previously allocated device memory to host memory.
2591@table @asis
2592@item @emph{Description}
2593This function maps previously allocated device and host memory. The device
2594memory is specified with the device address @var{d}. The host memory is
2595specified with the host address @var{h} and a length of @var{len}.
2596
2597@item @emph{C/C++}:
2598@multitable @columnfractions .20 .80
2599@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
2600@end multitable
2601
2602@item @emph{Reference}:
2603@uref{https://www.openacc.org, OpenACC specification v2.0}, section
26043.2.25.
2605@end table
2606
2607
2608
2609@node acc_unmap_data
2610@section @code{acc_unmap_data} -- Unmap device memory from host memory.
2611@table @asis
2612@item @emph{Description}
2613This function unmaps previously mapped device and host memory. The latter
2614specified by @var{h}.
2615
2616@item @emph{C/C++}:
2617@multitable @columnfractions .20 .80
2618@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
2619@end multitable
2620
2621@item @emph{Reference}:
2622@uref{https://www.openacc.org, OpenACC specification v2.0}, section
26233.2.26.
2624@end table
2625
2626
2627
2628@node acc_deviceptr
2629@section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
2630@table @asis
2631@item @emph{Description}
2632This function returns the device address that has been mapped to the
2633host address specified by @var{h}.
2634
2635@item @emph{C/C++}:
2636@multitable @columnfractions .20 .80
2637@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
2638@end multitable
2639
2640@item @emph{Reference}:
2641@uref{https://www.openacc.org, OpenACC specification v2.0}, section
26423.2.27.
2643@end table
2644
2645
2646
2647@node acc_hostptr
2648@section @code{acc_hostptr} -- Get host pointer associated with specific device address.
2649@table @asis
2650@item @emph{Description}
2651This function returns the host address that has been mapped to the
2652device address specified by @var{d}.
2653
2654@item @emph{C/C++}:
2655@multitable @columnfractions .20 .80
2656@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
2657@end multitable
2658
2659@item @emph{Reference}:
2660@uref{https://www.openacc.org, OpenACC specification v2.0}, section
26613.2.28.
2662@end table
2663
2664
2665
2666@node acc_is_present
2667@section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
2668@table @asis
2669@item @emph{Description}
2670This function indicates whether the specified host address in @var{a} and a
2671length of @var{len} bytes is present on the device. In C/C++, a non-zero
2672value is returned to indicate the presence of the mapped memory on the
2673device. A zero is returned to indicate the memory is not mapped on the
2674device.
2675
2676In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2677a contiguous array section. The second form @var{a} specifies a variable or
2678array element and @var{len} specifies the length in bytes. If the host
2679memory is mapped to device memory, then a @code{true} is returned. Otherwise,
2680a @code{false} is return to indicate the mapped memory is not present.
2681
2682@item @emph{C/C++}:
2683@multitable @columnfractions .20 .80
2684@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
2685@end multitable
2686
2687@item @emph{Fortran}:
2688@multitable @columnfractions .20 .80
2689@item @emph{Interface}: @tab @code{function acc_is_present(a)}
2690@item                   @tab @code{type, dimension(:[,:]...) :: a}
2691@item                   @tab @code{logical acc_is_present}
2692@item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
2693@item                   @tab @code{type, dimension(:[,:]...) :: a}
2694@item                   @tab @code{integer len}
2695@item                   @tab @code{logical acc_is_present}
2696@end multitable
2697
2698@item @emph{Reference}:
2699@uref{https://www.openacc.org, OpenACC specification v2.0}, section
27003.2.29.
2701@end table
2702
2703
2704
2705@node acc_memcpy_to_device
2706@section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
2707@table @asis
2708@item @emph{Description}
2709This function copies host memory specified by host address of @var{src} to
2710device memory specified by the device address @var{dest} for a length of
2711@var{bytes} bytes.
2712
2713@item @emph{C/C++}:
2714@multitable @columnfractions .20 .80
2715@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
2716@end multitable
2717
2718@item @emph{Reference}:
2719@uref{https://www.openacc.org, OpenACC specification v2.0}, section
27203.2.30.
2721@end table
2722
2723
2724
2725@node acc_memcpy_from_device
2726@section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
2727@table @asis
2728@item @emph{Description}
2729This function copies host memory specified by host address of @var{src} from
2730device memory specified by the device address @var{dest} for a length of
2731@var{bytes} bytes.
2732
2733@item @emph{C/C++}:
2734@multitable @columnfractions .20 .80
2735@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
2736@end multitable
2737
2738@item @emph{Reference}:
2739@uref{https://www.openacc.org, OpenACC specification v2.0}, section
27403.2.31.
2741@end table
2742
2743
2744
2745@node acc_get_current_cuda_device
2746@section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
2747@table @asis
2748@item @emph{Description}
2749This function returns the CUDA device handle. This handle is the same
2750as used by the CUDA Runtime or Driver API's.
2751
2752@item @emph{C/C++}:
2753@multitable @columnfractions .20 .80
2754@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
2755@end multitable
2756
2757@item @emph{Reference}:
2758@uref{https://www.openacc.org, OpenACC specification v2.0}, section
2759A.2.1.1.
2760@end table
2761
2762
2763
2764@node acc_get_current_cuda_context
2765@section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
2766@table @asis
2767@item @emph{Description}
2768This function returns the CUDA context handle. This handle is the same
2769as used by the CUDA Runtime or Driver API's.
2770
2771@item @emph{C/C++}:
2772@multitable @columnfractions .20 .80
2773@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
2774@end multitable
2775
2776@item @emph{Reference}:
2777@uref{https://www.openacc.org, OpenACC specification v2.0}, section
2778A.2.1.2.
2779@end table
2780
2781
2782
2783@node acc_get_cuda_stream
2784@section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
2785@table @asis
2786@item @emph{Description}
2787This function returns the CUDA stream handle for the queue @var{async}.
2788This handle is the same as used by the CUDA Runtime or Driver API's.
2789
2790@item @emph{C/C++}:
2791@multitable @columnfractions .20 .80
2792@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
2793@end multitable
2794
2795@item @emph{Reference}:
2796@uref{https://www.openacc.org, OpenACC specification v2.0}, section
2797A.2.1.3.
2798@end table
2799
2800
2801
2802@node acc_set_cuda_stream
2803@section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
2804@table @asis
2805@item @emph{Description}
2806This function associates the stream handle specified by @var{stream} with
2807the queue @var{async}.
2808
2809This cannot be used to change the stream handle associated with
2810@code{acc_async_sync}.
2811
2812The return value is not specified.
2813
2814@item @emph{C/C++}:
2815@multitable @columnfractions .20 .80
2816@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
2817@end multitable
2818
2819@item @emph{Reference}:
2820@uref{https://www.openacc.org, OpenACC specification v2.0}, section
2821A.2.1.4.
2822@end table
2823
2824
2825
2826@c ---------------------------------------------------------------------
2827@c OpenACC Environment Variables
2828@c ---------------------------------------------------------------------
2829
2830@node OpenACC Environment Variables
2831@chapter OpenACC Environment Variables
2832
2833The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
2834are defined by section 4 of the OpenACC specification in version 2.0.
2835The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
2836
2837@menu
2838* ACC_DEVICE_TYPE::
2839* ACC_DEVICE_NUM::
2840* GCC_ACC_NOTIFY::
2841@end menu
2842
2843
2844
2845@node ACC_DEVICE_TYPE
2846@section @code{ACC_DEVICE_TYPE}
2847@table @asis
2848@item @emph{Reference}:
2849@uref{https://www.openacc.org, OpenACC specification v2.0}, section
28504.1.
2851@end table
2852
2853
2854
2855@node ACC_DEVICE_NUM
2856@section @code{ACC_DEVICE_NUM}
2857@table @asis
2858@item @emph{Reference}:
2859@uref{https://www.openacc.org, OpenACC specification v2.0}, section
28604.2.
2861@end table
2862
2863
2864
2865@node GCC_ACC_NOTIFY
2866@section @code{GCC_ACC_NOTIFY}
2867@table @asis
2868@item @emph{Description}:
2869Print debug information pertaining to the accelerator.
2870@end table
2871
2872
2873
2874@c ---------------------------------------------------------------------
2875@c CUDA Streams Usage
2876@c ---------------------------------------------------------------------
2877
2878@node CUDA Streams Usage
2879@chapter CUDA Streams Usage
2880
2881This applies to the @code{nvptx} plugin only.
2882
2883The library provides elements that perform asynchronous movement of
2884data and asynchronous operation of computing constructs.  This
2885asynchronous functionality is implemented by making use of CUDA
2886streams@footnote{See "Stream Management" in "CUDA Driver API",
2887TRM-06703-001, Version 5.5, for additional information}.
2888
2889The primary means by that the asychronous functionality is accessed
2890is through the use of those OpenACC directives which make use of the
2891@code{async} and @code{wait} clauses.  When the @code{async} clause is
2892first used with a directive, it creates a CUDA stream.  If an
2893@code{async-argument} is used with the @code{async} clause, then the
2894stream is associated with the specified @code{async-argument}.
2895
2896Following the creation of an association between a CUDA stream and the
2897@code{async-argument} of an @code{async} clause, both the @code{wait}
2898clause and the @code{wait} directive can be used.  When either the
2899clause or directive is used after stream creation, it creates a
2900rendezvous point whereby execution waits until all operations
2901associated with the @code{async-argument}, that is, stream, have
2902completed.
2903
2904Normally, the management of the streams that are created as a result of
2905using the @code{async} clause, is done without any intervention by the
2906caller.  This implies the association between the @code{async-argument}
2907and the CUDA stream will be maintained for the lifetime of the program.
2908However, this association can be changed through the use of the library
2909function @code{acc_set_cuda_stream}.  When the function
2910@code{acc_set_cuda_stream} is called, the CUDA stream that was
2911originally associated with the @code{async} clause will be destroyed.
2912Caution should be taken when changing the association as subsequent
2913references to the @code{async-argument} refer to a different
2914CUDA stream.
2915
2916
2917
2918@c ---------------------------------------------------------------------
2919@c OpenACC Library Interoperability
2920@c ---------------------------------------------------------------------
2921
2922@node OpenACC Library Interoperability
2923@chapter OpenACC Library Interoperability
2924
2925@section Introduction
2926
2927The OpenACC library uses the CUDA Driver API, and may interact with
2928programs that use the Runtime library directly, or another library
2929based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
2930"Interactions with the CUDA Driver API" in
2931"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
2932Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
2933for additional information on library interoperability.}.
2934This chapter describes the use cases and what changes are
2935required in order to use both the OpenACC library and the CUBLAS and Runtime
2936libraries within a program.
2937
2938@section First invocation: NVIDIA CUBLAS library API
2939
2940In this first use case (see below), a function in the CUBLAS library is called
2941prior to any of the functions in the OpenACC library. More specifically, the
2942function @code{cublasCreate()}.
2943
2944When invoked, the function initializes the library and allocates the
2945hardware resources on the host and the device on behalf of the caller. Once
2946the initialization and allocation has completed, a handle is returned to the
2947caller. The OpenACC library also requires initialization and allocation of
2948hardware resources. Since the CUBLAS library has already allocated the
2949hardware resources for the device, all that is left to do is to initialize
2950the OpenACC library and acquire the hardware resources on the host.
2951
2952Prior to calling the OpenACC function that initializes the library and
2953allocate the host hardware resources, you need to acquire the device number
2954that was allocated during the call to @code{cublasCreate()}. The invoking of the
2955runtime library function @code{cudaGetDevice()} accomplishes this. Once
2956acquired, the device number is passed along with the device type as
2957parameters to the OpenACC library function @code{acc_set_device_num()}.
2958
2959Once the call to @code{acc_set_device_num()} has completed, the OpenACC
2960library uses the  context that was created during the call to
2961@code{cublasCreate()}. In other words, both libraries will be sharing the
2962same context.
2963
2964@smallexample
2965    /* Create the handle */
2966    s = cublasCreate(&h);
2967    if (s != CUBLAS_STATUS_SUCCESS)
2968    @{
2969        fprintf(stderr, "cublasCreate failed %d\n", s);
2970        exit(EXIT_FAILURE);
2971    @}
2972
2973    /* Get the device number */
2974    e = cudaGetDevice(&dev);
2975    if (e != cudaSuccess)
2976    @{
2977        fprintf(stderr, "cudaGetDevice failed %d\n", e);
2978        exit(EXIT_FAILURE);
2979    @}
2980
2981    /* Initialize OpenACC library and use device 'dev' */
2982    acc_set_device_num(dev, acc_device_nvidia);
2983
2984@end smallexample
2985@center Use Case 1
2986
2987@section First invocation: OpenACC library API
2988
2989In this second use case (see below), a function in the OpenACC library is
2990called prior to any of the functions in the CUBLAS library. More specificially,
2991the function @code{acc_set_device_num()}.
2992
2993In the use case presented here, the function @code{acc_set_device_num()}
2994is used to both initialize the OpenACC library and allocate the hardware
2995resources on the host and the device. In the call to the function, the
2996call parameters specify which device to use and what device
2997type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
2998is but one method to initialize the OpenACC library and allocate the
2999appropriate hardware resources. Other methods are available through the
3000use of environment variables and these will be discussed in the next section.
3001
3002Once the call to @code{acc_set_device_num()} has completed, other OpenACC
3003functions can be called as seen with multiple calls being made to
3004@code{acc_copyin()}. In addition, calls can be made to functions in the
3005CUBLAS library. In the use case a call to @code{cublasCreate()} is made
3006subsequent to the calls to @code{acc_copyin()}.
3007As seen in the previous use case, a call to @code{cublasCreate()}
3008initializes the CUBLAS library and allocates the hardware resources on the
3009host and the device.  However, since the device has already been allocated,
3010@code{cublasCreate()} will only initialize the CUBLAS library and allocate
3011the appropriate hardware resources on the host. The context that was created
3012as part of the OpenACC initialization is shared with the CUBLAS library,
3013similarly to the first use case.
3014
3015@smallexample
3016    dev = 0;
3017
3018    acc_set_device_num(dev, acc_device_nvidia);
3019
3020    /* Copy the first set to the device */
3021    d_X = acc_copyin(&h_X[0], N * sizeof (float));
3022    if (d_X == NULL)
3023    @{
3024        fprintf(stderr, "copyin error h_X\n");
3025        exit(EXIT_FAILURE);
3026    @}
3027
3028    /* Copy the second set to the device */
3029    d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
3030    if (d_Y == NULL)
3031    @{
3032        fprintf(stderr, "copyin error h_Y1\n");
3033        exit(EXIT_FAILURE);
3034    @}
3035
3036    /* Create the handle */
3037    s = cublasCreate(&h);
3038    if (s != CUBLAS_STATUS_SUCCESS)
3039    @{
3040        fprintf(stderr, "cublasCreate failed %d\n", s);
3041        exit(EXIT_FAILURE);
3042    @}
3043
3044    /* Perform saxpy using CUBLAS library function */
3045    s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
3046    if (s != CUBLAS_STATUS_SUCCESS)
3047    @{
3048        fprintf(stderr, "cublasSaxpy failed %d\n", s);
3049        exit(EXIT_FAILURE);
3050    @}
3051
3052    /* Copy the results from the device */
3053    acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
3054
3055@end smallexample
3056@center Use Case 2
3057
3058@section OpenACC library and environment variables
3059
3060There are two environment variables associated with the OpenACC library
3061that may be used to control the device type and device number:
3062@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respecively. These two
3063environement variables can be used as an alternative to calling
3064@code{acc_set_device_num()}. As seen in the second use case, the device
3065type and device number were specified using @code{acc_set_device_num()}.
3066If however, the aforementioned environment variables were set, then the
3067call to @code{acc_set_device_num()} would not be required.
3068
3069
3070The use of the environment variables is only relevant when an OpenACC function
3071is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
3072is called prior to a call to an OpenACC function, then you must call
3073@code{acc_set_device_num()}@footnote{More complete information
3074about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
3075sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
3076Application Programming Interface”, Version 2.0.}
3077
3078
3079
3080@c ---------------------------------------------------------------------
3081@c The libgomp ABI
3082@c ---------------------------------------------------------------------
3083
3084@node The libgomp ABI
3085@chapter The libgomp ABI
3086
3087The following sections present notes on the external ABI as
3088presented by libgomp.  Only maintainers should need them.
3089
3090@menu
3091* Implementing MASTER construct::
3092* Implementing CRITICAL construct::
3093* Implementing ATOMIC construct::
3094* Implementing FLUSH construct::
3095* Implementing BARRIER construct::
3096* Implementing THREADPRIVATE construct::
3097* Implementing PRIVATE clause::
3098* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
3099* Implementing REDUCTION clause::
3100* Implementing PARALLEL construct::
3101* Implementing FOR construct::
3102* Implementing ORDERED construct::
3103* Implementing SECTIONS construct::
3104* Implementing SINGLE construct::
3105* Implementing OpenACC's PARALLEL construct::
3106@end menu
3107
3108
3109@node Implementing MASTER construct
3110@section Implementing MASTER construct
3111
3112@smallexample
3113if (omp_get_thread_num () == 0)
3114  block
3115@end smallexample
3116
3117Alternately, we generate two copies of the parallel subfunction
3118and only include this in the version run by the master thread.
3119Surely this is not worthwhile though...
3120
3121
3122
3123@node Implementing CRITICAL construct
3124@section Implementing CRITICAL construct
3125
3126Without a specified name,
3127
3128@smallexample
3129  void GOMP_critical_start (void);
3130  void GOMP_critical_end (void);
3131@end smallexample
3132
3133so that we don't get COPY relocations from libgomp to the main
3134application.
3135
3136With a specified name, use omp_set_lock and omp_unset_lock with
3137name being transformed into a variable declared like
3138
3139@smallexample
3140  omp_lock_t gomp_critical_user_<name> __attribute__((common))
3141@end smallexample
3142
3143Ideally the ABI would specify that all zero is a valid unlocked
3144state, and so we wouldn't need to initialize this at
3145startup.
3146
3147
3148
3149@node Implementing ATOMIC construct
3150@section Implementing ATOMIC construct
3151
3152The target should implement the @code{__sync} builtins.
3153
3154Failing that we could add
3155
3156@smallexample
3157  void GOMP_atomic_enter (void)
3158  void GOMP_atomic_exit (void)
3159@end smallexample
3160
3161which reuses the regular lock code, but with yet another lock
3162object private to the library.
3163
3164
3165
3166@node Implementing FLUSH construct
3167@section Implementing FLUSH construct
3168
3169Expands to the @code{__sync_synchronize} builtin.
3170
3171
3172
3173@node Implementing BARRIER construct
3174@section Implementing BARRIER construct
3175
3176@smallexample
3177  void GOMP_barrier (void)
3178@end smallexample
3179
3180
3181@node Implementing THREADPRIVATE construct
3182@section Implementing THREADPRIVATE construct
3183
3184In _most_ cases we can map this directly to @code{__thread}.  Except
3185that OMP allows constructors for C++ objects.  We can either
3186refuse to support this (how often is it used?) or we can
3187implement something akin to .ctors.
3188
3189Even more ideally, this ctor feature is handled by extensions
3190to the main pthreads library.  Failing that, we can have a set
3191of entry points to register ctor functions to be called.
3192
3193
3194
3195@node Implementing PRIVATE clause
3196@section Implementing PRIVATE clause
3197
3198In association with a PARALLEL, or within the lexical extent
3199of a PARALLEL block, the variable becomes a local variable in
3200the parallel subfunction.
3201
3202In association with FOR or SECTIONS blocks, create a new
3203automatic variable within the current function.  This preserves
3204the semantic of new variable creation.
3205
3206
3207
3208@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
3209@section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
3210
3211This seems simple enough for PARALLEL blocks.  Create a private
3212struct for communicating between the parent and subfunction.
3213In the parent, copy in values for scalar and "small" structs;
3214copy in addresses for others TREE_ADDRESSABLE types.  In the
3215subfunction, copy the value into the local variable.
3216
3217It is not clear what to do with bare FOR or SECTION blocks.
3218The only thing I can figure is that we do something like:
3219
3220@smallexample
3221#pragma omp for firstprivate(x) lastprivate(y)
3222for (int i = 0; i < n; ++i)
3223  body;
3224@end smallexample
3225
3226which becomes
3227
3228@smallexample
3229@{
3230  int x = x, y;
3231
3232  // for stuff
3233
3234  if (i == n)
3235    y = y;
3236@}
3237@end smallexample
3238
3239where the "x=x" and "y=y" assignments actually have different
3240uids for the two variables, i.e. not something you could write
3241directly in C.  Presumably this only makes sense if the "outer"
3242x and y are global variables.
3243
3244COPYPRIVATE would work the same way, except the structure
3245broadcast would have to happen via SINGLE machinery instead.
3246
3247
3248
3249@node Implementing REDUCTION clause
3250@section Implementing REDUCTION clause
3251
3252The private struct mentioned in the previous section should have
3253a pointer to an array of the type of the variable, indexed by the
3254thread's @var{team_id}.  The thread stores its final value into the
3255array, and after the barrier, the master thread iterates over the
3256array to collect the values.
3257
3258
3259@node Implementing PARALLEL construct
3260@section Implementing PARALLEL construct
3261
3262@smallexample
3263  #pragma omp parallel
3264  @{
3265    body;
3266  @}
3267@end smallexample
3268
3269becomes
3270
3271@smallexample
3272  void subfunction (void *data)
3273  @{
3274    use data;
3275    body;
3276  @}
3277
3278  setup data;
3279  GOMP_parallel_start (subfunction, &data, num_threads);
3280  subfunction (&data);
3281  GOMP_parallel_end ();
3282@end smallexample
3283
3284@smallexample
3285  void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
3286@end smallexample
3287
3288The @var{FN} argument is the subfunction to be run in parallel.
3289
3290The @var{DATA} argument is a pointer to a structure used to
3291communicate data in and out of the subfunction, as discussed
3292above with respect to FIRSTPRIVATE et al.
3293
3294The @var{NUM_THREADS} argument is 1 if an IF clause is present
3295and false, or the value of the NUM_THREADS clause, if
3296present, or 0.
3297
3298The function needs to create the appropriate number of
3299threads and/or launch them from the dock.  It needs to
3300create the team structure and assign team ids.
3301
3302@smallexample
3303  void GOMP_parallel_end (void)
3304@end smallexample
3305
3306Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
3307
3308
3309
3310@node Implementing FOR construct
3311@section Implementing FOR construct
3312
3313@smallexample
3314  #pragma omp parallel for
3315  for (i = lb; i <= ub; i++)
3316    body;
3317@end smallexample
3318
3319becomes
3320
3321@smallexample
3322  void subfunction (void *data)
3323  @{
3324    long _s0, _e0;
3325    while (GOMP_loop_static_next (&_s0, &_e0))
3326    @{
3327      long _e1 = _e0, i;
3328      for (i = _s0; i < _e1; i++)
3329        body;
3330    @}
3331    GOMP_loop_end_nowait ();
3332  @}
3333
3334  GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
3335  subfunction (NULL);
3336  GOMP_parallel_end ();
3337@end smallexample
3338
3339@smallexample
3340  #pragma omp for schedule(runtime)
3341  for (i = 0; i < n; i++)
3342    body;
3343@end smallexample
3344
3345becomes
3346
3347@smallexample
3348  @{
3349    long i, _s0, _e0;
3350    if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
3351      do @{
3352        long _e1 = _e0;
3353        for (i = _s0, i < _e0; i++)
3354          body;
3355      @} while (GOMP_loop_runtime_next (&_s0, _&e0));
3356    GOMP_loop_end ();
3357  @}
3358@end smallexample
3359
3360Note that while it looks like there is trickiness to propagating
3361a non-constant STEP, there isn't really.  We're explicitly allowed
3362to evaluate it as many times as we want, and any variables involved
3363should automatically be handled as PRIVATE or SHARED like any other
3364variables.  So the expression should remain evaluable in the
3365subfunction.  We can also pull it into a local variable if we like,
3366but since its supposed to remain unchanged, we can also not if we like.
3367
3368If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
3369able to get away with no work-sharing context at all, since we can
3370simply perform the arithmetic directly in each thread to divide up
3371the iterations.  Which would mean that we wouldn't need to call any
3372of these routines.
3373
3374There are separate routines for handling loops with an ORDERED
3375clause.  Bookkeeping for that is non-trivial...
3376
3377
3378
3379@node Implementing ORDERED construct
3380@section Implementing ORDERED construct
3381
3382@smallexample
3383  void GOMP_ordered_start (void)
3384  void GOMP_ordered_end (void)
3385@end smallexample
3386
3387
3388
3389@node Implementing SECTIONS construct
3390@section Implementing SECTIONS construct
3391
3392A block as
3393
3394@smallexample
3395  #pragma omp sections
3396  @{
3397    #pragma omp section
3398    stmt1;
3399    #pragma omp section
3400    stmt2;
3401    #pragma omp section
3402    stmt3;
3403  @}
3404@end smallexample
3405
3406becomes
3407
3408@smallexample
3409  for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
3410    switch (i)
3411      @{
3412      case 1:
3413        stmt1;
3414        break;
3415      case 2:
3416        stmt2;
3417        break;
3418      case 3:
3419        stmt3;
3420        break;
3421      @}
3422  GOMP_barrier ();
3423@end smallexample
3424
3425
3426@node Implementing SINGLE construct
3427@section Implementing SINGLE construct
3428
3429A block like
3430
3431@smallexample
3432  #pragma omp single
3433  @{
3434    body;
3435  @}
3436@end smallexample
3437
3438becomes
3439
3440@smallexample
3441  if (GOMP_single_start ())
3442    body;
3443  GOMP_barrier ();
3444@end smallexample
3445
3446while
3447
3448@smallexample
3449  #pragma omp single copyprivate(x)
3450    body;
3451@end smallexample
3452
3453becomes
3454
3455@smallexample
3456  datap = GOMP_single_copy_start ();
3457  if (datap == NULL)
3458    @{
3459      body;
3460      data.x = x;
3461      GOMP_single_copy_end (&data);
3462    @}
3463  else
3464    x = datap->x;
3465  GOMP_barrier ();
3466@end smallexample
3467
3468
3469
3470@node Implementing OpenACC's PARALLEL construct
3471@section Implementing OpenACC's PARALLEL construct
3472
3473@smallexample
3474  void GOACC_parallel ()
3475@end smallexample
3476
3477
3478
3479@c ---------------------------------------------------------------------
3480@c Reporting Bugs
3481@c ---------------------------------------------------------------------
3482
3483@node Reporting Bugs
3484@chapter Reporting Bugs
3485
3486Bugs in the GNU Offloading and Multi Processing Runtime Library should
3487be reported via @uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  Please add
3488"openacc", or "openmp", or both to the keywords field in the bug
3489report, as appropriate.
3490
3491
3492
3493@c ---------------------------------------------------------------------
3494@c GNU General Public License
3495@c ---------------------------------------------------------------------
3496
3497@include gpl_v3.texi
3498
3499
3500
3501@c ---------------------------------------------------------------------
3502@c GNU Free Documentation License
3503@c ---------------------------------------------------------------------
3504
3505@include fdl.texi
3506
3507
3508
3509@c ---------------------------------------------------------------------
3510@c Funding Free Software
3511@c ---------------------------------------------------------------------
3512
3513@include funding.texi
3514
3515@c ---------------------------------------------------------------------
3516@c Index
3517@c ---------------------------------------------------------------------
3518
3519@node Library Index
3520@unnumbered Library Index
3521
3522@printindex cp
3523
3524@bye
3525