1<!--===- docs/OpenMP-declare-target.md 2 3 Part of the LLVM Project, under the Apache License v2.0 with LLVM 4 Exceptions. 5 See https://llvm.org/LICENSE.txt for license information. 6 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 7 8--> 9 10# Introduction to Declare Target 11 12In OpenMP `declare target` is a directive that can be applied to a function or 13variable (primarily global) to notate to the compiler that it should be 14generated in a particular device's environment. In essence whether something 15should be emitted for host or device, or both. An example of its usage for 16both data and functions can be seen below. 17 18```Fortran 19module test_0 20 integer :: sp = 0 21!$omp declare target link(sp) 22end module test_0 23 24program main 25 use test_0 26!$omp target map(tofrom:sp) 27 sp = 1 28!$omp end target 29end program 30``` 31 32In the above example, we create a variable in a separate module, mark it 33as `declare target` and then map it, embedding it into the device IR and 34assigning to it. 35 36 37```Fortran 38function func_t_device() result(i) 39 !$omp declare target to(func_t_device) device_type(nohost) 40 INTEGER :: I 41 I = 1 42end function func_t_device 43 44program main 45!$omp target 46 call func_t_device() 47!$omp end target 48end program 49``` 50 51In the above example, we are stating that a function is required on device 52utilising `declare target`, and that we will not be utilising it on host, 53so we are in theory free to remove or ignore it there. A user could also 54in this case, leave off the `declare target` from the function and it 55would be implicitly marked `declare target any` (for both host and device), 56as it's been utilised within a target region. 57 58# Declare Target as represented in the OpenMP Dialect 59 60In the OpenMP Dialect `declare target` is not represented by a specific 61`operation`. Instead, it's an OpenMP dialect specific `attribute` that can be 62applied to any operation in any dialect, which helps to simplify the 63utilisation of it. Rather than replacing or modifying existing global or 64function `operations` in a dialect, it applies to it as extra metadata that 65the lowering can use in different ways as is necessary. 66 67The `attribute` is composed of multiple fields representing the clauses you 68would find on the `declare target` directive i.e. device type (`nohost`, 69`any`, `host`) or the capture clause (`link` or `to`). A small example of 70`declare target` applied to a Fortran `real` can be found below: 71 72``` 73fir.global internal @_QFEi {omp.declare_target = 74#omp.declaretarget<device_type = (any), capture_clause = (to)>} : f32 { 75 %0 = fir.undefined f32 76 fir.has_value %0 : f32 77} 78``` 79 80This would look similar for function style `operations`. 81 82The application and access of this attribute is aided by an OpenMP Dialect 83MLIR Interface named `DeclareTargetInterface`, which can be utilised on 84operations to access the appropriate interface functions, e.g.: 85 86```C++ 87auto declareTargetGlobal = 88llvm::dyn_cast<mlir::omp::DeclareTargetInterface>(Op.getOperation()); 89declareTargetGlobal.isDeclareTarget(); 90``` 91 92# Declare Target Fortran OpenMP Lowering 93 94The initial lowering of `declare target` to MLIR for both use-cases is done 95inside of the usual OpenMP lowering in flang/lib/Lower/OpenMP.cpp. However, 96some direct calls to `declare target` related functions from Flang's 97lowering bridge in flang/lib/Lower/Bridge.cpp are made. 98 99The marking of operations with the declare target attribute happens in two 100phases, the second one optional and contingent on the first failing. The 101initial phase happens when the declare target directive and its clauses 102are initially processed, with the primary data gathering for the directive and 103clause happening in a function called `getDeclareTargetInfo`. This is then used 104to feed the `markDeclareTarget` function, which does the actual marking 105utilising the `DeclareTargetInterface`. If it encounters a variable or function 106that has been marked twice over multiple directives with two differing device 107types (e.g. `host`, `nohost`), then it will swap the device type to `any`. 108 109Whenever we invoke `genFIR` on an `OpenMPDeclarativeConstruct` from the 110lowering bridge, we are also invoking another function called 111`gatherOpenMPDeferredDeclareTargets`, which gathers information relevant to the 112application of the `declare target` attribute. This information 113includes the symbol that it should be applied to, device type clause, 114and capture clause, and it is stored in a vector that is part of the lowering 115bridge's instantiation of the `AbstractConverter`. It is only stored if we 116encounter a function or variable symbol that does not have an operation 117instantiated for it yet. This cannot happen as part of the 118initial marking as we must store this data in the lowering bridge and we 119only have access to the abstract version of the converter via the OpenMP 120lowering. 121 122The information produced by the first phase is used in the second phase, 123which is a form of deferred processing of the `declare target` marked 124operations that have delayed generation and cannot be proccessed in the 125first phase. The main notable case this occurs currently is when a 126Fortran function interface has been marked. This is 127done via the function 128`markOpenMPDeferredDeclareTargetFunctions`, which is called from the lowering 129bridge at the end of the lowering process allowing us to mark those where 130possible. It iterates over the data previously gathered by 131`gatherOpenMPDeferredDeclareTargets` 132checking if any of the recorded symbols have now had their corresponding 133operations instantiated and applying the declare target attribute where 134possible utilising `markDeclareTarget`. However, it must be noted that it 135is still possible for operations not to be generated for certain symbols, 136in particular the case of function interfaces that are not directly used 137or defined within the current module. This means we cannot emit errors in 138the case of left-over unmarked symbols. These must (and should) be caught 139by the initial semantic analysis. 140 141NOTE: `declare target` can be applied to implicit `SAVE` attributed variables. 142However, by default Flang does not represent these as `GlobalOp`'s, which means 143we cannot tag and lower them as `declare target` normally. Instead, similarly 144to the way `threadprivate` handles these cases, we raise and initialize the 145variable as an internal `GlobalOp` and apply the attribute. This occurs in the 146flang/lib/Lower/OpenMP.cpp function `genDeclareTargetIntGlobal`. 147 148# Declare Target Transformation Passes for Flang 149 150There are currently two passes within Flang that are related to the processing 151of `declare target`: 152* `MarkDeclareTarget` - This pass is in charge of marking functions captured 153(called from) in `target` regions or other `declare target` marked functions as 154`declare target`. It does so recursively, i.e. nested calls will also be 155implicitly marked. It currently will try to mark things as conservatively as 156possible, e.g. if captured in a `target` region it will apply `nohost`, unless 157it encounters a `host` `declare target` in which case it will apply the `any` 158device type. Functions are handled similarly, except we utilise the parent's 159device type where possible. 160* `FunctionFiltering` - This is executed after the `MarkDeclareTarget` 161pass, and its job is to conservatively remove host functions from 162the module where possible when compiling for the device. This helps make 163sure that most incompatible code for the host is not lowered for the 164device. Host functions with `target` regions in them need to be preserved 165(e.g. for lowering the `target region`(s) inside). Otherwise, it removes 166any function marked as a `declare target host` function and any uses will be 167replaced with `undef`'s so that the remaining host code doesn't become broken. 168Host functions with `target` regions are marked with a `declare target host` 169attribute so they will be removed after outlining the target regions contained 170inside. 171 172While this infrastructure could be generally applicable to more than just Flang, 173it is only utilised in the Flang frontend, so it resides there rather than in 174the OpenMP dialect codebase. 175 176# Declare Target OpenMP Dialect To LLVM-IR Lowering 177 178The OpenMP dialect lowering of `declare target` is done through the 179`amendOperation` flow, as it's not an `operation` but rather an 180`attribute`. This is triggered immediately after the corresponding 181operation has been lowered to LLVM-IR. As it is applicable to 182different types of operations, we must specialise this function for 183each operation type that we may encounter. Currently, this is 184`GlobalOp`'s and `FuncOp`'s. 185 186`FuncOp` processing is fairly simple. When compiling for the device, 187`host` marked functions are removed, including those that could not 188be removed earlier due to having `target` directives within. This 189leaves `any`, `device` or indeterminable functions left in the 190module to lower further. When compiling for the host, no filtering is 191done because `nohost` functions must be available as a fallback 192implementation. 193 194For `GlobalOp`'s, the processing is a little more complex. We 195currently leverage the `registerTargetGlobalVariable` and 196`getAddrOfDeclareTargetVar` `OMPIRBuilder` functions shared with Clang. 197These two functions invoke each other depending on the clauses and options 198provided to the `OMPIRBuilder` (in particular, unified shared memory). Their 199main purposes are the generation of a new global device pointer with a 200"ref_" prefix on the device and enqueuing metadata generation by the 201`OMPIRBuilder` to be produced at module finalization time. This is done 202for both host and device and it links the newly generated device global 203pointer and the host pointer together across the two modules. 204 205Similarly to other metadata (e.g. for `TargetOp`) that is shared across 206both host and device modules, processing of `GlobalOp`'s in the device 207needs access to the previously generated host IR file, which is done 208through another `attribute` applied to the `ModuleOp` by the compiler 209frontend. The file is loaded in and consumed by the `OMPIRBuilder` to 210populate it's `OffloadInfoManager` data structures, keeping host and 211device appropriately synchronised. 212 213The second (and more important to remember) is that as we effectively replace 214the original LLVM-IR generated for the `declare target` marked `GlobalOp` we 215have some corrections we need to do for `TargetOp`'s (or other region 216operations that use them directly) which still refer to the original lowered 217global operation. This is done via `handleDeclareTargetMapVar` which is invoked 218as the final function and alteration to the lowered `target` region, it's only 219invoked for device as it's only required in the case where we have emitted the 220"ref" pointer , and it effectively replaces all uses of the originally lowered 221global symbol, with our new global ref pointer's symbol. Currently we do not 222remove or delete the old symbol, this is due to the fact that the same symbol 223can be utilised across multiple target regions, if we remove it, we risk 224breaking lowerings of target regions that will be processed at a later time. 225To appropriately delete these no longer necessary symbols we would need a 226deferred removal process at the end of the module, which is currently not in 227place. It may be possible to store this information in the OMPIRBuilder and 228then perform this cleanup process on finalization, but this is open for 229discussion and implementation still. 230 231# Current Support 232 233For the moment, `declare target` should work for: 234* Marking functions/subroutines and function/subroutine interfaces for 235generation on host, device or both. 236* Implicit function/subroutine capture for calls emitted in a `target` region 237or explicitly marked `declare target` function/subroutine. Note: Calls made 238via arguments passed to other functions must still be themselves marked 239`declare target`, e.g. passing a `C` function pointer and invoking it, then 240the interface and the `C` function in the other module must be marked 241`declare target`, with the same type of marking as indicated by the 242specification. 243* Marking global variables with `declare target`'s `link` clause and mapping 244the data to the device data environment utilising `declare target`. This may 245not work for all types yet, but for scalars and arrays of scalars, it 246should. 247 248Doesn't work for, or needs further testing for: 249* Marking the following types with `declare target link` (needs further 250testing): 251 * Descriptor based types, e.g. pointers/allocatables. 252 * Derived types. 253 * Members of derived types (use-case needs legality checking with OpenMP 254specification). 255* Marking global variables with `declare target`'s `to` clause. A lot of the 256lowering should exist, but it needs further testing and likely some further 257changes to fully function. 258