1=============== 2Opaque Pointers 3=============== 4 5The Opaque Pointer Type 6======================= 7 8Traditionally, LLVM IR pointer types have contained a pointee type. For example, 9``i32*`` is a pointer that points to an ``i32`` somewhere in memory. However, 10due to a lack of pointee type semantics and various issues with having pointee 11types, there is a desire to remove pointee types from pointers. 12 13The opaque pointer type project aims to replace all pointer types containing 14pointee types in LLVM with an opaque pointer type. The new pointer type is 15represented textually as ``ptr``. 16 17Some instructions still need to know what type to treat the memory pointed to by 18the pointer as. For example, a load needs to know how many bytes to load from 19memory and what type to treat the resulting value as. In these cases, 20instructions themselves contain a type argument. For example the load 21instruction from older versions of LLVM 22 23.. code-block:: llvm 24 25 load i64* %p 26 27becomes 28 29.. code-block:: llvm 30 31 load i64, ptr %p 32 33Address spaces are still used to distinguish between different kinds of pointers 34where the distinction is relevant for lowering (e.g. data vs function pointers 35have different sizes on some architectures). Opaque pointers are not changing 36anything related to address spaces and lowering. For more information, see 37`DataLayout <LangRef.html#langref-datalayout>`_. Opaque pointers in non-default 38address space are spelled ``ptr addrspace(N)``. 39 40This was proposed all the way back in 41`2015 <https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html>`_. 42 43Issues with explicit pointee types 44================================== 45 46LLVM IR pointers can be cast back and forth between pointers with different 47pointee types. The pointee type does not necessarily represent the actual 48underlying type in memory. In other words, the pointee type carries no real 49semantics. 50 51Historically LLVM was some sort of type-safe subset of C. Having pointee types 52provided an extra layer of checks to make sure that the Clang frontend matched 53its frontend values/operations with the corresponding LLVM IR. However, as other 54languages like C++ adopted LLVM, the community realized that pointee types were 55more of a hindrance for LLVM development and that the extra type checking with 56some frontends wasn't worth it. 57 58LLVM's type system was `originally designed 59<https://llvm.org/pubs/2003-05-01-GCCSummit2003.html>`_ to support high-level 60optimization. However, years of LLVM implementation experience have demonstrated 61that the pointee type system design does not effectively support 62optimization. Memory optimization algorithms, such as SROA, GVN, and AA, 63generally need to look through LLVM's struct types and reason about the 64underlying memory offsets. The community realized that pointee types hinder LLVM 65development, rather than helping it. Some of the initially proposed high-level 66optimizations have evolved into `TBAA 67<https://llvm.org/docs/LangRef.html#tbaa-metadata>`_ due to limitations with 68representing higher-level language information directly via SSA values. 69 70Pointee types provide some value to frontends because the IR verifier uses types 71to detect straightforward type confusion bugs. However, frontends also have to 72deal with the complexity of inserting bitcasts everywhere that they might be 73required. The community consensus is that the costs of pointee types 74outweight the benefits, and that they should be removed. 75 76Many operations do not actually care about the underlying type. These 77operations, typically intrinsics, usually end up taking an arbitrary pointer 78type ``i8*`` and sometimes a size. This causes lots of redundant no-op bitcasts 79in the IR to and from a pointer with a different pointee type. 80 81No-op bitcasts take up memory/disk space and also take up compile time to look 82through. However, perhaps the biggest issue is the code complexity required to 83deal with bitcasts. When looking up through def-use chains for pointers it's 84easy to forget to call `Value::stripPointerCasts()` to find the true underlying 85pointer obfuscated by bitcasts. And when looking down through def-use chains 86passes need to iterate through bitcasts to handle uses. Removing no-op pointer 87bitcasts prevents a category of missed optimizations and makes writing LLVM 88passes a little bit easier. 89 90Fewer no-op pointer bitcasts also reduces the chances of incorrect bitcasts in 91regards to address spaces. People maintaining backends that care a lot about 92address spaces have complained that frontends like Clang often incorrectly 93bitcast pointers, losing address space information. 94 95An analogous transition that happened earlier in LLVM is integer signedness. 96Currently there is no distinction between signed and unsigned integer types, but 97rather each integer operation (e.g. add) contains flags to signal how to treat 98the integer. Previously LLVM IR distinguished between unsigned and signed 99integer types and ran into similar issues of no-op casts. The transition from 100manifesting signedness in types to instructions happened early on in LLVM's 101timeline to make LLVM easier to work with. 102 103Opaque Pointers Mode 104==================== 105 106During the transition phase, LLVM can be used in two modes: In typed pointer 107mode all pointer types have a pointee type and opaque pointers cannot be used. 108In opaque pointers mode (the default), all pointers are opaque. The opaque 109pointer mode can be disabled using ``-opaque-pointers=0`` in 110LLVM tools like ``opt``, or ``-Xclang -no-opaque-pointers`` in clang. 111Additionally, opaque pointer mode is automatically disabled for IR and bitcode 112files that explicitly mention ``i8*`` style typed pointers. 113 114In opaque pointer mode, all typed pointers used in IR, bitcode, or created 115using ``PointerType::get()`` and similar APIs are automatically converted into 116opaque pointers. This simplifies migration and allows testing existing IR with 117opaque pointers. 118 119.. code-block:: llvm 120 121 define i8* @test(i8* %p) { 122 %p2 = getelementptr i8, i8* %p, i64 1 123 ret i8* %p2 124 } 125 126 ; Is automatically converted into the following if -opaque-pointers 127 ; is enabled: 128 129 define ptr @test(ptr %p) { 130 %p2 = getelementptr i8, ptr %p, i64 1 131 ret ptr %p2 132 } 133 134Migration Instructions 135====================== 136 137In order to support opaque pointers, two types of changes tend to be necessary. 138The first is the removal of all calls to ``PointerType::getElementType()`` and 139``Type::getPointerElementType()``. 140 141In the LLVM middle-end and backend, this is usually accomplished by inspecting 142the type of relevant operations instead. For example, memory access related 143analyses and optimizations should use the types encoded in the load and store 144instructions instead of querying the pointer type. 145 146Here are some common ways to avoid pointer element type accesses: 147 148* For loads, use ``getType()``. 149* For stores, use ``getValueOperand()->getType()``. 150* Use ``getLoadStoreType()`` to handle both of the above in one call. 151* For getelementptr instructions, use ``getSourceElementType()``. 152* For calls, use ``getFunctionType()``. 153* For allocas, use ``getAllocatedType()``. 154* For globals, use ``getValueType()``. 155* For consistency assertions, use 156 ``PointerType::isOpaqueOrPointeeTypeEquals()``. 157* To create a pointer type in a different address space, use 158 ``PointerType::getWithSamePointeeType()``. 159* To check that two pointers have the same element type, use 160 ``PointerType::hasSameElementTypeAs()``. 161* While it is preferred to write code in a way that accepts both typed and 162 opaque pointers, ``Type::isOpaquePointerTy()`` and 163 ``PointerType::isOpaque()`` can be used to handle opaque pointers specially. 164 ``PointerType::getNonOpaquePointerElementType()`` can be used as a marker in 165 code-paths where opaque pointers have been explicitly excluded. 166* To get the type of a byval argument, use ``getParamByValType()``. Similar 167 method exists for other ABI-affecting attributes that need to know the 168 element type, such as byref, sret, inalloca and preallocated. 169* Some intrinsics require an ``elementtype`` attribute, which can be retrieved 170 using ``getParamElementType()``. This attribute is required in cases where 171 the intrinsic does not naturally encode a needed element type. This is also 172 used for inline assembly. 173 174Note that some of the methods mentioned above only exist to support both typed 175and opaque pointers at the same time, and will be dropped once the migration 176has completed. For example, ``isOpaqueOrPointeeTypeEquals()`` becomes 177meaningless once all pointers are opaque. 178 179While direct usage of pointer element types is immediately apparent in code, 180there is a more subtle issue that opaque pointers need to contend with: A lot 181of code assumes that pointer equality also implies that the used load/store 182type or GEP source element type is the same. Consider the following examples 183with typed and opaque pointers: 184 185.. code-block:: llvm 186 187 define i32 @test(i32* %p) { 188 store i32 0, i32* %p 189 %bc = bitcast i32* %p to i64* 190 %v = load i64, i64* %bc 191 ret i64 %v 192 } 193 194 define i32 @test(ptr %p) { 195 store i32 0, ptr %p 196 %v = load i64, ptr %p 197 ret i64 %v 198 } 199 200Without opaque pointers, a check that the pointer operand of the load and 201store are the same also ensures that the accessed type is the same. Using a 202different type requires a bitcast, which will result in distinct pointer 203operands. 204 205With opaque pointers, the bitcast is not present, and this check is no longer 206sufficient. In the above example, it could result in store to load forwarding 207of an incorrect type. Code making such assumptions needs to be adjusted to 208check the accessed type explicitly: 209``LI->getType() == SI->getValueOperand()->getType()``. 210 211Frontends 212--------- 213 214Frontends need to be adjusted to track pointee types independently of LLVM, 215insofar as they are necessary for lowering. For example, clang now tracks the 216pointee type in the ``Address`` structure. 217 218Frontends using the C API through an FFI interface should be aware that a 219number of C API functions are deprecated and will be removed as part of the 220opaque pointer transition:: 221 222 LLVMBuildLoad -> LLVMBuildLoad2 223 LLVMBuildCall -> LLVMBuildCall2 224 LLVMBuildInvoke -> LLVMBuildInvoke2 225 LLVMBuildGEP -> LLVMBuildGEP2 226 LLVMBuildInBoundsGEP -> LLVMBuildInBoundsGEP2 227 LLVMBuildStructGEP -> LLVMBuildStructGEP2 228 LLVMBuildPtrDiff -> LLVMBuildPtrDiff2 229 LLVMConstGEP -> LLVMConstGEP2 230 LLVMConstInBoundsGEP -> LLVMConstInBoundsGEP2 231 LLVMAddAlias -> LLVMAddAlias2 232 233Additionally, it will no longer be possible to call ``LLVMGetElementType()`` 234on a pointer type. 235 236It is possible to control whether opaque pointers are used (if you want to 237override the default) using ``LLVMContext::setOpaquePointers``. 238 239Temporarily disabling opaque pointers 240===================================== 241 242In LLVM 15, opaque pointers are enabled by default, but it it still possible to 243use typed pointers using a number of opt-in flags. 244 245For users of the clang driver interface, it is possible to temporarily restore 246the old default using the ``-DCLANG_ENABLE_OPAQUE_POINTERS=OFF`` cmake option, 247or by passing ``-Xclang -no-opaque-pointers`` to a single clang invocation. 248 249For users of the clang cc1 interface, ``-no-opaque-pointers`` can be passed. 250Note that the ``CLANG_ENABLE_OPAQUE_POINTERS`` cmake option has no effect on 251the cc1 interface. 252 253Usage for LTO can be disabled by passing ``-Wl,-plugin-opt=no-opaque-pointers`` 254to the clang driver. 255 256For users of LLVM as a library, opaque pointers can be disabled by calling 257``setOpaquePointers(false)`` on the ``LLVMContext``. 258 259For users of LLVM tools like opt, opaque pointers can be disabled by passing 260``-opaque-pointers=0``. 261 262Version Support 263=============== 264 265**LLVM 14:** Supports all necessary APIs for migrating to opaque pointers and deprecates/removes incompatible APIs. However, using opaque pointers in the optimization pipeline is **not** fully supported. This release can be used to make out-of-tree code compatible with opaque pointers, but opaque pointers should **not** be enabled in production. 266 267**LLVM 15:** Opaque pointers are enabled by default. Typed pointers are still 268supported. 269 270**LLVM 16:** Opaque pointers are enabled by default. Typed pointers are 271supported on a best-effort basis only and not tested. 272 273**LLVM 17:** Only opaque pointers are supported. Typed pointers are not 274supported. 275 276Transition State 277================ 278 279As of July 2023: 280 281Typed pointers are **not** supported on the ``main`` branch. 282 283The following typed pointer functionality has been removed: 284 285* The ``CLANG_ENABLE_OPAQUE_POINTERS`` cmake flag is no longer supported. 286* The ``-no-opaque-pointers`` cc1 clang flag is no longer supported. 287* The ``-opaque-pointers`` opt flag is no longer supported. 288* The ``-plugin-opt=no-opaque-pointers`` LTO flag is no longer supported. 289* C APIs that do not support opaque pointers (like ``LLVMBuildLoad``) are no 290 longer supported. 291 292The following typed pointer functionality is still to be removed: 293 294* Various APIs that are no longer relevant with opaque pointers. 295