17efd7506SPeter Collingbourne============= 27efd7506SPeter CollingbourneType Metadata 37efd7506SPeter Collingbourne============= 47efd7506SPeter Collingbourne 57efd7506SPeter CollingbourneType metadata is a mechanism that allows IR modules to co-operatively build 67efd7506SPeter Collingbournepointer sets corresponding to addresses within a given set of globals. LLVM's 77efd7506SPeter Collingbourne`control flow integrity`_ implementation uses this metadata to efficiently 87efd7506SPeter Collingbournecheck (at each call site) that a given address corresponds to either a 97efd7506SPeter Collingbournevalid vtable or function pointer for a given class or function type, and its 107efd7506SPeter Collingbournewhole-program devirtualization pass uses the metadata to identify potential 117efd7506SPeter Collingbournecallees for a given virtual call. 127efd7506SPeter Collingbourne 137efd7506SPeter CollingbourneTo use the mechanism, a client creates metadata nodes with two elements: 147efd7506SPeter Collingbourne 157efd7506SPeter Collingbourne1. a byte offset into the global (generally zero for functions) 167efd7506SPeter Collingbourne2. a metadata object representing an identifier for the type 177efd7506SPeter Collingbourne 187efd7506SPeter CollingbourneThese metadata nodes are associated with globals by using global object 197efd7506SPeter Collingbournemetadata attachments with the ``!type`` metadata kind. 207efd7506SPeter Collingbourne 217efd7506SPeter CollingbourneEach type identifier must exclusively identify either global variables 227efd7506SPeter Collingbourneor functions. 237efd7506SPeter Collingbourne 247efd7506SPeter Collingbourne.. admonition:: Limitation 257efd7506SPeter Collingbourne 267efd7506SPeter Collingbourne The current implementation only supports attaching metadata to functions on 277efd7506SPeter Collingbourne the x86-32 and x86-64 architectures. 287efd7506SPeter Collingbourne 297efd7506SPeter CollingbourneAn intrinsic, :ref:`llvm.type.test <type.test>`, is used to test whether a 307efd7506SPeter Collingbournegiven pointer is associated with a type identifier. 317efd7506SPeter Collingbourne 3272fd1033SSylvestre Ledru.. _control flow integrity: https://clang.llvm.org/docs/ControlFlowIntegrity.html 337efd7506SPeter Collingbourne 347efd7506SPeter CollingbourneRepresenting Type Information using Type Metadata 357efd7506SPeter Collingbourne================================================= 367efd7506SPeter Collingbourne 377efd7506SPeter CollingbourneThis section describes how Clang represents C++ type information associated with 387efd7506SPeter Collingbournevirtual tables using type metadata. 397efd7506SPeter Collingbourne 407efd7506SPeter CollingbourneConsider the following inheritance hierarchy: 417efd7506SPeter Collingbourne 427efd7506SPeter Collingbourne.. code-block:: c++ 437efd7506SPeter Collingbourne 447efd7506SPeter Collingbourne struct A { 457efd7506SPeter Collingbourne virtual void f(); 467efd7506SPeter Collingbourne }; 477efd7506SPeter Collingbourne 487efd7506SPeter Collingbourne struct B : A { 497efd7506SPeter Collingbourne virtual void f(); 507efd7506SPeter Collingbourne virtual void g(); 517efd7506SPeter Collingbourne }; 527efd7506SPeter Collingbourne 537efd7506SPeter Collingbourne struct C { 547efd7506SPeter Collingbourne virtual void h(); 557efd7506SPeter Collingbourne }; 567efd7506SPeter Collingbourne 577efd7506SPeter Collingbourne struct D : A, C { 587efd7506SPeter Collingbourne virtual void f(); 597efd7506SPeter Collingbourne virtual void h(); 607efd7506SPeter Collingbourne }; 617efd7506SPeter Collingbourne 627efd7506SPeter CollingbourneThe virtual table objects for A, B, C and D look like this (under the Itanium ABI): 637efd7506SPeter Collingbourne 647efd7506SPeter Collingbourne.. csv-table:: Virtual Table Layout for A, B, C, D 657efd7506SPeter Collingbourne :header: Class, 0, 1, 2, 3, 4, 5, 6 667efd7506SPeter Collingbourne 677efd7506SPeter Collingbourne A, A::offset-to-top, &A::rtti, &A::f 687efd7506SPeter Collingbourne B, B::offset-to-top, &B::rtti, &B::f, &B::g 697efd7506SPeter Collingbourne C, C::offset-to-top, &C::rtti, &C::h 707efd7506SPeter Collingbourne D, D::offset-to-top, &D::rtti, &D::f, &D::h, D::offset-to-top, &D::rtti, thunk for &D::h 717efd7506SPeter Collingbourne 727efd7506SPeter CollingbourneWhen an object of type A is constructed, the address of ``&A::f`` in A's 737efd7506SPeter Collingbournevirtual table object is stored in the object's vtable pointer. In ABI parlance 747efd7506SPeter Collingbournethis address is known as an `address point`_. Similarly, when an object of type 757efd7506SPeter CollingbourneB is constructed, the address of ``&B::f`` is stored in the vtable pointer. In 767efd7506SPeter Collingbournethis way, the vtable in B's virtual table object is compatible with A's vtable. 777efd7506SPeter Collingbourne 787efd7506SPeter CollingbourneD is a little more complicated, due to the use of multiple inheritance. Its 797efd7506SPeter Collingbournevirtual table object contains two vtables, one compatible with A's vtable and 807efd7506SPeter Collingbournethe other compatible with C's vtable. Objects of type D contain two virtual 817efd7506SPeter Collingbournepointers, one belonging to the A subobject and containing the address of 827efd7506SPeter Collingbournethe vtable compatible with A's vtable, and the other belonging to the C 837efd7506SPeter Collingbournesubobject and containing the address of the vtable compatible with C's vtable. 847efd7506SPeter Collingbourne 857efd7506SPeter CollingbourneThe full set of compatibility information for the above class hierarchy is 867efd7506SPeter Collingbourneshown below. The following table shows the name of a class, the offset of an 877efd7506SPeter Collingbourneaddress point within that class's vtable and the name of one of the classes 887efd7506SPeter Collingbournewith which that address point is compatible. 897efd7506SPeter Collingbourne 907efd7506SPeter Collingbourne.. csv-table:: Type Offsets for A, B, C, D 917efd7506SPeter Collingbourne :header: VTable for, Offset, Compatible Class 927efd7506SPeter Collingbourne 937efd7506SPeter Collingbourne A, 16, A 947efd7506SPeter Collingbourne B, 16, A 957efd7506SPeter Collingbourne , , B 967efd7506SPeter Collingbourne C, 16, C 977efd7506SPeter Collingbourne D, 16, A 987efd7506SPeter Collingbourne , , D 997efd7506SPeter Collingbourne , 48, C 1007efd7506SPeter Collingbourne 1017efd7506SPeter CollingbourneThe next step is to encode this compatibility information into the IR. The way 1027efd7506SPeter Collingbournethis is done is to create type metadata named after each of the compatible 1037efd7506SPeter Collingbourneclasses, with which we associate each of the compatible address points in 1047efd7506SPeter Collingbourneeach vtable. For example, these type metadata entries encode the compatibility 1057efd7506SPeter Collingbourneinformation for the above hierarchy: 1067efd7506SPeter Collingbourne 1077efd7506SPeter Collingbourne:: 1087efd7506SPeter Collingbourne 1097efd7506SPeter Collingbourne @_ZTV1A = constant [...], !type !0 1107efd7506SPeter Collingbourne @_ZTV1B = constant [...], !type !0, !type !1 1117efd7506SPeter Collingbourne @_ZTV1C = constant [...], !type !2 1127efd7506SPeter Collingbourne @_ZTV1D = constant [...], !type !0, !type !3, !type !4 1137efd7506SPeter Collingbourne 1147efd7506SPeter Collingbourne !0 = !{i64 16, !"_ZTS1A"} 1157efd7506SPeter Collingbourne !1 = !{i64 16, !"_ZTS1B"} 1167efd7506SPeter Collingbourne !2 = !{i64 16, !"_ZTS1C"} 1177efd7506SPeter Collingbourne !3 = !{i64 16, !"_ZTS1D"} 1187efd7506SPeter Collingbourne !4 = !{i64 48, !"_ZTS1C"} 1197efd7506SPeter Collingbourne 1207efd7506SPeter CollingbourneWith this type metadata, we can now use the ``llvm.type.test`` intrinsic to 1217efd7506SPeter Collingbournetest whether a given pointer is compatible with a type identifier. Working 1227efd7506SPeter Collingbournebackwards, if ``llvm.type.test`` returns true for a particular pointer, 1237efd7506SPeter Collingbournewe can also statically determine the identities of the virtual functions 1247efd7506SPeter Collingbournethat a particular virtual call may call. For example, if a program assumes 1257efd7506SPeter Collingbournea pointer to be a member of ``!"_ZST1A"``, we know that the address can 1267efd7506SPeter Collingbournebe only be one of ``_ZTV1A+16``, ``_ZTV1B+16`` or ``_ZTV1D+16`` (i.e. the 1277efd7506SPeter Collingbourneaddress points of the vtables of A, B and D respectively). If we then load 1287efd7506SPeter Collingbournean address from that pointer, we know that the address can only be one of 1297efd7506SPeter Collingbourne``&A::f``, ``&B::f`` or ``&D::f``. 1307efd7506SPeter Collingbourne 13137c019afSVlad Tsyrklevich.. _address point: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#vtable-general 1327efd7506SPeter Collingbourne 1337efd7506SPeter CollingbourneTesting Addresses For Type Membership 1347efd7506SPeter Collingbourne===================================== 1357efd7506SPeter Collingbourne 1367efd7506SPeter CollingbourneIf a program tests an address using ``llvm.type.test``, this will cause 1377efd7506SPeter Collingbournea link-time optimization pass, ``LowerTypeTests``, to replace calls to this 1387efd7506SPeter Collingbourneintrinsic with efficient code to perform type member tests. At a high level, 1397efd7506SPeter Collingbournethe pass will lay out referenced globals in a consecutive memory region in 1407efd7506SPeter Collingbournethe object file, construct bit vectors that map onto that memory region, 1417efd7506SPeter Collingbourneand generate code at each of the ``llvm.type.test`` call sites to test 1427efd7506SPeter Collingbournepointers against those bit vectors. Because of the layout manipulation, the 1437efd7506SPeter Collingbourneglobals' definitions must be available at LTO time. For more information, 1447efd7506SPeter Collingbournesee the `control flow integrity design document`_. 1457efd7506SPeter Collingbourne 1467efd7506SPeter CollingbourneA type identifier that identifies functions is transformed into a jump table, 1477efd7506SPeter Collingbournewhich is a block of code consisting of one branch instruction for each 1487efd7506SPeter Collingbourneof the functions associated with the type identifier that branches to the 1497efd7506SPeter Collingbournetarget function. The pass will redirect any taken function addresses to the 1507efd7506SPeter Collingbournecorresponding jump table entry. In the object file's symbol table, the jump 1517efd7506SPeter Collingbournetable entries take the identities of the original functions, so that addresses 1527efd7506SPeter Collingbournetaken outside the module will pass any verification done inside the module. 1537efd7506SPeter Collingbourne 1547efd7506SPeter CollingbourneJump tables may call external functions, so their definitions need not 1557efd7506SPeter Collingbournebe available at LTO time. Note that if an externally defined function is 1567efd7506SPeter Collingbourneassociated with a type identifier, there is no guarantee that its identity 1577efd7506SPeter Collingbournewithin the module will be the same as its identity outside of the module, 1587efd7506SPeter Collingbourneas the former will be the jump table entry if a jump table is necessary. 1597efd7506SPeter Collingbourne 1607efd7506SPeter CollingbourneThe `GlobalLayoutBuilder`_ class is responsible for laying out the globals 1617efd7506SPeter Collingbourneefficiently to minimize the sizes of the underlying bitsets. 1627efd7506SPeter Collingbourne 16372fd1033SSylvestre Ledru.. _control flow integrity design document: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html 1647efd7506SPeter Collingbourne 1657efd7506SPeter Collingbourne:Example: 1667efd7506SPeter Collingbourne 1677efd7506SPeter Collingbourne:: 1687efd7506SPeter Collingbourne 1697efd7506SPeter Collingbourne target datalayout = "e-p:32:32" 1707efd7506SPeter Collingbourne 1717efd7506SPeter Collingbourne @a = internal global i32 0, !type !0 1727efd7506SPeter Collingbourne @b = internal global i32 0, !type !0, !type !1 1737efd7506SPeter Collingbourne @c = internal global i32 0, !type !1 1747efd7506SPeter Collingbourne @d = internal global [2 x i32] [i32 0, i32 0], !type !2 1757efd7506SPeter Collingbourne 1767efd7506SPeter Collingbourne define void @e() !type !3 { 1777efd7506SPeter Collingbourne ret void 1787efd7506SPeter Collingbourne } 1797efd7506SPeter Collingbourne 1807efd7506SPeter Collingbourne define void @f() { 1817efd7506SPeter Collingbourne ret void 1827efd7506SPeter Collingbourne } 1837efd7506SPeter Collingbourne 1847efd7506SPeter Collingbourne declare void @g() !type !3 1857efd7506SPeter Collingbourne 1867efd7506SPeter Collingbourne !0 = !{i32 0, !"typeid1"} 1877efd7506SPeter Collingbourne !1 = !{i32 0, !"typeid2"} 1887efd7506SPeter Collingbourne !2 = !{i32 4, !"typeid2"} 1897efd7506SPeter Collingbourne !3 = !{i32 0, !"typeid3"} 1907efd7506SPeter Collingbourne 1917efd7506SPeter Collingbourne declare i1 @llvm.type.test(i8* %ptr, metadata %typeid) nounwind readnone 1927efd7506SPeter Collingbourne 1937efd7506SPeter Collingbourne define i1 @foo(i32* %p) { 1947efd7506SPeter Collingbourne %pi8 = bitcast i32* %p to i8* 1957efd7506SPeter Collingbourne %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid1") 1967efd7506SPeter Collingbourne ret i1 %x 1977efd7506SPeter Collingbourne } 1987efd7506SPeter Collingbourne 1997efd7506SPeter Collingbourne define i1 @bar(i32* %p) { 2007efd7506SPeter Collingbourne %pi8 = bitcast i32* %p to i8* 2017efd7506SPeter Collingbourne %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid2") 2027efd7506SPeter Collingbourne ret i1 %x 2037efd7506SPeter Collingbourne } 2047efd7506SPeter Collingbourne 2057efd7506SPeter Collingbourne define i1 @baz(void ()* %p) { 2067efd7506SPeter Collingbourne %pi8 = bitcast void ()* %p to i8* 2077efd7506SPeter Collingbourne %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid3") 2087efd7506SPeter Collingbourne ret i1 %x 2097efd7506SPeter Collingbourne } 2107efd7506SPeter Collingbourne 2117efd7506SPeter Collingbourne define void @main() { 2127efd7506SPeter Collingbourne %a1 = call i1 @foo(i32* @a) ; returns 1 2137efd7506SPeter Collingbourne %b1 = call i1 @foo(i32* @b) ; returns 1 2147efd7506SPeter Collingbourne %c1 = call i1 @foo(i32* @c) ; returns 0 2157efd7506SPeter Collingbourne %a2 = call i1 @bar(i32* @a) ; returns 0 2167efd7506SPeter Collingbourne %b2 = call i1 @bar(i32* @b) ; returns 1 2177efd7506SPeter Collingbourne %c2 = call i1 @bar(i32* @c) ; returns 1 2187efd7506SPeter Collingbourne %d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0 2197efd7506SPeter Collingbourne %d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1 2207efd7506SPeter Collingbourne %e = call i1 @baz(void ()* @e) ; returns 1 2217efd7506SPeter Collingbourne %f = call i1 @baz(void ()* @f) ; returns 0 2227efd7506SPeter Collingbourne %g = call i1 @baz(void ()* @g) ; returns 1 2237efd7506SPeter Collingbourne ret void 2247efd7506SPeter Collingbourne } 2257efd7506SPeter Collingbourne 226*43def795SHafiz Abid Qadeer.. _GlobalLayoutBuilder: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Transforms/IPO/LowerTypeTests.h 2273b598b9cSOliver Stannard 2283b598b9cSOliver Stannard``!vcall_visibility`` Metadata 2293b598b9cSOliver Stannard============================== 2303b598b9cSOliver Stannard 2313b598b9cSOliver StannardIn order to allow removing unused function pointers from vtables, we need to 2323b598b9cSOliver Stannardknow whether every virtual call which could use it is known to the compiler, or 2333b598b9cSOliver Stannardwhether another translation unit could introduce more calls through the vtable. 2343b598b9cSOliver StannardThis is not the same as the linkage of the vtable, because call sites could be 2353b598b9cSOliver Stannardusing a pointer of a more widely-visible base class. For example, consider this 2363b598b9cSOliver Stannardcode: 2373b598b9cSOliver Stannard 2383b598b9cSOliver Stannard.. code-block:: c++ 2393b598b9cSOliver Stannard 2403b598b9cSOliver Stannard __attribute__((visibility("default"))) 2413b598b9cSOliver Stannard struct A { 2423b598b9cSOliver Stannard virtual void f(); 2433b598b9cSOliver Stannard }; 2443b598b9cSOliver Stannard 2453b598b9cSOliver Stannard __attribute__((visibility("hidden"))) 2463b598b9cSOliver Stannard struct B : A { 2473b598b9cSOliver Stannard virtual void f(); 2483b598b9cSOliver Stannard }; 2493b598b9cSOliver Stannard 2503b598b9cSOliver StannardWith LTO, we know that all code which can see the declaration of ``B`` is 2513b598b9cSOliver Stannardvisible to us. However, a pointer to a ``B`` could be cast to ``A*`` and passed 2523b598b9cSOliver Stannardto another linkage unit, which could then call ``f`` on it. This call would 2533b598b9cSOliver Stannardload from the vtable for ``B`` (using the object pointer), and then call 2543b598b9cSOliver Stannard``B::f``. This means we can't remove the function pointer from ``B``'s vtable, 2553b598b9cSOliver Stannardor the implementation of ``B::f``. However, if we can see all code which knows 2563b598b9cSOliver Stannardabout any dynamic base class (which would be the case if ``B`` only inherited 2573b598b9cSOliver Stannardfrom classes with hidden visibility), then this optimisation would be valid. 2583b598b9cSOliver Stannard 2593b598b9cSOliver StannardThis concept is represented in IR by the ``!vcall_visibility`` metadata 2603b598b9cSOliver Stannardattached to vtable objects, with the following values: 2613b598b9cSOliver Stannard 2623b598b9cSOliver Stannard.. list-table:: 2633b598b9cSOliver Stannard :header-rows: 1 2643b598b9cSOliver Stannard :widths: 10 90 2653b598b9cSOliver Stannard 2663b598b9cSOliver Stannard * - Value 2673b598b9cSOliver Stannard - Behavior 2683b598b9cSOliver Stannard 2693b598b9cSOliver Stannard * - 0 (or omitted) 2703b598b9cSOliver Stannard - **Public** 2713b598b9cSOliver Stannard Virtual function calls using this vtable could be made from external 2723b598b9cSOliver Stannard code. 2733b598b9cSOliver Stannard 2743b598b9cSOliver Stannard * - 1 2753b598b9cSOliver Stannard - **Linkage Unit** 2763b598b9cSOliver Stannard All virtual function calls which might use this vtable are in the 2773b598b9cSOliver Stannard current LTO unit, meaning they will be in the current module once 2783b598b9cSOliver Stannard LTO linking has been performed. 2793b598b9cSOliver Stannard 2803b598b9cSOliver Stannard * - 2 2813b598b9cSOliver Stannard - **Translation Unit** 2823b598b9cSOliver Stannard All virtual function calls which might use this vtable are in the 2833b598b9cSOliver Stannard current module. 2843b598b9cSOliver Stannard 2853b598b9cSOliver StannardIn addition, all function pointer loads from a vtable marked with the 2863b598b9cSOliver Stannard``!vcall_visibility`` metadata (with a non-zero value) must be done using the 2873b598b9cSOliver Stannard:ref:`llvm.type.checked.load <type.checked.load>` intrinsic, so that virtual 2883b598b9cSOliver Stannardcalls sites can be correlated with the vtables which they might load from. 2893b598b9cSOliver StannardOther parts of the vtable (RTTI, offset-to-top, ...) can still be accessed with 2903b598b9cSOliver Stannardnormal loads. 291