xref: /llvm-project/llvm/docs/TypeMetadata.rst (revision 43def795aacd6794f93b91fc76e59953fd67e138)
17efd7506SPeter Collingbourne=============
27efd7506SPeter CollingbourneType Metadata
37efd7506SPeter Collingbourne=============
47efd7506SPeter Collingbourne
57efd7506SPeter CollingbourneType metadata is a mechanism that allows IR modules to co-operatively build
67efd7506SPeter Collingbournepointer sets corresponding to addresses within a given set of globals. LLVM's
77efd7506SPeter Collingbourne`control flow integrity`_ implementation uses this metadata to efficiently
87efd7506SPeter Collingbournecheck (at each call site) that a given address corresponds to either a
97efd7506SPeter Collingbournevalid vtable or function pointer for a given class or function type, and its
107efd7506SPeter Collingbournewhole-program devirtualization pass uses the metadata to identify potential
117efd7506SPeter Collingbournecallees for a given virtual call.
127efd7506SPeter Collingbourne
137efd7506SPeter CollingbourneTo use the mechanism, a client creates metadata nodes with two elements:
147efd7506SPeter Collingbourne
157efd7506SPeter Collingbourne1. a byte offset into the global (generally zero for functions)
167efd7506SPeter Collingbourne2. a metadata object representing an identifier for the type
177efd7506SPeter Collingbourne
187efd7506SPeter CollingbourneThese metadata nodes are associated with globals by using global object
197efd7506SPeter Collingbournemetadata attachments with the ``!type`` metadata kind.
207efd7506SPeter Collingbourne
217efd7506SPeter CollingbourneEach type identifier must exclusively identify either global variables
227efd7506SPeter Collingbourneor functions.
237efd7506SPeter Collingbourne
247efd7506SPeter Collingbourne.. admonition:: Limitation
257efd7506SPeter Collingbourne
267efd7506SPeter Collingbourne  The current implementation only supports attaching metadata to functions on
277efd7506SPeter Collingbourne  the x86-32 and x86-64 architectures.
287efd7506SPeter Collingbourne
297efd7506SPeter CollingbourneAn intrinsic, :ref:`llvm.type.test <type.test>`, is used to test whether a
307efd7506SPeter Collingbournegiven pointer is associated with a type identifier.
317efd7506SPeter Collingbourne
3272fd1033SSylvestre Ledru.. _control flow integrity: https://clang.llvm.org/docs/ControlFlowIntegrity.html
337efd7506SPeter Collingbourne
347efd7506SPeter CollingbourneRepresenting Type Information using Type Metadata
357efd7506SPeter Collingbourne=================================================
367efd7506SPeter Collingbourne
377efd7506SPeter CollingbourneThis section describes how Clang represents C++ type information associated with
387efd7506SPeter Collingbournevirtual tables using type metadata.
397efd7506SPeter Collingbourne
407efd7506SPeter CollingbourneConsider the following inheritance hierarchy:
417efd7506SPeter Collingbourne
427efd7506SPeter Collingbourne.. code-block:: c++
437efd7506SPeter Collingbourne
447efd7506SPeter Collingbourne  struct A {
457efd7506SPeter Collingbourne    virtual void f();
467efd7506SPeter Collingbourne  };
477efd7506SPeter Collingbourne
487efd7506SPeter Collingbourne  struct B : A {
497efd7506SPeter Collingbourne    virtual void f();
507efd7506SPeter Collingbourne    virtual void g();
517efd7506SPeter Collingbourne  };
527efd7506SPeter Collingbourne
537efd7506SPeter Collingbourne  struct C {
547efd7506SPeter Collingbourne    virtual void h();
557efd7506SPeter Collingbourne  };
567efd7506SPeter Collingbourne
577efd7506SPeter Collingbourne  struct D : A, C {
587efd7506SPeter Collingbourne    virtual void f();
597efd7506SPeter Collingbourne    virtual void h();
607efd7506SPeter Collingbourne  };
617efd7506SPeter Collingbourne
627efd7506SPeter CollingbourneThe virtual table objects for A, B, C and D look like this (under the Itanium ABI):
637efd7506SPeter Collingbourne
647efd7506SPeter Collingbourne.. csv-table:: Virtual Table Layout for A, B, C, D
657efd7506SPeter Collingbourne  :header: Class, 0, 1, 2, 3, 4, 5, 6
667efd7506SPeter Collingbourne
677efd7506SPeter Collingbourne  A, A::offset-to-top, &A::rtti, &A::f
687efd7506SPeter Collingbourne  B, B::offset-to-top, &B::rtti, &B::f, &B::g
697efd7506SPeter Collingbourne  C, C::offset-to-top, &C::rtti, &C::h
707efd7506SPeter Collingbourne  D, D::offset-to-top, &D::rtti, &D::f, &D::h, D::offset-to-top, &D::rtti, thunk for &D::h
717efd7506SPeter Collingbourne
727efd7506SPeter CollingbourneWhen an object of type A is constructed, the address of ``&A::f`` in A's
737efd7506SPeter Collingbournevirtual table object is stored in the object's vtable pointer.  In ABI parlance
747efd7506SPeter Collingbournethis address is known as an `address point`_. Similarly, when an object of type
757efd7506SPeter CollingbourneB is constructed, the address of ``&B::f`` is stored in the vtable pointer. In
767efd7506SPeter Collingbournethis way, the vtable in B's virtual table object is compatible with A's vtable.
777efd7506SPeter Collingbourne
787efd7506SPeter CollingbourneD is a little more complicated, due to the use of multiple inheritance. Its
797efd7506SPeter Collingbournevirtual table object contains two vtables, one compatible with A's vtable and
807efd7506SPeter Collingbournethe other compatible with C's vtable. Objects of type D contain two virtual
817efd7506SPeter Collingbournepointers, one belonging to the A subobject and containing the address of
827efd7506SPeter Collingbournethe vtable compatible with A's vtable, and the other belonging to the C
837efd7506SPeter Collingbournesubobject and containing the address of the vtable compatible with C's vtable.
847efd7506SPeter Collingbourne
857efd7506SPeter CollingbourneThe full set of compatibility information for the above class hierarchy is
867efd7506SPeter Collingbourneshown below. The following table shows the name of a class, the offset of an
877efd7506SPeter Collingbourneaddress point within that class's vtable and the name of one of the classes
887efd7506SPeter Collingbournewith which that address point is compatible.
897efd7506SPeter Collingbourne
907efd7506SPeter Collingbourne.. csv-table:: Type Offsets for A, B, C, D
917efd7506SPeter Collingbourne  :header: VTable for, Offset, Compatible Class
927efd7506SPeter Collingbourne
937efd7506SPeter Collingbourne  A, 16, A
947efd7506SPeter Collingbourne  B, 16, A
957efd7506SPeter Collingbourne   ,   , B
967efd7506SPeter Collingbourne  C, 16, C
977efd7506SPeter Collingbourne  D, 16, A
987efd7506SPeter Collingbourne   ,   , D
997efd7506SPeter Collingbourne   , 48, C
1007efd7506SPeter Collingbourne
1017efd7506SPeter CollingbourneThe next step is to encode this compatibility information into the IR. The way
1027efd7506SPeter Collingbournethis is done is to create type metadata named after each of the compatible
1037efd7506SPeter Collingbourneclasses, with which we associate each of the compatible address points in
1047efd7506SPeter Collingbourneeach vtable. For example, these type metadata entries encode the compatibility
1057efd7506SPeter Collingbourneinformation for the above hierarchy:
1067efd7506SPeter Collingbourne
1077efd7506SPeter Collingbourne::
1087efd7506SPeter Collingbourne
1097efd7506SPeter Collingbourne  @_ZTV1A = constant [...], !type !0
1107efd7506SPeter Collingbourne  @_ZTV1B = constant [...], !type !0, !type !1
1117efd7506SPeter Collingbourne  @_ZTV1C = constant [...], !type !2
1127efd7506SPeter Collingbourne  @_ZTV1D = constant [...], !type !0, !type !3, !type !4
1137efd7506SPeter Collingbourne
1147efd7506SPeter Collingbourne  !0 = !{i64 16, !"_ZTS1A"}
1157efd7506SPeter Collingbourne  !1 = !{i64 16, !"_ZTS1B"}
1167efd7506SPeter Collingbourne  !2 = !{i64 16, !"_ZTS1C"}
1177efd7506SPeter Collingbourne  !3 = !{i64 16, !"_ZTS1D"}
1187efd7506SPeter Collingbourne  !4 = !{i64 48, !"_ZTS1C"}
1197efd7506SPeter Collingbourne
1207efd7506SPeter CollingbourneWith this type metadata, we can now use the ``llvm.type.test`` intrinsic to
1217efd7506SPeter Collingbournetest whether a given pointer is compatible with a type identifier. Working
1227efd7506SPeter Collingbournebackwards, if ``llvm.type.test`` returns true for a particular pointer,
1237efd7506SPeter Collingbournewe can also statically determine the identities of the virtual functions
1247efd7506SPeter Collingbournethat a particular virtual call may call. For example, if a program assumes
1257efd7506SPeter Collingbournea pointer to be a member of ``!"_ZST1A"``, we know that the address can
1267efd7506SPeter Collingbournebe only be one of ``_ZTV1A+16``, ``_ZTV1B+16`` or ``_ZTV1D+16`` (i.e. the
1277efd7506SPeter Collingbourneaddress points of the vtables of A, B and D respectively). If we then load
1287efd7506SPeter Collingbournean address from that pointer, we know that the address can only be one of
1297efd7506SPeter Collingbourne``&A::f``, ``&B::f`` or ``&D::f``.
1307efd7506SPeter Collingbourne
13137c019afSVlad Tsyrklevich.. _address point: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#vtable-general
1327efd7506SPeter Collingbourne
1337efd7506SPeter CollingbourneTesting Addresses For Type Membership
1347efd7506SPeter Collingbourne=====================================
1357efd7506SPeter Collingbourne
1367efd7506SPeter CollingbourneIf a program tests an address using ``llvm.type.test``, this will cause
1377efd7506SPeter Collingbournea link-time optimization pass, ``LowerTypeTests``, to replace calls to this
1387efd7506SPeter Collingbourneintrinsic with efficient code to perform type member tests. At a high level,
1397efd7506SPeter Collingbournethe pass will lay out referenced globals in a consecutive memory region in
1407efd7506SPeter Collingbournethe object file, construct bit vectors that map onto that memory region,
1417efd7506SPeter Collingbourneand generate code at each of the ``llvm.type.test`` call sites to test
1427efd7506SPeter Collingbournepointers against those bit vectors. Because of the layout manipulation, the
1437efd7506SPeter Collingbourneglobals' definitions must be available at LTO time. For more information,
1447efd7506SPeter Collingbournesee the `control flow integrity design document`_.
1457efd7506SPeter Collingbourne
1467efd7506SPeter CollingbourneA type identifier that identifies functions is transformed into a jump table,
1477efd7506SPeter Collingbournewhich is a block of code consisting of one branch instruction for each
1487efd7506SPeter Collingbourneof the functions associated with the type identifier that branches to the
1497efd7506SPeter Collingbournetarget function. The pass will redirect any taken function addresses to the
1507efd7506SPeter Collingbournecorresponding jump table entry. In the object file's symbol table, the jump
1517efd7506SPeter Collingbournetable entries take the identities of the original functions, so that addresses
1527efd7506SPeter Collingbournetaken outside the module will pass any verification done inside the module.
1537efd7506SPeter Collingbourne
1547efd7506SPeter CollingbourneJump tables may call external functions, so their definitions need not
1557efd7506SPeter Collingbournebe available at LTO time. Note that if an externally defined function is
1567efd7506SPeter Collingbourneassociated with a type identifier, there is no guarantee that its identity
1577efd7506SPeter Collingbournewithin the module will be the same as its identity outside of the module,
1587efd7506SPeter Collingbourneas the former will be the jump table entry if a jump table is necessary.
1597efd7506SPeter Collingbourne
1607efd7506SPeter CollingbourneThe `GlobalLayoutBuilder`_ class is responsible for laying out the globals
1617efd7506SPeter Collingbourneefficiently to minimize the sizes of the underlying bitsets.
1627efd7506SPeter Collingbourne
16372fd1033SSylvestre Ledru.. _control flow integrity design document: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
1647efd7506SPeter Collingbourne
1657efd7506SPeter Collingbourne:Example:
1667efd7506SPeter Collingbourne
1677efd7506SPeter Collingbourne::
1687efd7506SPeter Collingbourne
1697efd7506SPeter Collingbourne    target datalayout = "e-p:32:32"
1707efd7506SPeter Collingbourne
1717efd7506SPeter Collingbourne    @a = internal global i32 0, !type !0
1727efd7506SPeter Collingbourne    @b = internal global i32 0, !type !0, !type !1
1737efd7506SPeter Collingbourne    @c = internal global i32 0, !type !1
1747efd7506SPeter Collingbourne    @d = internal global [2 x i32] [i32 0, i32 0], !type !2
1757efd7506SPeter Collingbourne
1767efd7506SPeter Collingbourne    define void @e() !type !3 {
1777efd7506SPeter Collingbourne      ret void
1787efd7506SPeter Collingbourne    }
1797efd7506SPeter Collingbourne
1807efd7506SPeter Collingbourne    define void @f() {
1817efd7506SPeter Collingbourne      ret void
1827efd7506SPeter Collingbourne    }
1837efd7506SPeter Collingbourne
1847efd7506SPeter Collingbourne    declare void @g() !type !3
1857efd7506SPeter Collingbourne
1867efd7506SPeter Collingbourne    !0 = !{i32 0, !"typeid1"}
1877efd7506SPeter Collingbourne    !1 = !{i32 0, !"typeid2"}
1887efd7506SPeter Collingbourne    !2 = !{i32 4, !"typeid2"}
1897efd7506SPeter Collingbourne    !3 = !{i32 0, !"typeid3"}
1907efd7506SPeter Collingbourne
1917efd7506SPeter Collingbourne    declare i1 @llvm.type.test(i8* %ptr, metadata %typeid) nounwind readnone
1927efd7506SPeter Collingbourne
1937efd7506SPeter Collingbourne    define i1 @foo(i32* %p) {
1947efd7506SPeter Collingbourne      %pi8 = bitcast i32* %p to i8*
1957efd7506SPeter Collingbourne      %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid1")
1967efd7506SPeter Collingbourne      ret i1 %x
1977efd7506SPeter Collingbourne    }
1987efd7506SPeter Collingbourne
1997efd7506SPeter Collingbourne    define i1 @bar(i32* %p) {
2007efd7506SPeter Collingbourne      %pi8 = bitcast i32* %p to i8*
2017efd7506SPeter Collingbourne      %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid2")
2027efd7506SPeter Collingbourne      ret i1 %x
2037efd7506SPeter Collingbourne    }
2047efd7506SPeter Collingbourne
2057efd7506SPeter Collingbourne    define i1 @baz(void ()* %p) {
2067efd7506SPeter Collingbourne      %pi8 = bitcast void ()* %p to i8*
2077efd7506SPeter Collingbourne      %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid3")
2087efd7506SPeter Collingbourne      ret i1 %x
2097efd7506SPeter Collingbourne    }
2107efd7506SPeter Collingbourne
2117efd7506SPeter Collingbourne    define void @main() {
2127efd7506SPeter Collingbourne      %a1 = call i1 @foo(i32* @a) ; returns 1
2137efd7506SPeter Collingbourne      %b1 = call i1 @foo(i32* @b) ; returns 1
2147efd7506SPeter Collingbourne      %c1 = call i1 @foo(i32* @c) ; returns 0
2157efd7506SPeter Collingbourne      %a2 = call i1 @bar(i32* @a) ; returns 0
2167efd7506SPeter Collingbourne      %b2 = call i1 @bar(i32* @b) ; returns 1
2177efd7506SPeter Collingbourne      %c2 = call i1 @bar(i32* @c) ; returns 1
2187efd7506SPeter Collingbourne      %d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0
2197efd7506SPeter Collingbourne      %d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1
2207efd7506SPeter Collingbourne      %e = call i1 @baz(void ()* @e) ; returns 1
2217efd7506SPeter Collingbourne      %f = call i1 @baz(void ()* @f) ; returns 0
2227efd7506SPeter Collingbourne      %g = call i1 @baz(void ()* @g) ; returns 1
2237efd7506SPeter Collingbourne      ret void
2247efd7506SPeter Collingbourne    }
2257efd7506SPeter Collingbourne
226*43def795SHafiz Abid Qadeer.. _GlobalLayoutBuilder: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Transforms/IPO/LowerTypeTests.h
2273b598b9cSOliver Stannard
2283b598b9cSOliver Stannard``!vcall_visibility`` Metadata
2293b598b9cSOliver Stannard==============================
2303b598b9cSOliver Stannard
2313b598b9cSOliver StannardIn order to allow removing unused function pointers from vtables, we need to
2323b598b9cSOliver Stannardknow whether every virtual call which could use it is known to the compiler, or
2333b598b9cSOliver Stannardwhether another translation unit could introduce more calls through the vtable.
2343b598b9cSOliver StannardThis is not the same as the linkage of the vtable, because call sites could be
2353b598b9cSOliver Stannardusing a pointer of a more widely-visible base class. For example, consider this
2363b598b9cSOliver Stannardcode:
2373b598b9cSOliver Stannard
2383b598b9cSOliver Stannard.. code-block:: c++
2393b598b9cSOliver Stannard
2403b598b9cSOliver Stannard  __attribute__((visibility("default")))
2413b598b9cSOliver Stannard  struct A {
2423b598b9cSOliver Stannard    virtual void f();
2433b598b9cSOliver Stannard  };
2443b598b9cSOliver Stannard
2453b598b9cSOliver Stannard  __attribute__((visibility("hidden")))
2463b598b9cSOliver Stannard  struct B : A {
2473b598b9cSOliver Stannard    virtual void f();
2483b598b9cSOliver Stannard  };
2493b598b9cSOliver Stannard
2503b598b9cSOliver StannardWith LTO, we know that all code which can see the declaration of ``B`` is
2513b598b9cSOliver Stannardvisible to us. However, a pointer to a ``B`` could be cast to ``A*`` and passed
2523b598b9cSOliver Stannardto another linkage unit, which could then call ``f`` on it. This call would
2533b598b9cSOliver Stannardload from the vtable for ``B`` (using the object pointer), and then call
2543b598b9cSOliver Stannard``B::f``. This means we can't remove the function pointer from ``B``'s vtable,
2553b598b9cSOliver Stannardor the implementation of ``B::f``. However, if we can see all code which knows
2563b598b9cSOliver Stannardabout any dynamic base class (which would be the case if ``B`` only inherited
2573b598b9cSOliver Stannardfrom classes with hidden visibility), then this optimisation would be valid.
2583b598b9cSOliver Stannard
2593b598b9cSOliver StannardThis concept is represented in IR by the ``!vcall_visibility`` metadata
2603b598b9cSOliver Stannardattached to vtable objects, with the following values:
2613b598b9cSOliver Stannard
2623b598b9cSOliver Stannard.. list-table::
2633b598b9cSOliver Stannard   :header-rows: 1
2643b598b9cSOliver Stannard   :widths: 10 90
2653b598b9cSOliver Stannard
2663b598b9cSOliver Stannard   * - Value
2673b598b9cSOliver Stannard     - Behavior
2683b598b9cSOliver Stannard
2693b598b9cSOliver Stannard   * - 0 (or omitted)
2703b598b9cSOliver Stannard     - **Public**
2713b598b9cSOliver Stannard           Virtual function calls using this vtable could be made from external
2723b598b9cSOliver Stannard           code.
2733b598b9cSOliver Stannard
2743b598b9cSOliver Stannard   * - 1
2753b598b9cSOliver Stannard     - **Linkage Unit**
2763b598b9cSOliver Stannard           All virtual function calls which might use this vtable are in the
2773b598b9cSOliver Stannard           current LTO unit, meaning they will be in the current module once
2783b598b9cSOliver Stannard           LTO linking has been performed.
2793b598b9cSOliver Stannard
2803b598b9cSOliver Stannard   * - 2
2813b598b9cSOliver Stannard     - **Translation Unit**
2823b598b9cSOliver Stannard           All virtual function calls which might use this vtable are in the
2833b598b9cSOliver Stannard           current module.
2843b598b9cSOliver Stannard
2853b598b9cSOliver StannardIn addition, all function pointer loads from a vtable marked with the
2863b598b9cSOliver Stannard``!vcall_visibility`` metadata (with a non-zero value) must be done using the
2873b598b9cSOliver Stannard:ref:`llvm.type.checked.load <type.checked.load>` intrinsic, so that virtual
2883b598b9cSOliver Stannardcalls sites can be correlated with the vtables which they might load from.
2893b598b9cSOliver StannardOther parts of the vtable (RTTI, offset-to-top, ...) can still be accessed with
2903b598b9cSOliver Stannardnormal loads.
291