xref: /llvm-project/llvm/docs/PDB/TpiStream.rst (revision 5c9e20d7d0a71439a95875ba6067f9c0fc7a4e04)
1=====================================
2The PDB TPI and IPI Streams
3=====================================
4
5.. contents::
6   :local:
7
8.. _tpi_intro:
9
10Introduction
11============
12
13The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
14all types used in the program.  It is organized as a :ref:`header <tpi_header>`
15followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`.  Types are
16referenced from various streams and records throughout the PDB by their
17:ref:`type index <type_indices>`.  In general, the sequence of type records
18following the :ref:`header <tpi_header>` forms a topologically sorted DAG
19(directed acyclic graph), which means that a type record B can only refer to
20the type A if ``A.TypeIndex < B.TypeIndex``.  While there are rare cases where
21this property will not hold (particularly when dealing with object files
22compiled with MASM), an implementation should try very hard to make this
23property hold, as it means the entire type graph can be constructed in a single
24pass.
25
26.. important::
27   Type records form a topologically sorted DAG (directed acyclic graph).
28
29.. _tpi_ipi:
30
31TPI vs IPI Stream
32=================
33
34Recent versions of the PDB format (aka all versions covered by this document)
35have 2 streams with identical layout, henceforth referred to as the TPI stream
36and IPI stream.  Subsequent contents of this document describing the on-disk
37format apply equally whether it is for the TPI Stream or the IPI Stream.  The
38only difference between the two is in *which* CodeView records are allowed to
39appear in each one, summarized by the following table:
40
41+----------------------+---------------------+
42|    TPI Stream        |    IPI Stream       |
43+======================+=====================+
44|  LF_POINTER          | LF_FUNC_ID          |
45+----------------------+---------------------+
46|  LF_MODIFIER         | LF_MFUNC_ID         |
47+----------------------+---------------------+
48|  LF_PROCEDURE        | LF_BUILDINFO        |
49+----------------------+---------------------+
50|  LF_MFUNCTION        | LF_SUBSTR_LIST      |
51+----------------------+---------------------+
52|  LF_LABEL            | LF_STRING_ID        |
53+----------------------+---------------------+
54|  LF_ARGLIST          | LF_UDT_SRC_LINE     |
55+----------------------+---------------------+
56|  LF_FIELDLIST        | LF_UDT_MOD_SRC_LINE |
57+----------------------+---------------------+
58|  LF_ARRAY            |                     |
59+----------------------+---------------------+
60|  LF_CLASS            |                     |
61+----------------------+---------------------+
62|  LF_STRUCTURE        |                     |
63+----------------------+---------------------+
64|  LF_INTERFACE        |                     |
65+----------------------+---------------------+
66|  LF_UNION            |                     |
67+----------------------+---------------------+
68|  LF_ENUM             |                     |
69+----------------------+---------------------+
70|  LF_TYPESERVER2      |                     |
71+----------------------+---------------------+
72|  LF_VFTABLE          |                     |
73+----------------------+---------------------+
74|  LF_VTSHAPE          |                     |
75+----------------------+---------------------+
76|  LF_BITFIELD         |                     |
77+----------------------+---------------------+
78|  LF_METHODLIST       |                     |
79+----------------------+---------------------+
80|  LF_PRECOMP          |                     |
81+----------------------+---------------------+
82|  LF_ENDPRECOMP       |                     |
83+----------------------+---------------------+
84
85The usage of these records is described in more detail in
86:doc:`CodeView Type Records <CodeViewTypes>`.
87
88.. _type_indices:
89
90Type Indices
91============
92
93A type index is a 32-bit integer that uniquely identifies a type inside of an
94object file's ``.debug$T`` section or a PDB file's TPI or IPI stream.  The
95value of the type index for the first type record from the TPI stream is given
96by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`
97although in practice this value is always equal to 0x1000 (4096).
98
99Any type index with a high bit set is considered to come from the IPI stream,
100although this appears to be more of a hack, and LLVM does not generate type
101indices of this nature.  They can, however, be observed in Microsoft PDBs
102occasionally, so one should be prepared to handle them.  Note that having the
103high bit set is not a necessary condition to determine whether a type index
104comes from the IPI stream, it is only sufficient.
105
106Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
107to come from the appropriate stream, and any type index less than this is a
108bitmask which can be decomposed as follows:
109
110.. code-block:: none
111
112  .---------------------------.------.----------.
113  |           Unused          | Mode |   Kind   |
114  '---------------------------'------'----------'
115  |+32                        |+12   |+8        |+0
116
117
118- **Kind** - A value from the following enum:
119
120.. code-block:: c++
121
122  enum class SimpleTypeKind : uint32_t {
123    None = 0x0000,          // uncharacterized type (no type)
124    Void = 0x0003,          // void
125    NotTranslated = 0x0007, // type not translated by cvpack
126    HResult = 0x0008,       // OLE/COM HRESULT
127
128    SignedCharacter = 0x0010,   // 8 bit signed
129    UnsignedCharacter = 0x0020, // 8 bit unsigned
130    NarrowCharacter = 0x0070,   // really a char
131    WideCharacter = 0x0071,     // wide char
132    Character16 = 0x007a,       // char16_t
133    Character32 = 0x007b,       // char32_t
134    Character8 = 0x007c,        // char8_t
135
136    SByte = 0x0068,       // 8 bit signed int
137    Byte = 0x0069,        // 8 bit unsigned int
138    Int16Short = 0x0011,  // 16 bit signed
139    UInt16Short = 0x0021, // 16 bit unsigned
140    Int16 = 0x0072,       // 16 bit signed int
141    UInt16 = 0x0073,      // 16 bit unsigned int
142    Int32Long = 0x0012,   // 32 bit signed
143    UInt32Long = 0x0022,  // 32 bit unsigned
144    Int32 = 0x0074,       // 32 bit signed int
145    UInt32 = 0x0075,      // 32 bit unsigned int
146    Int64Quad = 0x0013,   // 64 bit signed
147    UInt64Quad = 0x0023,  // 64 bit unsigned
148    Int64 = 0x0076,       // 64 bit signed int
149    UInt64 = 0x0077,      // 64 bit unsigned int
150    Int128Oct = 0x0014,   // 128 bit signed int
151    UInt128Oct = 0x0024,  // 128 bit unsigned int
152    Int128 = 0x0078,      // 128 bit signed int
153    UInt128 = 0x0079,     // 128 bit unsigned int
154
155    Float16 = 0x0046,                 // 16 bit real
156    Float32 = 0x0040,                 // 32 bit real
157    Float32PartialPrecision = 0x0045, // 32 bit PP real
158    Float48 = 0x0044,                 // 48 bit real
159    Float64 = 0x0041,                 // 64 bit real
160    Float80 = 0x0042,                 // 80 bit real
161    Float128 = 0x0043,                // 128 bit real
162
163    Complex16 = 0x0056,                 // 16 bit complex
164    Complex32 = 0x0050,                 // 32 bit complex
165    Complex32PartialPrecision = 0x0055, // 32 bit PP complex
166    Complex48 = 0x0054,                 // 48 bit complex
167    Complex64 = 0x0051,                 // 64 bit complex
168    Complex80 = 0x0052,                 // 80 bit complex
169    Complex128 = 0x0053,                // 128 bit complex
170
171    Boolean8 = 0x0030,   // 8 bit boolean
172    Boolean16 = 0x0031,  // 16 bit boolean
173    Boolean32 = 0x0032,  // 32 bit boolean
174    Boolean64 = 0x0033,  // 64 bit boolean
175    Boolean128 = 0x0034, // 128 bit boolean
176  };
177
178- **Mode** - A value from the following enum:
179
180.. code-block:: c++
181
182  enum class SimpleTypeMode : uint32_t {
183    Direct = 0,        // Not a pointer
184    NearPointer = 1,   // Near pointer
185    FarPointer = 2,    // Far pointer
186    HugePointer = 3,   // Huge pointer
187    NearPointer32 = 4, // 32 bit near pointer
188    FarPointer32 = 5,  // 32 bit far pointer
189    NearPointer64 = 6, // 64 bit near pointer
190    NearPointer128 = 7 // 128 bit near pointer
191  };
192
193Note that for pointers, the bitness is represented in the mode.  So a ``void*``
194would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for
19532-bits but a type index with ``Mode=NearPointer64, Kind=Void`` if built for
19664-bits.
197
198By convention, the type index for ``std::nullptr_t`` is constructed the same
199way as the type index for ``void*``, but using the bitless enumeration value
200``NearPointer``.
201
202.. _tpi_header:
203
204Stream Header
205=============
206At offset 0 of the TPI Stream is a header with the following layout:
207
208.. code-block:: c++
209
210  struct TpiStreamHeader {
211    uint32_t Version;
212    uint32_t HeaderSize;
213    uint32_t TypeIndexBegin;
214    uint32_t TypeIndexEnd;
215    uint32_t TypeRecordBytes;
216
217    uint16_t HashStreamIndex;
218    uint16_t HashAuxStreamIndex;
219    uint32_t HashKeySize;
220    uint32_t NumHashBuckets;
221
222    int32_t HashValueBufferOffset;
223    uint32_t HashValueBufferLength;
224
225    int32_t IndexOffsetBufferOffset;
226    uint32_t IndexOffsetBufferLength;
227
228    int32_t HashAdjBufferOffset;
229    uint32_t HashAdjBufferLength;
230  };
231
232- **Version** - A value from the following enum.
233
234.. code-block:: c++
235
236  enum class TpiStreamVersion : uint32_t {
237    V40 = 19950410,
238    V41 = 19951122,
239    V50 = 19961031,
240    V70 = 19990903,
241    V80 = 20040203,
242  };
243
244Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
245``V80``, and no other values have been observed.  It is assumed that should
246another value be observed, the layout described by this document may not be
247accurate.
248
249- **HeaderSize** - ``sizeof(TpiStreamHeader)``
250
251- **TypeIndexBegin** - The numeric value of the type index representing the
252  first type record in the TPI stream.  This is usually the value 0x1000 as
253  type indices lower than this are reserved (see :ref:`Type Indices
254  <type_indices>` for
255  a discussion of reserved type indices).
256
257- **TypeIndexEnd** - One greater than the numeric value of the type index
258  representing the last type record in the TPI stream.  The total number of
259  type records in the TPI stream can be computed as ``TypeIndexEnd -
260  TypeIndexBegin``.
261
262- **TypeRecordBytes** - The number of bytes of type record data following the
263  header.
264
265- **HashStreamIndex** - The index of a stream which contains a list of hashes
266  for every type record.  This value may be -1, indicating that hash
267  information is not present.  In practice a valid stream index is always
268  observed, so any producer implementation should be prepared to emit this
269  stream to ensure compatibility with tools which may expect it to be present.
270
271- **HashAuxStreamIndex** - Presumably the index of a stream which contains a
272  separate hash table, although this has not been observed in practice and it's
273  unclear what it might be used for.
274
275- **HashKeySize** - The size of a hash value (usually 4 bytes).
276
277- **NumHashBuckets** - The number of buckets used to generate the hash values
278  in the aforementioned hash streams.
279
280- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
281  the TPI Hash Stream of the list of hash values.  It should be assumed that
282  there are either 0 hash values, or a number equal to the number of type
283  records in the TPI stream (``TypeIndexEnd - TypeEndBegin``).  Thus, if
284  ``HashBufferLength`` is not equal to ``(TypeIndexEnd - TypeEndBegin) *
285  HashKeySize`` we can consider the PDB malformed.
286
287- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
288  within the TPI Hash Stream of the Type Index Offsets Buffer.  This is a list
289  of pairs of uint32_t's where the first value is a :ref:`Type Index
290  <type_indices>` and the second value is the offset in the type record data of
291  the type with this index.  This can be used to do a binary search followed by
292  a linear search to get O(log n) lookup by type index.
293
294- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
295  the TPI hash stream of a serialized hash table whose keys are the hash values
296  in the hash value buffer and whose values are type indices.  This appears to
297  be useful in incremental linking scenarios, so that if a type is modified an
298  entry can be created mapping the old hash value to the new type index so that
299  a PDB file consumer can always have the most up to date version of the type
300  without forcing the incremental linker to garbage collect and update
301  references that point to the old version to now point to the new version.
302  The layout of this hash table is described in :doc:`HashTable`.
303
304.. _tpi_records:
305
306CodeView Type Record List
307=========================
308Following the header, there are ``TypeRecordBytes`` bytes of data that
309represent a variable length array of :doc:`CodeView type records
310<CodeViewTypes>`.  The number of such records (e.g. the length of the array)
311can be determined by computing the value ``Header.TypeIndexEnd -
312Header.TypeIndexBegin``.
313
314O(log(n)) access is provided by way of the Type Index Offsets array (if
315present) described previously.
316