xref: /llvm-project/llvm/docs/PDB/PdbStream.rst (revision 0c660256eb41fb0ba44277a32f39d2a028f797f2)
1========================================
2The PDB Info Stream (aka the PDB Stream)
3========================================
4
5.. contents::
6   :local:
7
8.. _pdb_stream_header:
9
10Stream Header
11=============
12At offset 0 of the PDB Stream is a header with the following layout:
13
14
15.. code-block:: c++
16
17  struct PdbStreamHeader {
18    ulittle32_t Version;
19    ulittle32_t Signature;
20    ulittle32_t Age;
21    Guid UniqueId;
22  };
23
24- **Version** - A Value from the following enum:
25
26.. code-block:: c++
27
28  enum class PdbStreamVersion : uint32_t {
29    VC2 = 19941610,
30    VC4 = 19950623,
31    VC41 = 19950814,
32    VC50 = 19960307,
33    VC98 = 19970604,
34    VC70Dep = 19990604,
35    VC70 = 20000404,
36    VC80 = 20030901,
37    VC110 = 20091201,
38    VC140 = 20140508,
39  };
40
41While the meaning of this field appears to be obvious, in practice we have
42never observed a value other than ``VC70``, even with modern versions of
43the toolchain, and it is unclear why the other values exist.  It is assumed
44that certain aspects of the PDB stream's layout, and perhaps even that of
45the other streams, will change if the value is something other than ``VC70``.
46
47- **Signature** - A 32-bit time-stamp generated with a call to ``time()`` at
48  the time the PDB file is written.  Note that due to the inherent uniqueness
49  problems of using a timestamp with 1-second granularity, this field does not
50  really serve its intended purpose, and as such is typically ignored in favor
51  of the ``Guid`` field, described below.
52
53- **Age** - The number of times the PDB file has been written.  This can be used
54  along with ``Guid`` to match the PDB to its corresponding executable.
55
56- **Guid** - A 128-bit identifier guaranteed to be unique across space and time.
57  In general, this can be thought of as the result of calling the Win32 API
58  `UuidCreate <https://msdn.microsoft.com/en-us/library/windows/desktop/aa379205(v=vs.85).aspx>`__,
59  although LLVM cannot rely on that, as it must work on non-Windows platforms.
60
61.. _pdb_named_stream_map:
62
63Named Stream Map
64================
65
66Following the header is a serialized hash table whose key type is a string, and
67whose value type is an integer.  The existence of a mapping ``X -> Y`` means
68that the stream with the name ``X`` has stream index ``Y`` in the underlying MSF
69file.  Note that not all streams are named (for example, the
70:doc:`TPI Stream <TpiStream>` has a fixed index and as such there is no need to
71look up its index by name).  In practice, there are usually only a small number
72of named streams and these are enumerated in the table of streams in :doc:`index`.
73A corollary of this is if a stream does have a name (and as such is in the named
74stream map) then consulting the Named Stream Map is likely to be the only way to
75discover the stream's MSF stream index.  Several important streams (such as the
76global string table, which is called ``/names``) can only be located this way, and
77so it is important to both produce and consume this correctly as tools will not
78function correctly without it.
79
80.. important::
81   Some streams are located by fixed indices (e.g TPI Stream has index 2), but
82   other streams are located by fixed names (e.g. the string table is called
83   ``/names``) and can only be located by consulting the Named Stream Map.
84
85The on-disk layout of the Named Stream Map consists of 2 components.  The first is
86a buffer of string data prefixed by a 32-bit length.  The second is a serialized
87hash table whose key and value types are both ``uint32_t``.  The key is the offset
88of a null-terminated string in the string data buffer specifying the name of the
89stream, and the value is the MSF stream index of the stream with said name.
90Note that although the key is an integer, the hash function used to find the right
91bucket hashes the string at the corresponding offset in the string data buffer.
92
93The on-disk layout of the serialized hash table is described at :doc:`HashTable`.
94
95Note that the entire Named Stream Map is not length-prefixed, so the only way to
96get to the data following it is to de-serialize it in its entirety.
97
98
99.. _pdb_stream_features:
100
101PDB Feature Codes
102=================
103Following the Named Stream Map, and consuming all remaining bytes of the PDB
104Stream is a list of values from the following enumeration:
105
106.. code-block:: c++
107
108  enum class PdbRaw_FeatureSig : uint32_t {
109    VC110 = 20091201,
110    VC140 = 20140508,
111    NoTypeMerge = 0x4D544F4E,
112    MinimalDebugInfo = 0x494E494D,
113  };
114
115The meaning of these values is summarized by the following table:
116
117+------------------+-------------------------------------------------+
118| Flag             | Meaning                                         |
119+==================+=================================================+
120| VC110            | - No other features flags are present           |
121|                  | - PDB contains an :doc:`IPI Stream <TpiStream>` |
122+------------------+-------------------------------------------------+
123| VC140            | - Other feature flags may be present            |
124|                  | - PDB contains an :doc:`IPI Stream <TpiStream>` |
125+------------------+-------------------------------------------------+
126| NoTypeMerge      | - Presumably duplicate types can appear in the  |
127|                  |   TPI Stream, although it's unclear why this    |
128|                  |   might happen.                                 |
129+------------------+-------------------------------------------------+
130| MinimalDebugInfo | - Program was linked with /DEBUG:FASTLINK       |
131|                  | - There is no TPI / IPI stream, all type info   |
132|                  |   is contained in the original object files.    |
133+------------------+-------------------------------------------------+
134
135Matching a PDB to its executable
136================================
137The linker is responsible for writing both the PDB and the final executable, and
138as a result is the only entity capable of writing the information necessary to
139match the PDB to the executable.
140
141In order to accomplish this, the linker generates a guid for the PDB (or
142re-uses the existing guid if it is linking incrementally) and increments the Age
143field.
144
145The executable is a PE/COFF file, and part of a PE/COFF file is the presence of
146number of "directories".  For our purposes here, we are interested in the "debug
147directory".  The exact format of a debug directory is described by the
148`IMAGE_DEBUG_DIRECTORY structure <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680307(v=vs.85).aspx>`__.
149For this particular case, the linker emits a debug directory of type
150``IMAGE_DEBUG_TYPE_CODEVIEW``.  The format of this record is defined in
151``llvm/DebugInfo/CodeView/CVDebugRecord.h``, but it suffices to say here only
152that it includes the same ``Guid`` and ``Age`` fields.  At runtime, a
153debugger or tool can scan the COFF executable image for the presence of
154a debug directory of the correct type and verify that the Guid and Age match.
155