1===================================== 2The PDB File Format 3===================================== 4 5.. contents:: 6 :local: 7 8.. _pdb_intro: 9 10Introduction 11============ 12 13PDB (Program Database) is a file format invented by Microsoft and which contains 14debug information that can be consumed by debuggers and other tools. Since 15officially supported APIs exist on Windows for querying debug information from 16PDBs even without the user understanding the internals of the file format, a 17large ecosystem of tools has been built for Windows to consume this format. In 18order for Clang to be able to generate programs that can interoperate with these 19tools, it is necessary for us to generate PDB files ourselves. 20 21At the same time, LLVM has a long history of being able to cross-compile from 22any platform to any platform, and we wish for the same to be true here. So it 23is necessary for us to understand the PDB file format at the byte-level so that 24we can generate PDB files entirely on our own. 25 26This manual describes what we know about the PDB file format today. The layout 27of the file, the various streams contained within, the format of individual 28records within, and more. 29 30We would like to extend our heartfelt gratitude to Microsoft, without whom we 31would not be where we are today. Much of the knowledge contained within this 32manual was learned through reading code published by Microsoft on their `GitHub 33repo <https://github.com/Microsoft/microsoft-pdb>`__. 34 35.. _pdb_layout: 36 37File Layout 38=========== 39 40.. important:: 41 Unless otherwise specified, all numeric values are encoded in little endian. 42 If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always 43 assume it is little endian! 44 45.. toctree:: 46 :hidden: 47 48 MsfFile 49 PdbStream 50 TpiStream 51 DbiStream 52 ModiStream 53 PublicStream 54 GlobalStream 55 HashTable 56 CodeViewSymbols 57 CodeViewTypes 58 59.. _msf: 60 61The MSF Container 62----------------- 63A PDB file is an MSF (Multi-Stream Format) file. An MSF file is a "file system 64within a file". It contains multiple streams (aka files) which can represent 65arbitrary data, and these streams are divided into blocks which may not 66necessarily be contiguously laid out within the MSF container file. 67Additionally, the MSF contains a stream directory (aka MFT) which describes how 68the streams (files) are laid out within the MSF. 69 70For more information about the MSF container format, stream directory, and 71block layout, see :doc:`MsfFile`. 72 73.. _streams: 74 75Streams 76------- 77The PDB format contains a number of streams which describe various information 78such as the types, symbols, source files, and compilands (e.g. object files) 79of a program, as well as some additional streams containing hash tables that are 80used by debuggers and other tools to provide fast lookup of records and types 81by name, and various other information about how the program was compiled such 82as the specific toolchain used, and more. A summary of streams contained in a 83PDB file is as follows: 84 85+--------------------+------------------------------+-------------------------------------------+ 86| Name | Stream Index | Contents | 87+====================+==============================+===========================================+ 88| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory | 89+--------------------+------------------------------+-------------------------------------------+ 90| PDB Stream | - Fixed Stream Index 1 | - Basic File Information | 91| | | - Fields to match EXE to this PDB | 92| | | - Map of named streams to stream indices | 93+--------------------+------------------------------+-------------------------------------------+ 94| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records | 95| | | - Index of TPI Hash Stream | 96+--------------------+------------------------------+-------------------------------------------+ 97| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information | 98| | | - Indices of individual module streams | 99| | | - Indices of public / global streams | 100| | | - Section Contribution Information | 101| | | - Source File Information | 102| | | - References to streams containing | 103| | | FPO / PGO Data | 104+--------------------+------------------------------+-------------------------------------------+ 105| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records | 106| | | - Index of IPI Hash Stream | 107+--------------------+------------------------------+-------------------------------------------+ 108| /LinkInfo | - Contained in PDB Stream | - Unknown | 109| | Named Stream map | | 110+--------------------+------------------------------+-------------------------------------------+ 111| /src/headerblock | - Contained in PDB Stream | - Summary of embedded source file content | 112| | Named Stream map | (e.g. natvis files) | 113+--------------------+------------------------------+-------------------------------------------+ 114| /names | - Contained in PDB Stream | - PDB-wide global string table used for | 115| | Named Stream map | string de-duplication | 116+--------------------+------------------------------+-------------------------------------------+ 117| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module | 118| | - One for each compiland | - Line Number Information | 119+--------------------+------------------------------+-------------------------------------------+ 120| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records | 121| | | - Index of Public Hash Stream | 122+--------------------+------------------------------+-------------------------------------------+ 123| Global Stream | - Contained in DBI Stream | - Single combined symbol-table | 124| | | - Index of Global Hash Stream | 125+--------------------+------------------------------+-------------------------------------------+ 126| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records | 127| | | by name | 128+--------------------+------------------------------+-------------------------------------------+ 129| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records | 130| | | by name | 131+--------------------+------------------------------+-------------------------------------------+ 132 133More information about the structure of each of these can be found on the 134following pages: 135 136:doc:`PdbStream` 137 Information about the PDB Info Stream and how it is used to match PDBs to EXEs. 138 139:doc:`TpiStream` 140 Information about the TPI stream and the CodeView records contained within. 141 142:doc:`DbiStream` 143 Information about the DBI stream and relevant substreams including the 144 Module Substreams, source file information, and CodeView symbol records 145 contained within. 146 147:doc:`ModiStream` 148 Information about the Module Information Stream, of which there is one for 149 each compilation unit and the format of symbols contained within. 150 151:doc:`PublicStream` 152 Information about the Public Symbol Stream. 153 154:doc:`GlobalStream` 155 Information about the Global Symbol Stream. 156 157:doc:`HashTable` 158 Information about the serialized hash table format used internally to 159 represent things such as the Named Stream Map and the Hash Adjusters in the 160 :doc:`TPI/IPI Stream <TpiStream>`. 161 162CodeView 163======== 164CodeView is another format which comes into the picture. While MSF defines 165the structure of the overall file, and PDB defines the set of streams that 166appear within the MSF file and the format of those streams, CodeView defines 167the format of **symbol and type records** that appear within specific streams. 168Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for 169more information about the CodeView format. 170