1=================================== 2Instrumentation Profile Format 3=================================== 4 5.. contents:: 6 :local: 7 8 9Overview 10========= 11 12Clang supports two types of profiling via instrumentation [1]_: frontend-based 13and IR-based, and both could support a variety of use cases [2]_ . 14This document describes two binary serialization formats (raw and indexed) to 15store instrumented profiles with a specific emphasis on IRPGO use case, in the 16sense that when specific header fields and payload sections have different ways 17of interpretation across use cases, the documentation is based on IRPGO. 18 19.. note:: 20 Frontend-generated profiles are used together with coverage mapping for 21 `source-based code coverage`_. The `coverage mapping format`_ is different from 22 profile format. 23 24.. _`source-based code coverage`: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html 25.. _`coverage mapping format`: https://llvm.org/docs/CoverageMappingFormat.html 26 27Raw Profile Format 28=================== 29 30The raw profile is generated by running the instrumented binary. The raw profile 31data from an executable or a shared library [3]_ consists of a header and 32multiple sections, with each section as a memory dump. The raw profile data needs 33to be reasonably compact and fast to generate. 34 35There are no backward or forward version compatiblity guarantees for the raw profile 36format. That is, compilers and tools `require`_ a specific raw profile version 37to parse the profiles. 38 39.. _`require`: https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558 40 41To feed profiles back into compilers for an optimized build (e.g., via 42``-fprofile-use`` for IR instrumentation), a raw profile must to be converted into 43indexed format. 44 45General Storage Layout 46----------------------- 47 48The storage layout of raw profile data format is illustrated below. Basically, 49when the raw profile is read into an memory buffer, the actual byte offset of a 50section is inferred from the section's order in the layout and size information 51of all the sections ahead of it. 52 53:: 54 55 +----+-----------------------+ 56 | | Magic | 57 | +-----------------------+ 58 | | Version | 59 | +-----------------------+ 60 H | Size Info for | 61 E | Section 1 | 62 A +-----------------------+ 63 D | Size Info for | 64 E | Section 2 | 65 R +-----------------------+ 66 | | ... | 67 | +-----------------------+ 68 | | Size Info for | 69 | | Section N | 70 +----+-----------------------+ 71 P | Section 1 | 72 A +-----------------------+ 73 Y | Section 2 | 74 L +-----------------------+ 75 O | ... | 76 A +-----------------------+ 77 D | Section N | 78 +----+-----------------------+ 79 80 81.. note:: 82 Sections might be padded to meet specific alignment requirements. For 83 simplicity, header fields and data sections solely for padding purpose are 84 omitted in the data layout graph above and the rest of this document. 85 86Header 87------- 88 89``Magic`` 90 Magic number encodes profile format (raw, indexed or text). For the raw format, 91 the magic number also encodes the endianness (big or little) and C pointer 92 size (4 or 8 bytes) of the platform on which the profile is generated. 93 94 A factory method reads the magic number to construct reader properly and returns 95 error upon unrecognized format. Specifically, the factory method and raw profile 96 reader implementation make sure that a raw profile file could be read back on 97 a platform with the opposite endianness and/or the other C pointer size. 98 99``Version`` 100 The lower 32 bits specify the actual version and the most significant 32 bits 101 specify the variant types of the profile. IR-based instrumentation PGO and 102 context-sensitive IR-based instrumentation PGO are two variant types. 103 104``BinaryIdsSize`` 105 The byte size of `binary id`_ section. 106 107``NumData`` 108 The number of profile metadata. The byte size of `profile metadata`_ section 109 could be computed with this field. 110 111``NumCounter`` 112 The number of entries in the profile counter section. The byte size of `counter`_ 113 section could be computed with this field. 114 115``NumBitmapBytes`` 116 The number of bytes in the profile `bitmap`_ section. 117 118``NamesSize`` 119 The number of bytes in the name section. 120 121.. _`CountersDelta`: 122 123``CountersDelta`` 124 This field records the in-memory address difference between the `profile metadata`_ 125 and counter section in the instrumented binary, i.e., ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``. 126 127 It's used jointly with the `CounterPtr`_ field to compute the counter offset 128 relative to ``start(__llvm_prf_cnts)``. Check out calculation-of-counter-offset_ 129 for a visualized explanation. 130 131 .. note:: 132 The ``__llvm_prf_data`` object file section might not be loaded into memory 133 when instrumented binary runs or might not get generated in the instrumented 134 binary in the first place. In those cases, ``CountersDelta`` is not used and 135 other mechanisms are used to match counters with instrumented code. See 136 `lightweight instrumentation`_ and `binary profile correlation`_ for examples. 137 138``BitmapDelta`` 139 This field records the in-memory address difference between the `profile metadata`_ 140 and bitmap section in the instrumented binary, i.e., ``start(__llvm_prf_bits) - start(__llvm_prf_data)``. 141 142 It's used jointly with the `BitmapPtr`_ to find the bitmap of a profile data 143 record, in a similar way to how counters are referenced as explained by 144 calculation-of-counter-offset_ . 145 146 Similar to `CountersDelta`_ field, this field may not be used in non-PGO variants 147 of profiles. 148 149``NamesDelta`` 150 Records the in-memory address of name section. Not used except for raw profile 151 reader error checking. 152 153``NumVTables`` 154 Records the number of instrumented vtable entries in the binary. Used for 155 `type profiling`_. 156 157``VNamesSize`` 158 Records the byte size in the virtual table names section. Used for `type profiling`_. 159 160``ValueKindLast`` 161 Records the number of value kinds. Macro `VALUE_PROF_KIND`_ defines the value 162 kinds with a description of the kind. 163 164.. _`VALUE_PROF_KIND`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/compiler-rt/include/profile/InstrProfData.inc#L184-L186 165 166Payload Sections 167------------------ 168 169Binary Ids 170^^^^^^^^^^^ 171Stores the binary ids of the instrumented binaries to associate binaries with 172profiles for source code coverage. See `binary id`_ RFC for the design. 173 174.. _`profile metadata`: 175 176Profile Metadata 177^^^^^^^^^^^^^^^^^^ 178 179This section stores the metadata to map counters and value profiles back to 180instrumented code regions (e.g., LLVM IR for IRPGO). 181 182The in-memory representation of the metadata is `__llvm_profile_data`_. 183Some fields are used to reference data from other sections in the profile. 184The fields are documented as follows: 185 186.. _`__llvm_profile_data`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/include/profile/InstrProfData.inc#L65-L95 187 188``NameRef`` 189 The MD5 of the function's PGO name. PGO name has the format 190 ``[<filepath><delimiter>]<mangled-name>`` where ``<filepath>`` and 191 ``<delimiter>`` are provided for local-linkage functions to tell possibly 192 identical functions. 193 194.. _FuncHash: 195 196``FuncHash`` 197 A checksum of the function's IR, taking control flow graph and instrumented 198 value sites into accounts. See `computeCFGHash`_ for details. 199 200.. _`computeCFGHash`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685 201 202.. _`CounterPtr`: 203 204``CounterPtr`` 205 The in-memory address difference between profile data and the start of corresponding 206 counters. Counter position is stored this way (as a link-time constant) to reduce 207 instrumented binary size compared with snapshotting the address of symbols directly. 208 See `commit a1532ed`_ for further information. 209 210.. _`commit a1532ed`: https://github.com/llvm/llvm-project/commit/a1532ed27582038e2d9588108ba0fe8237f01844 211 212 .. note:: 213 ``CounterPtr`` might represent a different value for non-IRPGO use case. For 214 example, for `binary profile correlation`_, it represents the absolute address of counter. 215 When in doubt, check source code. 216 217.. _`BitmapPtr`: 218 219``BitmapPtr`` 220 The in-memory address difference between profile data and the start address of 221 corresponding bitmap. 222 223 .. note:: 224 Similar to `CounterPtr`_, this field may represent a different value for non-IRPGO use case. 225 226``FunctionPointer`` 227 Records the function address when instrumented binary runs. This is used to 228 map the profiled callee address of indirect calls to the ``NameRef`` during 229 conversion from raw to indexed profiles. 230 231``Values`` 232 Represents value profiles in a two dimensional array. The number of elements 233 in the first dimension is the number of instrumented value sites across all 234 kinds. Each element in the first dimension is the head of a linked list, and 235 the each element in the second dimension is linked list element, carrying 236 ``<profiled-value, count>`` as payload. This is used by compiler runtime when 237 writing out value profiles. 238 239 .. note:: 240 Value profiling is supported by frontend and IR PGO instrumentation, 241 but it's not supported in all cases (e.g., `lightweight instrumentation`_). 242 243``NumCounters`` 244 The number of counters for the instrumented function. 245 246``NumValueSites`` 247 This is an array of counters, and each counter represents the number of 248 instrumented sites for a kind of value in the function. 249 250``NumBitmapBytes`` 251 The number of bitmap bytes for the function. 252 253.. _`counter`: 254 255Profile Counters 256^^^^^^^^^^^^^^^^^ 257 258For PGO [4]_, the counters within an instrumented function of a specific `FuncHash`_ 259are stored contiguously and in an order that is consistent with instrumentation points selection. 260 261.. _calculation-of-counter-offset: 262 263As mentioned above, the recorded counter offset is relative to the profile metadata. 264So how are function counters located in the raw profile data? 265 266Basically, the profile reader iterates profile metadata (from the `profile metadata`_ 267section) and makes use of the recorded relative distances, as illustrated below. 268 269:: 270 271 + --> start(__llvm_prf_data) --> +---------------------+ ------------+ 272 | | Data 1 | | 273 | +---------------------+ =====|| | 274 | | Data 2 | || | 275 | +---------------------+ || | 276 | | ... | || | 277 Counter| +---------------------+ || | 278 Delta | | Data N | || | 279 | +---------------------+ || | CounterPtr1 280 | || | 281 | CounterPtr2 || | 282 | || | 283 | || | 284 + --> start(__llvm_prf_cnts) --> +---------------------+ || | 285 | ... | || | 286 +---------------------+ -----||----+ 287 | Counter for | || 288 | Data 1 | || 289 +---------------------+ || 290 | ... | || 291 +---------------------+ =====|| 292 | Counter for | 293 | Data 2 | 294 +---------------------+ 295 | ... | 296 +---------------------+ 297 | Counter for | 298 | Data N | 299 +---------------------+ 300 301 302In the graph, 303 304* The profile header records ``CounterDelta`` with the value as ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``. 305 We will call it ``CounterDeltaInitVal`` below for convenience. 306* For each profile data record ``ProfileDataN``, ``CounterPtr`` is recorded as 307 ``start(CounterN) - start(ProfileDataN)``, where ``ProfileDataN`` is the N-th 308 entry in ``__llvm_prf_data``, and ``CounterN`` represents the corresponding 309 profile counters. 310 311Each time the reader advances to the next data record, it `updates`_ ``CounterDelta`` 312to minus the size of one ``ProfileData``. 313 314.. _`updates`: https://github.com/llvm/llvm-project/blob/17ff25a58ee4f29816d932fdb75f0d305718069f/llvm/include/llvm/ProfileData/InstrProfReader.h#L439-L444 315 316For the counter corresponding to the first data record, the byte offset 317relative to the start of the counter section is calculated as ``CounterPtr1 - CounterDeltaInitVal``. 318When profile reader advances to the second data record, note ``CounterDelta`` 319is updated to ``CounterDeltaInitVal - sizeof(ProfileData)``. 320Thus the byte offset relative to the start of the counter section is calculated 321as ``CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))``. 322 323.. _`bitmap`: 324 325Bitmap 326^^^^^^^ 327This section is used for source-based `Modified Condition/Decision Coverage`_ code coverage. Check out `Bitmap RFC`_ 328for the design. 329 330.. _`Modified Condition/Decision Coverage`: https://en.wikipedia.org/wiki/Modified_condition/decision_coverage 331.. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244 332 333.. _`function names`: 334 335Names 336^^^^^^ 337 338This section contains possibly compressed concatenated string of functions' PGO 339names. If compressed, zlib library is used. 340 341Function names serve as keys in the PGO data hash table when raw profiles are 342converted into indexed profiles. They are also crucial for ``llvm-profdata`` to 343show the profiles in a human-readable way. 344 345Virtual Table Profile Data 346^^^^^^^^^^^^^^^^^^^^^^^^^^^ 347 348This section is used for `type profiling`_. Each entry corresponds to one virtual 349table and is defined by the following C++ struct 350 351.. code-block:: c++ 352 353 struct VTableProfData { 354 // The start address of the vtable, collected at runtime. 355 uint64_t StartAddress; 356 // The byte size of the vtable. `StartAddress` and `ByteSize` specifies an address range to look up. 357 uint32_t ByteSize; 358 // The hash of vtable's (PGO) name 359 uint64_t MD5HashOfName; 360 }; 361 362At profile use time, the compiler looks up a profiled address in the sorted vtable 363address ranges and maps the address to a specific vtable through hashed name. 364 365Virtual Table Names 366^^^^^^^^^^^^^^^^^^^^ 367 368This section is similar to `function names`_ section above, except it contains the PGO 369names of profiled virtual tables. It's a standalone section such that raw profile 370readers could directly find each name set by accessing the corresponding profile 371data section. 372 373This section is stored in raw profiles such that `llvm-profdata` could show the 374profiles in a human-readable way. 375 376Value Profile Data 377^^^^^^^^^^^^^^^^^^^^ 378 379This section contains the profile data for value profiling. 380 381The value profiles corresponding to a profile metadata are serialized contiguously 382as one record, and value profile records are stored in the same order as the 383respective profile data, such that a raw profile reader `advances`_ the pointer to 384profile data and the pointer to value profile records simutaneously [5]_ to find 385value profiles for a per function, per `FuncHash`_ profile data. 386 387.. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457 388 389Indexed Profile Format 390=========================== 391 392Indexed profiles are generated from ``llvm-profdata``. In the indexed profiles, 393function data are organized as on-disk hash table such that compilers can 394look up profile data for functions in an IR module. 395 396Compilers and tools must retain backward compatibility with indexed profiles. 397That is, a tool or a compiler built at newer versions of code must understand 398profiles generated by older tools or compilers. 399 400General Storage Layout 401----------------------- 402 403The ASCII art depicts the general storage layout of indexed profiles. 404Specifically, the indexed profile header describes the byte offset of individual 405payload sections. 406 407:: 408 409 +-----------------------+---+ 410 | Magic | | 411 +-----------------------+ | 412 | Version | | 413 +-----------------------+ | 414 | HashType | H 415 +-----------------------+ E 416 | Byte Offset | A 417 +------ | of section A | D 418 | +-----------------------+ E 419 | | Byte Of fset | R 420 +-----------| of section B | | 421 | | +-----------------------+ | 422 | | | ... | | 423 | | +-----------------------+ | 424 | | | Byte Offset | | 425 +---------------| of section Z | | 426 | | | +-----------------------+---+ 427 | | | | Profile Summary | | 428 | | | +-----------------------+ P 429 | | +------>| Section A | A 430 | | +-----------------------+ Y 431 | +---------->| Section B | L 432 | +-----------------------+ O 433 | | ... | A 434 | +-----------------------+ D 435 +-------------->| Section Z | | 436 +-----------------------+---+ 437 438.. note:: 439 440 Profile summary section is at the beginning of payload. It's right after the 441 header so its position is implicitly known after reading the header. 442 443Header 444-------- 445 446The `Header struct`_ is the source of truth and struct fields should explain 447what's in the header. At a high level, `*Offset` fields record section byte 448offsets, which are used by readers to locate interesting sections and skip 449uninteresting ones. 450 451.. note:: 452 453 To maintain backward compatibility of the indexed profiles, existing fields 454 shouldn't be deleted from struct definition; the field order shouldn't be 455 modified. New fields should be appended. 456 457.. _`Header struct`: https://github.com/llvm/llvm-project/blob/1a2960bab6381f2b288328e2371829b460ac020c/llvm/include/llvm/ProfileData/InstrProf.h#L1053-L1080 458 459 460Payload Sections 461------------------ 462 463(CS) Profile Summary 464^^^^^^^^^^^^^^^^^^^^^ 465This section is right after profile header. It stores the serialized profile 466summary. For context-sensitive IR-based instrumentation PGO, this section stores 467an additional profile summary corresponding to the context-sensitive profiles. 468 469.. _`function data`: 470 471Function data 472^^^^^^^^^^^^^^^^^^ 473This section stores functions and their profiling data as an on-disk hash table. 474Profile data for functions with the same name are grouped together and share one 475hash table entry (the functions may come from different shared libraries for 476instance). The profile data for them are organized as a sequence of key-value 477pair where the key is `FuncHash`_, and the value is profiled information (represented 478by `InstrProfRecord`_) for the function. 479 480.. _`InstrProfRecord`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/llvm/include/llvm/ProfileData/InstrProf.h#L693 481 482MemProf Profile data 483^^^^^^^^^^^^^^^^^^^^^^ 484This section stores function's memory profiling data. See 485`MemProf binary serialization format RFC`_ for the design. 486 487.. _`MemProf binary serialization format RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html 488 489Binary Ids 490^^^^^^^^^^^^^^^^^^^^^^ 491The section is used to carry on `binary id`_ information from raw profiles. 492 493Temporal Profile Traces 494^^^^^^^^^^^^^^^^^^^^^^^^ 495The section is used to carry on temporal profile information from raw profiles. 496See `temporal profiling`_ for the design. 497 498Virtual Table Names 499^^^^^^^^^^^^^^^^^^^^ 500This section is used to store the names of vtables from raw profile in the indexed 501profile. 502 503Unlike function names which are stored as keys of `function data`_ hash table, 504vtable names need to be stored in a standalone section in indexed profiles. 505This way, `llvm-profdata` could show the profiled vtable information in a 506human-readable way. 507 508Profile Data Usage 509======================================= 510 511``llvm-profdata`` is the command line tool to display and process instrumentation- 512based profile data. For supported usages, check out `llvm-profdata documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_. 513 514.. [1] For usage, see https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation 515.. [2] For example, IR-based instrumentation supports `lightweight instrumentation`_ 516 and `temporal profiling`_. Frontend instrumentation could support `single-byte counters`_. 517.. [3] A raw profile file could contain the concatenation of multiple raw 518 profiles, for example, from an executable and its shared libraries. Raw 519 profile reader could parse all raw profiles from the file correctly. 520.. [4] The counter section is used by a few variant types (like temporal 521 profiling) and might have different semantics there. 522.. [5] The step size of data pointer is the ``sizeof(ProfileData)``, and the step 523 size of value profile pointer is calcuated based on the number of collected 524 values. 525 526.. _`lightweight instrumentation`: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 527.. _`temporal profiling`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068 528.. _`single-byte counters`: https://discourse.llvm.org/t/rfc-single-byte-counters-for-source-based-code-coverage/75685 529.. _`binary profile correlation`: https://discourse.llvm.org/t/rfc-add-binary-profile-correlation-to-not-load-profile-metadata-sections-into-memory-at-runtime/74565 530.. _`binary id`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html 531.. _`type profiling`: https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600 532