xref: /llvm-project/llvm/docs/InstrProfileFormat.rst (revision 07a1fbe91a4fb7df986eedd52ca90f78bc19c657)
1===================================
2Instrumentation Profile Format
3===================================
4
5.. contents::
6   :local:
7
8
9Overview
10=========
11
12Clang supports two types of profiling via instrumentation [1]_: frontend-based
13and IR-based, and both could support a variety of use cases [2]_ .
14This document describes two binary serialization formats (raw and indexed) to
15store instrumented profiles with a specific emphasis on IRPGO use case, in the
16sense that when specific header fields and payload sections have different ways
17of interpretation across use cases, the documentation is based on IRPGO.
18
19.. note::
20  Frontend-generated profiles are used together with coverage mapping for
21  `source-based code coverage`_. The `coverage mapping format`_ is different from
22  profile format.
23
24.. _`source-based code coverage`: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
25.. _`coverage mapping format`: https://llvm.org/docs/CoverageMappingFormat.html
26
27Raw Profile Format
28===================
29
30The raw profile is generated by running the instrumented binary. The raw profile
31data from an executable or a shared library [3]_ consists of a header and
32multiple sections, with each section as a memory dump. The raw profile data needs
33to be reasonably compact and fast to generate.
34
35There are no backward or forward version compatiblity guarantees for the raw profile
36format. That is, compilers and tools `require`_ a specific raw profile version
37to parse the profiles.
38
39.. _`require`: https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558
40
41To feed profiles back into compilers for an optimized build (e.g., via
42``-fprofile-use`` for IR instrumentation), a raw profile must to be converted into
43indexed format.
44
45General Storage Layout
46-----------------------
47
48The storage layout of raw profile data format is illustrated below. Basically,
49when the raw profile is read into an memory buffer, the actual byte offset of a
50section is inferred from the section's order in the layout and size information
51of all the sections ahead of it.
52
53::
54
55  +----+-----------------------+
56  |    |        Magic          |
57  |    +-----------------------+
58  |    |        Version        |
59  |    +-----------------------+
60  H    |   Size Info for       |
61  E    |      Section 1        |
62  A    +-----------------------+
63  D    |   Size Info for       |
64  E    |      Section 2        |
65  R    +-----------------------+
66  |    |          ...          |
67  |    +-----------------------+
68  |    |   Size Info for       |
69  |    |      Section N        |
70  +----+-----------------------+
71  P    |       Section 1       |
72  A    +-----------------------+
73  Y    |       Section 2       |
74  L    +-----------------------+
75  O    |          ...          |
76  A    +-----------------------+
77  D    |       Section N       |
78  +----+-----------------------+
79
80
81.. note::
82   Sections might be padded to meet specific alignment requirements. For
83   simplicity, header fields and data sections solely for padding purpose are
84   omitted in the data layout graph above and the rest of this document.
85
86Header
87-------
88
89``Magic``
90  Magic number encodes profile format (raw, indexed or text). For the raw format,
91  the magic number also encodes the endianness (big or little) and C pointer
92  size (4 or 8 bytes) of the platform on which the profile is generated.
93
94  A factory method reads the magic number to construct reader properly and returns
95  error upon unrecognized format. Specifically, the factory method and raw profile
96  reader implementation make sure that a raw profile file could be read back on
97  a platform with the opposite endianness and/or the other C pointer size.
98
99``Version``
100  The lower 32 bits specify the actual version and the most significant 32 bits
101  specify the variant types of the profile. IR-based instrumentation PGO and
102  context-sensitive IR-based instrumentation PGO are two variant types.
103
104``BinaryIdsSize``
105  The byte size of `binary id`_ section.
106
107``NumData``
108  The number of profile metadata. The byte size of `profile metadata`_ section
109  could be computed with this field.
110
111``NumCounter``
112  The number of entries in the profile counter section. The byte size of `counter`_
113  section could be computed with this field.
114
115``NumBitmapBytes``
116  The number of bytes in the profile `bitmap`_ section.
117
118``NamesSize``
119  The number of bytes in the name section.
120
121.. _`CountersDelta`:
122
123``CountersDelta``
124  This field records the in-memory address difference between the `profile metadata`_
125  and counter section in the instrumented binary, i.e., ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``.
126
127  It's used jointly with the `CounterPtr`_ field to compute the counter offset
128  relative to ``start(__llvm_prf_cnts)``. Check out calculation-of-counter-offset_
129  for a visualized explanation.
130
131  .. note::
132    The ``__llvm_prf_data`` object file section might not be loaded into memory
133    when instrumented binary runs or might not get generated in the instrumented
134    binary in the first place. In those cases, ``CountersDelta`` is not used and
135    other mechanisms are used to match counters with instrumented code. See
136    `lightweight instrumentation`_ and `binary profile correlation`_ for examples.
137
138``BitmapDelta``
139  This field records the in-memory address difference between the `profile metadata`_
140  and bitmap section in the instrumented binary, i.e., ``start(__llvm_prf_bits) - start(__llvm_prf_data)``.
141
142  It's used jointly with the `BitmapPtr`_ to find the bitmap of a profile data
143  record, in a similar way to how counters are referenced as explained by
144  calculation-of-counter-offset_ .
145
146  Similar to `CountersDelta`_ field, this field may not be used in non-PGO variants
147  of profiles.
148
149``NamesDelta``
150  Records the in-memory address of name section. Not used except for raw profile
151  reader error checking.
152
153``NumVTables``
154  Records the number of instrumented vtable entries in the binary. Used for
155  `type profiling`_.
156
157``VNamesSize``
158  Records the byte size in the virtual table names section. Used for `type profiling`_.
159
160``ValueKindLast``
161  Records the number of value kinds. Macro `VALUE_PROF_KIND`_ defines the value
162  kinds with a description of the kind.
163
164.. _`VALUE_PROF_KIND`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/compiler-rt/include/profile/InstrProfData.inc#L184-L186
165
166Payload Sections
167------------------
168
169Binary Ids
170^^^^^^^^^^^
171Stores the binary ids of the instrumented binaries to associate binaries with
172profiles for source code coverage. See `binary id`_ RFC for the design.
173
174.. _`profile metadata`:
175
176Profile Metadata
177^^^^^^^^^^^^^^^^^^
178
179This section stores the metadata to map counters and value profiles back to
180instrumented code regions (e.g., LLVM IR for IRPGO).
181
182The in-memory representation of the metadata is `__llvm_profile_data`_.
183Some fields are used to reference data from other sections in the profile.
184The fields are documented as follows:
185
186.. _`__llvm_profile_data`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/include/profile/InstrProfData.inc#L65-L95
187
188``NameRef``
189  The MD5 of the function's PGO name. PGO name has the format
190  ``[<filepath><delimiter>]<mangled-name>`` where ``<filepath>`` and
191  ``<delimiter>`` are provided for local-linkage functions to tell possibly
192  identical functions.
193
194.. _FuncHash:
195
196``FuncHash``
197  A checksum of the function's IR, taking control flow graph and instrumented
198  value sites into accounts. See `computeCFGHash`_ for details.
199
200.. _`computeCFGHash`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685
201
202.. _`CounterPtr`:
203
204``CounterPtr``
205  The in-memory address difference between profile data and the start of corresponding
206  counters. Counter position is stored this way (as a link-time constant) to reduce
207  instrumented binary size compared with snapshotting the address of symbols directly.
208  See `commit a1532ed`_ for further information.
209
210.. _`commit a1532ed`: https://github.com/llvm/llvm-project/commit/a1532ed27582038e2d9588108ba0fe8237f01844
211
212  .. note::
213    ``CounterPtr`` might represent a different value for non-IRPGO use case. For
214    example, for `binary profile correlation`_, it represents the absolute address of counter.
215    When in doubt, check source code.
216
217.. _`BitmapPtr`:
218
219``BitmapPtr``
220  The in-memory address difference between profile data and the start address of
221  corresponding bitmap.
222
223  .. note::
224    Similar to `CounterPtr`_, this field may represent a different value for non-IRPGO use case.
225
226``FunctionPointer``
227  Records the function address when instrumented binary runs. This is used to
228  map the profiled callee address of indirect calls to the ``NameRef`` during
229  conversion from raw to indexed profiles.
230
231``Values``
232  Represents value profiles in a two dimensional array. The number of elements
233  in the first dimension is the number of instrumented value sites across all
234  kinds. Each element in the first dimension is the head of a linked list, and
235  the each element in the second dimension is linked list element, carrying
236  ``<profiled-value, count>`` as payload. This is used by compiler runtime when
237  writing out value profiles.
238
239  .. note::
240    Value profiling is supported by frontend and IR PGO instrumentation,
241    but it's not supported in all cases (e.g., `lightweight instrumentation`_).
242
243``NumCounters``
244  The number of counters for the instrumented function.
245
246``NumValueSites``
247  This is an array of counters, and each counter represents the number of
248  instrumented sites for a kind of value in the function.
249
250``NumBitmapBytes``
251  The number of bitmap bytes for the function.
252
253.. _`counter`:
254
255Profile Counters
256^^^^^^^^^^^^^^^^^
257
258For PGO [4]_, the counters within an instrumented function of a specific `FuncHash`_
259are stored contiguously and in an order that is consistent with instrumentation points selection.
260
261.. _calculation-of-counter-offset:
262
263As mentioned above, the recorded counter offset is relative to the profile metadata.
264So how are function counters located in the raw profile data?
265
266Basically, the profile reader iterates profile metadata (from the `profile metadata`_
267section) and makes use of the recorded relative distances, as illustrated below.
268
269::
270
271        + --> start(__llvm_prf_data) --> +---------------------+ ------------+
272        |                                |       Data 1        |             |
273        |                                +---------------------+  =====||    |
274        |                                |       Data 2        |       ||    |
275        |                                +---------------------+       ||    |
276        |                                |        ...          |       ||    |
277 Counter|                                +---------------------+       ||    |
278  Delta |                                |       Data N        |       ||    |
279        |                                +---------------------+       ||    |   CounterPtr1
280        |                                                              ||    |
281        |                                              CounterPtr2     ||    |
282        |                                                              ||    |
283        |                                                              ||    |
284        + --> start(__llvm_prf_cnts) --> +---------------------+       ||    |
285                                         |        ...          |       ||    |
286                                         +---------------------+  -----||----+
287                                         |    Counter for      |       ||
288                                         |       Data 1        |       ||
289                                         +---------------------+       ||
290                                         |        ...          |       ||
291                                         +---------------------+  =====||
292                                         |    Counter for      |
293                                         |       Data 2        |
294                                         +---------------------+
295                                         |        ...          |
296                                         +---------------------+
297                                         |    Counter for      |
298                                         |       Data N        |
299                                         +---------------------+
300
301
302In the graph,
303
304* The profile header records ``CounterDelta`` with the value as ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``.
305  We will call it ``CounterDeltaInitVal`` below for convenience.
306* For each profile data record ``ProfileDataN``, ``CounterPtr`` is recorded as
307  ``start(CounterN) - start(ProfileDataN)``, where ``ProfileDataN`` is the N-th
308  entry in ``__llvm_prf_data``, and ``CounterN`` represents the corresponding
309  profile counters.
310
311Each time the reader advances to the next data record, it `updates`_ ``CounterDelta``
312to minus the size of one ``ProfileData``.
313
314.. _`updates`: https://github.com/llvm/llvm-project/blob/17ff25a58ee4f29816d932fdb75f0d305718069f/llvm/include/llvm/ProfileData/InstrProfReader.h#L439-L444
315
316For the counter corresponding to the first data record, the byte offset
317relative to the start of the counter section is calculated as ``CounterPtr1 - CounterDeltaInitVal``.
318When profile reader advances to the second data record, note ``CounterDelta``
319is updated to ``CounterDeltaInitVal - sizeof(ProfileData)``.
320Thus the byte offset relative to the start of the counter section is calculated
321as ``CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))``.
322
323.. _`bitmap`:
324
325Bitmap
326^^^^^^^
327This section is used for source-based `Modified Condition/Decision Coverage`_ code coverage. Check out `Bitmap RFC`_
328for the design.
329
330.. _`Modified Condition/Decision Coverage`: https://en.wikipedia.org/wiki/Modified_condition/decision_coverage
331.. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244
332
333.. _`function names`:
334
335Names
336^^^^^^
337
338This section contains possibly compressed concatenated string of functions' PGO
339names. If compressed, zlib library is used.
340
341Function names serve as keys in the PGO data hash table when raw profiles are
342converted into indexed profiles. They are also crucial for ``llvm-profdata`` to
343show the profiles in a human-readable way.
344
345Virtual Table Profile Data
346^^^^^^^^^^^^^^^^^^^^^^^^^^^
347
348This section is used for `type profiling`_. Each entry corresponds to one virtual
349table and is defined by the following C++ struct
350
351.. code-block:: c++
352
353  struct VTableProfData {
354    // The start address of the vtable, collected at runtime.
355    uint64_t StartAddress;
356    // The byte size of the vtable. `StartAddress` and `ByteSize` specifies an address range to look up.
357    uint32_t ByteSize;
358    // The hash of vtable's (PGO) name
359    uint64_t MD5HashOfName;
360  };
361
362At profile use time, the compiler looks up a profiled address in the sorted vtable
363address ranges and maps the address to a specific vtable through hashed name.
364
365Virtual Table Names
366^^^^^^^^^^^^^^^^^^^^
367
368This section is similar to `function names`_ section above, except it contains the PGO
369names of profiled virtual tables. It's a standalone section such that raw profile
370readers could directly find each name set by accessing the corresponding profile
371data section.
372
373This section is stored in raw profiles such that `llvm-profdata` could show the
374profiles in a human-readable way.
375
376Value Profile Data
377^^^^^^^^^^^^^^^^^^^^
378
379This section contains the profile data for value profiling.
380
381The value profiles corresponding to a profile metadata are serialized contiguously
382as one record, and value profile records are stored in the same order as the
383respective profile data, such that a raw profile reader `advances`_ the pointer to
384profile data and the pointer to value profile records simutaneously [5]_ to find
385value profiles for a per function, per `FuncHash`_ profile data.
386
387.. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457
388
389Indexed Profile Format
390===========================
391
392Indexed profiles are generated from ``llvm-profdata``. In the indexed profiles,
393function data are organized as on-disk hash table such that compilers can
394look up profile data for functions in an IR module.
395
396Compilers and tools must retain backward compatibility with indexed profiles.
397That is, a tool or a compiler built at newer versions of code must understand
398profiles generated by older tools or compilers.
399
400General Storage Layout
401-----------------------
402
403The ASCII art depicts the general storage layout of indexed profiles.
404Specifically, the indexed profile header describes the byte offset of individual
405payload sections.
406
407::
408
409                            +-----------------------+---+
410                            |        Magic          |   |
411                            +-----------------------+   |
412                            |        Version        |   |
413                            +-----------------------+   |
414                            |        HashType       |   H
415                            +-----------------------+   E
416                            |       Byte Offset     |   A
417                    +------ |      of section A     |   D
418                    |       +-----------------------+   E
419                    |       |       Byte Of fset    |   R
420                +-----------|      of section B     |   |
421                |   |       +-----------------------+   |
422                |   |       |         ...           |   |
423                |   |       +-----------------------+   |
424                |   |       |      Byte Offset      |   |
425            +---------------|     of section Z      |   |
426            |   |   |       +-----------------------+---+
427            |   |   |       |    Profile Summary    |   |
428            |   |   |       +-----------------------+   P
429            |   |   +------>|      Section A        |   A
430            |   |           +-----------------------+   Y
431            |   +---------->|      Section B        |   L
432            |               +-----------------------+   O
433            |               |         ...           |   A
434            |               +-----------------------+   D
435            +-------------->|      Section Z        |   |
436                            +-----------------------+---+
437
438.. note::
439
440  Profile summary section is at the beginning of payload. It's right after the
441  header so its position is implicitly known after reading the header.
442
443Header
444--------
445
446The `Header struct`_ is the source of truth and struct fields should explain
447what's in the header. At a high level, `*Offset` fields record section byte
448offsets, which are used by readers to locate interesting sections and skip
449uninteresting ones.
450
451.. note::
452
453  To maintain backward compatibility of the indexed profiles, existing fields
454  shouldn't be deleted from struct definition; the field order shouldn't be
455  modified. New fields should be appended.
456
457.. _`Header struct`: https://github.com/llvm/llvm-project/blob/1a2960bab6381f2b288328e2371829b460ac020c/llvm/include/llvm/ProfileData/InstrProf.h#L1053-L1080
458
459
460Payload Sections
461------------------
462
463(CS) Profile Summary
464^^^^^^^^^^^^^^^^^^^^^
465This section is right after profile header. It stores the serialized profile
466summary. For context-sensitive IR-based instrumentation PGO, this section stores
467an additional profile summary corresponding to the context-sensitive profiles.
468
469.. _`function data`:
470
471Function data
472^^^^^^^^^^^^^^^^^^
473This section stores functions and their profiling data as an on-disk hash table.
474Profile data for functions with the same name are grouped together and share one
475hash table entry (the functions may come from different shared libraries for
476instance). The profile data for them are organized as a sequence of key-value
477pair where the key is `FuncHash`_, and the value is profiled information (represented
478by `InstrProfRecord`_) for the function.
479
480.. _`InstrProfRecord`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/llvm/include/llvm/ProfileData/InstrProf.h#L693
481
482MemProf Profile data
483^^^^^^^^^^^^^^^^^^^^^^
484This section stores function's memory profiling data. See
485`MemProf binary serialization format RFC`_ for the design.
486
487.. _`MemProf binary serialization format RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
488
489Binary Ids
490^^^^^^^^^^^^^^^^^^^^^^
491The section is used to carry on `binary id`_ information from raw profiles.
492
493Temporal Profile Traces
494^^^^^^^^^^^^^^^^^^^^^^^^
495The section is used to carry on temporal profile information from raw profiles.
496See `temporal profiling`_ for the design.
497
498Virtual Table Names
499^^^^^^^^^^^^^^^^^^^^
500This section is used to store the names of vtables from raw profile in the indexed
501profile.
502
503Unlike function names which are stored as keys of `function data`_ hash table,
504vtable names need to be stored in a standalone section in indexed profiles.
505This way, `llvm-profdata` could show the profiled vtable information in a
506human-readable way.
507
508Profile Data Usage
509=======================================
510
511``llvm-profdata`` is the command line tool to display and process instrumentation-
512based profile data. For supported usages, check out `llvm-profdata documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.
513
514.. [1] For usage, see https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation
515.. [2] For example, IR-based instrumentation supports `lightweight instrumentation`_
516   and `temporal profiling`_. Frontend instrumentation could support `single-byte counters`_.
517.. [3] A raw profile file could contain the concatenation of multiple raw
518   profiles, for example, from an executable and its shared libraries. Raw
519   profile reader could parse all raw profiles from the file correctly.
520.. [4] The counter section is used by a few variant types (like temporal
521   profiling) and might have different semantics there.
522.. [5] The step size of data pointer is the ``sizeof(ProfileData)``, and the step
523   size of value profile pointer is calcuated based on the number of collected
524   values.
525
526.. _`lightweight instrumentation`: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
527.. _`temporal profiling`:  https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
528.. _`single-byte counters`: https://discourse.llvm.org/t/rfc-single-byte-counters-for-source-based-code-coverage/75685
529.. _`binary profile correlation`: https://discourse.llvm.org/t/rfc-add-binary-profile-correlation-to-not-load-profile-metadata-sections-into-memory-at-runtime/74565
530.. _`binary id`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html
531.. _`type profiling`: https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
532