xref: /llvm-project/bolt/docs/CommandLineArgumentReference.md (revision 62c39d773422fd7193758c325085c864a67a55cc)
1# BOLT - a post-link optimizer developed to speed up large applications
2
3## SYNOPSIS
4
5`llvm-bolt <executable> [-o outputfile] <executable>.bolt [-data=perf.fdata] [options]`
6
7## OPTIONS
8
9### Generic options:
10
11- `-h`
12
13  Alias for --help
14
15- `--help`
16
17  Display available options (--help-hidden for more)
18
19- `--help-hidden`
20
21  Display all available options
22
23- `--help-list`
24
25  Display list of available options (--help-list-hidden for more)
26
27- `--help-list-hidden`
28
29  Display list of all available options
30
31- `--version`
32
33  Display the version of this program
34
35### Output options:
36
37- `--bolt-info`
38
39  Write bolt info section in the output binary
40
41- `-o <string>`
42
43  output file
44
45- `-w <string>`
46
47  Save recorded profile to a file
48
49### BOLT generic options:
50
51- `--align-text=<uint>`
52
53  Alignment of .text section
54
55- `--allow-stripped`
56
57  Allow processing of stripped binaries
58
59- `--alt-inst-feature-size=<uint>`
60
61  Size of feature field in .altinstructions
62
63- `--alt-inst-has-padlen`
64
65  Specify that .altinstructions has padlen field
66
67- `--asm-dump[=<dump folder>]`
68
69  Dump function into assembly
70
71- `-b`
72
73  Alias for -data
74
75- `--bolt-id=<string>`
76
77  Add any string to tag this execution in the output binary via bolt info section
78
79- `--break-funcs=<func1,func2,func3,...>`
80
81  List of functions to core dump on (debugging)
82
83- `--check-encoding`
84
85  Perform verification of LLVM instruction encoding/decoding. Every instruction
86  in the input is decoded and re-encoded. If the resulting bytes do not match
87  the input, a warning message is printed.
88
89- `--comp-dir-override=<string>`
90
91  Overrides DW_AT_comp_dir, and provides an alternative base location, which is
92  used with DW_AT_dwo_name to construct a path to *.dwo files.
93
94- `--create-debug-names-section`
95
96  Creates .debug_names section, if the input binary doesn't have it already, for
97  DWARF5 CU/TUs.
98
99- `--cu-processing-batch-size=<uint>`
100
101  Specifies the size of batches for processing CUs. Higher number has better
102  performance, but more memory usage. Default value is 1.
103
104- `--data=<string>`
105
106  data file
107
108- `--data2=<string>`
109
110  data file
111
112- `--debug-skeleton-cu`
113
114  Prints out offsets for abbrev and debug_info of Skeleton CUs that get patched.
115
116- `--debug-thread-count=<uint>`
117
118  Specifies the number of threads to be used when processing DWO debug information.
119
120- `--dot-tooltip-code`
121
122  Add basic block instructions as tool tips on nodes
123
124- `--dump-alt-instructions`
125
126  Dump Linux alternative instructions info
127
128- `--dump-cg=<string>`
129
130  Dump callgraph to the given file
131
132- `--dump-data`
133
134  Dump parsed bolt data for debugging
135
136- `--dump-dot-all`
137
138  Dump function CFGs to graphviz format after each stage;enable '-print-loops'
139  for color-coded blocks
140
141- `--dump-linux-exceptions`
142
143  Dump Linux kernel exception table
144
145- `--dump-orc`
146
147  Dump raw ORC unwind information (sorted)
148
149- `--dump-para-sites`
150
151  Dump Linux kernel paravitual patch sites
152
153- `--dump-pci-fixups`
154
155  Dump Linux kernel PCI fixup table
156
157- `--dump-smp-locks`
158
159  Dump Linux kernel SMP locks
160
161- `--dump-static-calls`
162
163  Dump Linux kernel static calls
164
165- `--dump-static-keys`
166
167  Dump Linux kernel static keys jump table
168
169- `--dwarf-output-path=<string>`
170
171  Path to where .dwo files or dwp file will be written out to.
172
173- `--dwp=<string>`
174
175  Path and name to DWP file.
176
177- `--dyno-stats`
178
179  Print execution info based on profile
180
181- `--dyno-stats-all`
182
183  Print dyno stats after each stage
184
185- `--dyno-stats-scale=<uint>`
186
187  Scale to be applied while reporting dyno stats
188
189- `--enable-bat`
190
191  Write BOLT Address Translation tables
192
193- `--force-data-relocations`
194
195  Force relocations to data sections to always be processed
196
197- `--force-patch`
198
199  Force patching of original entry points
200
201- `--funcs=<func1,func2,func3,...>`
202
203  Limit optimizations to functions from the list
204
205- `--funcs-file=<string>`
206
207  File with list of functions to optimize
208
209- `--funcs-file-no-regex=<string>`
210
211  File with list of functions to optimize (non-regex)
212
213- `--funcs-no-regex=<func1,func2,func3,...>`
214
215  Limit optimizations to functions from the list (non-regex)
216
217- `--hot-data`
218
219  Hot data symbols support (relocation mode)
220
221- `--hot-functions-at-end`
222
223  If reorder-functions is used, order functions putting hottest last
224
225- `--hot-text`
226
227  Generate hot text symbols. Apply this option to a precompiled binary that
228  manually calls into hugify, such that at runtime hugify call will put hot code
229  into 2M pages. This requires relocation.
230
231- `--hot-text-move-sections=<sec1,sec2,sec3,...>`
232
233  List of sections containing functions used for hugifying hot text. BOLT makes
234  sure these functions are not placed on the same page as the hot text.
235  (default='.stub,.mover').
236
237- `--insert-retpolines`
238
239  Run retpoline insertion pass
240
241- `--keep-aranges`
242
243  Keep or generate .debug_aranges section if .gdb_index is written
244
245- `--keep-tmp`
246
247  Preserve intermediate .o file
248
249- `--lite`
250
251  Skip processing of cold functions
252
253- `--log-file=<string>`
254
255  Redirect journaling to a file instead of stdout/stderr
256
257- `--long-jump-labels`
258
259  Always use long jumps/nops for Linux kernel static keys
260
261- `--match-profile-with-function-hash`
262
263  Match profile with function hash
264
265- `--max-data-relocations=<uint>`
266
267  Maximum number of data relocations to process
268
269- `--max-funcs=<uint>`
270
271  Maximum number of functions to process
272
273- `--no-huge-pages`
274
275  Use regular size pages for code alignment
276
277- `--no-threads`
278
279  Disable multithreading
280
281- `--pad-funcs=<func1:pad1,func2:pad2,func3:pad3,...>`
282
283  List of functions to pad with amount of bytes
284
285- `--print-mappings`
286
287  Print mappings in the legend, between characters/blocks and text sections
288  (default false).
289
290
291- `--profile-format=<value>`
292
293  Format to dump profile output in aggregation mode, default is fdata
294  - `fdata`: offset-based plaintext format
295  - `yaml`: dense YAML representation
296
297- `--r11-availability=<value>`
298
299  Determine the availability of r11 before indirect branches
300  - `never`: r11 not available
301  - `always`: r11 available before calls and jumps
302  - `abi`: r11 available before calls but not before jumps
303
304- `--relocs`
305
306  Use relocations in the binary (default=autodetect)
307
308- `--remove-symtab`
309
310  Remove .symtab section
311
312- `--reorder-skip-symbols=<symbol1,symbol2,symbol3,...>`
313
314  List of symbol names that cannot be reordered
315
316- `--reorder-symbols=<symbol1,symbol2,symbol3,...>`
317
318  List of symbol names that can be reordered
319
320- `--retpoline-lfence`
321
322  Determine if lfence instruction should exist in the retpoline
323
324- `--skip-funcs=<func1,func2,func3,...>`
325
326  List of functions to skip
327
328- `--skip-funcs-file=<string>`
329
330  File with list of functions to skip
331
332- `--strict`
333
334  Trust the input to be from a well-formed source
335
336- `--tasks-per-thread=<uint>`
337
338  Number of tasks to be created per thread
339
340- `--terminal-trap`
341
342  Assume that execution stops at trap instruction
343
344- `--thread-count=<uint>`
345
346  Number of threads
347
348- `--top-called-limit=<uint>`
349
350  Maximum number of functions to print in top called functions section
351
352- `--trap-avx512`
353
354  In relocation mode trap upon entry to any function that uses AVX-512
355  instructions
356
357- `--trap-old-code`
358
359  Insert traps in old function bodies (relocation mode)
360
361- `--update-debug-sections`
362
363  Update DWARF debug sections of the executable
364
365- `--use-gnu-stack`
366
367  Use GNU_STACK program header for new segment (workaround for issues with
368  strip/objcopy)
369
370- `--use-old-text`
371
372  Re-use space in old .text if possible (relocation mode)
373
374- `-v <uint>`
375
376  Set verbosity level for diagnostic output
377
378- `--write-dwp`
379
380  Output a single dwarf package file (dwp) instead of multiple non-relocatable
381  dwarf object files (dwo).
382
383### BOLT optimization options:
384
385- `--align-blocks`
386
387  Align basic blocks
388
389- `--align-blocks-min-size=<uint>`
390
391  Minimal size of the basic block that should be aligned
392
393- `--align-blocks-threshold=<uint>`
394
395  Align only blocks with frequency larger than containing function execution
396  frequency specified in percent. E.g. 1000 means aligning blocks that are 10
397  times more frequently executed than the containing function.
398
399- `--align-functions=<uint>`
400
401  Align functions at a given value (relocation mode)
402
403- `--align-functions-max-bytes=<uint>`
404
405  Maximum number of bytes to use to align functions
406
407- `--assume-abi`
408
409  Assume the ABI is never violated
410
411- `--block-alignment=<uint>`
412
413  Boundary to use for alignment of basic blocks
414
415- `--bolt-seed=<uint>`
416
417  Seed for randomization
418
419- `--cg-from-perf-data`
420
421  Use perf data directly when constructing the call graph for stale functions
422
423- `--cg-ignore-recursive-calls`
424
425  Ignore recursive calls when constructing the call graph
426
427- `--cg-use-split-hot-size`
428
429  Use hot/cold data on basic blocks to determine hot sizes for call graph
430  functions
431
432- `--cold-threshold=<uint>`
433
434  Tenths of percents of main entry frequency to use as a threshold when
435  evaluating whether a basic block is cold (0 means it is only considered cold
436  if the block has zero samples). Default: 0
437
438- `--elim-link-veneers`
439
440  Run veneer elimination pass
441
442- `--eliminate-unreachable`
443
444  Eliminate unreachable code
445
446- `--equalize-bb-counts`
447
448  Use same count for BBs that should have equivalent count (used in non-LBR and
449  shrink wrapping)
450
451- `--execution-count-threshold=<uint>`
452
453  Perform profiling accuracy-sensitive optimizations only if function execution
454  count >= the threshold (default: 0)
455
456- `--fix-block-counts`
457
458  Adjust block counts based on outgoing branch counts
459
460- `--fix-func-counts`
461
462  Adjust function counts based on basic blocks execution count
463
464- `--force-inline=<func1,func2,func3,...>`
465
466  List of functions to always consider for inlining
467
468- `--frame-opt=<value>`
469
470  Optimize stack frame accesses
471  - `none`: do not perform frame optimization
472  - `hot`: perform FOP on hot functions
473  - `all`: perform FOP on all functions
474
475- `--frame-opt-rm-stores`
476
477  Apply additional analysis to remove stores (experimental)
478
479- `--function-order=<string>`
480
481  File containing an ordered list of functions to use for function reordering
482
483- `--generate-function-order=<string>`
484
485  File to dump the ordered list of functions to use for function reordering
486
487- `--generate-link-sections=<string>`
488
489  Generate a list of function sections in a format suitable for inclusion in a
490  linker script
491
492- `--group-stubs`
493
494  Share stubs across functions
495
496- `--hugify`
497
498  Automatically put hot code on 2MB page(s) (hugify) at runtime. No manual call
499  to hugify is needed in the binary (which is what --hot-text relies on).
500
501- `--icf=<value>`
502
503  Fold functions with identical code
504  - `all`:  Enable identical code folding
505  - `none`: Disable identical code folding (default)
506  - `safe`: Enable safe identical code folding
507
508- `--icp`
509
510  Alias for --indirect-call-promotion
511
512- `--icp-calls-remaining-percent-threshold=<uint>`
513
514  The percentage threshold against remaining unpromoted indirect call count for
515  the promotion for calls
516
517- `--icp-calls-topn`
518
519  Alias for --indirect-call-promotion-calls-topn
520
521- `--icp-calls-total-percent-threshold=<uint>`
522
523  The percentage threshold against total count for the promotion for calls
524
525- `--icp-eliminate-loads`
526
527  Enable load elimination using memory profiling data when performing ICP
528
529- `--icp-funcs=<func1,func2,func3,...>`
530
531  List of functions to enable ICP for
532
533- `--icp-inline`
534
535  Only promote call targets eligible for inlining
536
537- `--icp-jt-remaining-percent-threshold=<uint>`
538
539  The percentage threshold against remaining unpromoted indirect call count for
540  the promotion for jump tables
541
542- `--icp-jt-targets`
543
544  Alias for --icp-jump-tables-targets
545
546- `--icp-jt-topn`
547
548  Alias for --indirect-call-promotion-jump-tables-topn
549
550- `--icp-jt-total-percent-threshold=<uint>`
551
552  The percentage threshold against total count for the promotion for jump tables
553
554- `--icp-jump-tables-targets`
555
556  For jump tables, optimize indirect jmp targets instead of indices
557
558- `--icp-mp-threshold`
559
560  Alias for --indirect-call-promotion-mispredict-threshold
561
562- `--icp-old-code-sequence`
563
564  Use old code sequence for promoted calls
565
566- `--icp-top-callsites=<uint>`
567
568  Optimize hottest calls until at least this percentage of all indirect calls
569  frequency is covered. 0 = all callsites
570
571- `--icp-topn`
572
573  Alias for --indirect-call-promotion-topn
574
575- `--icp-use-mp`
576
577  Alias for --indirect-call-promotion-use-mispredicts
578
579- `--indirect-call-promotion=<value>`
580
581  Indirect call promotion
582  - `none`: do not perform indirect call promotion
583  - `calls`: perform ICP on indirect calls
584  - `jump-tables`: perform ICP on jump tables
585  - `all`: perform ICP on calls and jump tables
586
587- `--indirect-call-promotion-calls-topn=<uint>`
588
589  Limit number of targets to consider when doing indirect call promotion on
590  calls. 0 = no limit
591
592- `--indirect-call-promotion-jump-tables-topn=<uint>`
593
594  Limit number of targets to consider when doing indirect call promotion on jump
595  tables. 0 = no limit
596
597- `--indirect-call-promotion-topn=<uint>`
598
599  Limit number of targets to consider when doing indirect call promotion. 0 = no
600  limit
601
602- `--indirect-call-promotion-use-mispredicts`
603
604  Use misprediction frequency for determining whether or not ICP should be
605  applied at a callsite.  The -indirect-call-promotion-mispredict-threshold
606  value will be used by this heuristic
607
608- `--infer-fall-throughs`
609
610  Infer execution count for fall-through blocks
611
612- `--infer-stale-profile`
613
614  Infer counts from stale profile data.
615
616- `--inline-all`
617
618  Inline all functions
619
620- `--inline-ap`
621
622  Adjust function profile after inlining
623
624- `--inline-limit=<uint>`
625
626  Maximum number of call sites to inline
627
628- `--inline-max-iters=<uint>`
629
630  Maximum number of inline iterations
631
632- `--inline-memcpy`
633
634  Inline memcpy using 'rep movsb' instruction (X86-only)
635
636- `--inline-small-functions`
637
638  Inline functions if increase in size is less than defined by -inline-small-
639  functions-bytes
640
641- `--inline-small-functions-bytes=<uint>`
642
643  Max number of bytes for the function to be considered small for inlining
644  purposes
645
646- `--instrument`
647
648  Instrument code to generate accurate profile data
649
650- `--iterative-guess`
651
652  In non-LBR mode, guess edge counts using iterative technique
653
654- `--jt-footprint-optimize-for-icache`
655
656  With jt-footprint-reduction, only process PIC jumptables and turn off other
657  transformations that increase code size
658
659- `--jt-footprint-reduction`
660
661  Make jump tables size smaller at the cost of using more instructions at jump
662  sites
663
664- `--jump-tables=<value>`
665
666  Jump tables support (default=basic)
667  - `none`: do not optimize functions with jump tables
668  - `basic`: optimize functions with jump tables
669  - `move`: move jump tables to a separate section
670  - `split`: split jump tables section into hot and cold based on function
671  execution frequency
672  - `aggressive`: aggressively split jump tables section based on usage of the
673  tables
674
675- `--keep-nops`
676
677  Keep no-op instructions. By default they are removed.
678
679- `--lite-threshold-count=<uint>`
680
681  Similar to '-lite-threshold-pct' but specify threshold using absolute function
682  call count. I.e. limit processing to functions executed at least the specified
683  number of times.
684
685- `--lite-threshold-pct=<uint>`
686
687  Threshold (in percent) for selecting functions to process in lite mode. Higher
688  threshold means fewer functions to process. E.g threshold of 90 means only top
689  10 percent of functions with profile will be processed.
690
691- `--match-with-call-graph`
692
693  Match functions with call graph
694
695- `--memcpy1-spec=<func1,func2:cs1:cs2,func3:cs1,...>`
696
697  List of functions with call sites for which to specialize memcpy() for size 1
698
699- `--min-branch-clusters`
700
701  Use a modified clustering algorithm geared towards minimizing branches
702
703- `--name-similarity-function-matching-threshold=<uint>`
704
705  Match functions using namespace and edit distance.
706
707- `--no-inline`
708
709  Disable all inlining (overrides other inlining options)
710
711- `--no-scan`
712
713  Do not scan cold functions for external references (may result in slower binary)
714
715- `--peepholes=<value>`
716
717  Enable peephole optimizations
718  - `none`: disable peepholes
719  - `double-jumps`: remove double jumps when able
720  - `tailcall-traps`: insert tail call traps
721  - `useless-branches`: remove useless conditional branches
722  - `all`: enable all peephole optimizations
723
724- `--plt=<value>`
725
726  Optimize PLT calls (requires linking with -znow)
727  - `none`: do not optimize PLT calls
728  - `hot`: optimize executed (hot) PLT calls
729  - `all`: optimize all PLT calls
730
731- `--preserve-blocks-alignment`
732
733  Try to preserve basic block alignment
734
735- `--profile-ignore-hash`
736
737  Ignore hash while reading function profile
738
739- `--profile-use-dfs`
740
741  Use DFS order for YAML profile
742
743- `--reg-reassign`
744
745  Reassign registers so as to avoid using REX prefixes in hot code
746
747- `--reorder-blocks=<value>`
748
749  Change layout of basic blocks in a function
750  - `none`: do not reorder basic blocks
751  - `reverse`: layout blocks in reverse order
752  - `normal`: perform optimal layout based on profile
753  - `branch-predictor`: perform optimal layout prioritizing branch predictions
754  - `cache`: perform optimal layout prioritizing I-cache behavior
755  - `cache+`: perform layout optimizing I-cache behavior
756  - `ext-tsp`: perform layout optimizing I-cache behavior
757  - `cluster-shuffle`: perform random layout of clusters
758
759- `--reorder-data=<section1,section2,section3,...>`
760
761  List of sections to reorder
762
763- `--reorder-data-algo=<value>`
764
765  Algorithm used to reorder data sections
766  - `count`: sort hot data by read counts
767  - `funcs`: sort hot data by hot function usage and count
768
769- `--reorder-data-inplace`
770
771  Reorder data sections in place
772
773- `--reorder-data-max-bytes=<uint>`
774
775  Maximum number of bytes to reorder
776
777- `--reorder-data-max-symbols=<uint>`
778
779  Maximum number of symbols to reorder
780
781- `--reorder-functions=<value>`
782
783  Reorder and cluster functions (works only with relocations)
784  - `none`: do not reorder functions
785  - `exec-count`: order by execution count
786  - `hfsort`: use hfsort algorithm
787  - `hfsort+`: use cache-directed sort
788  - `cdsort`: use cache-directed sort
789  - `pettis-hansen`: use Pettis-Hansen algorithm
790  - `random`: reorder functions randomly
791  - `user`: use function order specified by -function-order
792
793- `--reorder-functions-use-hot-size`
794
795  Use a function's hot size when doing clustering
796
797- `--report-bad-layout=<uint>`
798
799  Print top <uint> functions with suboptimal code layout on input
800
801- `--report-stale`
802
803  Print the list of functions with stale profile
804
805- `--runtime-hugify-lib=<string>`
806
807  Specify file name of the runtime hugify library
808
809- `--runtime-instrumentation-lib=<string>`
810
811  Specify file name of the runtime instrumentation library
812
813- `--sctc-mode=<value>`
814
815  Mode for simplify conditional tail calls
816  - `always`: always perform sctc
817  - `preserve`: only perform sctc when branch direction is preserved
818  - `heuristic`: use branch prediction data to control sctc
819
820- `--sequential-disassembly`
821
822  Performs disassembly sequentially
823
824- `--shrink-wrapping-threshold=<uint>`
825
826  Percentage of prologue execution count to use as threshold when evaluating
827  whether a block is cold enough to be profitable to move eligible spills there
828
829- `--simplify-conditional-tail-calls`
830
831  Simplify conditional tail calls by removing unnecessary jumps
832
833- `--simplify-rodata-loads`
834
835  Simplify loads from read-only sections by replacing the memory operand with
836  the constant found in the corresponding section
837
838- `--split-align-threshold=<uint>`
839
840  When deciding to split a function, apply this alignment while doing the size
841  comparison (see -split-threshold). Default value: 2.
842
843- `--split-all-cold`
844
845  Outline as many cold basic blocks as possible
846
847- `--split-eh`
848
849  Split C++ exception handling code
850
851- `--split-functions`
852
853  Split functions into fragments
854
855- `--split-strategy=<value>`
856
857  Strategy used to partition blocks into fragments
858  - `profile2`: split each function into a hot and cold fragment using profiling
859  information
860  - `cdsplit`: split each function into a hot, warm, and cold fragment using
861  profiling information
862  - `random2`: split each function into a hot and cold fragment at a randomly
863  chosen split point (ignoring any available profiling information)
864  - `randomN`: split each function into N fragments at a randomly chosen split
865  points (ignoring any available profiling information)
866  - `all`: split all basic blocks of each function into fragments such that each
867  fragment contains exactly a single basic block
868
869- `--split-threshold=<uint>`
870
871  Split function only if its main size is reduced by more than given amount of
872  bytes. Default value: 0, i.e. split iff the size is reduced. Note that on some
873  architectures the size can increase after splitting.
874
875- `--stale-matching-max-func-size=<uint>`
876
877  The maximum size of a function to consider for inference.
878
879- `--stale-matching-min-matched-block=<uint>`
880
881  Percentage threshold of matched basic blocks at which stale profile inference
882  is executed.
883
884- `--stale-threshold=<uint>`
885
886  Maximum percentage of stale functions to tolerate (default: 100)
887
888- `--stoke`
889
890  Turn on the stoke analysis
891
892- `--strip-rep-ret`
893
894  Strip 'repz' prefix from 'repz retq' sequence (on by default)
895
896- `--tail-duplication=<value>`
897
898  Duplicate unconditional branches that cross a cache line
899  - `none`: do not apply
900  - `aggressive`: aggressive strategy
901  - `moderate`: moderate strategy
902  - `cache`: cache-aware duplication strategy
903
904- `--tsp-threshold=<uint>`
905
906  Maximum number of hot basic blocks in a function for which to use a precise
907  TSP solution while re-ordering basic blocks
908
909- `--use-aggr-reg-reassign`
910
911  Use register liveness analysis to try to find more opportunities for -reg-
912  reassign optimization
913
914- `--use-compact-aligner`
915
916  Use compact approach for aligning functions
917
918- `--use-edge-counts`
919
920  Use edge count data when doing clustering
921
922- `--verify-cfg`
923
924  Verify the CFG after every pass
925
926- `--x86-align-branch-boundary-hot-only`
927
928  Only apply branch boundary alignment in hot code
929
930- `--x86-strip-redundant-address-size`
931
932  Remove redundant Address-Size override prefix
933
934### BOLT instrumentation options:
935
936`llvm-bolt <executable> -instrument [-o outputfile] <instrumented-executable>`
937
938- `--conservative-instrumentation`
939
940  Disable instrumentation optimizations that sacrifice profile accuracy (for
941  debugging, default: false)
942
943- `--instrument-calls`
944
945  Record profile for inter-function control flow activity (default: true)
946
947- `--instrument-hot-only`
948
949  Only insert instrumentation on hot functions (needs profile, default: false)
950
951- `--instrumentation-binpath=<string>`
952
953  Path to instrumented binary in case if /proc/self/map_files is not accessible
954  due to access restriction issues
955
956- `--instrumentation-file=<string>`
957
958  File name where instrumented profile will be saved (default: /tmp/prof.fdata)
959
960- `--instrumentation-file-append-pid`
961
962  Append PID to saved profile file name (default: false)
963
964- `--instrumentation-no-counters-clear`
965
966  Don't clear counters across dumps (use with instrumentation-sleep-time option)
967
968- `--instrumentation-sleep-time=<uint>`
969
970  Interval between profile writes (default: 0 = write only at program end).
971  This is useful for service workloads when you want to dump profile every X
972  minutes or if you are killing the program and the profile is not being dumped
973  at the end.
974
975- `--instrumentation-wait-forks`
976
977  Wait until all forks of instrumented process will finish (use with
978  instrumentation-sleep-time option)
979
980### BOLT printing options:
981
982- `--print-aliases`
983
984  Print aliases when printing objects
985
986- `--print-all`
987
988  Print functions after each stage
989
990- `--print-cfg`
991
992  Print functions after CFG construction
993
994- `--print-debug-info`
995
996  Print debug info when printing functions
997
998- `--print-disasm`
999
1000  Print function after disassembly
1001
1002- `--print-dyno-opcode-stats=<uint>`
1003
1004  Print per instruction opcode dyno stats and the functionnames:BB offsets of
1005  the nth highest execution counts
1006
1007- `--print-dyno-stats-only`
1008
1009  While printing functions output dyno-stats and skip instructions
1010
1011- `--print-exceptions`
1012
1013  Print exception handling data
1014
1015- `--print-globals`
1016
1017  Print global symbols after disassembly
1018
1019- `--print-jump-tables`
1020
1021  Print jump tables
1022
1023- `--print-loops`
1024
1025  Print loop related information
1026
1027- `--print-mem-data`
1028
1029  Print memory data annotations when printing functions
1030
1031- `--print-normalized`
1032
1033  Print functions after CFG is normalized
1034
1035- `--print-only=<func1,func2,func3,...>`
1036
1037  List of functions to print
1038
1039- `--print-orc`
1040
1041  Print ORC unwind information for instructions
1042
1043- `--print-profile`
1044
1045  Print functions after attaching profile
1046
1047- `--print-profile-stats`
1048
1049  Print profile quality/bias analysis
1050
1051- `--print-pseudo-probes=<value>`
1052
1053  Print pseudo probe info
1054  - `decode`: decode probes section from binary
1055  - `address_conversion`: update address2ProbesMap with output block address
1056  - `encoded_probes`: display the encoded probes in binary section
1057  - `all`: enable all debugging printout
1058
1059- `--print-relocations`
1060
1061  Print relocations when printing functions/objects
1062
1063- `--print-reordered-data`
1064
1065  Print section contents after reordering
1066
1067- `--print-retpoline-insertion`
1068
1069  Print functions after retpoline insertion pass
1070
1071- `--print-sdt`
1072
1073  Print all SDT markers
1074
1075- `--print-sections`
1076
1077  Print all registered sections
1078
1079- `--print-unknown`
1080
1081  Print names of functions with unknown control flow
1082
1083- `--time-build`
1084
1085  Print time spent constructing binary functions
1086
1087- `--time-rewrite`
1088
1089  Print time spent in rewriting passes
1090
1091- `--print-after-branch-fixup`
1092
1093  Print function after fixing local branches
1094
1095- `--print-after-jt-footprint-reduction`
1096
1097  Print function after jt-footprint-reduction pass
1098
1099- `--print-after-lowering`
1100
1101  Print function after instruction lowering
1102
1103- `--print-cache-metrics`
1104
1105  Calculate and print various metrics for instruction cache
1106
1107- `--print-clusters`
1108
1109  Print clusters
1110
1111- `--print-estimate-edge-counts`
1112
1113  Print function after edge counts are set for no-LBR profile
1114
1115- `--print-finalized`
1116
1117  Print function after CFG is finalized
1118
1119- `--print-fix-relaxations`
1120
1121  Print functions after fix relaxations pass
1122
1123- `--print-fix-riscv-calls`
1124
1125  Print functions after fix RISCV calls pass
1126
1127- `--print-fop`
1128
1129  Print functions after frame optimizer pass
1130
1131- `--print-function-statistics=<uint>`
1132
1133  Print statistics about basic block ordering
1134
1135- `--print-icf`
1136
1137  Print functions after ICF optimization
1138
1139- `--print-icp`
1140
1141  Print functions after indirect call promotion
1142
1143- `--print-inline`
1144
1145  Print functions after inlining optimization
1146
1147- `--print-large-functions`
1148
1149  Print functions that could not be overwritten due to excessive size
1150
1151- `--print-longjmp`
1152
1153  Print functions after longjmp pass
1154
1155- `--print-optimize-bodyless`
1156
1157  Print functions after bodyless optimization
1158
1159- `--print-output-address-range`
1160
1161  Print output address range for each basic block in the function
1162  whenBinaryFunction::print is called
1163
1164- `--print-peepholes`
1165
1166  Print functions after peephole optimization
1167
1168- `--print-plt`
1169
1170  Print functions after PLT optimization
1171
1172- `--print-regreassign`
1173
1174  Print functions after regreassign pass
1175
1176- `--print-reordered`
1177
1178  Print functions after layout optimization
1179
1180- `--print-reordered-functions`
1181
1182  Print functions after clustering
1183
1184- `--print-sctc`
1185
1186  Print functions after conditional tail call simplification
1187
1188- `--print-simplify-rodata-loads`
1189
1190  Print functions after simplification of RO data loads
1191
1192- `--print-sorted-by=<value>`
1193
1194  Print functions sorted by order of dyno stats
1195  - `executed-forward-branches`: executed forward branches
1196  - `taken-forward-branches`: taken forward branches
1197  - `executed-backward-branches`: executed backward branches
1198  - `taken-backward-branches`: taken backward branches
1199  - `executed-unconditional-branches`: executed unconditional branches
1200  - `all-function-calls`: all function calls
1201  - `indirect-calls`: indirect calls
1202  - `PLT-calls`: PLT calls
1203  - `executed-instructions`: executed instructions
1204  - `executed-load-instructions`: executed load instructions
1205  - `executed-store-instructions`: executed store instructions
1206  - `taken-jump-table-branches`: taken jump table branches
1207  - `taken-unknown-indirect-branches`: taken unknown indirect branches
1208  - `total-branches`: total branches
1209  - `taken-branches`: taken branches
1210  - `non-taken-conditional-branches`: non-taken conditional branches
1211  - `taken-conditional-branches`: taken conditional branches
1212  - `all-conditional-branches`: all conditional branches
1213  - `linker-inserted-veneer-calls`: linker-inserted veneer calls
1214  - `all`: sorted by all names
1215
1216- `--print-sorted-by-order=<value>`
1217
1218  Use ascending or descending order when printing functions ordered by dyno stats
1219
1220- `--print-split`
1221
1222  Print functions after code splitting
1223
1224- `--print-stoke`
1225
1226  Print functions after stoke analysis
1227
1228- `--print-uce`
1229
1230  Print functions after unreachable code elimination
1231
1232- `--print-veneer-elimination`
1233
1234  Print functions after veneer elimination pass
1235
1236- `--time-opts`
1237
1238  Print time spent in each optimization
1239
1240- `--print-all-options`
1241
1242  Print all option values after command line parsing
1243
1244- `--print-options`
1245
1246  Print non-default options after command line parsing
1247