1# BOLT - a post-link optimizer developed to speed up large applications 2 3## SYNOPSIS 4 5`llvm-bolt <executable> [-o outputfile] <executable>.bolt [-data=perf.fdata] [options]` 6 7## OPTIONS 8 9### Generic options: 10 11- `-h` 12 13 Alias for --help 14 15- `--help` 16 17 Display available options (--help-hidden for more) 18 19- `--help-hidden` 20 21 Display all available options 22 23- `--help-list` 24 25 Display list of available options (--help-list-hidden for more) 26 27- `--help-list-hidden` 28 29 Display list of all available options 30 31- `--version` 32 33 Display the version of this program 34 35### Output options: 36 37- `--bolt-info` 38 39 Write bolt info section in the output binary 40 41- `-o <string>` 42 43 output file 44 45- `-w <string>` 46 47 Save recorded profile to a file 48 49### BOLT generic options: 50 51- `--align-text=<uint>` 52 53 Alignment of .text section 54 55- `--allow-stripped` 56 57 Allow processing of stripped binaries 58 59- `--alt-inst-feature-size=<uint>` 60 61 Size of feature field in .altinstructions 62 63- `--alt-inst-has-padlen` 64 65 Specify that .altinstructions has padlen field 66 67- `--asm-dump[=<dump folder>]` 68 69 Dump function into assembly 70 71- `-b` 72 73 Alias for -data 74 75- `--bolt-id=<string>` 76 77 Add any string to tag this execution in the output binary via bolt info section 78 79- `--break-funcs=<func1,func2,func3,...>` 80 81 List of functions to core dump on (debugging) 82 83- `--check-encoding` 84 85 Perform verification of LLVM instruction encoding/decoding. Every instruction 86 in the input is decoded and re-encoded. If the resulting bytes do not match 87 the input, a warning message is printed. 88 89- `--comp-dir-override=<string>` 90 91 Overrides DW_AT_comp_dir, and provides an alternative base location, which is 92 used with DW_AT_dwo_name to construct a path to *.dwo files. 93 94- `--create-debug-names-section` 95 96 Creates .debug_names section, if the input binary doesn't have it already, for 97 DWARF5 CU/TUs. 98 99- `--cu-processing-batch-size=<uint>` 100 101 Specifies the size of batches for processing CUs. Higher number has better 102 performance, but more memory usage. Default value is 1. 103 104- `--data=<string>` 105 106 data file 107 108- `--data2=<string>` 109 110 data file 111 112- `--debug-skeleton-cu` 113 114 Prints out offsets for abbrev and debug_info of Skeleton CUs that get patched. 115 116- `--debug-thread-count=<uint>` 117 118 Specifies the number of threads to be used when processing DWO debug information. 119 120- `--dot-tooltip-code` 121 122 Add basic block instructions as tool tips on nodes 123 124- `--dump-alt-instructions` 125 126 Dump Linux alternative instructions info 127 128- `--dump-cg=<string>` 129 130 Dump callgraph to the given file 131 132- `--dump-data` 133 134 Dump parsed bolt data for debugging 135 136- `--dump-dot-all` 137 138 Dump function CFGs to graphviz format after each stage;enable '-print-loops' 139 for color-coded blocks 140 141- `--dump-linux-exceptions` 142 143 Dump Linux kernel exception table 144 145- `--dump-orc` 146 147 Dump raw ORC unwind information (sorted) 148 149- `--dump-para-sites` 150 151 Dump Linux kernel paravitual patch sites 152 153- `--dump-pci-fixups` 154 155 Dump Linux kernel PCI fixup table 156 157- `--dump-smp-locks` 158 159 Dump Linux kernel SMP locks 160 161- `--dump-static-calls` 162 163 Dump Linux kernel static calls 164 165- `--dump-static-keys` 166 167 Dump Linux kernel static keys jump table 168 169- `--dwarf-output-path=<string>` 170 171 Path to where .dwo files or dwp file will be written out to. 172 173- `--dwp=<string>` 174 175 Path and name to DWP file. 176 177- `--dyno-stats` 178 179 Print execution info based on profile 180 181- `--dyno-stats-all` 182 183 Print dyno stats after each stage 184 185- `--dyno-stats-scale=<uint>` 186 187 Scale to be applied while reporting dyno stats 188 189- `--enable-bat` 190 191 Write BOLT Address Translation tables 192 193- `--force-data-relocations` 194 195 Force relocations to data sections to always be processed 196 197- `--force-patch` 198 199 Force patching of original entry points 200 201- `--funcs=<func1,func2,func3,...>` 202 203 Limit optimizations to functions from the list 204 205- `--funcs-file=<string>` 206 207 File with list of functions to optimize 208 209- `--funcs-file-no-regex=<string>` 210 211 File with list of functions to optimize (non-regex) 212 213- `--funcs-no-regex=<func1,func2,func3,...>` 214 215 Limit optimizations to functions from the list (non-regex) 216 217- `--hot-data` 218 219 Hot data symbols support (relocation mode) 220 221- `--hot-functions-at-end` 222 223 If reorder-functions is used, order functions putting hottest last 224 225- `--hot-text` 226 227 Generate hot text symbols. Apply this option to a precompiled binary that 228 manually calls into hugify, such that at runtime hugify call will put hot code 229 into 2M pages. This requires relocation. 230 231- `--hot-text-move-sections=<sec1,sec2,sec3,...>` 232 233 List of sections containing functions used for hugifying hot text. BOLT makes 234 sure these functions are not placed on the same page as the hot text. 235 (default='.stub,.mover'). 236 237- `--insert-retpolines` 238 239 Run retpoline insertion pass 240 241- `--keep-aranges` 242 243 Keep or generate .debug_aranges section if .gdb_index is written 244 245- `--keep-tmp` 246 247 Preserve intermediate .o file 248 249- `--lite` 250 251 Skip processing of cold functions 252 253- `--log-file=<string>` 254 255 Redirect journaling to a file instead of stdout/stderr 256 257- `--long-jump-labels` 258 259 Always use long jumps/nops for Linux kernel static keys 260 261- `--match-profile-with-function-hash` 262 263 Match profile with function hash 264 265- `--max-data-relocations=<uint>` 266 267 Maximum number of data relocations to process 268 269- `--max-funcs=<uint>` 270 271 Maximum number of functions to process 272 273- `--no-huge-pages` 274 275 Use regular size pages for code alignment 276 277- `--no-threads` 278 279 Disable multithreading 280 281- `--pad-funcs=<func1:pad1,func2:pad2,func3:pad3,...>` 282 283 List of functions to pad with amount of bytes 284 285- `--print-mappings` 286 287 Print mappings in the legend, between characters/blocks and text sections 288 (default false). 289 290 291- `--profile-format=<value>` 292 293 Format to dump profile output in aggregation mode, default is fdata 294 - `fdata`: offset-based plaintext format 295 - `yaml`: dense YAML representation 296 297- `--r11-availability=<value>` 298 299 Determine the availability of r11 before indirect branches 300 - `never`: r11 not available 301 - `always`: r11 available before calls and jumps 302 - `abi`: r11 available before calls but not before jumps 303 304- `--relocs` 305 306 Use relocations in the binary (default=autodetect) 307 308- `--remove-symtab` 309 310 Remove .symtab section 311 312- `--reorder-skip-symbols=<symbol1,symbol2,symbol3,...>` 313 314 List of symbol names that cannot be reordered 315 316- `--reorder-symbols=<symbol1,symbol2,symbol3,...>` 317 318 List of symbol names that can be reordered 319 320- `--retpoline-lfence` 321 322 Determine if lfence instruction should exist in the retpoline 323 324- `--skip-funcs=<func1,func2,func3,...>` 325 326 List of functions to skip 327 328- `--skip-funcs-file=<string>` 329 330 File with list of functions to skip 331 332- `--strict` 333 334 Trust the input to be from a well-formed source 335 336- `--tasks-per-thread=<uint>` 337 338 Number of tasks to be created per thread 339 340- `--terminal-trap` 341 342 Assume that execution stops at trap instruction 343 344- `--thread-count=<uint>` 345 346 Number of threads 347 348- `--top-called-limit=<uint>` 349 350 Maximum number of functions to print in top called functions section 351 352- `--trap-avx512` 353 354 In relocation mode trap upon entry to any function that uses AVX-512 355 instructions 356 357- `--trap-old-code` 358 359 Insert traps in old function bodies (relocation mode) 360 361- `--update-debug-sections` 362 363 Update DWARF debug sections of the executable 364 365- `--use-gnu-stack` 366 367 Use GNU_STACK program header for new segment (workaround for issues with 368 strip/objcopy) 369 370- `--use-old-text` 371 372 Re-use space in old .text if possible (relocation mode) 373 374- `-v <uint>` 375 376 Set verbosity level for diagnostic output 377 378- `--write-dwp` 379 380 Output a single dwarf package file (dwp) instead of multiple non-relocatable 381 dwarf object files (dwo). 382 383### BOLT optimization options: 384 385- `--align-blocks` 386 387 Align basic blocks 388 389- `--align-blocks-min-size=<uint>` 390 391 Minimal size of the basic block that should be aligned 392 393- `--align-blocks-threshold=<uint>` 394 395 Align only blocks with frequency larger than containing function execution 396 frequency specified in percent. E.g. 1000 means aligning blocks that are 10 397 times more frequently executed than the containing function. 398 399- `--align-functions=<uint>` 400 401 Align functions at a given value (relocation mode) 402 403- `--align-functions-max-bytes=<uint>` 404 405 Maximum number of bytes to use to align functions 406 407- `--assume-abi` 408 409 Assume the ABI is never violated 410 411- `--block-alignment=<uint>` 412 413 Boundary to use for alignment of basic blocks 414 415- `--bolt-seed=<uint>` 416 417 Seed for randomization 418 419- `--cg-from-perf-data` 420 421 Use perf data directly when constructing the call graph for stale functions 422 423- `--cg-ignore-recursive-calls` 424 425 Ignore recursive calls when constructing the call graph 426 427- `--cg-use-split-hot-size` 428 429 Use hot/cold data on basic blocks to determine hot sizes for call graph 430 functions 431 432- `--cold-threshold=<uint>` 433 434 Tenths of percents of main entry frequency to use as a threshold when 435 evaluating whether a basic block is cold (0 means it is only considered cold 436 if the block has zero samples). Default: 0 437 438- `--elim-link-veneers` 439 440 Run veneer elimination pass 441 442- `--eliminate-unreachable` 443 444 Eliminate unreachable code 445 446- `--equalize-bb-counts` 447 448 Use same count for BBs that should have equivalent count (used in non-LBR and 449 shrink wrapping) 450 451- `--execution-count-threshold=<uint>` 452 453 Perform profiling accuracy-sensitive optimizations only if function execution 454 count >= the threshold (default: 0) 455 456- `--fix-block-counts` 457 458 Adjust block counts based on outgoing branch counts 459 460- `--fix-func-counts` 461 462 Adjust function counts based on basic blocks execution count 463 464- `--force-inline=<func1,func2,func3,...>` 465 466 List of functions to always consider for inlining 467 468- `--frame-opt=<value>` 469 470 Optimize stack frame accesses 471 - `none`: do not perform frame optimization 472 - `hot`: perform FOP on hot functions 473 - `all`: perform FOP on all functions 474 475- `--frame-opt-rm-stores` 476 477 Apply additional analysis to remove stores (experimental) 478 479- `--function-order=<string>` 480 481 File containing an ordered list of functions to use for function reordering 482 483- `--generate-function-order=<string>` 484 485 File to dump the ordered list of functions to use for function reordering 486 487- `--generate-link-sections=<string>` 488 489 Generate a list of function sections in a format suitable for inclusion in a 490 linker script 491 492- `--group-stubs` 493 494 Share stubs across functions 495 496- `--hugify` 497 498 Automatically put hot code on 2MB page(s) (hugify) at runtime. No manual call 499 to hugify is needed in the binary (which is what --hot-text relies on). 500 501- `--icf=<value>` 502 503 Fold functions with identical code 504 - `all`: Enable identical code folding 505 - `none`: Disable identical code folding (default) 506 - `safe`: Enable safe identical code folding 507 508- `--icp` 509 510 Alias for --indirect-call-promotion 511 512- `--icp-calls-remaining-percent-threshold=<uint>` 513 514 The percentage threshold against remaining unpromoted indirect call count for 515 the promotion for calls 516 517- `--icp-calls-topn` 518 519 Alias for --indirect-call-promotion-calls-topn 520 521- `--icp-calls-total-percent-threshold=<uint>` 522 523 The percentage threshold against total count for the promotion for calls 524 525- `--icp-eliminate-loads` 526 527 Enable load elimination using memory profiling data when performing ICP 528 529- `--icp-funcs=<func1,func2,func3,...>` 530 531 List of functions to enable ICP for 532 533- `--icp-inline` 534 535 Only promote call targets eligible for inlining 536 537- `--icp-jt-remaining-percent-threshold=<uint>` 538 539 The percentage threshold against remaining unpromoted indirect call count for 540 the promotion for jump tables 541 542- `--icp-jt-targets` 543 544 Alias for --icp-jump-tables-targets 545 546- `--icp-jt-topn` 547 548 Alias for --indirect-call-promotion-jump-tables-topn 549 550- `--icp-jt-total-percent-threshold=<uint>` 551 552 The percentage threshold against total count for the promotion for jump tables 553 554- `--icp-jump-tables-targets` 555 556 For jump tables, optimize indirect jmp targets instead of indices 557 558- `--icp-mp-threshold` 559 560 Alias for --indirect-call-promotion-mispredict-threshold 561 562- `--icp-old-code-sequence` 563 564 Use old code sequence for promoted calls 565 566- `--icp-top-callsites=<uint>` 567 568 Optimize hottest calls until at least this percentage of all indirect calls 569 frequency is covered. 0 = all callsites 570 571- `--icp-topn` 572 573 Alias for --indirect-call-promotion-topn 574 575- `--icp-use-mp` 576 577 Alias for --indirect-call-promotion-use-mispredicts 578 579- `--indirect-call-promotion=<value>` 580 581 Indirect call promotion 582 - `none`: do not perform indirect call promotion 583 - `calls`: perform ICP on indirect calls 584 - `jump-tables`: perform ICP on jump tables 585 - `all`: perform ICP on calls and jump tables 586 587- `--indirect-call-promotion-calls-topn=<uint>` 588 589 Limit number of targets to consider when doing indirect call promotion on 590 calls. 0 = no limit 591 592- `--indirect-call-promotion-jump-tables-topn=<uint>` 593 594 Limit number of targets to consider when doing indirect call promotion on jump 595 tables. 0 = no limit 596 597- `--indirect-call-promotion-topn=<uint>` 598 599 Limit number of targets to consider when doing indirect call promotion. 0 = no 600 limit 601 602- `--indirect-call-promotion-use-mispredicts` 603 604 Use misprediction frequency for determining whether or not ICP should be 605 applied at a callsite. The -indirect-call-promotion-mispredict-threshold 606 value will be used by this heuristic 607 608- `--infer-fall-throughs` 609 610 Infer execution count for fall-through blocks 611 612- `--infer-stale-profile` 613 614 Infer counts from stale profile data. 615 616- `--inline-all` 617 618 Inline all functions 619 620- `--inline-ap` 621 622 Adjust function profile after inlining 623 624- `--inline-limit=<uint>` 625 626 Maximum number of call sites to inline 627 628- `--inline-max-iters=<uint>` 629 630 Maximum number of inline iterations 631 632- `--inline-memcpy` 633 634 Inline memcpy using 'rep movsb' instruction (X86-only) 635 636- `--inline-small-functions` 637 638 Inline functions if increase in size is less than defined by -inline-small- 639 functions-bytes 640 641- `--inline-small-functions-bytes=<uint>` 642 643 Max number of bytes for the function to be considered small for inlining 644 purposes 645 646- `--instrument` 647 648 Instrument code to generate accurate profile data 649 650- `--iterative-guess` 651 652 In non-LBR mode, guess edge counts using iterative technique 653 654- `--jt-footprint-optimize-for-icache` 655 656 With jt-footprint-reduction, only process PIC jumptables and turn off other 657 transformations that increase code size 658 659- `--jt-footprint-reduction` 660 661 Make jump tables size smaller at the cost of using more instructions at jump 662 sites 663 664- `--jump-tables=<value>` 665 666 Jump tables support (default=basic) 667 - `none`: do not optimize functions with jump tables 668 - `basic`: optimize functions with jump tables 669 - `move`: move jump tables to a separate section 670 - `split`: split jump tables section into hot and cold based on function 671 execution frequency 672 - `aggressive`: aggressively split jump tables section based on usage of the 673 tables 674 675- `--keep-nops` 676 677 Keep no-op instructions. By default they are removed. 678 679- `--lite-threshold-count=<uint>` 680 681 Similar to '-lite-threshold-pct' but specify threshold using absolute function 682 call count. I.e. limit processing to functions executed at least the specified 683 number of times. 684 685- `--lite-threshold-pct=<uint>` 686 687 Threshold (in percent) for selecting functions to process in lite mode. Higher 688 threshold means fewer functions to process. E.g threshold of 90 means only top 689 10 percent of functions with profile will be processed. 690 691- `--match-with-call-graph` 692 693 Match functions with call graph 694 695- `--memcpy1-spec=<func1,func2:cs1:cs2,func3:cs1,...>` 696 697 List of functions with call sites for which to specialize memcpy() for size 1 698 699- `--min-branch-clusters` 700 701 Use a modified clustering algorithm geared towards minimizing branches 702 703- `--name-similarity-function-matching-threshold=<uint>` 704 705 Match functions using namespace and edit distance. 706 707- `--no-inline` 708 709 Disable all inlining (overrides other inlining options) 710 711- `--no-scan` 712 713 Do not scan cold functions for external references (may result in slower binary) 714 715- `--peepholes=<value>` 716 717 Enable peephole optimizations 718 - `none`: disable peepholes 719 - `double-jumps`: remove double jumps when able 720 - `tailcall-traps`: insert tail call traps 721 - `useless-branches`: remove useless conditional branches 722 - `all`: enable all peephole optimizations 723 724- `--plt=<value>` 725 726 Optimize PLT calls (requires linking with -znow) 727 - `none`: do not optimize PLT calls 728 - `hot`: optimize executed (hot) PLT calls 729 - `all`: optimize all PLT calls 730 731- `--preserve-blocks-alignment` 732 733 Try to preserve basic block alignment 734 735- `--profile-ignore-hash` 736 737 Ignore hash while reading function profile 738 739- `--profile-use-dfs` 740 741 Use DFS order for YAML profile 742 743- `--reg-reassign` 744 745 Reassign registers so as to avoid using REX prefixes in hot code 746 747- `--reorder-blocks=<value>` 748 749 Change layout of basic blocks in a function 750 - `none`: do not reorder basic blocks 751 - `reverse`: layout blocks in reverse order 752 - `normal`: perform optimal layout based on profile 753 - `branch-predictor`: perform optimal layout prioritizing branch predictions 754 - `cache`: perform optimal layout prioritizing I-cache behavior 755 - `cache+`: perform layout optimizing I-cache behavior 756 - `ext-tsp`: perform layout optimizing I-cache behavior 757 - `cluster-shuffle`: perform random layout of clusters 758 759- `--reorder-data=<section1,section2,section3,...>` 760 761 List of sections to reorder 762 763- `--reorder-data-algo=<value>` 764 765 Algorithm used to reorder data sections 766 - `count`: sort hot data by read counts 767 - `funcs`: sort hot data by hot function usage and count 768 769- `--reorder-data-inplace` 770 771 Reorder data sections in place 772 773- `--reorder-data-max-bytes=<uint>` 774 775 Maximum number of bytes to reorder 776 777- `--reorder-data-max-symbols=<uint>` 778 779 Maximum number of symbols to reorder 780 781- `--reorder-functions=<value>` 782 783 Reorder and cluster functions (works only with relocations) 784 - `none`: do not reorder functions 785 - `exec-count`: order by execution count 786 - `hfsort`: use hfsort algorithm 787 - `hfsort+`: use cache-directed sort 788 - `cdsort`: use cache-directed sort 789 - `pettis-hansen`: use Pettis-Hansen algorithm 790 - `random`: reorder functions randomly 791 - `user`: use function order specified by -function-order 792 793- `--reorder-functions-use-hot-size` 794 795 Use a function's hot size when doing clustering 796 797- `--report-bad-layout=<uint>` 798 799 Print top <uint> functions with suboptimal code layout on input 800 801- `--report-stale` 802 803 Print the list of functions with stale profile 804 805- `--runtime-hugify-lib=<string>` 806 807 Specify file name of the runtime hugify library 808 809- `--runtime-instrumentation-lib=<string>` 810 811 Specify file name of the runtime instrumentation library 812 813- `--sctc-mode=<value>` 814 815 Mode for simplify conditional tail calls 816 - `always`: always perform sctc 817 - `preserve`: only perform sctc when branch direction is preserved 818 - `heuristic`: use branch prediction data to control sctc 819 820- `--sequential-disassembly` 821 822 Performs disassembly sequentially 823 824- `--shrink-wrapping-threshold=<uint>` 825 826 Percentage of prologue execution count to use as threshold when evaluating 827 whether a block is cold enough to be profitable to move eligible spills there 828 829- `--simplify-conditional-tail-calls` 830 831 Simplify conditional tail calls by removing unnecessary jumps 832 833- `--simplify-rodata-loads` 834 835 Simplify loads from read-only sections by replacing the memory operand with 836 the constant found in the corresponding section 837 838- `--split-align-threshold=<uint>` 839 840 When deciding to split a function, apply this alignment while doing the size 841 comparison (see -split-threshold). Default value: 2. 842 843- `--split-all-cold` 844 845 Outline as many cold basic blocks as possible 846 847- `--split-eh` 848 849 Split C++ exception handling code 850 851- `--split-functions` 852 853 Split functions into fragments 854 855- `--split-strategy=<value>` 856 857 Strategy used to partition blocks into fragments 858 - `profile2`: split each function into a hot and cold fragment using profiling 859 information 860 - `cdsplit`: split each function into a hot, warm, and cold fragment using 861 profiling information 862 - `random2`: split each function into a hot and cold fragment at a randomly 863 chosen split point (ignoring any available profiling information) 864 - `randomN`: split each function into N fragments at a randomly chosen split 865 points (ignoring any available profiling information) 866 - `all`: split all basic blocks of each function into fragments such that each 867 fragment contains exactly a single basic block 868 869- `--split-threshold=<uint>` 870 871 Split function only if its main size is reduced by more than given amount of 872 bytes. Default value: 0, i.e. split iff the size is reduced. Note that on some 873 architectures the size can increase after splitting. 874 875- `--stale-matching-max-func-size=<uint>` 876 877 The maximum size of a function to consider for inference. 878 879- `--stale-matching-min-matched-block=<uint>` 880 881 Percentage threshold of matched basic blocks at which stale profile inference 882 is executed. 883 884- `--stale-threshold=<uint>` 885 886 Maximum percentage of stale functions to tolerate (default: 100) 887 888- `--stoke` 889 890 Turn on the stoke analysis 891 892- `--strip-rep-ret` 893 894 Strip 'repz' prefix from 'repz retq' sequence (on by default) 895 896- `--tail-duplication=<value>` 897 898 Duplicate unconditional branches that cross a cache line 899 - `none`: do not apply 900 - `aggressive`: aggressive strategy 901 - `moderate`: moderate strategy 902 - `cache`: cache-aware duplication strategy 903 904- `--tsp-threshold=<uint>` 905 906 Maximum number of hot basic blocks in a function for which to use a precise 907 TSP solution while re-ordering basic blocks 908 909- `--use-aggr-reg-reassign` 910 911 Use register liveness analysis to try to find more opportunities for -reg- 912 reassign optimization 913 914- `--use-compact-aligner` 915 916 Use compact approach for aligning functions 917 918- `--use-edge-counts` 919 920 Use edge count data when doing clustering 921 922- `--verify-cfg` 923 924 Verify the CFG after every pass 925 926- `--x86-align-branch-boundary-hot-only` 927 928 Only apply branch boundary alignment in hot code 929 930- `--x86-strip-redundant-address-size` 931 932 Remove redundant Address-Size override prefix 933 934### BOLT instrumentation options: 935 936`llvm-bolt <executable> -instrument [-o outputfile] <instrumented-executable>` 937 938- `--conservative-instrumentation` 939 940 Disable instrumentation optimizations that sacrifice profile accuracy (for 941 debugging, default: false) 942 943- `--instrument-calls` 944 945 Record profile for inter-function control flow activity (default: true) 946 947- `--instrument-hot-only` 948 949 Only insert instrumentation on hot functions (needs profile, default: false) 950 951- `--instrumentation-binpath=<string>` 952 953 Path to instrumented binary in case if /proc/self/map_files is not accessible 954 due to access restriction issues 955 956- `--instrumentation-file=<string>` 957 958 File name where instrumented profile will be saved (default: /tmp/prof.fdata) 959 960- `--instrumentation-file-append-pid` 961 962 Append PID to saved profile file name (default: false) 963 964- `--instrumentation-no-counters-clear` 965 966 Don't clear counters across dumps (use with instrumentation-sleep-time option) 967 968- `--instrumentation-sleep-time=<uint>` 969 970 Interval between profile writes (default: 0 = write only at program end). 971 This is useful for service workloads when you want to dump profile every X 972 minutes or if you are killing the program and the profile is not being dumped 973 at the end. 974 975- `--instrumentation-wait-forks` 976 977 Wait until all forks of instrumented process will finish (use with 978 instrumentation-sleep-time option) 979 980### BOLT printing options: 981 982- `--print-aliases` 983 984 Print aliases when printing objects 985 986- `--print-all` 987 988 Print functions after each stage 989 990- `--print-cfg` 991 992 Print functions after CFG construction 993 994- `--print-debug-info` 995 996 Print debug info when printing functions 997 998- `--print-disasm` 999 1000 Print function after disassembly 1001 1002- `--print-dyno-opcode-stats=<uint>` 1003 1004 Print per instruction opcode dyno stats and the functionnames:BB offsets of 1005 the nth highest execution counts 1006 1007- `--print-dyno-stats-only` 1008 1009 While printing functions output dyno-stats and skip instructions 1010 1011- `--print-exceptions` 1012 1013 Print exception handling data 1014 1015- `--print-globals` 1016 1017 Print global symbols after disassembly 1018 1019- `--print-jump-tables` 1020 1021 Print jump tables 1022 1023- `--print-loops` 1024 1025 Print loop related information 1026 1027- `--print-mem-data` 1028 1029 Print memory data annotations when printing functions 1030 1031- `--print-normalized` 1032 1033 Print functions after CFG is normalized 1034 1035- `--print-only=<func1,func2,func3,...>` 1036 1037 List of functions to print 1038 1039- `--print-orc` 1040 1041 Print ORC unwind information for instructions 1042 1043- `--print-profile` 1044 1045 Print functions after attaching profile 1046 1047- `--print-profile-stats` 1048 1049 Print profile quality/bias analysis 1050 1051- `--print-pseudo-probes=<value>` 1052 1053 Print pseudo probe info 1054 - `decode`: decode probes section from binary 1055 - `address_conversion`: update address2ProbesMap with output block address 1056 - `encoded_probes`: display the encoded probes in binary section 1057 - `all`: enable all debugging printout 1058 1059- `--print-relocations` 1060 1061 Print relocations when printing functions/objects 1062 1063- `--print-reordered-data` 1064 1065 Print section contents after reordering 1066 1067- `--print-retpoline-insertion` 1068 1069 Print functions after retpoline insertion pass 1070 1071- `--print-sdt` 1072 1073 Print all SDT markers 1074 1075- `--print-sections` 1076 1077 Print all registered sections 1078 1079- `--print-unknown` 1080 1081 Print names of functions with unknown control flow 1082 1083- `--time-build` 1084 1085 Print time spent constructing binary functions 1086 1087- `--time-rewrite` 1088 1089 Print time spent in rewriting passes 1090 1091- `--print-after-branch-fixup` 1092 1093 Print function after fixing local branches 1094 1095- `--print-after-jt-footprint-reduction` 1096 1097 Print function after jt-footprint-reduction pass 1098 1099- `--print-after-lowering` 1100 1101 Print function after instruction lowering 1102 1103- `--print-cache-metrics` 1104 1105 Calculate and print various metrics for instruction cache 1106 1107- `--print-clusters` 1108 1109 Print clusters 1110 1111- `--print-estimate-edge-counts` 1112 1113 Print function after edge counts are set for no-LBR profile 1114 1115- `--print-finalized` 1116 1117 Print function after CFG is finalized 1118 1119- `--print-fix-relaxations` 1120 1121 Print functions after fix relaxations pass 1122 1123- `--print-fix-riscv-calls` 1124 1125 Print functions after fix RISCV calls pass 1126 1127- `--print-fop` 1128 1129 Print functions after frame optimizer pass 1130 1131- `--print-function-statistics=<uint>` 1132 1133 Print statistics about basic block ordering 1134 1135- `--print-icf` 1136 1137 Print functions after ICF optimization 1138 1139- `--print-icp` 1140 1141 Print functions after indirect call promotion 1142 1143- `--print-inline` 1144 1145 Print functions after inlining optimization 1146 1147- `--print-large-functions` 1148 1149 Print functions that could not be overwritten due to excessive size 1150 1151- `--print-longjmp` 1152 1153 Print functions after longjmp pass 1154 1155- `--print-optimize-bodyless` 1156 1157 Print functions after bodyless optimization 1158 1159- `--print-output-address-range` 1160 1161 Print output address range for each basic block in the function 1162 whenBinaryFunction::print is called 1163 1164- `--print-peepholes` 1165 1166 Print functions after peephole optimization 1167 1168- `--print-plt` 1169 1170 Print functions after PLT optimization 1171 1172- `--print-regreassign` 1173 1174 Print functions after regreassign pass 1175 1176- `--print-reordered` 1177 1178 Print functions after layout optimization 1179 1180- `--print-reordered-functions` 1181 1182 Print functions after clustering 1183 1184- `--print-sctc` 1185 1186 Print functions after conditional tail call simplification 1187 1188- `--print-simplify-rodata-loads` 1189 1190 Print functions after simplification of RO data loads 1191 1192- `--print-sorted-by=<value>` 1193 1194 Print functions sorted by order of dyno stats 1195 - `executed-forward-branches`: executed forward branches 1196 - `taken-forward-branches`: taken forward branches 1197 - `executed-backward-branches`: executed backward branches 1198 - `taken-backward-branches`: taken backward branches 1199 - `executed-unconditional-branches`: executed unconditional branches 1200 - `all-function-calls`: all function calls 1201 - `indirect-calls`: indirect calls 1202 - `PLT-calls`: PLT calls 1203 - `executed-instructions`: executed instructions 1204 - `executed-load-instructions`: executed load instructions 1205 - `executed-store-instructions`: executed store instructions 1206 - `taken-jump-table-branches`: taken jump table branches 1207 - `taken-unknown-indirect-branches`: taken unknown indirect branches 1208 - `total-branches`: total branches 1209 - `taken-branches`: taken branches 1210 - `non-taken-conditional-branches`: non-taken conditional branches 1211 - `taken-conditional-branches`: taken conditional branches 1212 - `all-conditional-branches`: all conditional branches 1213 - `linker-inserted-veneer-calls`: linker-inserted veneer calls 1214 - `all`: sorted by all names 1215 1216- `--print-sorted-by-order=<value>` 1217 1218 Use ascending or descending order when printing functions ordered by dyno stats 1219 1220- `--print-split` 1221 1222 Print functions after code splitting 1223 1224- `--print-stoke` 1225 1226 Print functions after stoke analysis 1227 1228- `--print-uce` 1229 1230 Print functions after unreachable code elimination 1231 1232- `--print-veneer-elimination` 1233 1234 Print functions after veneer elimination pass 1235 1236- `--time-opts` 1237 1238 Print time spent in each optimization 1239 1240- `--print-all-options` 1241 1242 Print all option values after command line parsing 1243 1244- `--print-options` 1245 1246 Print non-default options after command line parsing 1247