#
46bc197d |
| 25-Oct-2021 |
Marius Wachtler <undingen@gmail.com> |
[PR] bolt_rt: getBinaryPath() increase max file path
Summary: Increase the hard limit from 256 to 4096. This fixes the 'Assertion failed: failed to open binary path' error I'm seeing.
(cherry picke
[PR] bolt_rt: getBinaryPath() increase max file path
Summary: Increase the hard limit from 256 to 4096. This fixes the 'Assertion failed: failed to open binary path' error I'm seeing.
(cherry picked from FBD31911946)
show more ...
|
#
cb8d701b |
| 16-Oct-2021 |
Vladislav Khmelevsky <vladislav.khmelevskyi@huawei.com> |
[PR] Disable instrumentation and hugify build for aarch64
Summary: This patch temporarily disables instrumentation and higufy build not for x86 platforms to be able to build llvm-bolt tool on aarch6
[PR] Disable instrumentation and hugify build for aarch64
Summary: This patch temporarily disables instrumentation and higufy build not for x86 platforms to be able to build llvm-bolt tool on aarch64.
Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei
(cherry picked from FBD31738306)
show more ...
|
#
dcdd37fd |
| 15-Oct-2021 |
Vladislav Khmelevsky <vladislav.khmelevskyi@huawei.com> |
[PR] Instrumentation: Sync file on dump
Summary: Sync the file with storage device on data dump to stabilize instrumentation testing
Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei
[PR] Instrumentation: Sync file on dump
Summary: Sync the file with storage device on data dump to stabilize instrumentation testing
Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei
(cherry picked from FBD31738021)
show more ...
|
#
9aa134dc |
| 07-Aug-2021 |
Vasily Leonenko <vasily.leonenko@huawei.com> |
[PR] Instrumentation: use TryLock for SimpleHashTable getter
Summary: This commit introduces TryLock usage for SimpleHashTable getter to avoid deadlock and relax syscalls usage which causes signific
[PR] Instrumentation: use TryLock for SimpleHashTable getter
Summary: This commit introduces TryLock usage for SimpleHashTable getter to avoid deadlock and relax syscalls usage which causes significant overhead in runtime. The old behavior left under -conservative-instrumentation option passed to instrumentation library. Also, this commit includes a corresponding test case: instrumentation of executable which performs indirect calls from common code and signal handler.
Note: in case if TryLock was failed to acquire the lock - this indirect call will not be accounted in the resulting profile.
Vasily Leonenko, Advanced Software Technology Lab, Huawei
(cherry picked from FBD30821949)
show more ...
|
#
af58da4e |
| 21-Jul-2021 |
Vladislav Khmelevsky <vladislav.khmelevskyi@huawei.com> |
[PR] Instrumentation: Avoid generating GOT table in instrumentation library
Summary: To avoid RELATIVE relocations avoid using of GOT table by using hidden visibility for all symbols in library.
Vl
[PR] Instrumentation: Avoid generating GOT table in instrumentation library
Summary: To avoid RELATIVE relocations avoid using of GOT table by using hidden visibility for all symbols in library.
Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092712)
show more ...
|
#
553f28e9 |
| 30-Jul-2021 |
Vladislav Khmelevsky <Vladislav.Khmelevskyi@huawei.com> |
[PR] Instrumentation: Fix start and fini trampoline pointers
Summary: The trampolines are no loger pointers to the functions. For propper name resolving by bolt use extern "C" for all external symb
[PR] Instrumentation: Fix start and fini trampoline pointers
Summary: The trampolines are no loger pointers to the functions. For propper name resolving by bolt use extern "C" for all external symbols in instr.cpp
Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092698)
show more ...
|
#
519cbbaa |
| 30-Jul-2021 |
Vasily Leonenko <vasily.leonenko@huawei.com> |
[PR] Instrumentation: Introduce instrumentation-binpath argument
Summary: This commit introduces -instrumentation-binpath argument used to point instuqmented binary in runtime in case if /proc/self/
[PR] Instrumentation: Introduce instrumentation-binpath argument
Summary: This commit introduces -instrumentation-binpath argument used to point instuqmented binary in runtime in case if /proc/self/map_files path is not accessible due to access restriction issues.
Vasily Leonenko Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092681)
show more ...
|
#
361f3b55 |
| 23-Jun-2021 |
Vladislav Khmelevsky <Vladislav.Khmelevskyi@huawei.com> |
[PR] Instrumentation: Fix runtime handlers for PIE files
Summary: This commit fixes runtime instrumentation handlers for PIE binaries case.
Vladislav Khmelevsky, Advanced Software Technology Lab, H
[PR] Instrumentation: Fix runtime handlers for PIE files
Summary: This commit fixes runtime instrumentation handlers for PIE binaries case.
Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092522)
show more ...
|
#
2ffd6e2b |
| 18-Jan-2021 |
Elvina Yakubova <elvina.yakubova@huawei.com> |
[PR] Instrumentation: Add support for opening libs based on links /proc/self/map_files
Summary: This commit adds support for opening libs based on links /proc/self/map_files. For this we're getting
[PR] Instrumentation: Add support for opening libs based on links /proc/self/map_files
Summary: This commit adds support for opening libs based on links /proc/self/map_files. For this we're getting current virtual address and searching the lib in the directory with such address range. After that, we're getting full path to the binary by using readlink function. Direct read from link in /proc/self/map_files entries is not possible because of lack of permissions.
Elvina Yakubova, Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092422)
show more ...
|
#
ad79d517 |
| 18-Jun-2021 |
Vasily Leonenko <vasily.leonenko@huawei.com> |
[PR] Instrumentation: Generate and use _start and _fini trampolines
Summary: This commit implements new method for _start & _fini functions hooking which allows to use relative jumps for future PIE
[PR] Instrumentation: Generate and use _start and _fini trampolines
Summary: This commit implements new method for _start & _fini functions hooking which allows to use relative jumps for future PIE & .so library support. Instead of using absolute address of _start & _fini functions known on linking stage - we'll use dynamically created trampoline functions and use corresponding symbols in instrumentation runtime library.
As we would like to use instrumentation for dynamically loaded binaries (with PIE & .so), thus we need to compile instrumentation library with "-fPIC" flag to support relative address resolution for functions and data.
For shared libraries we need to handle initialization of instrumentation library case by using DT_INIT section entry point.
Also this commit adds detection if the binary is executable or shared library based on existence of PT_INTERP header. In case of shared library we save information about real library init function address for further usage for instrumentation library init trampoline function creation and also update DT_INIT to point instrumentation library init function.
Functions called from init/fini functions should be called with forced stack alignment to avoid issues with instructions which relies on it. E.g. optimized string operations.
Vasily Leonenko, Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092316)
show more ...
|
#
c7306cc2 |
| 08-Apr-2021 |
Amir Ayupov <aaupov@fb.com> |
Rebase: [BOLT][NFC] Expand auto types
Summary: Expanded auto types across BOLT semi-automatically with the aid of clangd LSP
(cherry picked from FBD33289309)
|
#
76d346ca |
| 10-Mar-2021 |
Vladislav Khmelevsky <Vladislav.Khmelevskyi@huawei.com> |
[BOLT][PR] Instrumentation: Introduce -no-counters-clear and -wait-forks options
Summary: This PR introduces 2 new instrumentation options: 1. instrumentation-no-counters-clear: Discussed at https:/
[BOLT][PR] Instrumentation: Introduce -no-counters-clear and -wait-forks options
Summary: This PR introduces 2 new instrumentation options: 1. instrumentation-no-counters-clear: Discussed at https://github.com/facebookincubator/BOLT/issues/121 2. instrumentation-wait-forks: Since the instrumentation counters are mapped as MAP_SHARED it will be nice to add ability to wait until all forks of the parent process will die using tracking of process group. The last patch is just emitBinary code refactor. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei
Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/125 GitHub Author: Vladislav Khmelevskyi <Vladislav.Khmelevskyi@huawei.com>
(cherry picked from FBD26919011)
show more ...
|
#
da752c9c |
| 17-Mar-2021 |
Rafael Auler <rafaelauler@fb.com> |
Fix license for a few remaining files
Summary: As titled.
(cherry picked from FBD28112137)
|
#
a0dd5b05 |
| 28-Jan-2021 |
Alexander Shaposhnikov <alexshap@fb.com> |
[BOLT] Add support for dumping profile on MacOS
Summary: Add support for dumping profile on MacOS.
(cherry picked from FBD25751363)
|
#
3b876cc3 |
| 28-Jan-2021 |
Alexander Shaposhnikov <alexshap@fb.com> |
[BOLT] Add support for dumping counters on MacOS
Summary: Add support for dumping counters on MacOS
(cherry picked from FBD25750516)
|
#
faaefff6 |
| 20-Jan-2021 |
Alexander Shaposhnikov <alexshap@fb.com> |
[BOLT] Fix operator new signature
Summary: Use size_t for the first parameter of operator new. https://en.cppreference.com/w/cpp/memory/new/operator_new
(cherry picked from FBD25750921)
|
#
e067f2ad |
| 20-Nov-2020 |
Alexander Shaposhnikov <alexshap@fb.com> |
Inject instrumentation's global dtor on MachO
Summary: This diff is a preparation for dumping the profile generated by BOLT's instrumenation on MachO.
1/ Function "bolt_instr_fini" is placed into
Inject instrumentation's global dtor on MachO
Summary: This diff is a preparation for dumping the profile generated by BOLT's instrumenation on MachO.
1/ Function "bolt_instr_fini" is placed into the predefined section "__fini"
2/ In the instrumentation pass we create a symbol "bolt_instr_fini" and replace the last global destructor with it.
This is a temporary solution, in the future we need to register bolt_instr_fini in addition to the existing destructors without dropping the last one.
(cherry picked from FBD25071864)
show more ...
|
#
1cf23e5e |
| 17-Nov-2020 |
Alexander Shaposhnikov <alexshap@fb.com> |
Link the instrumentation runtime on OSX
Summary: Link the instrumentation runtime on OSX.
(cherry picked from FBD24390019)
|
#
bbd9d610 |
| 15-Oct-2020 |
Alexander Shaposhnikov <alexshap@fb.com> |
Add first bits to cross-compile the runtime for OSX
Summary: Add first bits to cross-compile the runtime for OSX.
(cherry picked from FBD24330977)
|
#
c6799a68 |
| 27-Jul-2020 |
Rafael Auler <rafaelauler@fb.com> |
[BOLT] Fix stack alignment for runtime lib
Summary: Right now, the SAVE_ALL sequence executed upon entry of both of our runtime libs (hugify and instrumentation) will cause the stack to not be align
[BOLT] Fix stack alignment for runtime lib
Summary: Right now, the SAVE_ALL sequence executed upon entry of both of our runtime libs (hugify and instrumentation) will cause the stack to not be aligned at a 16B boundary because it saves 15 8-byte regs. Change the code sequence to adjust for that. The compiler may generate code that assumes the stack is aligned by using movaps instructions, which will crash.
(cherry picked from FBD22744307)
show more ...
|
#
9bd71615 |
| 02-May-2020 |
Xun Li <xun@fb.com> |
Adding automatic huge page support
Summary: This patch enables automated hugify for Bolt. When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library func
Adding automatic huge page support
Summary: This patch enables automated hugify for Bolt. When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back. With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.
Some details around the changes made: 1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify. 2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions. 3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify. 4. Added a simple test for hugify.
(cherry picked from FBD21384529)
show more ...
|
#
8c7f524a |
| 03-Jan-2020 |
Alexander Shaposhnikov <alexshap@fb.com> |
[BOLT] Fix build of the runtime on OSX
Summary: Fix the compilation error on OSX
(cherry picked from FBD19269806)
|
#
16a497c6 |
| 14-Dec-2019 |
Rafael Auler <rafaelauler@fb.com> |
[BOLT] Support full instrumentation
Summary: Add full instrumentation support (branches, direct and indirect calls). Add output statistics to show how many hot bytes were split from cold ones in fun
[BOLT] Support full instrumentation
Summary: Add full instrumentation support (branches, direct and indirect calls). Add output statistics to show how many hot bytes were split from cold ones in functions. Add -cold-threshold option to allow splitting warm code (non-zero count). Add option in bolt-diff to report missing functions in profile 2.
In instrumentation, fini hooks are fixed to run proper finalization code after program finishes. Hooks for startup are added to setup the runtime structures that needs initilization, such as indirect call hash tables.
Add support for automatically dumping profile data every N seconds by forking a watcher process during runtime.
(cherry picked from FBD17644396)
show more ...
|
#
ba31344f |
| 20-Sep-2019 |
Rafael Auler <rafaelauler@fb.com> |
[BOLT] Fix build for Mac
Summary: Change our CMake config for the standalone runtime instrumentation library to check for the elf.h header before using it, so the build doesn't break on systems lack
[BOLT] Fix build for Mac
Summary: Change our CMake config for the standalone runtime instrumentation library to check for the elf.h header before using it, so the build doesn't break on systems lacking it. Also fix a SmallPtrSet usage where its elements are not really pointers, but uint64_t, breaking the build in Apple's Clang.
(cherry picked from FBD17505759)
show more ...
|
#
cc4b2fb6 |
| 07-Aug-2019 |
Rafael Auler <rafaelauler@fb.com> |
[BOLT] Efficient edge profiling in instrumented mode
Summary: Change our edge profiling technique when using instrumentation to do not instrument every edge. Instead, build the spanning tree for the
[BOLT] Efficient edge profiling in instrumented mode
Summary: Change our edge profiling technique when using instrumentation to do not instrument every edge. Instead, build the spanning tree for the CFG and omit instrumentation for edges in the spanning tree. Infer the edge count for these edges when writing the profile during run time. The inference works with a bottom-up traversal of the spanning tree and establishes the value of the edge connecting to the parent based on a simple flow equation involving output and input edges, where the only unknown variable is the parent edge.
This requires some engineering in the runtime lib to support dynamic allocation for building these graphs at runtime.
(cherry picked from FBD17062773)
show more ...
|