|
Revision tags: llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3 |
|
| #
b4fcaa13 |
| 22-Oct-2024 |
Michael O'Farrell <micpof@gmail.com> |
[PGO][SampledInstr] Correct off by 1s and allow 100% sampling (#113350)
This corrects a couple off by ones related to the sampling of
**instrumented** counters, and enables setting 100% rates for b
[PGO][SampledInstr] Correct off by 1s and allow 100% sampling (#113350)
This corrects a couple off by ones related to the sampling of
**instrumented** counters, and enables setting 100% rates for burst
sampling (burst duration = period).
Off by ones:
Prior to this change it was impossible to set a period of 65535 because
this was converted to fast sampling which rollsover at USHRT_MAX + 1
(65536). Similarly the burst durations would collect burst duration + 1
counts as they used an ULE comparison.
100% sampling:
Although this is not useful for a productionized use case, it does allow
for more deterministic testing with the sampling checks in place. After
all the off by ones are fixed, allowing for 100% sampling is a matter of
letting burst duration = period.
show more ...
|
|
Revision tags: llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init |
|
| #
b1ca2a95 |
| 22-Jul-2024 |
xur-llvm <59886942+xur-llvm@users.noreply.github.com> |
[PGO] Sampled instrumentation in PGO to speed up instrumentation binary (#69535)
In comparison to non-instrumented binaries, PGO instrumentation binaries
can be significantly slower. For highly thr
[PGO] Sampled instrumentation in PGO to speed up instrumentation binary (#69535)
In comparison to non-instrumented binaries, PGO instrumentation binaries
can be significantly slower. For highly threaded programs, this slowdown
can
reach 10x due to data races or false sharing within counters.
This patch incorporates sampling into the PGO instrumentation process to
enhance the speed of instrumentation binaries. The fundamental concept
is similar to the one proposed in https://reviews.llvm.org/D63949.
Three sampling modes are introduced:
1. Simple Sampling: When '-sampled-instr-bust-duration' is set to 1.
2. Fast Burst Sampling: When not using simple sampling, and
'-sampled-instr-period' is set to 65535. This is the default mode of
sampling.
3. Full Burst Sampling: When neither simple nor fast burst sampling is
used.
Utilizing this sampled instrumentation significantly improves the
binary's
execution speed. Measurements show up to 5x speedup with default
settings. Fast burst sampling now results in only around 20% to 30%
slowdown (compared to 8 to 10x slowdown without sampling).
Out tests show that profile quality remains good with sampling,
with edge counts typically showing more than 90% overlap.
For applications whose behavior changes due to binary speed,
sampling instrumentation can enhance performance.
Observations have shown some apps experiencing up to
a ~2% improvement in PGO.
A potential drawback of this patch is the increased binary size
and compilation time. The Sampling method in this patch does
not improve single threaded program instrumentation binary
speed.
show more ...
|