Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1 |
|
#
faf675ce |
| 29-Jan-2024 |
Aiden Grossman <agrossman154@yahoo.com> |
[llvm-exegesis] Remove llvm prefix where unnecessary (#79802)
This patch removes the llvm:: prefix within llvm-exegesis where it is
not necessary. This is most occurrences of the prefix within exeg
[llvm-exegesis] Remove llvm prefix where unnecessary (#79802)
This patch removes the llvm:: prefix within llvm-exegesis where it is
not necessary. This is most occurrences of the prefix within exegesis as
exegesis is within the llvm namespace. This patch makes things more
consistent as the vast majority of the code did not use the llvm::
prefix for anything.
show more ...
|
Revision tags: llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0 |
|
#
c7b25faa |
| 09-Sep-2023 |
Kazu Hirata <kazu@google.com> |
[llvm-exegesis] Use range-based for loops (NFC)
|
Revision tags: llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1 |
|
#
389bf5d8 |
| 27-Mar-2023 |
Aiden Grossman <agrossman154@yahoo.com> |
[llvm-exegesis] Refactor InstructionBenchmark to Benchmark
When llvm-exegesis was first introduced, it only supported benchmarking individual instructions, hence the name for the data structure stor
[llvm-exegesis] Refactor InstructionBenchmark to Benchmark
When llvm-exegesis was first introduced, it only supported benchmarking individual instructions, hence the name for the data structure storing the data corresponding to a benchmark being called InstructionBenchmark made sense. However, now that benchmarking arbitrary snippets is supported, InstructionBenchmark doesn't correspond to a single instruction. This patch refactors InstructionBenchmark to be called Benchmark to clean up this little bit of technical debt.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D146884
show more ...
|
Revision tags: llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init |
|
#
898b5c9f |
| 22-Jan-2023 |
Piotr Fusik <fox@scene.pl> |
[NFC] Fix "form/from" typos
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D142007
|
Revision tags: llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2 |
|
#
5bc0e7b7 |
| 30-Jul-2022 |
Kazu Hirata <kazu@google.com> |
Convert for_each to range-based for loops (NFC)
|
Revision tags: llvmorg-15.0.0-rc1, llvmorg-16-init, llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1, llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3 |
|
#
e030f808 |
| 07-Sep-2021 |
Roman Lebedev <lebedev.ri@gmail.com> |
[Exegesis] Native clusterization: sub-partition by sched class id
Currently native clusterization simply groups all benchmarks by the opcode of key instruction, but that is suboptimal in certain cas
[Exegesis] Native clusterization: sub-partition by sched class id
Currently native clusterization simply groups all benchmarks by the opcode of key instruction, but that is suboptimal in certain cases, e.g. where we can already tell that the particular instructions already resolve into different sched classes.
show more ...
|
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init, llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3, llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3, llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2, llvmorg-11.0.1-rc1, llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4, llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2, llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3 |
|
#
af450eab |
| 29-Feb-2020 |
Reid Kleckner <rnk@google.com> |
Avoid including FileSystem.h from MemoryBuffer.h
Lots of headers pass around MemoryBuffer objects, but very few open them. Let those that do include FileSystem.h.
Saves ~250 includes of Chrono.h &
Avoid including FileSystem.h from MemoryBuffer.h
Lots of headers pass around MemoryBuffer objects, but very few open them. Let those that do include FileSystem.h.
Saves ~250 includes of Chrono.h & FileSystem.h:
$ diff -u thedeps-before.txt thedeps-after.txt | grep '^[-+] ' | sort | uniq -c | sort -nr 254 - ../llvm/include/llvm/Support/FileSystem.h 253 - ../llvm/include/llvm/Support/Chrono.h 237 - ../llvm/include/llvm/Support/NativeFormatting.h 237 - ../llvm/include/llvm/Support/FormatProviders.h 192 - ../llvm/include/llvm/ADT/StringSwitch.h 190 - ../llvm/include/llvm/Support/FormatVariadicDetails.h ...
This requires duplicating the file_t typedef, which is unfortunate. I sunk the choice of mapping mode down into the cpp file using variable template specializations instead of class members in headers.
show more ...
|
Revision tags: llvmorg-10.0.0-rc2 |
|
#
c55cf4af |
| 10-Feb-2020 |
Bill Wendling <isanbard@gmail.com> |
Revert "Remove redundant "std::move"s in return statements"
The build failed with
error: call to deleted constructor of 'llvm::Error'
errors.
This reverts commit 1c2241a7936bf85aa68aef94bd40c3b
Revert "Remove redundant "std::move"s in return statements"
The build failed with
error: call to deleted constructor of 'llvm::Error'
errors.
This reverts commit 1c2241a7936bf85aa68aef94bd40c3ba77d8ddf2.
show more ...
|
#
1c2241a7 |
| 10-Feb-2020 |
Bill Wendling <isanbard@gmail.com> |
Remove redundant "std::move"s in return statements
|
#
446268a2 |
| 06-Feb-2020 |
Miloš Stojanović <Milos.Stojanovic@rt-rk.com> |
[llvm-exegesis] Add a custom error for clustering
All errors of type `Failure` are `StringError`s. In order for exit code mapping to detect that specifically a clustering error has occurred it needs
[llvm-exegesis] Add a custom error for clustering
All errors of type `Failure` are `StringError`s. In order for exit code mapping to detect that specifically a clustering error has occurred it needs to have a different type.
This patch also prepares D74085 where termination `report_fatal_error()` will be replaced with emitting `StringError`s.
Differential Revision: https://reviews.llvm.org/D74124
show more ...
|
Revision tags: llvmorg-10.0.0-rc1, llvmorg-11-init |
|
#
536c9a60 |
| 22-Dec-2019 |
Mark de Wever <koraq@xs4all.nl> |
[Tools] Fixes -Wrange-loop-analysis warnings
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.
Differential Revision: https://reviews.llvm.org/D71808
|
Revision tags: llvmorg-9.0.1, llvmorg-9.0.1-rc3, llvmorg-9.0.1-rc2, llvmorg-9.0.1-rc1 |
|
#
50cdd56b |
| 09-Oct-2019 |
Clement Courbet <courbet@google.com> |
[llvm-exegesis][NFC] Remove extra `llvm::` qualifications.
Summary: Second patch: in the lib.
Reviewers: gchatelet
Subscribers: nemanjai, tschuett, MaskRay, mgrang, jsji, llvm-commits
Tags: #llvm
[llvm-exegesis][NFC] Remove extra `llvm::` qualifications.
Summary: Second patch: in the lib.
Reviewers: gchatelet
Subscribers: nemanjai, tschuett, MaskRay, mgrang, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68692
llvm-svn: 374158
show more ...
|
#
4919534a |
| 08-Oct-2019 |
Clement Courbet <courbet@google.com> |
[llvm-exegesis] Finish plumbing the `Config` field.
Summary: Right now there are no snippet generators that emit the `Config` Field, but I plan to add it to investigate LEA operands for PR32326.
Wh
[llvm-exegesis] Finish plumbing the `Config` field.
Summary: Right now there are no snippet generators that emit the `Config` Field, but I plan to add it to investigate LEA operands for PR32326.
What was broken was: - `Config` Was not propagated up until the BenchmarkResult::Key. - Clustering should really consider different configs as measuring different things, so we should stabilize on (Opcode, Config) instead of just Opcode.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits, lebedev.ri
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68629
llvm-svn: 374031
show more ...
|
Revision tags: llvmorg-9.0.0, llvmorg-9.0.0-rc6, llvmorg-9.0.0-rc5, llvmorg-9.0.0-rc4, llvmorg-9.0.0-rc3, llvmorg-9.0.0-rc2, llvmorg-9.0.0-rc1, llvmorg-10-init, llvmorg-8.0.1, llvmorg-8.0.1-rc4, llvmorg-8.0.1-rc3, llvmorg-8.0.1-rc2, llvmorg-8.0.1-rc1 |
|
#
b8fb15d4 |
| 29-Mar-2019 |
Roman Lebedev <lebedev.ri@gmail.com> |
[NFC][llvm-exegesis] Refactor Analysis::SchedClassCluster::measurementsMatch()
Summary: The diff looks scary but it really isn't: 1. I moved the check for the number of measurements into `SchedClass
[NFC][llvm-exegesis] Refactor Analysis::SchedClassCluster::measurementsMatch()
Summary: The diff looks scary but it really isn't: 1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()` 2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially. 3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid. 3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`. This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix. 3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()` I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 | PR41275 ]]
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59951
llvm-svn: 357245
show more ...
|
#
c2423fe6 |
| 28-Mar-2019 |
Roman Lebedev <lebedev.ri@gmail.com> |
[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)
Summary: This is an alternative to D59539.
Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`
[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)
Summary: This is an alternative to D59539.
Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`. Let's suppose we are using `-analysis-clustering-epsilon=0.5`. By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster. Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster. Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster. So all these points ended up in the same cluster. This may or may not be a correct implementation of dbscan clustering algorithm.
But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data. Let's suppose all those opcodes are currently in the same sched cluster. If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter the LLVM values this cluster will **never** match the LLVM values, and thus this cluster will **always** be displayed as inconsistent.
The solution is obviously to split off some of these opcodes into different sched cluster. But how do i do that? Out of 4 opcodes displayed in the inconsistency report, which ones are the "bad ones"? Which ones are the most different from the checked-in data? I'd need to go in to the `.yaml` and look it up manually.
The trivial solution is to, when creating clusters, don't use the full dbscan algorithm, but instead "pick some unclustered point, pick all unclustered points that are it's neighbor, put them all into a new cluster, repeat". And just so as it happens, we can arrive at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.
But that won't work well once we teach analyze mode to operate in on-1D mode (i.e. on more than a single measurement type at a time), because the clustering would depend on the order of the measurements.
Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster. And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster, and if they are not, the cluster (==opcode) is unstable.
This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59820
llvm-svn: 357152
show more ...
|
Revision tags: llvmorg-8.0.0, llvmorg-8.0.0-rc5, llvmorg-8.0.0-rc4, llvmorg-8.0.0-rc3 |
|
#
542e5d7b |
| 25-Feb-2019 |
Roman Lebedev <lebedev.ri@gmail.com> |
[llvm-exegesis] Split Epsilon param into two (PR40787)
Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values
What if on
[llvm-exegesis] Split Epsilon param into two (PR40787)
Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values
What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters)
By splitting it into two params it is now possible.
This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%)
0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):
6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%)
6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):
400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%)
0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% )
lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):
6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%)
6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ```
I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say.
Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58476
llvm-svn: 354767
show more ...
|
#
14e15ec1 |
| 20-Feb-2019 |
Hans Wennborg <hans@hanshq.net> |
Fix the build with gcc/libstdc++ 4.8.2 after r354441
llvm-svn: 354469
|
#
69716394 |
| 20-Feb-2019 |
Roman Lebedev <lebedev.ri@gmail.com> |
[llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then,
[llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with *similar* characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with *different* performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into *different* clusters. This is not great for further analysis.
We shall find every `Opcode` with benchmarks not in just one cluster, and move *all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.
I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters.
The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't)
Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%)
6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ```
patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):
6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%)
6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, *this* measurement shows that said reclustering actually improved performance.
patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):
6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%)
6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print.
patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):
6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%)
6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally)
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58355
llvm-svn: 354441
show more ...
|
Revision tags: llvmorg-7.1.0, llvmorg-7.1.0-rc1, llvmorg-8.0.0-rc2, llvmorg-8.0.0-rc1 |
|
#
2946cd70 |
| 19-Jan-2019 |
Chandler Carruth <chandlerc@gmail.com> |
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the ne
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository.
llvm-svn: 351636
show more ...
|
#
176388c9 |
| 02-Jan-2019 |
Clement Courbet <courbet@google.com> |
Revert rL350035 "[llvm-exegesis] Clustering: don't enqueue a point multiple times"
Let's discuss this on the review thread before submitting.
llvm-svn: 350207
|
#
cd93d7ef |
| 23-Dec-2018 |
Fangrui Song <maskray@google.com> |
[llvm-exegesis] Clustering: don't enqueue a point multiple times
Summary: SetVector uses both DenseSet and vector, which is time/memory inefficient. The points are represented as natural numbers so
[llvm-exegesis] Clustering: don't enqueue a point multiple times
Summary: SetVector uses both DenseSet and vector, which is time/memory inefficient. The points are represented as natural numbers so we can replace the DenseSet part by indexing into a vector<char> instead.
Don't cargo cult the pseudocode on the wikipedia DBSCAN page. This is a standard BFS style algorithm (the similar loops have been used several times in other LLVM components): every point is processed at most once, thus the queue has at most NumPoints elements. We represent it with a vector and allocate it outside of the loop to avoid allocation in the loop body.
We check `Processed[P]` to avoid enqueueing a point more than once, which also nicely saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Many people hate the oneshot abstraction but some favor it, therefore we make a compromise, use a lambda to abstract away the neighbor adding process.
Delete the comment `assert(Neighbors.capacity() == (Points_.size() - 1));` as it is wrong.
llvm-svn: 350035
show more ...
|
#
96408bb0 |
| 14-Dec-2018 |
Simon Pilgrim <llvm-dev@redking.me.uk> |
Revert rL349136: [llvm-exegesis] Optimize ToProcess in dbScan
Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`
We also check `Added[P]` to enqueueing a
Revert rL349136: [llvm-exegesis] Optimize ToProcess in dbScan
Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`
We also check `Added[P]` to enqueueing a point more than once, which also saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54442 ........ Patch wasn't approved and breaks buildbots
llvm-svn: 349139
show more ...
|
#
92537ccc |
| 14-Dec-2018 |
Fangrui Song <maskray@google.com> |
[llvm-exegesis] Optimize ToProcess in dbScan
Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`
We also check `Added[P]` to enqueueing a point more than o
[llvm-exegesis] Optimize ToProcess in dbScan
Summary: Use `vector<char> Added + vector<size_t> ToProcess` to replace `SetVector ToProcess`
We also check `Added[P]` to enqueueing a point more than once, which also saves us a `ClusterIdForPoint_[Q].isUndef()` check.
Reviewers: courbet, RKSimon, gchatelet, john.brawn, lebedev.ri
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54442
llvm-svn: 349136
show more ...
|
Revision tags: llvmorg-7.0.1, llvmorg-7.0.1-rc3 |
|
#
71fdb576 |
| 19-Nov-2018 |
Roman Lebedev <lebedev.ri@gmail.com> |
[llvm-exegesis] (+final perf overview) InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors
Summary: As it was pointed out in D54388+D54390, the maximal size of `Ne
[llvm-exegesis] (+final perf overview) InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors
Summary: As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known, it will contain at most Points_.size() minus one (the center of the cluster)
While that is the upper bound, meaning in the most cases, the actual count will be much smaller, since D54390 made the allocation persistent, we no longer have to worry about overly-optimistically `reserve()`ing.
Old: (D54393) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
6553.167456 task-clock (msec) # 1.000 CPUs utilized ( +- 0.21% ) ... 6.5547 +- 0.0134 seconds time elapsed ( +- 0.20% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):
6315.057872 task-clock (msec) # 0.999 CPUs utilized ( +- 0.24% ) ... 6.3187 +- 0.0160 seconds time elapsed ( +- 0.25% ) ``` And that is another -~4%.
Since this is the last (as of this moment) patch in this patch series, it is a good time to summarize: Old: (svn trunk, as stated in D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null
real 0m24.884s user 0m24.099s sys 0m0.785s ``` So these patches, on a given benchmark, has decreased llvm-exegesis analysis time by 74.62%.
There surely is more room for further improvements. D54514 may improve thins by -11.5% more (relative to this patch). Parallelization may improve things further significantly, too.
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54415
llvm-svn: 347204
show more ...
|
#
8e315b66 |
| 19-Nov-2018 |
Roman Lebedev <lebedev.ri@gmail.com> |
[llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into header
Summary: Old: (D54390) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000
[llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into header
Summary: Old: (D54390) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
7432.421721 task-clock (msec) # 1.000 CPUs utilized ( +- 0.15% ) ... 7.4336 +- 0.0115 seconds time elapsed ( +- 0.15% ) ``` New: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):
6569.936144 task-clock (msec) # 1.000 CPUs utilized ( +- 0.22% ) ... 6.5711 +- 0.0143 seconds time elapsed ( +- 0.22% ) ``` And another -12%. You'd think it would be `inline`d anyway, but no! :)
Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn
Reviewed By: courbet, MaskRay
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54393
llvm-svn: 347203
show more ...
|