Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7
# 2fe83274 07-Jan-2023 Kazu Hirata <kazu@google.com>

[lldb] Use std::optional instead of llvm::Optional (NFC)

This patch replaces (llvm::|)Optional< with std::optional<. I'll post
a separate patch to clean up the "using" declarations, #include
"llvm/

[lldb] Use std::optional instead of llvm::Optional (NFC)

This patch replaces (llvm::|)Optional< with std::optional<. I'll post
a separate patch to clean up the "using" declarations, #include
"llvm/ADT/Optional.h", etc.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

show more ...


# f190ce62 07-Jan-2023 Kazu Hirata <kazu@google.com>

[lldb] Add #include <optional> (NFC)

This patch adds #include <optional> to those files containing
llvm::Optional<...> or Optional<...>.

I'll post a separate patch to actually replace llvm::Optiona

[lldb] Add #include <optional> (NFC)

This patch adds #include <optional> to those files containing
llvm::Optional<...> or Optional<...>.

I'll post a separate patch to actually replace llvm::Optional with
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

show more ...


Revision tags: llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, working, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3
# e17cae07 08-Aug-2022 Walter Erquinigo <wallace@fb.com>

[trace][intel pt] Fix per-psb packet decoding

The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is no

[trace][intel pt] Fix per-psb packet decoding

The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.

A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening

1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.

The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.

As part of making sure this diff is correct, I added some other features that are very useful.

- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.

I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.

A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!

```
thread #1: tid = 3578223
0: (event) trace synchronization point [offset = 0x0xef0]
a.out`main + 20 at main.cpp:29:20
1: 0x0000000000402479 leaq -0x1210(%rbp), %rax
2: (event) software disabled tracing
3: 0x0000000000402480 movq %rax, %rdi
4: (event) software disabled tracing
5: (event) software disabled tracing
6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
7: (event) software disabled tracing
a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
8: 0x0000000000403bd4 pushq %rbp
9: (event) software disabled tracing
10: 0x0000000000403bd5 movq %rsp, %rbp
11: (event) software disabled tracing
```

This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f thread #1: tid = 3603082
0: (event) trace synchronization point [offset = 0x0x80]
47417: (event) software disabled tracing
129231: (event) trace synchronization point [offset = 0x0x800]
146747: (event) software disabled tracing
246076: (event) software disabled tracing
259068: (event) trace synchronization point [offset = 0x0xf78]
259276: (event) software disabled tracing
259278: (event) software disabled tracing
no more data
```

Differential Revision: https://reviews.llvm.org/D131630

show more ...


# c4fb631c 11-Aug-2022 Walter Erquinigo <wallace@fb.com>

[NFC][lldb][trace] Fix formatting of tracing files

Pavel Labath taught me that clang-format sorts headers automatically
using llvm's rules, and it's better not to have spaces between

So in this dif

[NFC][lldb][trace] Fix formatting of tracing files

Pavel Labath taught me that clang-format sorts headers automatically
using llvm's rules, and it's better not to have spaces between

So in this diff I'm removing those spaces and formatting them as well.

I used `clang-format -i` to format these files.

show more ...


Revision tags: llvmorg-15.0.0-rc2
# d179ea12 03-Aug-2022 Walter Erquinigo <wallace@fb.com>

[NFC][trace] format source files

Cleanup formatting diff


Revision tags: llvmorg-15.0.0-rc1, llvmorg-16-init
# 4f676c25 18-Jul-2022 Walter Erquinigo <wallace@fb.com>

[trace][intel pt] Introduce wall clock time for each trace item

- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other

[trace][intel pt] Introduce wall clock time for each trace item

- Decouple TSCs from trace items
- Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors.
- Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item.
- For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.

Sample output:

```
(lldb) r
Process 750047 launched: '/home/wallace/a.out' (x86_64)
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20
26 };
27
28 int main() {
-> 29 std::vector<int> vvv;
30 for (int i = 0; i < 100; i++)
31 vvv.push_back(i);
32
(lldb) process trace start -s 64kb -t --per-cpu
(lldb) b 60
Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe
(lldb) c
Process 750047 resuming
Process 750047 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23
57 map<int, int> m;
58 m[3] = 4;
59
-> 60 map<string, string> m2;
61 m2["5"] = "6";
62
63 std::vector<std::string> vs = {"2", "3"};
(lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047
0: [379567.000 ns] (event) HW clock tick [48599428476224707]
1: [379569.000 ns] (event) CPU core changed [new CPU=2]
2: [390487.000 ns] (event) HW clock tick [48599428476246495]
3: [1602508.000 ns] (event) HW clock tick [48599428478664855]
4: [1662745.000 ns] (event) HW clock tick [48599428478785046]
libc.so.6`malloc
5: [1662746.995 ns] 0x00007ffff7176660 endbr64
6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408
7: [1662750.986 ns] 0x00007ffff717666b pushq %r12
8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp
9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx
10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax
11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax
12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384>
13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax
14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax
15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208>
16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax
17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax
18: [1662772.935 ns] 0x00007ffff7176738 seta %dl
19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48>
20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax
21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336>
22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl
23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336>
24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24
(lldb) thread trace dump instructions -t -f -e -J -c 4
[
{
"id": 0,
"timestamp_ns": "379567.000000",
"event": "HW clock tick",
"hwClock": 48599428476224707
},
{
"id": 1,
"timestamp_ns": "379569.000000",
"event": "CPU core changed",
"cpuId": 2
},
{
"id": 2,
"timestamp_ns": "390487.000000",
"event": "HW clock tick",
"hwClock": 48599428476246495
},
{
"id": 3,
"timestamp_ns": "1602508.000000",
"event": "HW clock tick",
"hwClock": 48599428478664855
},
{
"id": 4,
"timestamp_ns": "1662745.000000",
"event": "HW clock tick",
"hwClock": 48599428478785046
},
{
"id": 5,
"timestamp_ns": "1662746.995324",
"loadAddress": "0x7ffff7176660",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "endbr64"
},
{
"id": 6,
"timestamp_ns": "1662748.990648",
"loadAddress": "0x7ffff7176664",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "movq"
},
{
"id": 7,
"timestamp_ns": "1662750.985972",
"loadAddress": "0x7ffff717666b",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
},
{
"id": 8,
"timestamp_ns": "1662752.981296",
"loadAddress": "0x7ffff717666d",
"module": "libc.so.6",
"symbol": "malloc",
"mnemonic": "pushq"
}
]
```

Differential Revision: https://reviews.llvm.org/D130054

show more ...


# 0466d1df 07-Jul-2022 ymeng <ymeng@devvm4974.frc0.facebook.com>

[trace][intel pt] Support dumping the trace info in json

Thanks to ymeng@fb.com for coming up with this change.

`thread trace dump info` can dump some metrics that can be useful for
analyzing the p

[trace][intel pt] Support dumping the trace info in json

Thanks to ymeng@fb.com for coming up with this change.

`thread trace dump info` can dump some metrics that can be useful for
analyzing the performance and quality of a trace. This diff adds a --json
option for dumping this information in json format that can be easily
understood my machines.

Differential Revision: https://reviews.llvm.org/D129332

show more ...


# d30fd5c3 29-Jun-2022 Gaurav Gaur <gaur@fb.com>

[trace][intel pt] Add a cgroup filter

It turns out that cgroup filtering is relatively trivial and works
really nicely. Thid diffs adds automatic cgroup filtering when in
per-cpu mode, unless a new

[trace][intel pt] Add a cgroup filter

It turns out that cgroup filtering is relatively trivial and works
really nicely. Thid diffs adds automatic cgroup filtering when in
per-cpu mode, unless a new --disable-cgroup-filtering flag is passed in
the start command. At least on Meta machines, all processes are spawned
inside a cgroup by default, which comes super handy, because per cpu
tracing is now much more precise.

A manual test gave me this result

- Without filtering:
Total number of trace items: 36083
Total number of continuous executions found: 229
Number of continuous executions for this thread: 2
Total number of PSB blocks found: 98
Number of PSB blocks for this thread 2
Total number of unattributed PSB blocks found: 38

- With filtering:
Total number of trace items: 87756
Total number of continuous executions found: 123
Number of continuous executions for this thread: 2
Total number of PSB blocks found: 10
Number of PSB blocks for this thread 3
Total number of unattributed PSB blocks found: 2

Filtering gives us great results. The number of instructions collected
more than double (probalby because we have less noise in the trace), and
we have much less unattributed PSBs blocks and unrelated PSBs in
general. The ones that are unrelated probably belong to other processes
in the same cgroup.

Differential Revision: https://reviews.llvm.org/D129257

show more ...


# 9f9464e0 29-Jun-2022 Peicong Wu <wupeicong@fb.com>

[trace][intel pt] Measure the time it takes to decode a thread in per-cpu mode

This metric was missing. We were only measuring in per-thread mode, and
this completes the work.

For a sample trace I

[trace][intel pt] Measure the time it takes to decode a thread in per-cpu mode

This metric was missing. We were only measuring in per-thread mode, and
this completes the work.

For a sample trace I have, the `dump info` command shows

```
Timing for this thread:
Decoding instructions: 0.12s

```

I also improved a bit the TaskTime function so that callers don't need to
specify the template argument

Differential Revision: https://reviews.llvm.org/D129249

show more ...


# a7d6c3ef 24-Jun-2022 Walter Erquinigo <wallace@fb.com>

[trace] Make events first class items in the trace cursor and rework errors

We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (

[trace] Make events first class items in the trace cursor and rework errors

We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.

- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.

All test pass

Differential Revision: https://reviews.llvm.org/D128576

show more ...


Revision tags: llvmorg-14.0.6
# ea37cd52 15-Jun-2022 Walter Erquinigo <wallace@fb.com>

[trace][intelpt] Support system-wide tracing [22] - Some final touches

Having a member variable TraceIntelPT * makes it look as if it was
optional. I'm using instead a weak_ptr to indicate that it's

[trace][intelpt] Support system-wide tracing [22] - Some final touches

Having a member variable TraceIntelPT * makes it look as if it was
optional. I'm using instead a weak_ptr to indicate that it's not
optional and the object is under the ownership of TraceIntelPT.

Besides that, I've simplified the Perf aux and data buffers copying by
using vector.insert.

I'm also renaming Lookup2 to Lookup. The 2 in the name is confusing.

Differential Revision: https://reviews.llvm.org/D127881

show more ...


# 6a5355e8 14-Jun-2022 Walter Erquinigo <wallace@fb.com>

[trace][intelpt] Support system-wide tracing [20] - Rename some fields in the schema

As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json s

[trace][intelpt] Support system-wide tracing [20] - Rename some fields in the schema

As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json schema

- traceBuffer -> iptTrace
- core -> cpu

Differential Revision: https://reviews.llvm.org/D127817

show more ...