Skip to content

Commit

Permalink
Added more DTrace scripts. (#1309)
Browse files Browse the repository at this point in the history
Additional scripts, single_up_info.d and sled_upstairs_info.d to give
you a specific script for a specific PID, or for an entire system.

A new script, all_downstairs.d, to display IO stats for all downstairs
running on a system.

Updated the README

single_up_info.d requires a PID and adds a SESSION column to tell
apart different upstairs inside a single process.
sled_upstairs_info.d has both a PID and a SESSION column and will
print out stats for the whole system.


Co-authored-by: Alan Hanson <[email protected]>
  • Loading branch information
leftwo and Alan Hanson authored May 24, 2024
1 parent 8c6d485 commit da333e9
Show file tree
Hide file tree
Showing 5 changed files with 369 additions and 0 deletions.
3 changes: 3 additions & 0 deletions package-manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,8 @@ source.paths = [
{ from = "tools/dtrace/upstairs_raw.d", to = "/opt/oxide/dtrace/upstairs_raw.d" },
{ from = "tools/dtrace/get-lr-state.sh", to = "/opt/oxide/dtrace/get-lr-state.sh" },
{ from = "tools/dtrace/get-ds-state.sh", to = "/opt/oxide/dtrace/get-ds-state.sh" },
{ from = "tools/dtrace/single_up_info.d", to = "/opt/oxide/dtrace/single_up_info.d" },
{ from = "tools/dtrace/sled_upstairs_info.d", to = "/opt/oxide/dtrace/sled_upstairs_info.d" },
{ from = "tools/dtrace/all_downstairs.d", to = "/opt/oxide/dtrace/all_downstairs.d" },
]
output.type = "zone"
85 changes: 85 additions & 0 deletions tools/dtrace/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,46 @@
# Oxide DTrace Crucible scripts

## all_downstairs.d
A DTrace script to show IOs coming and going on all downstairs as well as the
work task geting new work, performing the work and completing the work. Stats
are printed at a 4 second interval.

The columns show counts in the last 4 seconds of:
F> Flush coming in from the upstairs
F< Flush completed message being sent back to the upstairs.
W> Write coming in from the upstairs
W< Write completed message being sent back to the upstairs.
R> Read coming in from the upstairs
R< Read completed message being sent back to the upstairs.
WS An IO has been submitted to the work task in the downstairs
WIP An IO is taken off the work queue by the downstairs work task.
WD An IO is completed by the downstairs work task.

If a downstairs has not done any IOs it will either print no line, or
print a line of zeros.

```
EVT22200005 # dtrace -s /alan/dtrace/all_downstairs.d
PID F> F< W> W< R> R< WS WIP WD
13790 10 9 1911 1835 0 0 1943 1867 1867
25574 10 10 2204 2082 0 0 2237 2115 2114
25442 10 10 2204 2089 0 0 2236 2122 2121
PID F> F< W> W< R> R< WS WIP WD
17147 2 2 0 0 389 389 391 391 391
25492 2 2 0 0 389 389 391 391 391
25627 2 2 0 0 389 389 391 391 391
25442 10 9 2283 2177 0 0 2315 2207 2208
25574 10 9 2283 2184 0 0 2314 2214 2215
13790 10 10 2054 2030 0 0 2085 2061 2061
PID F> F< W> W< R> R< WS WIP WD
17147 2 2 2 2 0 0 4 4 4
25492 2 2 2 2 0 0 4 4 4
25627 2 2 2 2 0 0 4 4 4
13790 10 10 1961 1985 0 0 1994 2018 2018
25442 10 10 2042 2185 0 0 2074 2218 2217
25574 10 10 2045 2185 0 0 2077 2218 2217
```

## downstairs_count.d
A DTrace script to show IOs coming and going on a downstairs as well as the
work task geting new work, performing the work and completing the work. This
Expand Down Expand Up @@ -339,6 +380,50 @@ Trace a downstairs IO and measure time for in in the following three parts:
* 2nd report is OS time (for flush, to flush all extents)
* 3rd report is OS done to downstairs sending the ACK back to upstairs

## single_up_info.d
Similar to upstairs_info.d, this script prints out various counters in
the upstairs. However, you specify a PID and it will display stats for
only that PID. See upstairs_info.d for a description of the columns.

```
EVT22200005 # dtrace -s single_up_info.d 15579
SESSION DS STATE 0 DS STATE 1 DS STATE 2 UPW DSW NEXT_JOB BAKPR WRITE_BO NEW0 NEW1 NEW2 IP0 IP1 IP2 D0 D1 D2 S0 S1 S2 ER0 ER1 ER2 EC0 EC1 EC2
c0b92059 live_repair active active 3 435 570215 2761 226492416 0 0 0 40 241 241 24 194 194 371 0 0 9384 0 0 0 0 0
a666a8bd live_repair active active 2 3 90656 0 0 0 0 0 2 1 1 1 2 2 0 0 0 7561 0 0 11640 0 0
a666a8bd live_repair active active 2 11 90664 0 0 0 0 0 2 1 1 9 10 10 0 0 0 7563 0 0 11640 0 0
c0b92059 live_repair active active 3 514 570762 3111 237219840 0 0 0 67 234 234 33 280 280 414 0 0 9385 0 0 0 0 0
c0b92059 live_repair active active 3 329 571129 2929 231735296 0 0 0 1 227 251 59 102 78 269 0 0 9386 0 0 0 0 0
a666a8bd live_repair active active 2 19 90672 0 0 0 0 0 2 1 1 17 18 18 0 0 0 7565 0 0 11640 0 0
c0b92059 live_repair active active 3 339 571544 512 127401984 0 0 0 1 139 137 54 200 202 284 0 0 9387 0 0 0 0 0
a666a8bd live_repair active active 2 23 90676 0 0 0 0 0 2 1 1 21 22 22 0 0 0 7566 0 0 11640 0 0
c0b92059 live_repair active active 3 389 572038 221 101711872 0 0 0 1 112 112 67 277 277 321 0 0 9388 0 0 0 0 0
a666a8bd live_repair active active 2 31 90684 0 0 0 0 0 2 1 1 29 30 30 0 0 0 7568 0 0 11640 0 0
```
## sled_upstairs_info.d
Similar to upstairs_info.d, this script prints out various counters in
the upstairs for all process that have an upstairs running on the system.
See upstairs_info.d for a description of the columns.
This script adds a PID and a SESSION to identify which upstairs we are
reporting stats for.

```
EVT22200005 # dtrace -s sled_upstairs_info.d
PID SESSION DS STATE 0 DS STATE 1 DS STATE 2 UPW DSW NEXT_JOB BAKPR WRITE_BO NEW0 NEW1 NEW2 IP0 IP1 IP2 D0 D1 D2 S0 S1 S2 ER0 ER1 ER2 EC0 EC1 EC2
15579 c0b92059 live_repair active active 3 367 656347 1616 185597952 0 0 0 75 195 195 69 172 172 223 0 0 9589 0 0 0 0 0
15579 a666a8bd live_repair active active 2 95 91960 0 0 0 0 0 2 1 1 93 94 94 0 0 0 7827 0 0 11667 0 0
24948 fac8cbba new new new 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24948 fac8cbba new new new 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24948 79d92ceb active active active 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15579 c0b92059 live_repair active active 3 432 656863 2077 203423744 0 0 0 104 168 168 68 264 264 260 0 0 9590 0 0 0 0 0
15579 a666a8bd live_repair active active 2 99 91964 0 0 0 0 0 2 1 1 97 98 98 0 0 0 7828 0 0 11667 0 0
24948 127b8de5 new new new 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24948 fac8cbba new new new 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24948 79d92ceb active active active 0 0 1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15579 c0b92059 live_repair active active 4 529 657227 4805 282066944 0 0 0 95 296 296 80 233 233 354 0 0 9591 0 0 0 0 0
15579 a666a8bd live_repair active active 2 107 91972 0 0 0 0 0 2 1 1 105 106 106 0 0 0 7830 0 0 11667 0 0
```

## upstairs_action.d
This is a dtrace script for printing the counts of the upstairs main action
loop.
Expand Down
77 changes: 77 additions & 0 deletions tools/dtrace/all_downstairs.d
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#pragma D option quiet
/*
* Print IO counters for all running downstairs.
*/
crucible_downstairs*:::submit-flush-start
{
@sf_start[pid] = count();
}

crucible_downstairs*:::submit-flush-done
{
@sf_done[pid] = count();
}

crucible_downstairs*:::submit-write-start
{
@sw_start[pid] = count();
}

crucible_downstairs*:::submit-write-done
{
@sw_done[pid] = count();
}

crucible_downstairs*:::submit-read-start
{
@sr_start[pid] = count();
}

crucible_downstairs*:::submit-read-done
{
@sr_done[pid] = count();
}

crucible_downstairs*:::submit-writeunwritten-start
{
@swu_start[pid] = count();
}

crucible_downstairs*:::submit-writeunwritten-done
{
@swu_done[pid] = count();
}
crucible_downstairs*:::work-start
{
@work_start[pid] = count();
}
crucible_downstairs*:::work-process
{
@work_process[pid] = count();
}
crucible_downstairs*:::work-done
{
@work_done[pid] = count();
}


tick-4s
{
printf("%5s %4s %4s %4s %4s %5s %5s %5s %5s %5s\n",
"PID", "F>", "F<", "W>", "W<", "R>", "R<", "WS", "WIP", "WD");
printa("%05d %@4u %@4u %@4u %@4u %@5u %@5u %@5u %@5u %@5u\n",
@sf_start, @sf_done, @sw_start, @sw_done, @sr_start, @sr_done,
@work_start, @work_process, @work_done
);
clear(@sf_start);
clear(@sf_done);
clear(@sw_start);
clear(@sw_done);
clear(@sr_start);
clear(@sr_done);
clear(@swu_start);
clear(@swu_done);
clear(@work_start);
clear(@work_process);
clear(@work_done);
}
106 changes: 106 additions & 0 deletions tools/dtrace/single_up_info.d
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
/*
* Display internal Upstairs status for the PID provided as $1
*/
#pragma D option quiet
#pragma D option strsize=1k
/*
* Print the header right away
*/
dtrace:::BEGIN
{
show = 21;
}

/*
* Every second, check and see if we have printed enough that it is
* time to print the header again
*/
tick-1s
/show > 20/
{
printf("%8s ", "SESSION");
printf("%17s %17s %17s", "DS STATE 0", "DS STATE 1", "DS STATE 2");
printf(" %5s %5s %9s %5s", "UPW", "DSW", "NEXT_JOB", "BAKPR");
printf(" %10s", "WRITE_BO");
printf(" %5s %5s %5s", "NEW0", "NEW1", "NEW2");
printf(" %5s %5s %5s", "IP0", "IP1", "IP2");
printf(" %5s %5s %5s", "D0", "D1", "D2");
printf(" %5s %5s %5s", "S0", "S1", "S2");
printf(" %5s %5s %5s", "ER0", "ER1", "ER2");
printf(" %5s %5s %5s", "EC0", "EC1", "EC2");
printf("\n");
show = 0;
}

crucible_upstairs*:::up-status
/pid==$1/
{
show = show + 1;
session_id = json(copyinstr(arg1), "ok.session_id");

/*
* I'm not very happy about this, but if we don't print it all on one
* line, then multiple sessions will clobber each others output.
*/
printf("%8s %17s %17s %17s %5s %5s %9s %5s %10s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s\n",

substr(session_id, 0, 8),

/*
* State for the three downstairs
*/
json(copyinstr(arg1), "ok.ds_state[0]"),
json(copyinstr(arg1), "ok.ds_state[1]"),
json(copyinstr(arg1), "ok.ds_state[2]"),

/*
* Work queue counts for Upstairs and Downstairs
*/
json(copyinstr(arg1), "ok.up_count"),
json(copyinstr(arg1), "ok.ds_count"),

/*
* Job ID delta and backpressure
*/
json(copyinstr(arg1), "ok.next_job_id"),
json(copyinstr(arg1), "ok.up_backpressure"),
json(copyinstr(arg1), "ok.write_bytes_out"),

/*
* New jobs on the work list for each downstairs
*/
json(copyinstr(arg1), "ok.ds_io_count.new[0]"),
json(copyinstr(arg1), "ok.ds_io_count.new[1]"),
json(copyinstr(arg1), "ok.ds_io_count.new[2]"),

/*
* In progress jobs on the work list for each downstairs
*/
json(copyinstr(arg1), "ok.ds_io_count.in_progress[0]"),
json(copyinstr(arg1), "ok.ds_io_count.in_progress[1]"),
json(copyinstr(arg1), "ok.ds_io_count.in_progress[2]"),

/*
* Completed (done) jobs on the work list for each downstairs
*/
json(copyinstr(arg1), "ok.ds_io_count.done[0]"),
json(copyinstr(arg1), "ok.ds_io_count.done[1]"),
json(copyinstr(arg1), "ok.ds_io_count.done[2]"),

/*
* Skipped jobs on the work list for each downstairs
*/
json(copyinstr(arg1), "ok.ds_io_count.skipped[0]"),
json(copyinstr(arg1), "ok.ds_io_count.skipped[1]"),
json(copyinstr(arg1), "ok.ds_io_count.skipped[2]"),

/* Extents Repaired */
json(copyinstr(arg1), "ok.ds_extents_repaired[0]"),
json(copyinstr(arg1), "ok.ds_extents_repaired[1]"),
json(copyinstr(arg1), "ok.ds_extents_repaired[2]"),
/* Extents Confirmed */
json(copyinstr(arg1), "ok.ds_extents_confirmed[0]"),
json(copyinstr(arg1), "ok.ds_extents_confirmed[1]"),
json(copyinstr(arg1), "ok.ds_extents_confirmed[2]"));

}
98 changes: 98 additions & 0 deletions tools/dtrace/sled_upstairs_info.d
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
/*
* Display internal Upstairs status.
* This is an ease of use script that can be run on a sled and will
* output stats for all propolis-server or pantry process (anything
* that has an upstairs). The PID and SESSION will be unique for
* an upstairs. Multiple disks attached to a single propolis server
* will share the PID, but have unique SESSIONs.
*/
#pragma D option quiet
#pragma D option strsize=1k
/*
* Print the header right away
*/
dtrace:::BEGIN
{
show = 21;
}

/*
* Every second, check and see if we have printed enough that it is
* time to print the header again
*/
tick-1s
/show > 20/
{
printf("%5s %8s ", "PID", "SESSION");
printf("%17s %17s %17s", "DS STATE 0", "DS STATE 1", "DS STATE 2");
printf(" %5s %5s %9s %5s", "UPW", "DSW", "NEXT_JOB", "BAKPR");
printf(" %10s", "WRITE_BO");
printf(" %5s %5s %5s", "NEW0", "NEW1", "NEW2");
printf(" %5s %5s %5s", "IP0", "IP1", "IP2");
printf(" %5s %5s %5s", "D0", "D1", "D2");
printf(" %5s %5s %5s", "S0", "S1", "S2");
printf(" %5s %5s %5s", "ER0", "ER1", "ER2");
printf(" %5s %5s %5s", "EC0", "EC1", "EC2");
printf("\n");
show = 0;
}

crucible_upstairs*:::up-status
{
show = show + 1;
session_id = json(copyinstr(arg1), "ok.session_id");

/*
* I'm not very happy about this very long muli-line printf, but if
* we don't print it all on one line, then multiple sessions will
* clobber each others output.
*/
printf("%5d %8s %17s %17s %17s %5s %5s %9s %5s %10s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s\n",

pid,
substr(session_id, 0, 8),

/* State for the three downstairs */
json(copyinstr(arg1), "ok.ds_state[0]"),
json(copyinstr(arg1), "ok.ds_state[1]"),
json(copyinstr(arg1), "ok.ds_state[2]"),

/* Work queue counts for Upstairs and Downstairs */
json(copyinstr(arg1), "ok.up_count"),
json(copyinstr(arg1), "ok.ds_count"),

/* Job ID and backpressure */
json(copyinstr(arg1), "ok.next_job_id"),
json(copyinstr(arg1), "ok.up_backpressure"),
json(copyinstr(arg1), "ok.write_bytes_out"),

/* New jobs on the work list for each downstairs */
json(copyinstr(arg1), "ok.ds_io_count.new[0]"),
json(copyinstr(arg1), "ok.ds_io_count.new[1]"),
json(copyinstr(arg1), "ok.ds_io_count.new[2]"),

/* In progress jobs on the work list for each downstairs */
json(copyinstr(arg1), "ok.ds_io_count.in_progress[0]"),
json(copyinstr(arg1), "ok.ds_io_count.in_progress[1]"),
json(copyinstr(arg1), "ok.ds_io_count.in_progress[2]"),

/* Completed (done) jobs on the work list for each downstairs */
json(copyinstr(arg1), "ok.ds_io_count.done[0]"),
json(copyinstr(arg1), "ok.ds_io_count.done[1]"),
json(copyinstr(arg1), "ok.ds_io_count.done[2]"),

/* Skipped jobs on the work list for each downstairs */
json(copyinstr(arg1), "ok.ds_io_count.skipped[0]"),
json(copyinstr(arg1), "ok.ds_io_count.skipped[1]"),
json(copyinstr(arg1), "ok.ds_io_count.skipped[2]"),

/* Extents Repaired */
json(copyinstr(arg1), "ok.ds_extents_repaired[0]"),
json(copyinstr(arg1), "ok.ds_extents_repaired[1]"),
json(copyinstr(arg1), "ok.ds_extents_repaired[2]"),

/* Extents Confirmed */
json(copyinstr(arg1), "ok.ds_extents_confirmed[0]"),
json(copyinstr(arg1), "ok.ds_extents_confirmed[1]"),
json(copyinstr(arg1), "ok.ds_extents_confirmed[2]"));
}

0 comments on commit da333e9

Please sign in to comment.