title | weight | description |
---|---|---|
Using profile block-io |
20 |
Analyze block I/O performance through a latency distribution.
|
The profile block-io gadget gathers information about the usage of the block device I/O (disk I/O), generating a histogram distribution of I/O latency (time), when the gadget is stopped.
Notice that the latency of the disk I/O is measured from when the call is issued to the device until its completion, it does not include time spent in the kernel queue. This means that the histogram reflects only the performance of the device and not the effective latency suffered by the applications.
The histogram shows the number of I/O operations (count
column) that lie in
the latency range interval-start
-> interval-end
(usecs
column), which,
as the columns name indicates, is given in microseconds.
For this guide, we will use the stress
tool that allows
us to load and stress the system in many different ways. In particular, we will use the --io
flag
that will generate a given number of workers to spin on the sync()
syscall. In this way, we will generate disk I/O
that we will analyse using the biolatency gadget.
For further details, please refer to the BCC documentation.
Firstly, let's use the profile block-io gadget to see the I/O latency in our testing node with its normal load work:
# Run the gadget on the worker-node node
$ kubectl gadget profile block-io --node worker-node
Tracing block device I/O... Hit Ctrl-C to end
# Wait for around 1 minute and hit Ctrl+C to stop the gadget and see the results
^C
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 17 |* |
64 -> 127 : 261 |******************* |
128 -> 255 : 546 |****************************************|
256 -> 511 : 426 |******************************* |
512 -> 1023 : 227 |**************** |
1024 -> 2047 : 18 |* |
2048 -> 4095 : 8 | |
4096 -> 8191 : 23 |* |
8192 -> 16383 : 15 |* |
16384 -> 32767 : 2 | |
32768 -> 65535 : 1 | |
This output shows that the bulk of the I/O was between 64 and 1023 us, and that there were 1544 I/O operations during the time the gadget was running. Notice that we waited for 1 minute but longer time would produce more stable results.
Now, let's increase the I/O operations using the stress tool:
# Start by creating our testing namespace
$ kubectl create ns test-biolatency
# Run stress with 1 worker that will generate I/O operations
$ kubectl run --restart=Never --image=polinux/stress stress-io -n test-biolatency -- stress --io 1
$ kubectl wait --timeout=-1s -n test-biolatency --for=condition=ready pod/stress-io
pod/stress-io condition met
$ kubectl get pod -n test-biolatency -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
stress-io 1/1 Running 0 2s 10.244.1.7 worker-node <none> <none>
Using the profile block-io gadget, we can generate another histogram to analyse the disk I/O with this load:
# Run the gadget again
$ kubectl gadget profile block-io --node worker-node
Tracing block device I/O... Hit Ctrl-C to end
# Wait again for 1 minute and hit Ctrl+C to stop the gadget and see the results
^C
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 411 | |
32 -> 63 : 310822 |****************************************|
64 -> 127 : 293404 |************************************* |
128 -> 255 : 194881 |************************* |
256 -> 511 : 96520 |************ |
512 -> 1023 : 33756 |**** |
1024 -> 2047 : 4414 | |
2048 -> 4095 : 1007 | |
4096 -> 8191 : 1025 | |
8192 -> 16383 : 176 | |
16384 -> 32767 : 13 | |
32768 -> 65535 : 7 | |
65536 -> 131071 : 0 | |
131072 -> 262143 : 0 | |
262144 -> 524287 : 1 | |
524288 -> 1048575 : 0 | |
1048576 -> 2097151 : 1 | |
# Remove load
$ kubectl delete pod/stress-io -n test-biolatency
The new histogram shows how the number of I/O operations increased significantly, passing from 1544 (normal load) to 936438 (stressing the I/O). On the other hand, even though this histogram shows that the bulk of the I/O was still lower than 1023us, we can observe that there were several I/O operations that suffered a high latency due to the load, one of them, even more than 1 sec.
Delete the demo test namespace:
$ kubectl delete ns test-biolatency
namespace "test-biolatency" deleted
- Generate some io load:
$ docker run -d --rm --name stresstest polinux/stress stress --io 10
- Start
ig
:
$ sudo ./ig profile block-io
- Observe the results:
$ sudo ./ig profile block-io
Tracing block device I/O... Hit Ctrl-C to end.^C
usecs : count distribution
1 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 113 | |
64 -> 127 : 7169 |****************************************|
128 -> 255 : 3724 |******************** |
256 -> 511 : 2198 |************ |
512 -> 1023 : 712 |*** |
1024 -> 2047 : 203 |* |
2048 -> 4095 : 23 | |
4096 -> 8191 : 7 | |
8192 -> 16383 : 3 | |
- Remove the docker container:
$ docker stop stresstest