Scrape timeout with many disks #197

pokab · 2024-01-20T17:23:22Z

I manage a Linux box with around 20-25 HDDs. Some of these disks are faster to reply to smartctl, others are pretty slow, taking around 1-2 seconds.
Setting the scrape interval to 60 seconds and scrape timeout to 40 seconds does not help in avoiding regular scrape timeouts.
A previous solution (not specific to Prometheus) I made spawned the smartctl subprocesses in parallel for all the HDDs, and it works perfectly. Would a solution like this be appropriate for this software? Maybe with an option to enable or disable it?

SuperQ · 2024-01-22T16:33:28Z

This should be possible here with a goroutine worker pool. We can parallellize the data collection.

It doesn't look like the current collector records any timing information, even in debug mode. Something we can improve as well.

Update the smartctl command reading and parsing of json logging to make for easier debugging of slow devices by adding a duration to the debug logging. For #197 Signed-off-by: SuperQ <[email protected]>

…theus-community#197).

…theus-community#197). Signed-off-by: Póka Balázs <[email protected]>

Update the smartctl command reading and parsing of json logging to make for easier debugging of slow devices by adding a duration to the debug logging. For prometheus-community#197 Signed-off-by: SuperQ <[email protected]> Signed-off-by: mort <[email protected]>

Update the smartctl command reading and parsing of json logging to make for easier debugging of slow devices by adding a duration to the debug logging. For prometheus-community#197 Signed-off-by: SuperQ <[email protected]> Signed-off-by: Denys <[email protected]>

pokab · 2024-03-19T14:36:13Z

I've been using my forked version (see #204) for three weeks without any problem. It successfully solved the scrape issue. Could you please give some feedback?

…theus-community#197). Signed-off-by: Póka Balázs <[email protected]>

SuperQ mentioned this issue Jan 22, 2024

Update json reading logging #198

Merged

pokab added a commit to idatahu/smartctl_exporter that referenced this issue Feb 27, 2024

Use worker pool to make running of smartctl parallel (for issue prome…

19555c3

…theus-community#197).

pokab mentioned this issue Feb 27, 2024

Use worker pool for smartctl #204

Open

pokab added a commit to idatahu/smartctl_exporter that referenced this issue Feb 27, 2024

Use worker pool to make running of smartctl parallel (for issue prome…

488ba2b

…theus-community#197). Signed-off-by: Póka Balázs <[email protected]>

pokab added a commit to idatahu/smartctl_exporter that referenced this issue May 9, 2024

Use worker pool to make running of smartctl parallel (for issue prome…

f25870d

…theus-community#197). Signed-off-by: Póka Balázs <[email protected]>

pokab mentioned this issue May 10, 2024

Added determining device type and use it at scrape data #205

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrape timeout with many disks #197

Scrape timeout with many disks #197

pokab commented Jan 20, 2024 •

edited

Loading

SuperQ commented Jan 22, 2024

pokab commented Mar 19, 2024

Scrape timeout with many disks #197

Scrape timeout with many disks #197

Comments

pokab commented Jan 20, 2024 • edited Loading

SuperQ commented Jan 22, 2024

pokab commented Mar 19, 2024

pokab commented Jan 20, 2024 •

edited

Loading