metrics, pprof: support reloading services with SIGHUP #3016

End-rey · 2024-11-14T14:46:28Z

Closes #1868.

Is it right that we should reload services, even if the config is not updated?

There is also such a linter error:
contextcheck Function `preRunAndLog->preRunAndLog$2->Shutdown` should pass the context parameter

Do I need to pass the context honestly or is there some another way?

codecov · 2024-11-14T14:49:06Z

Codecov Report

Attention: Patch coverage is 0% with 63 lines in your changes missing coverage. Please review.

Project coverage is 22.85%. Comparing base (2bb903c) to head (b79f58f).
Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
cmd/neofs-node/config.go	0.00%	32 Missing ⚠️
cmd/neofs-node/metrics.go	0.00%	10 Missing ⚠️
cmd/neofs-node/pprof.go	0.00%	10 Missing ⚠️
cmd/neofs-node/main.go	0.00%	6 Missing ⚠️
cmd/neofs-node/netmap.go	0.00%	2 Missing ⚠️
cmd/neofs-node/control.go	0.00%	1 Missing ⚠️
cmd/neofs-node/object.go	0.00%	1 Missing ⚠️
cmd/neofs-node/storage.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3016      +/-   ##
==========================================
- Coverage   22.87%   22.85%   -0.03%     
==========================================
  Files         791      791              
  Lines       58688    58734      +46     
==========================================
- Hits        13425    13422       -3     
- Misses      44366    44414      +48     
- Partials      897      898       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

roman-khimov · 2024-11-14T15:10:51Z

Is it right that we should reload services, even if the config is not updated?

This is a service interruption and we can easily avoid it.

Do I need to pass the context honestly or is there some another way?

In this case it can be suppressed, configWatcher is misusing context.Context, so passing it over is not very helpful.

End-rey · 2024-11-14T21:43:21Z

Do I understand correctly that it is also necessary to overwrite this variable and all its derivatives if enabled changes?

roman-khimov · 2024-11-15T07:45:19Z

Likely so, it should be possible to enable/disable the service with SIGHUP.

End-rey · 2024-11-19T20:22:42Z

I have made sure that the metrics are completely reloaded in all services where they are used. But I'm not at all sure that I did it right.

roman-khimov

Looking at the amount changes I suspect we're doing something wrong. We're using global metric registry and yet we have some service that is collecting all metrics into a single place. This makes zero sense to me. In general, we still want to use the default registry since it adds Go/process-specific things that we want to export. This means each package can define metrics it wants locally and add data to them irrespectively of settings (or just with some local flag). And then services would just be HTTP windows to things provided by packages, it'd be trivial to restart them.

So my suggestion is to refactor pkg/metrics out of the way completely (moving respective metrics to appropriate packages) and then solve the restart problem easily.

@carpawell, @cthulhu-rider?

cmd/neofs-node/config.go

roman-khimov · 2024-11-20T15:38:15Z

Or maybe at least, if we're not refactoring it now, we can just leave NodeMetrics alone and always initialize/pass it over. Then never reinitialize it and either expose metrics via HTTP or not. But I don't see any reasonable way to avoid "always collect" mode, current PR tries to do that, but it's not really needed.

carpawell · 2024-11-20T18:18:18Z

each package can define metrics it wants locally

I feel like it is a simpler way in general but still, the idea of local metrics for every package scares me a little just because it will be hard to control them (i mean you will never see all the metrics in a single place then). My internal feeling is close to what @End-rey did in this PR. However, @roman-khimov, I agree that we are fighting against the lib. Dont have a strict opinion here. Maybe if some decision is required, then I have to say that KISS should win here and local metrics in every service should be a better choice overall.

Logs: ``` prometheus service started successfully pprof service started successfully ``` Appear after shutting down these services. Now they do not appear at all. Signed-off-by: Andrey Butusov <[email protected]>

Add consts for the metric and profiler names. Make `c.veryLastClosers` a map. Signed-off-by: Andrey Butusov <[email protected]>

To simply reload the metrics service and enable/disable it at runtime, always initialize the metrics collector and collect data, even in local mode, if it is not exposed via HTTP. Signed-off-by: Andrey Butusov <[email protected]>

Reload prometheus and pprof services, if the config is updated. Closes #1868. Signed-off-by: Andrey Butusov <[email protected]>

End-rey · 2024-11-22T10:57:17Z

Made "always collect" mode so node only reloads metrics server with SIGHUP.

End-rey self-assigned this Nov 14, 2024

End-rey requested review from roman-khimov, carpawell and cthulhu-rider as code owners November 14, 2024 14:46

End-rey force-pushed the 1868-sighup-reload-pprof-metrics branch from 1520f03 to 055c9dd Compare November 14, 2024 21:33

End-rey marked this pull request as draft November 15, 2024 15:08

End-rey force-pushed the 1868-sighup-reload-pprof-metrics branch from 055c9dd to 0d24df4 Compare November 19, 2024 20:16

End-rey marked this pull request as ready for review November 19, 2024 20:23

roman-khimov reviewed Nov 20, 2024

View reviewed changes

cmd/neofs-node/config.go Outdated Show resolved Hide resolved

cmd/neofs-node/config.go Outdated Show resolved Hide resolved

cmd/neofs-node/config.go Show resolved Hide resolved

cmd/neofs-node/config.go Outdated Show resolved Hide resolved

roman-khimov mentioned this pull request Nov 22, 2024

Provide RegisterMetrics() functions in packages that expose metrics nspcc-dev/neo-go#3698

Open

End-rey added 2 commits November 22, 2024 12:29

node: fix logs prometheus and pprof started

a373f6c

Logs: ``` prometheus service started successfully pprof service started successfully ``` Appear after shutting down these services. Now they do not appear at all. Signed-off-by: Andrey Butusov <[email protected]>

metrics, pprof: make their closers map

f9e365e

Add consts for the metric and profiler names. Make `c.veryLastClosers` a map. Signed-off-by: Andrey Butusov <[email protected]>

End-rey force-pushed the 1868-sighup-reload-pprof-metrics branch from 0d24df4 to 8fb2539 Compare November 22, 2024 10:39

End-rey added 2 commits November 22, 2024 13:47

metrics: always init metrics collector

d03873a

To simply reload the metrics service and enable/disable it at runtime, always initialize the metrics collector and collect data, even in local mode, if it is not exposed via HTTP. Signed-off-by: Andrey Butusov <[email protected]>

metrics, pprof: support reloading services with SIGHUP

b79f58f

Reload prometheus and pprof services, if the config is updated. Closes #1868. Signed-off-by: Andrey Butusov <[email protected]>

End-rey force-pushed the 1868-sighup-reload-pprof-metrics branch from 8fb2539 to b79f58f Compare November 22, 2024 10:48

End-rey requested a review from roman-khimov November 22, 2024 10:57

roman-khimov approved these changes Nov 22, 2024

View reviewed changes

roman-khimov merged commit 339b4cb into master Nov 23, 2024
20 of 22 checks passed

roman-khimov deleted the 1868-sighup-reload-pprof-metrics branch November 23, 2024 08:36

roman-khimov mentioned this pull request Dec 2, 2024

Add SIGHUP handler tests nspcc-dev/neofs-testcases#901

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics, pprof: support reloading services with SIGHUP #3016

metrics, pprof: support reloading services with SIGHUP #3016

End-rey commented Nov 14, 2024

codecov bot commented Nov 14, 2024 •

edited

Loading

roman-khimov commented Nov 14, 2024

End-rey commented Nov 14, 2024 •

edited

Loading

roman-khimov commented Nov 15, 2024

End-rey commented Nov 19, 2024

roman-khimov left a comment

roman-khimov commented Nov 20, 2024

carpawell commented Nov 20, 2024

End-rey commented Nov 22, 2024

metrics, pprof: support reloading services with SIGHUP #3016

metrics, pprof: support reloading services with SIGHUP #3016

Conversation

End-rey commented Nov 14, 2024

codecov bot commented Nov 14, 2024 • edited Loading

Codecov Report

roman-khimov commented Nov 14, 2024

End-rey commented Nov 14, 2024 • edited Loading

roman-khimov commented Nov 15, 2024

End-rey commented Nov 19, 2024

roman-khimov left a comment

Choose a reason for hiding this comment

roman-khimov commented Nov 20, 2024

carpawell commented Nov 20, 2024

End-rey commented Nov 22, 2024

codecov bot commented Nov 14, 2024 •

edited

Loading

End-rey commented Nov 14, 2024 •

edited

Loading