Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics, pprof: support reloading services with SIGHUP #3016

Merged
merged 4 commits into from
Nov 23, 2024

Conversation

End-rey
Copy link
Contributor

@End-rey End-rey commented Nov 14, 2024

Closes #1868.

Is it right that we should reload services, even if the config is not updated?

There is also such a linter error:
contextcheck Function `preRunAndLog->preRunAndLog$2->Shutdown` should pass the context parameter

Do I need to pass the context honestly or is there some another way?

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 0% with 63 lines in your changes missing coverage. Please review.

Project coverage is 22.85%. Comparing base (2bb903c) to head (b79f58f).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
cmd/neofs-node/config.go 0.00% 32 Missing ⚠️
cmd/neofs-node/metrics.go 0.00% 10 Missing ⚠️
cmd/neofs-node/pprof.go 0.00% 10 Missing ⚠️
cmd/neofs-node/main.go 0.00% 6 Missing ⚠️
cmd/neofs-node/netmap.go 0.00% 2 Missing ⚠️
cmd/neofs-node/control.go 0.00% 1 Missing ⚠️
cmd/neofs-node/object.go 0.00% 1 Missing ⚠️
cmd/neofs-node/storage.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3016      +/-   ##
==========================================
- Coverage   22.87%   22.85%   -0.03%     
==========================================
  Files         791      791              
  Lines       58688    58734      +46     
==========================================
- Hits        13425    13422       -3     
- Misses      44366    44414      +48     
- Partials      897      898       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@roman-khimov
Copy link
Member

Is it right that we should reload services, even if the config is not updated?

This is a service interruption and we can easily avoid it.

Do I need to pass the context honestly or is there some another way?

In this case it can be suppressed, configWatcher is misusing context.Context, so passing it over is not very helpful.

@End-rey End-rey force-pushed the 1868-sighup-reload-pprof-metrics branch from 1520f03 to 055c9dd Compare November 14, 2024 21:33
@End-rey
Copy link
Contributor Author

End-rey commented Nov 14, 2024

Do I understand correctly that it is also necessary to overwrite this variable and all its derivatives if enabled changes?

@roman-khimov
Copy link
Member

Likely so, it should be possible to enable/disable the service with SIGHUP.

@End-rey End-rey marked this pull request as draft November 15, 2024 15:08
@End-rey End-rey force-pushed the 1868-sighup-reload-pprof-metrics branch from 055c9dd to 0d24df4 Compare November 19, 2024 20:16
@End-rey
Copy link
Contributor Author

End-rey commented Nov 19, 2024

I have made sure that the metrics are completely reloaded in all services where they are used. But I'm not at all sure that I did it right.

@End-rey End-rey marked this pull request as ready for review November 19, 2024 20:23
Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the amount changes I suspect we're doing something wrong. We're using global metric registry and yet we have some service that is collecting all metrics into a single place. This makes zero sense to me. In general, we still want to use the default registry since it adds Go/process-specific things that we want to export. This means each package can define metrics it wants locally and add data to them irrespectively of settings (or just with some local flag). And then services would just be HTTP windows to things provided by packages, it'd be trivial to restart them.

So my suggestion is to refactor pkg/metrics out of the way completely (moving respective metrics to appropriate packages) and then solve the restart problem easily.

@carpawell, @cthulhu-rider?

cmd/neofs-node/config.go Outdated Show resolved Hide resolved
cmd/neofs-node/config.go Outdated Show resolved Hide resolved
cmd/neofs-node/config.go Show resolved Hide resolved
cmd/neofs-node/config.go Outdated Show resolved Hide resolved
@roman-khimov
Copy link
Member

Or maybe at least, if we're not refactoring it now, we can just leave NodeMetrics alone and always initialize/pass it over. Then never reinitialize it and either expose metrics via HTTP or not. But I don't see any reasonable way to avoid "always collect" mode, current PR tries to do that, but it's not really needed.

@carpawell
Copy link
Member

each package can define metrics it wants locally

I feel like it is a simpler way in general but still, the idea of local metrics for every package scares me a little just because it will be hard to control them (i mean you will never see all the metrics in a single place then). My internal feeling is close to what @End-rey did in this PR. However, @roman-khimov, I agree that we are fighting against the lib. Dont have a strict opinion here. Maybe if some decision is required, then I have to say that KISS should win here and local metrics in every service should be a better choice overall.

Logs:
```
prometheus service started successfully
pprof service started successfully
```
Appear after shutting down these services. Now they do not appear at all.

Signed-off-by: Andrey Butusov <[email protected]>
Add consts for the metric and profiler names. Make `c.veryLastClosers` a map.

Signed-off-by: Andrey Butusov <[email protected]>
@End-rey End-rey force-pushed the 1868-sighup-reload-pprof-metrics branch from 0d24df4 to 8fb2539 Compare November 22, 2024 10:39
To simply reload the metrics service and enable/disable it at runtime, always
initialize the metrics collector and collect data, even in local mode, if it is
not exposed via HTTP.

Signed-off-by: Andrey Butusov <[email protected]>
Reload prometheus and pprof services, if the config is updated.

Closes #1868.

Signed-off-by: Andrey Butusov <[email protected]>
@End-rey End-rey force-pushed the 1868-sighup-reload-pprof-metrics branch from 8fb2539 to b79f58f Compare November 22, 2024 10:48
@End-rey
Copy link
Contributor Author

End-rey commented Nov 22, 2024

Made "always collect" mode so node only reloads metrics server with SIGHUP.

@End-rey End-rey requested a review from roman-khimov November 22, 2024 10:57
@roman-khimov roman-khimov merged commit 339b4cb into master Nov 23, 2024
20 of 22 checks passed
@roman-khimov roman-khimov deleted the 1868-sighup-reload-pprof-metrics branch November 23, 2024 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reload pprof/metrics services on SIGHUP
3 participants