Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(BaseMonitorSet): Implement monitoring template #9855

Merged
merged 1 commit into from
Mar 4, 2025

Conversation

k0machi
Copy link
Contributor

@k0machi k0machi commented Jan 20, 2025

This commit adds a new templating mechanism for monitoring dashboards,
allowing SCT to be in sync with changes in scylla-monitoring repo, by
first separating SCT metrics into a separate file, and then injecting
them into the overview template on the monitoring node, then
regenerating said template. This allows monitoring dashboard to always
be up to date with regards to changes on the monitoring side.

Fixes scylladb/qa-tasks#1444

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@k0machi k0machi self-assigned this Jan 20, 2025
@k0machi k0machi added the backport/none Backport is not required label Jan 20, 2025
@k0machi k0machi force-pushed the monitoring-improved-dash-logic branch 4 times, most recently from 5f70e7b to 9b4018c Compare January 20, 2025 11:58
@k0machi k0machi marked this pull request as ready for review January 20, 2025 12:01
@k0machi
Copy link
Contributor Author

k0machi commented Jan 20, 2025

Demo:

image

One more thing that's missing are previously available template variables for event filtering - I'll add them soon.

@k0machi k0machi requested review from fruch and soyacz January 20, 2025 12:02
@fruch fruch added test-provision-aws Run provision test on AWS test-provision-docker labels Jan 21, 2025
@scylladbbot
Copy link

@k0machi new branch branch-2025.1 was added, please add backport label if needed

@k0machi k0machi force-pushed the monitoring-improved-dash-logic branch 6 times, most recently from 448e001 to a6f5dac Compare January 28, 2025 11:17
@k0machi
Copy link
Contributor Author

k0machi commented Jan 28, 2025

Added annotations, template variables for event types, added logic to change default values (currently just hardcoded to change the by var, open for suggestions on the format if we want something more generic), re-arranged widgets to be more in line with the previous dash.

@k0machi
Copy link
Contributor Author

k0machi commented Jan 28, 2025

This run has the updated dashboard, monitoring node is set to keep so will be alive for now.

@k0machi k0machi force-pushed the monitoring-improved-dash-logic branch from a6f5dac to 9eeebf4 Compare January 28, 2025 13:12
@soyacz
Copy link
Contributor

soyacz commented Jan 29, 2025

requests served per instance shows whole numbers (like 30000 instead of 30k) and miss unit (ops/s)

@k0machi k0machi force-pushed the monitoring-improved-dash-logic branch 2 times, most recently from b5847d4 to e0def3f Compare February 17, 2025 13:52
@k0machi
Copy link
Contributor Author

k0machi commented Feb 17, 2025

requests served per instance shows whole numbers (like 30000 instead of 30k) and miss unit (ops/s)

Fixed

@k0machi k0machi requested a review from soyacz February 17, 2025 13:53
@k0machi
Copy link
Contributor Author

k0machi commented Feb 17, 2025

I'll update the template based on recent changes by Valerii and then we can merge this.

soyacz
soyacz previously approved these changes Feb 18, 2025
Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Still docker provision test fails with c-s.
@dimakr do you know what could be the reason?

@fruch
Copy link
Contributor

fruch commented Feb 18, 2025

LGTM Still docker provision test fails with c-s. @dimakr do you know what could be the reason?

it's cause of counters, we need to drop them from those tests.

since we move to take latest scylla release (with tablets enabled by default), it's kind of broken
and forgot/didn't got to handle/report it

EDIT: I did handled a part of it, skipping the pre-create of counter table when tablets is enabled, but that wasn't enough

@k0machi
Copy link
Contributor Author

k0machi commented Feb 25, 2025

Will rebase onto #10006 and update with changes from recent stress-command panels merge.

@k0machi k0machi force-pushed the monitoring-improved-dash-logic branch from e0def3f to c93bfd7 Compare March 3, 2025 10:13
@k0machi k0machi force-pushed the monitoring-improved-dash-logic branch 2 times, most recently from 942d773 to 1bc4693 Compare March 3, 2025 11:45
This commit adds a new templating mechanism for monitoring dashboards,
allowing SCT to be in sync with changes in scylla-monitoring repo, by
first separating SCT metrics into a separate file, and then injecting
them into the overview template on the monitoring node, then
regenerating said template. This allows monitoring dashboard to always
be up to date with regards to changes on the monitoring side.

Fixes scylladb/qa-tasks#1444
@k0machi k0machi force-pushed the monitoring-improved-dash-logic branch from 1bc4693 to 3811063 Compare March 3, 2025 12:03
@k0machi
Copy link
Contributor Author

k0machi commented Mar 3, 2025

Latest run that implements the new dash. Grafana node is still up.

Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fruch fruch merged commit eab8915 into scylladb:master Mar 4, 2025
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants