Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exportable service metrics #134

Closed
1 task done
j08lue opened this issue Aug 20, 2024 · 6 comments
Closed
1 task done

Exportable service metrics #134

j08lue opened this issue Aug 20, 2024 · 6 comments
Assignees

Comments

@j08lue
Copy link
Member

j08lue commented Aug 20, 2024

Scope out what it would take to gather service health / usage metrics and make them accessible to any harvester of such metrics, e.g. to feed into a Grafana dashboard or OpenSearch.

Similar to what veda-backend has been doing, e.g.

Perhaps related to

Acceptance criteria

  • Describe possible pathways to an interface that exposes service health and usage metrics / traces
@j08lue
Copy link
Member Author

j08lue commented Aug 20, 2024

Why this is coming up now:

I learned that the Resource Health building block in EOEPCA will possibly in the future be able to receive OpenTelemetry traces from other building blocks, to ease debugging.

VEDA Backend has set up a centralized usage metrics overview (dashboard), which would also be highly relevant as a feature for eoapi-k8s.

@ranchodeluxe
Copy link
Contributor

ranchodeluxe commented Aug 27, 2024

I think we have about 75% of this already done in main b/c our metrics are already exportable via Prometheus on an API endpoint (that's how we have custom metrics about each service to autoscale by) and they get fed to Grafana. If I understand correctly the only addition here would be either:

  1. setting up an OTeL deployment that feeds more data into Prometheus
  2. setting up an OTeL exporter deployment that pushes data to an OTeL receiver service that EOEPCA+ sets up

That said we use ETOL on the fire atlas stuff and without writing custom metrics I haven't seen a lot of good data coming out of it that we can't get from other metrics APIs already. So something to think about

@j08lue
Copy link
Member Author

j08lue commented Aug 28, 2024

Great to hear the infrastructure is basically already there and we just need to add some custom metrics.

Let us see once our services are up, which metrics that could be.

What is ETOL? 🙏

@ranchodeluxe
Copy link
Contributor

What is ETOL? 🙏

The Encyclopedia of Trotskyism Online 😉 I mean OTeL

@ranchodeluxe
Copy link
Contributor

@j08lue: It also occurred to me this means we'd have to build our own custom runtimes to add OTeL stuff 😞 I really, truly feel like this is yet another example of something that should be plumbed through and turned on in all the upstream libraries 😬

@j08lue
Copy link
Member Author

j08lue commented Dec 10, 2024

I think we have about 75% of this already done in main b/c our metrics are already exportable via Prometheus on an API endpoint

We are now planning on including OpenTelemetry in our EOEPCA runtimes and have them feed into Prometheus. From there they can get exposed via Grafana or picked up by external observability services.

We should document the approach EOEPCA takes for other users of eoapi-k8s to follow.

We will discuss in the context of EOEPCA, which additional metrics (besides liveliness / uptime) we would like to gather.

@j08lue j08lue closed this as completed Dec 10, 2024
@j08lue j08lue self-assigned this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants