Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry deployment or API integration #659

Open
rootfs opened this issue Apr 27, 2023 · 21 comments
Open

OpenTelemetry deployment or API integration #659

rootfs opened this issue Apr 27, 2023 · 21 comments
Assignees
Labels
kind/feature New feature or request

Comments

@rootfs
Copy link
Contributor

rootfs commented Apr 27, 2023

For deployment integration, evaluate the architecture of metrics -> telemetry adapter
For API integration, evaluate telemetry client scalability in kepler

@rootfs rootfs converted this from a draft issue Apr 27, 2023
@SamYuan1990
Copy link
Collaborator

for api migration, maybe we need to double check if open telemetry supports all kinds of kepler metrics today.
as I found Summaries type of metrics is marked as legacy in openTelemetry without migration guide.

@rootfs
Copy link
Contributor Author

rootfs commented May 30, 2023

meeting 30: implement otel api client in kepler and emit telemetry directly. Hopefully there is a way to convert metrics to telemetry. @husky-parul will take the first try.

previous discussion is here
#97

@marceloamaral
Copy link
Collaborator

Just verified and we can export OpenTelemetric metrics and then by using OpenTelemetry Collector we can also expose metrics to Prometheus.

Therefore If the user has OpenTelemetry Collector deployed in the Cluster, Kepler does not need to export Prometheus metrics.

So we need to make it configurable and avoid duplications. That is, if OpenTelemetry metrics are enabled, we should disable Prometheus metrics and vice-versa.

@sunya-ch sunya-ch added the kind/feature New feature or request label Jun 22, 2023
@rootfs rootfs added this to the kepler-release-0.6 milestone Jun 22, 2023
@husky-parul
Copy link
Collaborator

Recap

Towards our migration to OpenTelemetry Mterics from Prometheus metrics to allow vendor- and tool-agnostic observability
I did an initial POC of instrumenting an exporter using OTEL SDK and collecting metrics using otel collector and dashboard using grafana (poc example)

Before starting with the migration I was looking into kepler code to identify metrics type. So far I see kepler uses only Counters and Gauge
https://github.com/sustainable-computing-io/kepler/blob/main/pkg/collector/prometheus_process_collector.go#L30

Otel SDK supports Synchronous Counter and Asynchronous GaugeObserver. They have highlights a point about GaugeObserver:

For GaugeObserver timeseries, backends usually display the last value and don't allow to sum different timeseries together.

It should not affect our implementation though.
@rootfs @marceloamaral @sunya-ch @SamYuan1990 @bertysentry

@bertysentry
Copy link

This is awesome @husky-parul! WRT metric types, make sure to use Gauge only for metrics that are usually not summable (additive), like temperature, ratios, etc. For other metrics that move "up and down", like measured electrical power, you should use UpDownCounter. See OpenTelemetry Supplementary Guidelines about this.

@husky-parul
Copy link
Collaborator

husky-parul commented Jul 17, 2023

I am proposing the following. @sustainable-computing-io/maintainer please TAL. Let me know if you have any questions.

design

Components

Instrumentation:
Kepler instrumented using the OTEL SDK to collect metrics.

OTEL Collector:
The OTEL collector receives the exported metrics data from the instrumented applications. The collector acts as an intermediary component that processes and routes the telemetry data to the appropriate destinations.
For Kepler we are going to support OpenTelemetry protocol (OTLP), to receive data from the instrumented applications.

Exporters:
The OTEL collector will utilize OTEL exporters to send metrics data to backends.
We are currently using Prometheus as backend but other options include InfluxDB, ElasticSearch.
We will be using OTEL Prometheus Exporter with Grafana
These exporters convert the collected metrics into a format that Grafana can understand and consume.

Data Storage:
The exported metrics data is stored in Prometheus.

Grafana Data Source:
Grafana will configure to connect to the Prometheus data storage backend where the metrics data is stored.
Connection is established through the Prometheus data source within Grafana.

Visualization in Grafana:
Grafana can query the metrics data from the storage backend and create visualizations based on the collected metrics.

@bertysentry
Copy link

When using the Prometheus exporter, I recommend enabling the normalization of metric names with this flag: --feature-gates=pkg.translator.prometheus.NormalizeName. Otel metric names will be normalized as described here

@rootfs
Copy link
Contributor Author

rootfs commented Jul 19, 2023

@husky-parul this looks great! Look forward to this happening!

@marceloamaral
Copy link
Collaborator

Looks good to me! Thanks for working on this!

@stale
Copy link

stale bot commented Sep 19, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Sep 19, 2023
@brunobat
Copy link

I don't think this is stale.
This issue should get more attention as OTel is quickly becoming the de-facto standard to export telemetry everywhere.

@stale stale bot removed the wontfix This will not be worked on label Sep 19, 2023
@husky-parul
Copy link
Collaborator

It is not stale. I am working on this and a demo/PR is WIP.

@husky-parul husky-parul self-assigned this Sep 19, 2023
@bertysentry
Copy link

Also good news: future version of Prometheus will be capable of ingesting Otel metrics, and the mechanism to translate Otel metrics to Prometheus metrics is the one I mentioned earlier.

@rootfs
Copy link
Contributor Author

rootfs commented Sep 21, 2023

thank you @bertysentry for the info! We are going to make this happen in the next release milestone. Stay tuned!

@frzifus
Copy link

frzifus commented Sep 29, 2023

@husky-parul that #659 (comment) is great. Do you already have some updates?

@rootfs Just out of curiosity, is there a timeline for the next milestone?

@husky-parul
Copy link
Collaborator

husky-parul commented Oct 2, 2023

Thanks for sharing the information.

Just out of curiosity, is there a timeline for the next milestone?

@frzifus Otel integration will be part of our next release which will be part of 0.7 in this case. Our releases takes place every 6 months, so it will be happening in Q1 of 2024

Copy link

stale bot commented Dec 1, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Dec 1, 2023
@rootfs rootfs removed the wontfix This will not be worked on label Dec 1, 2023
Copy link

stale bot commented Jan 30, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jan 30, 2024
@stale stale bot closed this as completed Feb 6, 2024
@github-project-automation github-project-automation bot moved this from 🕊Features to ✅ Done in kepler-release-planning Feb 6, 2024
@SamYuan1990
Copy link
Collaborator

are we done for this ticket? @rootfs

@SamYuan1990 SamYuan1990 reopened this Feb 11, 2024
@stale stale bot removed the wontfix This will not be worked on label Feb 11, 2024
@gyliu513
Copy link

gyliu513 commented Mar 5, 2024

Any documentation for using otel to collect metrics from kepler? Thanks

@husky-parul
Copy link
Collaborator

https://github.com/husky-parul/otel-observability

Please try this. We haven’t merged this doc into Kepler website yet.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
Status: Done
Development

No branches or pull requests

9 participants