Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenTelemetry converter to export Prometheus metrics #97

Closed
husky-parul opened this issue Jul 19, 2022 · 29 comments
Closed

Add OpenTelemetry converter to export Prometheus metrics #97

husky-parul opened this issue Jul 19, 2022 · 29 comments
Assignees
Labels
kind/feature New feature or request wontfix This will not be worked on

Comments

@husky-parul
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@brunobat
Copy link

brunobat commented Oct 4, 2022

I also believe that outputting data using open telemetry with the right attribute semantics could be used to later correlate data from application data.
Moreover, Using OpenTelemetry across the board in the project would allow not only OTLP standard exporters but also Prometheus and many others.

@marceloamaral
Copy link
Collaborator

Typically we use tracing the measure the latency across function calls.
So, I am not sure when/how/why we will use data from trace....

Could you please explain better why you need tracing?

@brunobat
Copy link

brunobat commented Oct 4, 2022

When I mention OpenTelemetry(OTel) it's not just about tracing. Metrics themselves can be produced in the OTel format. This seems to me much more flexible than outputting metrics as timeseries in the Prometheus format. The main advantage is to easily integrate and correlate with App generated metrics.
This can be used to identify inside the apps what might be the energy consumption hotspots.

@marceloamaral
Copy link
Collaborator

It's actually on our roadmap to support export metrics in different formats, not just Prometheus...

We could discuss in more detail, it would be nice if you could create a google doc detailing the ideas and then everyone can give feedback

@marceloamaral marceloamaral added the kind/feature New feature or request label Oct 5, 2022
@bertysentry
Copy link

I totally support the idea of replacing the Prometheus metrics with OpenTelemetry metrics. Then it can be exported anywhere (including Prometheus) through the OpenTelemetry Collector. It would make things more open and platform-agnostic.

@SamYuan1990
Copy link
Collaborator

wait a min, why kepler need OpenTelemetry or in general distributed tracing?
in my point of view

  • kepler's workflow is not a distributed transaction.
  • kepler scans at ebpf level, which is running on k8s node, in a schedule. Hence is that means we need to create a new span for each round of scan? for a specific metric?

it's good to support different type of format as output, but may I know what's the different between OpenTelemetry and prometheus? I hope this is the correct document.

if document above is correct, can anyone help find a sample that prometheus consumes openmetrics format? Otherwise, it looks like a single way from prometheus to openmetrics.
is there any sample/application supports both openmetrics and prometheus?

Hence, I suppose to avoid misunderstanding, we'd better rename this issue to add openmetrics support?

@brunobat
Copy link

OpenTelemetry (OTel) is not just about Tracing. It includes metrics and logs... More to come in the future.
Providing OTel metrics output would potentially allow to cross correlate the metrics generated here with application metrics. Mind that OTel supports multiple programming languages and is quickly becoming the de-facto standard for telemetry.
This would be super useful.

@rootfs
Copy link
Contributor

rootfs commented Dec 20, 2022

@sallyom has some early PoC with this

@SamYuan1990
Copy link
Collaborator

SamYuan1990 commented Dec 20, 2022

@sallyom has some early PoC with this

I would like to see the PoC and further evaluate it. To see whether it's ready for us at implementation level or not.
Once before open-telemetry/opentelemetry-go#2742 I asked as migrate from jaeger to OTel, but during the time it seems not ready.

Hence, if migrate to OTel means too much efforts and dependencies, I would like to considering that we wait until OTel ready with UI as dashboard, to make sure kepler's user has same UX in prometheus and OTel.
list some key features/points here for discussion

  • operator (as prometheus operator)
  • operator integration (as service monitor of prometheus operator)
  • sdk for metric
  • UI for dash board (so far we also provide grafana dashboard for visualization)
  • prometheus compatibility

@bertysentry
Copy link

bertysentry commented Dec 20, 2022

@SamYuan1990 OTel won't have natively dashboards, or UI, etc. OTel defines data structures and protocols for metrics, logs and traces. They provide SDKs so that app developers can send metrics, logs and traces that can then be consumed in any OpenTelemetry-supported backend and UI: Prometheus + Grafana, or Datadog, or New Relic, or Splunk, or Dynatrace, etc. etc.

OTel also provides a "collector", whose role is mostly to act as a proxy, relaying metrics, logs and traces from one place to another.

You can use OpenTelemetry in Kepler to export OTel metrics, that will be pushed to an OTel Collector running on the side (like a wagon), and that will export these metrics to Prometheus. This way, it's 100% compatible with the current architecture, and you don't need to rewrite your Grafana dashboards.

The benefit is that the user can easily configure the OpenTelemetry Collector to push metrics to other backends as well (Datadog, New Relic, etc.)

To answer your points:

Last but not least: it is important to follow semantic conventions. For example, you're currently exporting this metric: kepler_container_core_joules_total, which follows Prometheus conventions.

In OpenTelemetry, you will rather create a metric as:

  • type: Counter
  • name: kepler.container.core
  • unit: J (for joules)

When exported to Prometheus (using either OpenTelemetry Collector Contrib exporters for Prometheus), this metric will be converted to kepler_container_core_joules_total by the collector automatically (using the translator documented here).

Hope this helps understand OpenTelemetry!

@SamYuan1990
Copy link
Collaborator

@SamYuan1990 OTel won't have natively dashboards, or UI, etc. OTel defines data structures and protocols for metrics, logs and traces. They provide SDKs so that app developers can send metrics, logs and traces that can then be consumed in any OpenTelemetry-supported backend and UI: Prometheus + Grafana, or Datadog, or New Relic, or Splunk, or Dynatrace, etc. etc.

OTel also provides a "collector", whose role is mostly to act as a proxy, relaying metrics, logs and traces from one place to another.

You can use OpenTelemetry in Kepler to export OTel metrics, that will be pushed to an OTel Collector running on the side (like a wagon), and that will export these metrics to Prometheus. This way, it's 100% compatible with the current architecture, and you don't need to rewrite your Grafana dashboards.

The benefit is that the user can easily configure the OpenTelemetry Collector to push metrics to other backends as well (Datadog, New Relic, etc.)

To answer your points:

Last but not least: it is important to follow semantic conventions. For example, you're currently exporting this metric: kepler_container_core_joules_total, which follows Prometheus conventions.

In OpenTelemetry, you will rather create a metric as:

  • type: Counter
  • name: kepler.container.core
  • unit: J (for joules)

When exported to Prometheus (using either OpenTelemetry Collector Contrib exporters for Prometheus), this metric will be converted to kepler_container_core_joules_total by the collector automatically (using the translator documented here).

Hope this helps understand OpenTelemetry!

LGTM.

btw, do you know if prometheus has any idea to consume OTel metric directly?

@SamYuan1990
Copy link
Collaborator

SamYuan1990 commented Feb 5, 2023

I tried with https://github.com/open-telemetry/opentelemetry-go/blob/main/example/prometheus/main.go and https://github.com/open-telemetry/opentelemetry-go/blob/main/example/view/main.go it seems if we use OTel, it's nearly same as prometheus?
as the output is http://localhost:2222/metrics or http://localhost:2223/metrics from the sample, hence, does it mean we can use prometheus to consume Otel directly? if we don't have sum or count type convert?

ref https://opentelemetry.io/docs/reference/specification/metrics/data-model/#point-kinds

@stale
Copy link

stale bot commented May 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label May 17, 2023
@marceloamaral
Copy link
Collaborator

@SamYuan1990 OpenTelemetry can collect metrics from Prometheus directly
https://uptrace.dev/opentelemetry/prometheus-metrics.html

So, we don't need to export OpenTelemetry metrics, right?

@stale stale bot removed the wontfix This will not be worked on label May 18, 2023
@SamYuan1990
Copy link
Collaborator

@SamYuan1990 OpenTelemetry can collect metrics from Prometheus directly https://uptrace.dev/opentelemetry/prometheus-metrics.html

So, we don't need to export OpenTelemetry metrics, right?

yea... but in the past, as offline discussed with @rootfs , if we are going to run kepler on edge node. (edge computing) we'd better support opentelemetry metric. as for edge node, it's better to use remote push.

@marceloamaral
Copy link
Collaborator

Humm, I was not aware of this use case.
Ok, let's discuss how to move forward with this.

@bertysentry
Copy link

@marceloamaral In general, we all agree it's better to use the open standard that most vendors agreed on, than just one specific technology. It will make the integration with the rest of the world much smoother, and it should not add any friction when interacting with the Prometheus world.

I understand that switching from a Prometheus-based code to OpenTelemetry is quite a challenge, though!

Trivia: Did you know that OpenTelemetry takes its roots in OpenMetrics (and others), which derives directly from Prometheus? 😉

@stale
Copy link

stale bot commented Aug 21, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Aug 21, 2023
@brunobat
Copy link

I don't think this is stale.
It's scope needs to be clarified in the light of #659

@rootfs rootfs removed the wontfix This will not be worked on label Aug 21, 2023
@rootfs
Copy link
Contributor

rootfs commented Aug 21, 2023

thanks! @brunobat

@frzifus
Copy link

frzifus commented Sep 29, 2023

btw, do you know if prometheus has any idea to consume OTel metric directly?

yes, in the latest release they added native otlp ingestion. https://github.com/prometheus/prometheus/releases/tag/v2.47.0

@simonpasquier
Copy link

👋 Prometheus team member here. For information, the Prometheus community agreed that the Prometheus client libraries will support exporting OTLP metrics directly.

From the Sep 30th 2023's Prometheus developers summit notes

CONSENSUS: We want to support exporting OTLP protocol metrics in our client libraries. We want to maintain Prometheus SDKs as an opinionated and efficient way to do custom metrics.

@marceloamaral
Copy link
Collaborator

There was some discussion in the Community meeting about the overhead of Prometheus and OTLP client.
Prometheus client has better scalability.

https://github.com/danielm0hr/edge-metrics-measurements/blob/main/talks/DanielMohr_PromAgentVsOtelCol.pdf

@husky-parul
Copy link
Collaborator Author

Some more experiments are leading that PromAgent+RW is less cpu crunching than setting up Otel Collector+OTLP and Otel collector+RW. (results here) thanks to @danielm0hr

@bertysentry has your team conducted similar benchmarks?

At the same point I would like to add the point that Otel SDK instrumentation still supports Prometheus and the scope of this integration to not limited to setting up Otel Collector +RW but to instrument Kepler using open protocol (and not Prom metrics) that supports metrics vendor other than Prometheus. Using Prometheus as a backend is not affected by this integration.

@frzifus
Copy link

frzifus commented Oct 18, 2023

I did some benchmarks in the past that show a CPU overhead on the otel side when dealing with the otel prometheus receiver and exporter. But it did better in memory and network - While it also depends on the configuration of the collector.

I can not confirm that the CPU usage was higher in a otel in/out senario then in prometheus scrape + rw.

Unfortunately, I do not have much time to make the setup and the results available in a way that is understandable like https://github.com/danielm0hr/edge-metrics-measurements.

@simonpasquier
Copy link

I think that we are talking about different use cases here:

  1. Export Kepler metrics in OTLP format.
  2. Instrument the Kepler exporter using the OTEL Go SDK.

IIUC the first use case can be accomplished today with the OTEL collector scraping metrics from the /metrics endpoint (and hopefully Prometheus should be able to support this natively in the future).

IMHO the second case would deserve careful evaluation because the Kepler exporter has some unique characteristics/challenges in terms of instrumentation (discussed in #439 and #365 (comment)).

@bertysentry
Copy link

@simonpasquier The idea would be to use the OpenTelemetry SDK everywhere we can to produce OTLP metrics instead of Prometheus metrics.

Of course, one can use OpenTelemetry's receiver for Prometheus to export the metrics to another OpenTelemetry-supporting backend, but it's an added step in the way that we can remove.

The Prometheus server can now ingest OTLP metrics natively. This means that Kepler use OpenTelemetry to send OTLP metrics and still use Prometheus as a backend, without any extra-step, and no OpenTelemetry Collector required at all, and therefore no performance hit either.

@simonpasquier
Copy link

simonpasquier commented Oct 19, 2023

The performance issue I'm referring to was with the Prometheus client_golang library and one would need to verify that the OTEL SDK provides good performances given the very special nature of the Kepler exporter.

Copy link

stale bot commented Dec 18, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Dec 18, 2023
@stale stale bot closed this as completed Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

9 participants