Add OpenTelemetry converter to export Prometheus metrics #97

husky-parul · 2022-07-19T19:11:52Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

brunobat · 2022-10-04T08:58:33Z

I also believe that outputting data using open telemetry with the right attribute semantics could be used to later correlate data from application data.
Moreover, Using OpenTelemetry across the board in the project would allow not only OTLP standard exporters but also Prometheus and many others.

marceloamaral · 2022-10-04T11:57:43Z

Typically we use tracing the measure the latency across function calls.
So, I am not sure when/how/why we will use data from trace....

Could you please explain better why you need tracing?

brunobat · 2022-10-04T12:03:39Z

When I mention OpenTelemetry(OTel) it's not just about tracing. Metrics themselves can be produced in the OTel format. This seems to me much more flexible than outputting metrics as timeseries in the Prometheus format. The main advantage is to easily integrate and correlate with App generated metrics.
This can be used to identify inside the apps what might be the energy consumption hotspots.

marceloamaral · 2022-10-05T01:42:50Z

It's actually on our roadmap to support export metrics in different formats, not just Prometheus...

We could discuss in more detail, it would be nice if you could create a google doc detailing the ideas and then everyone can give feedback

bertysentry · 2022-12-20T00:00:49Z

I totally support the idea of replacing the Prometheus metrics with OpenTelemetry metrics. Then it can be exported anywhere (including Prometheus) through the OpenTelemetry Collector. It would make things more open and platform-agnostic.

SamYuan1990 · 2022-12-20T10:59:37Z

wait a min, why kepler need OpenTelemetry or in general distributed tracing?
in my point of view

kepler's workflow is not a distributed transaction.
kepler scans at ebpf level, which is running on k8s node, in a schedule. Hence is that means we need to create a new span for each round of scan? for a specific metric?

it's good to support different type of format as output, but may I know what's the different between OpenTelemetry and prometheus? I hope this is the correct document.

if document above is correct, can anyone help find a sample that prometheus consumes openmetrics format? Otherwise, it looks like a single way from prometheus to openmetrics.
is there any sample/application supports both openmetrics and prometheus?

Hence, I suppose to avoid misunderstanding, we'd better rename this issue to add openmetrics support?

brunobat · 2022-12-20T11:34:59Z

OpenTelemetry (OTel) is not just about Tracing. It includes metrics and logs... More to come in the future.
Providing OTel metrics output would potentially allow to cross correlate the metrics generated here with application metrics. Mind that OTel supports multiple programming languages and is quickly becoming the de-facto standard for telemetry.
This would be super useful.

rootfs · 2022-12-20T13:16:00Z

@sallyom has some early PoC with this

SamYuan1990 · 2022-12-20T13:36:36Z

@sallyom has some early PoC with this

I would like to see the PoC and further evaluate it. To see whether it's ready for us at implementation level or not.
Once before open-telemetry/opentelemetry-go#2742 I asked as migrate from jaeger to OTel, but during the time it seems not ready.

Hence, if migrate to OTel means too much efforts and dependencies, I would like to considering that we wait until OTel ready with UI as dashboard, to make sure kepler's user has same UX in prometheus and OTel.
list some key features/points here for discussion

operator (as prometheus operator)
operator integration (as service monitor of prometheus operator)
sdk for metric
UI for dash board (so far we also provide grafana dashboard for visualization)
prometheus compatibility

bertysentry · 2022-12-20T17:58:29Z

@SamYuan1990 OTel won't have natively dashboards, or UI, etc. OTel defines data structures and protocols for metrics, logs and traces. They provide SDKs so that app developers can send metrics, logs and traces that can then be consumed in any OpenTelemetry-supported backend and UI: Prometheus + Grafana, or Datadog, or New Relic, or Splunk, or Dynatrace, etc. etc.

OTel also provides a "collector", whose role is mostly to act as a proxy, relaying metrics, logs and traces from one place to another.

You can use OpenTelemetry in Kepler to export OTel metrics, that will be pushed to an OTel Collector running on the side (like a wagon), and that will export these metrics to Prometheus. This way, it's 100% compatible with the current architecture, and you don't need to rewrite your Grafana dashboards.

The benefit is that the user can easily configure the OpenTelemetry Collector to push metrics to other backends as well (Datadog, New Relic, etc.)

To answer your points:

OpenTelemetry also provides its own Kubernetes Operator (https://github.com/open-telemetry/opentelemetry-operator).
The OTel SDK for Go is good, use the OTLP exporter though, and not the Prometheus one
You will keep Grafana for dashboarding (Kepler --> OpenTelemetry Collector --> Prometheus --> Grafana)
Full prometheus compatibility (both using classic scraping method, or remote write protocol)

Last but not least: it is important to follow semantic conventions. For example, you're currently exporting this metric: kepler_container_core_joules_total, which follows Prometheus conventions.

In OpenTelemetry, you will rather create a metric as:

type: Counter
name: kepler.container.core
unit: J (for joules)

When exported to Prometheus (using either OpenTelemetry Collector Contrib exporters for Prometheus), this metric will be converted to kepler_container_core_joules_total by the collector automatically (using the translator documented here).

Hope this helps understand OpenTelemetry!

SamYuan1990 · 2022-12-21T10:21:09Z

@SamYuan1990 OTel won't have natively dashboards, or UI, etc. OTel defines data structures and protocols for metrics, logs and traces. They provide SDKs so that app developers can send metrics, logs and traces that can then be consumed in any OpenTelemetry-supported backend and UI: Prometheus + Grafana, or Datadog, or New Relic, or Splunk, or Dynatrace, etc. etc.

OTel also provides a "collector", whose role is mostly to act as a proxy, relaying metrics, logs and traces from one place to another.

You can use OpenTelemetry in Kepler to export OTel metrics, that will be pushed to an OTel Collector running on the side (like a wagon), and that will export these metrics to Prometheus. This way, it's 100% compatible with the current architecture, and you don't need to rewrite your Grafana dashboards.

The benefit is that the user can easily configure the OpenTelemetry Collector to push metrics to other backends as well (Datadog, New Relic, etc.)

To answer your points:

OpenTelemetry also provides its own Kubernetes Operator (https://github.com/open-telemetry/opentelemetry-operator).

The OTel SDK for Go is good, use the OTLP exporter though, and not the Prometheus one

You will keep Grafana for dashboarding (Kepler --> OpenTelemetry Collector --> Prometheus --> Grafana)

Full prometheus compatibility (both using classic scraping method, or remote write protocol)

Last but not least: it is important to follow semantic conventions. For example, you're currently exporting this metric: kepler_container_core_joules_total, which follows Prometheus conventions.

In OpenTelemetry, you will rather create a metric as:

type: Counter

name: kepler.container.core

unit: J (for joules)

When exported to Prometheus (using either OpenTelemetry Collector Contrib exporters for Prometheus), this metric will be converted to kepler_container_core_joules_total by the collector automatically (using the translator documented here).

Hope this helps understand OpenTelemetry!

LGTM.

btw, do you know if prometheus has any idea to consume OTel metric directly?

SamYuan1990 · 2023-02-05T12:57:53Z

I tried with https://github.com/open-telemetry/opentelemetry-go/blob/main/example/prometheus/main.go and https://github.com/open-telemetry/opentelemetry-go/blob/main/example/view/main.go it seems if we use OTel, it's nearly same as prometheus?
as the output is http://localhost:2222/metrics or http://localhost:2223/metrics from the sample, hence, does it mean we can use prometheus to consume Otel directly? if we don't have sum or count type convert?

ref https://opentelemetry.io/docs/reference/specification/metrics/data-model/#point-kinds

stale · 2023-05-17T13:32:47Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

marceloamaral · 2023-05-18T05:06:12Z

@SamYuan1990 OpenTelemetry can collect metrics from Prometheus directly
https://uptrace.dev/opentelemetry/prometheus-metrics.html

So, we don't need to export OpenTelemetry metrics, right?

SamYuan1990 · 2023-05-19T04:11:08Z

@SamYuan1990 OpenTelemetry can collect metrics from Prometheus directly https://uptrace.dev/opentelemetry/prometheus-metrics.html

So, we don't need to export OpenTelemetry metrics, right?

yea... but in the past, as offline discussed with @rootfs , if we are going to run kepler on edge node. (edge computing) we'd better support opentelemetry metric. as for edge node, it's better to use remote push.

marceloamaral · 2023-05-19T08:26:03Z

Humm, I was not aware of this use case.
Ok, let's discuss how to move forward with this.

bertysentry · 2023-05-19T09:25:17Z

@marceloamaral In general, we all agree it's better to use the open standard that most vendors agreed on, than just one specific technology. It will make the integration with the rest of the world much smoother, and it should not add any friction when interacting with the Prometheus world.

I understand that switching from a Prometheus-based code to OpenTelemetry is quite a challenge, though!

Trivia: Did you know that OpenTelemetry takes its roots in OpenMetrics (and others), which derives directly from Prometheus? 😉

stale · 2023-08-21T05:54:29Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

brunobat · 2023-08-21T12:49:56Z

I don't think this is stale.
It's scope needs to be clarified in the light of #659

rootfs · 2023-08-21T12:53:01Z

thanks! @brunobat

frzifus · 2023-09-29T14:43:33Z

btw, do you know if prometheus has any idea to consume OTel metric directly?

yes, in the latest release they added native otlp ingestion. https://github.com/prometheus/prometheus/releases/tag/v2.47.0

simonpasquier · 2023-09-30T11:42:33Z

👋 Prometheus team member here. For information, the Prometheus community agreed that the Prometheus client libraries will support exporting OTLP metrics directly.

From the Sep 30th 2023's Prometheus developers summit notes

CONSENSUS: We want to support exporting OTLP protocol metrics in our client libraries. We want to maintain Prometheus SDKs as an opinionated and efficient way to do custom metrics.

marceloamaral · 2023-10-18T08:38:13Z

There was some discussion in the Community meeting about the overhead of Prometheus and OTLP client.
Prometheus client has better scalability.

https://github.com/danielm0hr/edge-metrics-measurements/blob/main/talks/DanielMohr_PromAgentVsOtelCol.pdf

husky-parul · 2023-10-18T10:06:39Z

Some more experiments are leading that PromAgent+RW is less cpu crunching than setting up Otel Collector+OTLP and Otel collector+RW. (results here) thanks to @danielm0hr

@bertysentry has your team conducted similar benchmarks?

At the same point I would like to add the point that Otel SDK instrumentation still supports Prometheus and the scope of this integration to not limited to setting up Otel Collector +RW but to instrument Kepler using open protocol (and not Prom metrics) that supports metrics vendor other than Prometheus. Using Prometheus as a backend is not affected by this integration.

frzifus · 2023-10-18T10:28:14Z

I did some benchmarks in the past that show a CPU overhead on the otel side when dealing with the otel prometheus receiver and exporter. But it did better in memory and network - While it also depends on the configuration of the collector.

I can not confirm that the CPU usage was higher in a otel in/out senario then in prometheus scrape + rw.

Unfortunately, I do not have much time to make the setup and the results available in a way that is understandable like https://github.com/danielm0hr/edge-metrics-measurements.

simonpasquier · 2023-10-19T14:07:11Z

I think that we are talking about different use cases here:

Export Kepler metrics in OTLP format.
Instrument the Kepler exporter using the OTEL Go SDK.

IIUC the first use case can be accomplished today with the OTEL collector scraping metrics from the /metrics endpoint (and hopefully Prometheus should be able to support this natively in the future).

IMHO the second case would deserve careful evaluation because the Kepler exporter has some unique characteristics/challenges in terms of instrumentation (discussed in #439 and #365 (comment)).

bertysentry · 2023-10-19T15:04:56Z

@simonpasquier The idea would be to use the OpenTelemetry SDK everywhere we can to produce OTLP metrics instead of Prometheus metrics.

Of course, one can use OpenTelemetry's receiver for Prometheus to export the metrics to another OpenTelemetry-supporting backend, but it's an added step in the way that we can remove.

The Prometheus server can now ingest OTLP metrics natively. This means that Kepler use OpenTelemetry to send OTLP metrics and still use Prometheus as a backend, without any extra-step, and no OpenTelemetry Collector required at all, and therefore no performance hit either.

simonpasquier · 2023-10-19T15:36:15Z

The performance issue I'm referring to was with the Prometheus client_golang library and one would need to verify that the OTEL SDK provides good performances given the very special nature of the Kepler exporter.

stale · 2023-12-18T17:55:44Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

marceloamaral added the kind/feature New feature or request label Oct 5, 2022

stale bot added the wontfix This will not be worked on label May 17, 2023

stale bot removed the wontfix This will not be worked on label May 18, 2023

rootfs mentioned this issue May 30, 2023

OpenTelemetry deployment or API integration #659

Open

husky-parul self-assigned this May 30, 2023

sunya-ch added this to the kepler-release-0.6 milestone Jun 22, 2023

stale bot added the wontfix This will not be worked on label Aug 21, 2023

rootfs removed the wontfix This will not be worked on label Aug 21, 2023

rootfs modified the milestones: kepler-release-0.6, keple-releae-0.7 Sep 21, 2023

stale bot added the wontfix This will not be worked on label Dec 18, 2023

stale bot closed this as completed Dec 26, 2023

beneiltis mentioned this issue Jan 16, 2024

Missing support for linux/arm64/v8 #1156

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenTelemetry converter to export Prometheus metrics #97

Add OpenTelemetry converter to export Prometheus metrics #97

husky-parul commented Jul 19, 2022

brunobat commented Oct 4, 2022

marceloamaral commented Oct 4, 2022

brunobat commented Oct 4, 2022 •

edited

Loading

marceloamaral commented Oct 5, 2022

bertysentry commented Dec 20, 2022

SamYuan1990 commented Dec 20, 2022

brunobat commented Dec 20, 2022

rootfs commented Dec 20, 2022

SamYuan1990 commented Dec 20, 2022 •

edited

Loading

bertysentry commented Dec 20, 2022 •

edited

Loading

SamYuan1990 commented Dec 21, 2022

SamYuan1990 commented Feb 5, 2023 •

edited

Loading

stale bot commented May 17, 2023

marceloamaral commented May 18, 2023

SamYuan1990 commented May 19, 2023

marceloamaral commented May 19, 2023

bertysentry commented May 19, 2023

stale bot commented Aug 21, 2023

brunobat commented Aug 21, 2023

rootfs commented Aug 21, 2023

frzifus commented Sep 29, 2023

simonpasquier commented Sep 30, 2023

marceloamaral commented Oct 18, 2023

husky-parul commented Oct 18, 2023

frzifus commented Oct 18, 2023

simonpasquier commented Oct 19, 2023

bertysentry commented Oct 19, 2023

simonpasquier commented Oct 19, 2023 •

edited

Loading

stale bot commented Dec 18, 2023

Add OpenTelemetry converter to export Prometheus metrics #97

Add OpenTelemetry converter to export Prometheus metrics #97

Comments

husky-parul commented Jul 19, 2022

brunobat commented Oct 4, 2022

marceloamaral commented Oct 4, 2022

brunobat commented Oct 4, 2022 • edited Loading

marceloamaral commented Oct 5, 2022

bertysentry commented Dec 20, 2022

SamYuan1990 commented Dec 20, 2022

brunobat commented Dec 20, 2022

rootfs commented Dec 20, 2022

SamYuan1990 commented Dec 20, 2022 • edited Loading

bertysentry commented Dec 20, 2022 • edited Loading

SamYuan1990 commented Dec 21, 2022

SamYuan1990 commented Feb 5, 2023 • edited Loading

stale bot commented May 17, 2023

marceloamaral commented May 18, 2023

SamYuan1990 commented May 19, 2023

marceloamaral commented May 19, 2023

bertysentry commented May 19, 2023

stale bot commented Aug 21, 2023

brunobat commented Aug 21, 2023

rootfs commented Aug 21, 2023

frzifus commented Sep 29, 2023

simonpasquier commented Sep 30, 2023

marceloamaral commented Oct 18, 2023

husky-parul commented Oct 18, 2023

frzifus commented Oct 18, 2023

simonpasquier commented Oct 19, 2023

bertysentry commented Oct 19, 2023

simonpasquier commented Oct 19, 2023 • edited Loading

stale bot commented Dec 18, 2023

brunobat commented Oct 4, 2022 •

edited

Loading

SamYuan1990 commented Dec 20, 2022 •

edited

Loading

bertysentry commented Dec 20, 2022 •

edited

Loading

SamYuan1990 commented Feb 5, 2023 •

edited

Loading

simonpasquier commented Oct 19, 2023 •

edited

Loading