Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch of units between calculated span duration and metric name #27488

Closed
bw972 opened this issue Oct 9, 2023 · 3 comments
Closed

Mismatch of units between calculated span duration and metric name #27488

bw972 opened this issue Oct 9, 2023 · 3 comments
Labels
bug Something isn't working connector/servicegraph processor/servicegraph Service graph processor

Comments

@bw972
Copy link

bw972 commented Oct 9, 2023

Component(s)

connector/servicegraph, processor/servicegraph

What happened?

Description

The service graph connector calculates and buckets the duration of spans in milliseconds but stores it in a metric with the suffix _seconds.

Steps to Reproduce

Enable the service graph connector.

Expected Result

The buckets and calculated span durations should match the unit in the metric name.

Potential Solution

The following span duration calculations are modified to return the duration in seconds:

float64(span.EndTimestamp()-span.StartTimestamp()) / float64(time.Millisecond.Nanoseconds())

e.g.

float64(span.EndTimestamp()-span.StartTimestamp()) / float64(time.Second.Nanoseconds())

Additionally, the following default buckets to be updated to seconds:

	defaultLatencyHistogramBucketsMs = []float64{
		2, 4, 6, 8, 10, 50, 100, 200, 400, 800, 1000, 1400, 2000, 5000, 10_000, 15_000,
	}

e.g.

	defaultLatencyHistogramBucketsSec = []float64{
		0.002, 0.004, 0.006, 0.008, 0.01, 0.05, 0.1, 0.2, 0.4, 0.8, 1.0, 1.4, 2.0, 5.0, 10.0, 15.0,
	}

Collector version

v0.85.0

Environment information

No response

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@bw972 bw972 added bug Something isn't working needs triage New item requiring triage labels Oct 9, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

I agree this is a valid issue. From documentation, timestamps are epoch time in nano seconds, so we need to divide by time.Second.Nanosecond as you've shared rather than time.Second.Millisecond to match the unit in the name. Examples of incorrect calculation can be found in this method.

The math has been this way since the original introduction of the processor, so users likely expect the values to be in milliseconds even though it doesn't line up with the unit name. Due to this incorrect expectation, we need to clearly communicated to users that this change occurred so they're not caught off guard.

I'd think fixing the unit is a better solution than changing metric names to match implementation. In my mind milliseconds makes a bit more sense as the latency metric, but it would be a breaking change to update them. I'll defer to code owners regarding the best path forward.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Oct 9, 2023
jpkrohling added a commit that referenced this issue Oct 24, 2023
…onds instead of milliseconds (#27665)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Measures latency in seconds instead of milliseconds, as the metric name
indicates. Previously, milliseconds was used. This unit is still
available via the feature gate
`processor.servicegraph.legacyLatencyUnitMs`.

This is a breaking change.

**Link to tracking Issue:** <Issue number if applicable>  #27488

**Testing:** <Describe what testing was performed and which tests were
added.> Tests are updated

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
@crobert-1
Copy link
Member

Fixed by #27665

sigilioso pushed a commit to carlossscastro/opentelemetry-collector-contrib that referenced this issue Oct 27, 2023
…onds instead of milliseconds (open-telemetry#27665)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Measures latency in seconds instead of milliseconds, as the metric name
indicates. Previously, milliseconds was used. This unit is still
available via the feature gate
`processor.servicegraph.legacyLatencyUnitMs`.

This is a breaking change.

**Link to tracking Issue:** <Issue number if applicable>  open-telemetry#27488

**Testing:** <Describe what testing was performed and which tests were
added.> Tests are updated

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…onds instead of milliseconds (open-telemetry#27665)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Measures latency in seconds instead of milliseconds, as the metric name
indicates. Previously, milliseconds was used. This unit is still
available via the feature gate
`processor.servicegraph.legacyLatencyUnitMs`.

This is a breaking change.

**Link to tracking Issue:** <Issue number if applicable>  open-telemetry#27488

**Testing:** <Describe what testing was performed and which tests were
added.> Tests are updated

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Curtis Robert <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working connector/servicegraph processor/servicegraph Service graph processor
Projects
None yet
Development

No branches or pull requests

2 participants