Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanner: out of range [0] with length 0 when OTMetrics enabled after client creation #9740

Open
tamayika opened this issue Apr 10, 2024 · 3 comments · May be fixed by #11496
Open

spanner: out of range [0] with length 0 when OTMetrics enabled after client creation #9740

tamayika opened this issue Apr 10, 2024 · 3 comments · May be fixed by #11496
Assignees
Labels
api: spanner Issues related to the Spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release.

Comments

@tamayika
Copy link

Client

Spanner

Environment

N/A

Go Environment

N/A

Code

e.g.

package main

func main() {
	client, err := spanner.NewClientWithConfig(context.Background(), "", spanner.ClientConfig{
		OpenTelemetryMeterProvider: metric.NewMeterProvider(),
	})
	if err != nil {
		panic(err)
	}
	spanner.EnableOpenTelemetryMetrics()
	err = client.ReadOnlyTransaction().Query(context.Background(), spanner.NewStatement("SELECT 1"))
	if err != nil {
		panic(err)
	}
}

Expected behavior

No panic

Actual behavior

panic with out of range [0] with length 0 at https://github.com/googleapis/google-cloud-go/blob/main/spanner/ot_metrics.go#L245 .
Screenshots

N/A

Additional context

I'm using Spanner Emulator, so there is no server-timing metadata in the response.
And client.otConfig is always empty before EnableOpenTelemetryMetrics called.
Our codebase is modular monolith, so some service creates client without EnableOpenTelemetryMetrics but other service creates client with EnableOpenTelemetryMetrics and panics.

The code should be

	if len(md.Get("server-timing")) == 0 {
		if otConfig.gfeHeaderMissingCount != nil {
			otConfig.gfeHeaderMissingCount.Add(ctx, 1, metric.WithAttributes(attr...))
		}
		return nil
	}
@tamayika tamayika added the triage me I really want to be triaged. label Apr 10, 2024
@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Apr 10, 2024
@rahul2393
Copy link
Contributor

rahul2393 commented Apr 10, 2024

Related ticket #9519, spanner.EnableOpenTelemetryMetrics() enable OpenTelemetry for all the clients in application and should be called before any client initialization, panic here is not expected though which I am working to fix here #9657, but for short term please move spanner.EnableOpenTelemetryMetrics() before spanner.NewClientWithConfig

@tamayika
Copy link
Author

I think several options should exists to use otel metrics.

  • spanner.EnableOpenTelemetryMetrics() activates all client metrics. ClientConfig.OpenTelemetryMeterProvider is optional.(i.e. use global meter provider if nil)
  • When ClientConfig.OpenTelemetryMeterProvider is set but spanner.EnableOpenTelemetryMetrics() is not called, metrics is activated only for its client.

@rahul2393 rahul2393 added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed triage me I really want to be triaged. labels Apr 15, 2024
@codygibb
Copy link

codygibb commented Jan 23, 2025

Managing the sequencing of a global spanner.EnableOpenTelemetryMetrics() call and any spanner.NewClientWithConfig call is somewhat impractical for applications built on shared libraries. The only way to get this 100% right is to move the spanner.EnableOpenTelemetryMetrics() call into func main(), to guarantee that any downstream spanner.NewClientWithConfig calls run against a consistent global state. This is essentially an abstraction leak: any application using a Spanner library needs to remember to manually enable otel metrics.

I would love to see this get fixed without putting the onus on applications to sequence their calls correctly. IMO the problem with the recordGFELatencyMetricsOT function is that it dynamically checks the value of the mutable openTelemetryMetricsEnabled global variable, instead of exclusively relying on immutable state. This is how we end up in weird states where otConfig isn't fully initialized, but somehow otel metrics are enabled.

One way to fix this is to only load the global openTelemetryMetricsEnabled variable exactly once on each client's creation (i.e. snapshot the value), and then from that point on, the client should rely on the snapshotted value for its entire lifetime: either otel metrics are enabled and otConfig is fully initialized, or otel metrics are disabled and otConfig isn't used. Basically, it's a bug to call IsOpenTelemetryMetricsEnabled() more than once per client, since the value can change between calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants