Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spans aren't being marked as errors in Cloud Trace #730

Open
andypwarren opened this issue Oct 4, 2023 · 13 comments
Open

Spans aren't being marked as errors in Cloud Trace #730

andypwarren opened this issue Oct 4, 2023 · 13 comments
Assignees
Labels
bug Something isn't working priority: p3

Comments

@andypwarren
Copy link

Hi,

I'm instrumenting a gRPC server with OpenTelemetry and Google Cloud Trace. I can see spans in my Trace dashboard but they aren't being coloured red if an rpc returns an internal error. I'm using the otelgrpc.UnaryServerInterceptor() (code here) which calls span.SetStatus with the otel error code and the grpc message if any of these statuses are returned

  • grpc_codes.Unknown
  • grpc_codes.DeadlineExceeded
  • grpc_codes.Unimplemented
  • grpc_codes.Internal
  • grpc_codes.Unavailable
  • grpc_codes.DataLoss

I've also tried calling span.SetStatus outside the interceptors and Cloud Trace doesn't colour them red either so I don't think the problem is with the interceptor code.

I've created a simple demo app to reproduce this using the example grpc-go Greeter service with the addition of tracing using otel and cloudtrace.

When forcing a request to fail this is what I see in cloud trace

Screen Shot 2023-10-04 at 16 21 05
The interceptor has added the attribute rpc.grpc.status_code: 13 but the span status isn't showing up.

Ideally this would produce a red dot in the trace graph and the span would be coloured red.

Many thanks,

Andy

@dashpole dashpole added bug Something isn't working priority: p2 labels Oct 5, 2023
@dashpole dashpole self-assigned this Oct 5, 2023
@dashpole
Copy link
Contributor

dashpole commented Oct 6, 2023

This seems suspicious...

case codes.Error:
sp.Status = &statuspb.Status{Code: int32(codepb.Code_UNKNOWN), Message: s.Status().Description}

@dashpole
Copy link
Contributor

dashpole commented Oct 6, 2023

Seems potentially related to #143

@dashpole
Copy link
Contributor

dashpole commented Oct 6, 2023

@aabmass do you remember why we set codes.Error to codepb.Code_UNKNOWN?

@aabmass
Copy link
Contributor

aabmass commented Oct 6, 2023

Unknown represents an unknown error, along the lines of HTTP 500 status code. Since OTel only has two possible statuses (OK and ERROR), gRPCs UNKNOWN (error) seems reasonable.

Do you know what status codes actually show red in Cloud Trace?

@dashpole
Copy link
Contributor

dashpole commented Oct 6, 2023

It does seem like we are doing the right thing based on https://pkg.go.dev/google.golang.org/genproto/googleapis/rpc/code#Code

// Unknown error. For example, this error may be returned when
// a Status value received from another address space belongs to
// an error space that is not known in this address space. Also
// errors raised by APIs that do not return enough error information
// may be converted to this error.
//
// HTTP Mapping: 500 Internal Server Error
Code_UNKNOWN Code = 2

Do you know what status codes actually show red in Cloud Trace?

I'll see if I can find the answer to that question.

@dashpole
Copy link
Contributor

dashpole commented Oct 6, 2023

I tested all status codes, and none appear to make the span look like an error

@dashpole
Copy link
Contributor

dashpole commented Oct 6, 2023

I'll reach out to the trace UI team.

@BradleyChatha
Copy link

BradleyChatha commented Oct 10, 2023

For further context, the way we're getting around this currently is by setting the attribute /http/status_code to 500 regardless of whether the context is for a HTTP server or not.

It seems to be the only way to make the trace UI render it as an error.

@andypwarren
Copy link
Author

Hi @dashpole, is there any update on this?

@dashpole
Copy link
Contributor

The cloud trace folks are aware of the issue, and suggested the same workaround pointed out above: #730 (comment). I'm not sure about timelines, but i'll post here when there are updates.

@shraddhaag
Copy link

+1. We are facing this problem as well. Thanks for pointing to the workaround!

@aabmass
Copy link
Contributor

aabmass commented Jul 29, 2024

Lowering to p3 since the workaround is sufficient

@nikolaydubina
Copy link

nikolaydubina commented Nov 6, 2024

workaround is odd. adding http status code to code just so Google Trace can recognise, is not good. for example, if there is custom span (e.g. span_kind worker or consumer in open telemetry lingo) then Google Trace is not helpful. At least being able to configure what user threats as error would help. For example, in Grafana this is possible.

  1. using custom solution just so Google Trace can work, is extra effort on developers and not good for cross-vendor compatibility
  2. Open Telemetry already defined very simple and minimalistic Ok and Error span statuses, that would work for any language, out of the box, for all systems (that use open telemetry). why not to use it?

I spent some time to find where the error happen buy looking at the span attributes and finding grpc status 7. no colors at all, all blue. very hard to read. Google Trace needs to improve.

finding error traces, is very important user feature. I hope you guys at Google realise that if this does not work, this is 👎🏻 for Google Trace Explorer product.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: p3
Projects
None yet
Development

No branches or pull requests

6 participants