Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/add distributed tracing support #32

Merged
merged 38 commits into from
Jan 18, 2024

Conversation

oodigie
Copy link

@oodigie oodigie commented Oct 10, 2023

Changes:

  • chore: bumped up the packaged ccsmp lib to 7.27.0.1
  • feat(SOL-103546): Implement code changes to expose context propagation methods based on ARCH
  • test: added unit tests for the new DT functions in the message_impl
  • test[SOL-103572]: Golang: Add the integration tests to support the context propagation feature
  • test: fixed some failing integration tests
  • test: fixed the failing tests in the message impl class

Comment on lines 117 to 148
// GetCreationTraceContext will return the trace context metadata used for distributed message tracing message
// creation context information across service boundaries.
// It allows correlating the producer with the consumers of a message, regardless of intermediary
// instrumentation. It must not be altered by intermediaries.
// If the content is not accessible, an empty slice will be returned and the ok flag will be false.
GetCreationTraceContext() (traceID [16]byte, spanID [8]byte, sampled bool, traceState string, ok bool)

// SetTraceContext will set creation trace context metadata used for distributed message tracing.
// Creation context considered to be immutable, and should not be set multiple times.
// If the content could not be set into the message, the ok flag will be false.
SetCreationTraceContext(traceID [16]byte, spanID [8]byte, sampled bool, traceState string) (ok bool)

// GetTransportTraceContext will return the trace context metadata used for distributed message tracing
// It allows correlating the producer and the consumer with an intermediary.
// It also allows correlating multiple intermediaries among each other.
// When no transport context is present it may return a creation context when available as
// an initial transport context.
// If the content is not accessible, an empty slice will be returned and the ok flag will be false.
GetTransportTraceContext() (traceID [16]byte, spanID [8]byte, sampled bool, traceState string, ok bool)

// SetTraceContext will set transport trace context metadata used for distributed message tracing.
// If the content could not be set into the message, the ok flag will be false.
SetTransportTraceContext(traceID [16]byte, spanID [8]byte, sampled bool, traceState string) (ok bool)

// GetBaggage will return the baggage string associated with the message
// If the content is not accessible, an empty slice will
// be returned and the ok flag will be false.
GetBaggage() (baggage string, ok bool)

// SetBaggage will set the baggage string associated with the message
SetBaggage(baggage string) error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we wanted to expose the interfaces to the world in the Message interface. I think we decided to keep the functions public but hidden to deter users from using these directly. @mgevantmakher can confirm.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I will make the update to remove these

// It allows correlating the producer with the consumers of a message, regardless of intermediary
// instrumentation. It must not be altered by intermediaries.
// If the content is not accessible, an empty slice will be returned and the ok flag will be false.
GetCreationTraceContext() (traceID [16]byte, spanID [8]byte, sampled bool, traceState string, ok bool)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all otel methods need to be removed from a public interface

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these will be removed

@mcardy
Copy link
Collaborator

mcardy commented Oct 16, 2023

@oodigie looks like there is a failing test from the last push https://github.com/SolaceDev/pubsubplus-go-client/actions/runs/6490386492/job/17626071508?pr=32 which appears to be a newly added test.

Remote Message Tests
/home/runner/work/pubsubplus-go-client/pubsubplus-go-client/test/message_test.go:698
  Published and received message with Distributed Tracing support
  /home/runner/work/pubsubplus-go-client/pubsubplus-go-client/test/message_test.go:1987
    [It] should be able to publish/receive a message with different creation context and transport context

@oodigie
Copy link
Author

oodigie commented Oct 31, 2023

The failing test has been updated

Copy link

@cjwmorgan-sol cjwmorgan-sol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far the source looks good to me but I want to compare the tests in solace otel integration project. To ensure nothing is missing.

There is a conflict with dev so that should be resolved as well.

The last jenkins run seems to be having timing issues in the integration tests on linux arm64 along with a test failure on intel linux.
The intel test failure was for test:

Main Suite.[It] DirectReceiver with a messaging service that will be disconnected terminates the receiver using async receive when disconnecting with messaging service disconnect"

Based on the time stamps in the linux arm64 test the initial failure was caused by a known issue in the solace broker where tls truststore updates cause service instabliity in the broker. With the follow tests (based on timestamps) execute immediately afterwards and even smf client connections got interrupted. The broker instability might affect more then just semp services now in the broker. After enough time connections can be established and test start passing again.

Will continue after I have a better understand the otel field behvaiour from the otel intergration project in pr https://github.com/SolaceDev/pubsubplus-opentelemetry-go-integration/pull/3.

@oodigie
Copy link
Author

oodigie commented Jan 12, 2024

Okay, thanks @cjwmorgan-sol . I have merged changes from dev into this branch and resolved the conflicts. We can investigate the failing integration tests further but I do not think they are related to the changes in this PR.

They are mostly related to OAuth/SSL - http://jenkins2/job/solacedev.pubsubplus-go-client/job/feat%252Fadd-distributed-tracing-support/25/#showFailuresLink

@oodigie oodigie requested a review from cjwmorgan-sol January 18, 2024 18:51
Copy link

@cjwmorgan-sol cjwmorgan-sol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.
The trace state reset logic is implemented in the otel integration component.
All test result are consistent with dev.

This is ready for merge.

@cjwmorgan-sol cjwmorgan-sol merged commit f246d5f into dev Jan 18, 2024
4 checks passed
@cjwmorgan-sol cjwmorgan-sol deleted the feat/add-distributed-tracing-support branch January 18, 2024 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants