Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade OTel from v0.87.0 to v0.96.0 #6725

Merged
merged 4 commits into from
Mar 22, 2024
Merged

Upgrade OTel from v0.87.0 to v0.96.0 #6725

merged 4 commits into from
Mar 22, 2024

Conversation

ptodev
Copy link
Contributor

@ptodev ptodev commented Mar 19, 2024

PR Description

OTel upgrade. I went through the changes in the OTel changelog. As far as I can tell, this PR updates all the relevant Agent bits. I hope I didn't miss anything.

Which issue(s) this PR fixes

Fixes #6389

Notes to the Reviewer

I tested this locally with the Static and River files according to our docs.

There are some additional things I could do:

  • Add tests for the new config arguments.
  • There are a few TODOs in the changelog which I need to take a look at.
  • When the fork PR is merged, I need to update the go.mod file again.
  • Update config converters?

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@ptodev ptodev requested a review from clayton-cornell as a code owner March 19, 2024 13:49
@ptodev ptodev force-pushed the ptodev/update-otel branch from 8f97b37 to a1d9f72 Compare March 19, 2024 13:54
@wildum wildum self-assigned this Mar 20, 2024
Copy link
Contributor

@wildum wildum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this! I only have a few comments, nothing major. With this review I only checked the added code. I did not go through all opentelemetry release changes to see if something was missing

| `metrics_flush_interval` | `duration` | How often to flush generated metrics. | `"15s"` | no |
| `namespace` | `string` | Metric namespace. | `""` | no |
| `resource_metrics_cache_size` | `number` | The size of the cache holding metrics for a service. | `1000` | no |
| `resource_metrics_key_attributes` | `list(string)` | Span resources with the same values for those resource attributes are aggregated together. | `[]` | no |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I did not find this sentence very clear "Span resources with the same values for those resource attributes are aggregated together."

internal/component/otelcol/connector/spanmetrics/types.go Outdated Show resolved Hide resolved
* [otelcol.connector.spanmetrics] Add a new `events` metric.
https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27451
* [otelcol.connector.spanmetrics] A new `max_per_data_point` argument for exemplar generation.
* https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/22620
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this link is wrong, it should be open-telemetry/opentelemetry-collector-contrib#29242

@@ -93,6 +102,9 @@ func (args *Arguments) Validate() error {
if args.Client.Endpoint == "" && args.TracesEndpoint == "" && args.MetricsEndpoint == "" && args.LogsEndpoint == "" {
return errors.New("at least one endpoint must be specified")
}
if args.Encoding != EncodingProto && args.Encoding != EncodingJSON {
return errors.New("invalid encoding type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the error also say what the valid types are?

`max_conns_per_host` | `int` | Limits the total (dialing,active, and idle) number of connections per host. | `0` | no
`idle_conn_timeout` | `duration` | Time to wait before an idle connection closes itself. | `"90s"` | no
`disable_keep_alives` | `bool` | Disable HTTP keep-alive. | `false` | no
`http2_read_idle_timeout` | `duration` | Timeout after which a health check using ping frame will be carried out if no frame is received on the connection. | `0a` | no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could also add that if the timeout is unset or set to 0, then no health check will be performed

@wildum
Copy link
Contributor

wildum commented Mar 20, 2024

Found a missing change: in extract-field-block.md we say that "The source of the labels or annotations. Allowed values are pod and namespace." But this change in v0.89.0 adds the value "node": open-telemetry/opentelemetry-collector-contrib#28570

@wildum
Copy link
Contributor

wildum commented Mar 20, 2024

Scrolled through the rest of the open telemetry contrib release, I did not see any other missing changes, well done 👍 For the future I would recommend having atomic commits for every change to make the review easier

@ptodev
Copy link
Contributor Author

ptodev commented Mar 21, 2024

@wildum Thank you for the review! I addressed some of your comments. I plan to address the other ones in a subsequent PR.

I also took out these lines from the CHANGELOG:

  * [otelcol.connector.servicegraph] Additional debug metrics
  <!-- TODO: Document those metrics -->
https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/29917
  * [otelcol.processor.filter] Additional debug metrics
  <!-- TODO: Document those metrics -->
https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/30736
  * Scope name for all generated Meter/Tracer funcs now includes full package name
  <!-- TODO: Does this affect the Agent's metrics and logs? -->
https://github.com/open-telemetry/opentelemetry-collector/issues/9494
  * [otelcol.connector.servicegraph] Measure latency in seconds instead of milliseconds
  <!-- TODO: This is a breaking change for Static mode? -->
  <!-- TODO: Is this a breaking change for Flow? -->
https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27488

I will review them and reintroduce them in a subsequent PR.

@ptodev ptodev force-pushed the ptodev/update-otel branch from a1d9f72 to 40a289a Compare March 21, 2024 10:43
Copy link
Contributor

@wildum wildum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second review, only two little comments

@@ -81,6 +82,7 @@ func (args *TLSSetting) Convert() *otelconfigtls.TLSSetting {
MinVersion: args.MinVersion,
MaxVersion: args.MaxVersion,
ReloadInterval: args.ReloadInterval,
CipherSuites: args.CipherSuites,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we create a new slice instead of copying it like this?

@@ -1159,6 +1159,8 @@ remote_write:
scopes: ["api.metrics"]
timeout: 2s
`,
//TODO(ptodev): Look into why we need to add a "cipher_suites: []" explicitly.
// The expected config should unmarshal it to [], but instead it sets it to nil.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems correct to me that the config would unmarshal it to nil. Else maybe you can explicitly create a list during the convert part (converting from nil to {}string[]). I think that we should avoid adding "cipher_suites: []" explicitly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems correct to me that the config would unmarshal it to nil.

I used a debugger in one of the Collector's unit tests, to see if it's nil or []. It was [], so I assumed that this is the correct thing to do. I don't know if the unit tests deviates from their prod code path, but I guess it doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way Agent's unit test creates the Collector config isn't the same way the Collector does it. Something is different, but I'm not sure what exactly.

@ptodev ptodev force-pushed the ptodev/update-otel branch from 8f0d269 to 636041f Compare March 22, 2024 09:18
Copy link
Contributor

@wildum wildum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's ship it!

@wildum
Copy link
Contributor

wildum commented Mar 22, 2024

@clayton-cornell could you have a look at the docs when you have time? We are merging the branch because this PR is already very big and we don't want it to be stuck longer. Some of my comments and your future comments will be addressed in a subsequent PR

@ptodev ptodev merged commit da363b7 into main Mar 22, 2024
10 checks passed
@ptodev ptodev deleted the ptodev/update-otel branch March 22, 2024 09:34
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Apr 22, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade OpenTelemetry to at least versions v1.23.1/v0.94.1
2 participants