Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opampextension] fix blocking agent shutdown due to unclosed channel #36754

Merged
merged 13 commits into from
Dec 11, 2024

Conversation

bacherfl
Copy link
Contributor

@bacherfl bacherfl commented Dec 10, 2024

Description

The changes introduced in #35892 seemed to have introduced some flakyness in the opampsupervisor e2e tests, as the shutdown of the opamp agent waits for the component health loop to end. Due to an unclosed channel within the opamp agent however, the agent does not properly shut down, and the supervisor runs into a timeout before ultimately sending a SIGKILL to the agent process.
Closing the channel in the Shutdown method of the opamp extension fixes that and the agent gets shut down properly upon the reception of the SIGINT signal

Link to tracking Issue:

Fixes #36764

Testing

This fixes the failing test mentioned in the issue (#36764)

@bacherfl bacherfl changed the title [opampextension] do not wait for componentStatusLoop to end during shutdown chore(opampextension) do not wait for componentStatusLoop to end during shutdown Dec 10, 2024
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
@bacherfl bacherfl changed the title chore(opampextension) do not wait for componentStatusLoop to end during shutdown [chore][opampextension] do not wait for componentStatusLoop to end during shutdown Dec 10, 2024
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
@bacherfl bacherfl changed the title [chore][opampextension] do not wait for componentStatusLoop to end during shutdown [opampextension] fix blocking agent shutdown due to unclosed channel Dec 11, 2024
@bacherfl bacherfl marked this pull request as ready for review December 11, 2024 07:22
@bacherfl bacherfl requested a review from a team as a code owner December 11, 2024 07:22
@songy23 songy23 added ready to merge Code review completed; ready to merge by maintainers ci-cd CI, CD, testing, build issues labels Dec 11, 2024
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bacherfl

@codeboten
Copy link
Contributor

Pinging code owners @portertech @evan-bradley @tigrannajaryan for a final review

@evan-bradley evan-bradley added the Run Windows Enable running windows test on a PR label Dec 11, 2024
@evan-bradley
Copy link
Contributor

One of the checks appears to be stuck, and I can't merge in main, so I put the Run Windows label on this PR, which isn't a bad idea anyway since the Supervisor tests deal with process management.

@codeboten
Copy link
Contributor

One of the checks appears to be stuck, and I can't merge in main, so I put the Run Windows label on this PR, which isn't a bad idea anyway since the Supervisor tests deal with process management.

It looks like the tests are running now, thanks for the reviews!

@codeboten
Copy link
Contributor

Failing test might be related to #36772

@evan-bradley evan-bradley removed the Run Windows Enable running windows test on a PR label Dec 11, 2024
@codeboten codeboten merged commit 338955c into open-telemetry:main Dec 11, 2024
206 of 210 checks passed
@github-actions github-actions bot added this to the next release milestone Dec 11, 2024
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…pen-telemetry#36754)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

The changes introduced in
open-telemetry#35892
seemed to have introduced some flakyness in the opampsupervisor e2e
tests, as the shutdown of the opamp agent waits for the component health
loop to end. Due to an unclosed channel within the opamp agent however,
the agent does not properly shut down, and the supervisor runs into a
timeout before ultimately sending a SIGKILL to the agent process.
Closing the channel in the Shutdown method of the opamp extension fixes
that and the agent gets shut down properly upon the reception of the
SIGINT signal

#### Link to tracking Issue:

Fixes open-telemetry#36764

#### Testing

This fixes the failing test mentioned in the issue (open-telemetry#36764)

---------

Signed-off-by: Florian Bacher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cd CI, CD, testing, build issues extension/opamp ready to merge Code review completed; ready to merge by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Failed test][cmd/opampsupervisor] TestSupervisorStopsAgentProcessWithEmptyConfigMap constantly failed
8 participants