Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't call OnConnectFailed when failed to connect during shutdown #113

Merged

Conversation

cpheps
Copy link
Contributor

@cpheps cpheps commented Jul 27, 2022

Fixes #111

Once Stop() is called on the websocket client we don't call OnConnectFailed handler if a dial fails.

Tested this fix on a system that was repeatedly showing the OnConnectFailed handler called during shutdown.

@cpheps cpheps requested a review from a team July 27, 2022 12:21
@codecov
Copy link

codecov bot commented Jul 27, 2022

Codecov Report

Merging #113 (aaac67e) into main (be3f3ff) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #113   +/-   ##
=======================================
  Coverage   74.95%   74.95%           
=======================================
  Files          22       22           
  Lines        1633     1633           
=======================================
  Hits         1224     1224           
  Misses        307      307           
  Partials      102      102           
Impacted Files Coverage Δ
client/wsclient.go 84.56% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be3f3ff...aaac67e. Read the comment docs.

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpheps can you add a test that creates the race condition and fails without this fix, but passes after the fix?

@cpheps
Copy link
Contributor Author

cpheps commented Jul 27, 2022

@tigrannajaryan I can try. I'm not sure how possible it is since you need to Stop the opamp client at certain point. I don't know if it's possible right now to force the client to block so we can hit that point.

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Jul 27, 2022

@tigrannajaryan I can try. I'm not sure how possible it is since you need to Stop the opamp client at certain point. I don't know if it's possible right now to force the client to block so we can hit that point.

OK, please try. Ideally we want to clearly demonstrate that the PR really fixes the problem and an automated test helps a lot.

If you find that it is impossible to write a test we can merge as is.

@cpheps
Copy link
Contributor Author

cpheps commented Jul 27, 2022

@tigrannajaryan I've been messing around with this and I can't seem to get it consistent. I was able to write a test that triggered the race condition like 1 in 5 runs. The race conditions lies inside the runUntilStopped which runs concurrently on start. We have no injection point currently to have it hang.

The test I have was starting the client, waiting a few milliseconds, stopping it, the checking if OnConnectFailedFunc was called. This is probably a good test just in general but it doesn't isolate/prove the race condition was solved.

There's only one other test for the websocket implementation that shows OnConnectFailedFunc is called when the sever closes the connection. So maybe a test showing we don't call OnConnectFailedFunc on a client connection close is good, but like I said doesn't highly the race condition fix.

@tigrannajaryan
Copy link
Member

OK, let's merge this as is and maybe we can add more tests as part of #99

@tigrannajaryan tigrannajaryan merged commit e3007c5 into open-telemetry:main Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Websocket Race Condition when closing
2 participants