-
-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't panic when logs waits for more than 5 seconds #947
Conversation
✅ Deploy Preview for testcontainers-go ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
@martin-sucha thanks for this fix, I was taking a look at #366 yesterday, and they seem related. Could you check if they are overlapping? 🙏 |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
@mdelapenya It seems to me that these two PRs are not overlapping. #947 handles retries/removing of the panic and interrupting the HTTP request when context is canceled (internally in the worker goroutine) while #366 handles API changes like only one producer is allowed. I think both should be merged. It seems there is a small code conflict as both PRs touch the stop channel, but that should be easy to resolve (#366 creates new stop channel each time the method is called and #947 uses a context to handle cancellation inside the worker goroutine, the merged result would create the stop channel each time the method is called and use that to create the context used for cancellation). |
docker.go
Outdated
ctx, cancel := context.WithCancel(ctx) | ||
|
||
go func() { | ||
<-c.stopProducer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably too picky, as the stopProducer
is private and not accesible from client code, but should we wait for a true
value in the stopProducer, or simply reading from the channel is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading from the channel should be enough, now with the changes in main it is even guaranteed that the write will happen once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment when reading the stopProducer channel, but apart from that, this PR LGTM. Will wait for your feedback before merging it.
Thank you very much for your time here! 👏
@martin-sucha we merged #366, so I think you can continue with this one resolving conflicts 🙏 |
@martin-sucha if you mind, I'm going to fetch this PR and resolve the conflicts myself in top of your commits. Is that ok for you? |
92a7343
to
ff6912f
Compare
Sure, no problem. In the mean time I started signing all my commits, so I've rebased this PR anyway and signed the commit as well. |
This removes panic when logs endpoint takes more than 5 seconds to respond. The panic happened at least with podman when no new logs appear when using follow and since parameters. We keep retrying until the context is canceled (the retry request would fail anyway with canceled context) or the producer is stopped, whichever comes first. This makes the retry behavior consistent with closed connections handling. This should fix testcontainers#946
ff6912f
to
97ebbaa
Compare
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
@martin-sucha thanks for the fix and your patience while reviewing this PR! |
…ers#947) This removes panic when logs endpoint takes more than 5 seconds to respond. The panic happened at least with podman when no new logs appear when using follow and since parameters. We keep retrying until the context is canceled (the retry request would fail anyway with canceled context) or the producer is stopped, whichever comes first. This makes the retry behavior consistent with closed connections handling. This should fix testcontainers#946
Hey @martin-sucha, I'm experimenting certain flakiness in a log_consumer test after this PR has been merged. This is how I am able to reproduce it:
for x in {1..20}; do go test -timeout 600s -run ^Test_StartStop$ github.com/testcontainers/testcontainers-go -count=1; done No matter if you see log traces about Ryuk being disabled, as I can reproduce it in both cases. Expected result:
Actual result:
I'm tempted to revert the commit and give the chance to debug this, so that we can perform a new release, but I wanted to check with you first. |
@martin-sucha I think I have a clue on what's going on: the test does the assertions and because we changed here how the log producers behave, it's very likely that the messages are not there yet at test time. If you take a look at https://github.com/testcontainers/testcontainers-go/pull/947/files#diff-b68159b6168f099a639e8c6f96d75f23333ab0f0505ee22a98268748db17ae4cR635-R637, we are removing the channel that was used to receive the signal that the log producer was stopped. Therefore, we are only waiting for the Done method of the context: https://github.com/testcontainers/testcontainers-go/pull/947/files#diff-b68159b6168f099a639e8c6f96d75f23333ab0f0505ee22a98268748db17ae4cR663 Those are my initial suspects, but it's not easy to debug. Wdyt, anything on the top of your head? |
…tcontainers#947)" This reverts commit 5185956.
* main: docs: enrich docs for modules (testcontainers#1167) chore: prepare for next minor development cycle (0.21.0) chore: use new version (v0.20.1) in modules and examples Revert "fix: don't panic when logs waits for more than 5 seconds (testcontainers#947)" (testcontainers#1164) fix: define a two-phase release process (testcontainers#1163) ci(lint): enable misspell and gci linters (testcontainers#1162)
@mdelapenya I've finally looked at the issue and it seems to me that the failing test was wrong: it did not wait for the log lines to appear. Please see #1278 |
This should fix #946
What does this PR do?
This removes panic when logs endpoint takes more than 5 seconds to respond. The panic happened at least with podman when no new logs appear when using follow and since parameters.
We keep retrying until the context is canceled (the retry request would fail anyway) or the producer is stopped, whichever comes first. This makes the retry behavior consistent.
Why is it important?
Panicking the whole test suite breaks all my tests if the container ends early or there is some temporary error fetching the logs.
Related issues