Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: error cause for degraded pipeline might not be correct #1659

Open
lovromazgon opened this issue Jun 14, 2024 · 0 comments
Open

Bug: error cause for degraded pipeline might not be correct #1659

lovromazgon opened this issue Jun 14, 2024 · 0 comments
Labels
bug Something isn't working
Milestone

Comments

@lovromazgon
Copy link
Member

Bug description

Under certain conditions, it can happen that the error that supposedly caused a degraded pipeline to stop is not the actual error that caused the stop.

Let's imagine a running pipeline that is continuously processing records. Suddenly the source connector (plugin) experiences an error and returns an error. The issue is that returning an error closes the bidirectional stream between the connector and Conduit, meaning that records can't be passed to Conduit anymore, but also that acknowledgments can't be passed back to the connector. If there are still unprocessed records in the pipeline we essentially have a race condition at our hands - either the source node will first see the closed stream when trying to read the next record, or the acker node will experience an error when it tries to send an acknowledgment to the source connector. If the acker node is the first one to get that error, it will stop running and return the error, which will then be stored as the error that caused the stop. While that's technically correct, that error will contain just io.EOF which is not useful for the user, as it only signals that the stream stopped, and not why it stopped. The actual reason for the stop is only received when reading from the stream in the source node. That error will be logged, but it won't be seen anywhere else (e.g. in API responses or the UI).

Steps to reproduce

I have a failing test that consistently reproduces this error, I will link it here once I push the code.

Version

v0.10.1

@lovromazgon lovromazgon added bug Something isn't working triage Needs to be triaged labels Jun 14, 2024
@lovromazgon lovromazgon removed the triage Needs to be triaged label Jun 17, 2024
@lovromazgon lovromazgon added this to the Next milestone Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant