Bug: error cause for degraded pipeline might not be correct #1659

lovromazgon · 2024-06-14T14:35:41Z

Bug description

Under certain conditions, it can happen that the error that supposedly caused a degraded pipeline to stop is not the actual error that caused the stop.

Let's imagine a running pipeline that is continuously processing records. Suddenly the source connector (plugin) experiences an error and returns an error. The issue is that returning an error closes the bidirectional stream between the connector and Conduit, meaning that records can't be passed to Conduit anymore, but also that acknowledgments can't be passed back to the connector. If there are still unprocessed records in the pipeline we essentially have a race condition at our hands - either the source node will first see the closed stream when trying to read the next record, or the acker node will experience an error when it tries to send an acknowledgment to the source connector. If the acker node is the first one to get that error, it will stop running and return the error, which will then be stored as the error that caused the stop. While that's technically correct, that error will contain just io.EOF which is not useful for the user, as it only signals that the stream stopped, and not why it stopped. The actual reason for the stop is only received when reading from the stream in the source node. That error will be logged, but it won't be seen anywhere else (e.g. in API responses or the UI).

Steps to reproduce

I have a failing test that consistently reproduces this error, I will link it here once I push the code.

Version

v0.10.1

The text was updated successfully, but these errors were encountered:

lovromazgon added bug Something isn't working triage Needs to be triaged labels Jun 14, 2024

simonl2002 added this to Conduit Main Jun 14, 2024

github-project-automation bot moved this to Triage in Conduit Main Jun 14, 2024

lovromazgon removed the triage Needs to be triaged label Jun 17, 2024

lovromazgon added this to the Next milestone Jun 17, 2024

lovromazgon removed this from Conduit Main Jun 17, 2024

lovromazgon mentioned this issue Jun 20, 2024

Use connector protocol v2 #1622

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: error cause for degraded pipeline might not be correct #1659

Bug: error cause for degraded pipeline might not be correct #1659

lovromazgon commented Jun 14, 2024

Bug: error cause for degraded pipeline might not be correct #1659

Bug: error cause for degraded pipeline might not be correct #1659

Comments

lovromazgon commented Jun 14, 2024

Bug description

Steps to reproduce

Version