fix(sftp): make sure to delete last file when `watch` and `delete_on_finish` are enabled #3037

ooesili · 2024-11-26T19:45:26Z

Questions

I believe I have fixed the underlying issue, but I am not sure how to write an integration test to verify the fix. I have created a new integration test function with a TODO comment on where I got stuck. The questions I have around this are:

My plan was to start a pipeline with watch and delete_on_finished enabled, the use an SFTP client directly to inspect which files exist on the server to make sure they are all deleted after the pipeline runs. However, I'm not sure how to actually run the pipeline. Is too specific of a test to run using integration.StreamTests(), and if not, could you point me in the right direction?
The other pattern I've seen would be to call newSFTPReaderFromParsed() directly from the tests then use Connect(), and ReadBatch() to interact with the plugin. However this plugin appears to be unusually structured in the way that it progresses through the input files. What it does is finds the first file in Connect() and sets up the scanner for the file. In ReadBatch(), when the file is exhausted, ReadBatch() returns service.ErrNotConnected which will cause the engine to re-run Connect() which advances to the next file. If the plugin only required Connect() to be called once, I would be happy to drive the plugin directly in the tests, but because of the reconnection logic required, I was hesitant to reimplement the reconnection loop in the tests. Is there a utility somewhere that I can use from a test that implements the reconnect logic?

rockwotj · 2024-11-26T19:51:09Z

I don't think there is a utility so either you need to do option 1 or implement the retry logic - which I don't think should be too bad?

Here's the code that drives this in benthos AFAIK: https://github.com/redpanda-data/benthos/blob/dad70374cd8fb323f0c7f47452498ea94c2ed7aa/internal/component/input/async_reader.go#L115

The pipeline option (number 1) might be the best route, but I'm not too familiar with that test helper myself.

This commit reduces the scope of critical sections guarded by scannerMut to remove a deadlock that causes the last file to not be deleted when the SFTP input is used with watching enabled.

`(*watcherPathProvider).Next()` currently uses recursion to loop until a path is found. This commit refactors that function to use a for loop instead which is more straight forward to read.

This integration test makes sure that when `delete_on_finish` is true and watching is enabled that we delete every file.

ooesili · 2024-12-03T22:18:51Z

internal/impl/sftp/integration_test.go

+	builder := service.NewStreamBuilder()
+	require.NoError(t, builder.SetYAML(config))


@rockwotj

I don't think there is a utility so either you need to do option 1 or implement the retry logic - which I don't think should be too bad?

Here's the code that drives this in benthos AFAIK: https://github.com/redpanda-data/benthos/blob/dad70374cd8fb323f0c7f47452498ea94c2ed7aa/internal/component/input/async_reader.go#L115

The pipeline option (number 1) might be the best route, but I'm not too familiar with that test helper myself.

Knowledge from the great and powerful @mihaitodor

ooesili added bug inputs Any tasks or issues relating specifically to inputs labels Nov 26, 2024

ooesili requested review from mihaitodor and Jeffail November 26, 2024 19:45

ooesili self-assigned this Nov 26, 2024

ooesili force-pushed the sftp-delete-last-file branch from 83668cd to 2e47f2a Compare November 26, 2024 19:50

ooesili added 5 commits December 3, 2024 14:33

fix(sftp): fix polling logic in watcher

a36f25e

fix(sftp): fix deadlock so last file is deleted

68eba81

This commit reduces the scope of critical sections guarded by scannerMut to remove a deadlock that causes the last file to not be deleted when the SFTP input is used with watching enabled.

refactor(sftp): use for loop in watcher provider

259d12b

`(*watcherPathProvider).Next()` currently uses recursion to loop until a path is found. This commit refactors that function to use a for loop instead which is more straight forward to read.

fix(sftp): reduce mutex scope even further

18b29aa

test(sftp): add test for delete-on-finish bug

ab133f4

This integration test makes sure that when `delete_on_finish` is true and watching is enabled that we delete every file.

ooesili force-pushed the sftp-delete-last-file branch from 1bbf6fa to ab133f4 Compare December 3, 2024 21:34

ooesili marked this pull request as ready for review December 3, 2024 22:01

ooesili commented Dec 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sftp): make sure to delete last file when `watch` and `delete_on_finish` are enabled #3037

fix(sftp): make sure to delete last file when `watch` and `delete_on_finish` are enabled #3037

ooesili commented Nov 26, 2024

rockwotj commented Nov 26, 2024

ooesili Dec 3, 2024

		builder := service.NewStreamBuilder()
		require.NoError(t, builder.SetYAML(config))

fix(sftp): make sure to delete last file when watch and delete_on_finish are enabled #3037

Are you sure you want to change the base?

fix(sftp): make sure to delete last file when watch and delete_on_finish are enabled #3037

Conversation

ooesili commented Nov 26, 2024

Questions

rockwotj commented Nov 26, 2024

ooesili Dec 3, 2024

Choose a reason for hiding this comment

fix(sftp): make sure to delete last file when `watch` and `delete_on_finish` are enabled #3037

fix(sftp): make sure to delete last file when `watch` and `delete_on_finish` are enabled #3037