[DAR-3846][External] Prevent upload of dataset items where the {item_path}/{item_name}
already exists in the dataset
#937
+208
−69
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Recently, I discovered that it's possible to add slots to existing items. The ability to upload multi-file items with
push
was built on the assumption that this was impossible, and has led to the following behaviour (because we name slots differently based on the merge mode):1: Upload some files as one merge mode --> No error
2: Upload the same files a different merge mode --> No error
3: Upload the same files as the same merge mode as step 2 again --> You get an error about skipping files
We've decided that this type of scenario should be blocked in darwin-py, as most users would expect deduplication validation to take place on the item-name level
Solution
Add a function to the
UploadHandler
constructor that runs the following before beginning the upload:Changelog
Prevent upload of dataset items where the
{item_path}/{item_name}
already exists in the dataset