Skip to content
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.

Handle stable_id collisions #83

Open
norling opened this issue Mar 31, 2023 · 0 comments
Open

Handle stable_id collisions #83

norling opened this issue Mar 31, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@norling
Copy link

norling commented Mar 31, 2023

Background:

CEGA generates file stable id's from the unencrypted file checksum. This means that files with identical content will get the same stable_id. This can cause problems, since it will prevent finalize from assigning the stable_id and continue to mapper.

For CEGA, using the same id works since they only access files by stable_id, but for Bigpicture, there's been requests to download files with the upload filename. So each file needs a correct stable_id and submission_file_path.

Possible Solution

One of the core issues is wheather multiple uploads should share the same sda.files entry. Storage deduplication can be solved by pointing to the same archive path regardless of number of items, but there is also a case to be made for having all files containing the same data use the same database entry. One option here is to move the storage information into a separate table, so that multiple "upload-files" could reference a single "storage file".

It is likely required to remove the stable_id field from the sda.files table, and instead use the file_references table to store the ID. This is partly a matter of simplifying the schema so that all potential stable IDs are handled in the same manner. If multiple sda.files-entries are used, this also allows multiple files to have the same ID when needed for FEGA.

To solve the problem of submission_file_path if one sda.files-entry is used, one solution is to add a file_path field to the file_dataset table, so that a file can have a unique path for each dataset it's part of while still only referencing one sda.files entry.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant