Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save checksums to manifest textfile/md5sum #90

Open
kieranjol opened this issue Nov 30, 2021 · 7 comments
Open

Save checksums to manifest textfile/md5sum #90

kieranjol opened this issue Nov 30, 2021 · 7 comments

Comments

@kieranjol
Copy link

Hi - this might be similar to #63, but I was wondering if you could add an option to save checksums to a manifest textile, perhaps using a style similar to md5sum
checksum relative/path/to/file

I work in the National Library of Ireland and we like to get donors to generate checksums on their end before sending the files to us. Quickhash is so great for this as one can just drag and drop into a GUI. Being able to save a list of files would be ideal. So far, I've just tested it with small donations where two files were transferred, and checksums were copy/pasted.

So a sidecar file might contain checksums like this:

ae8c5d5f6288964d65c13459f5334258  storage/2020-09-21/rokyu2aberfc5jto/indexes/index-20200921183136-CXOKODVD.cdxj
e9fe18d9eeafddf6712dc95199ca0611  behaviors/dist/soundcloudArtistBehavior.js
da45bba92fc439ba6c547e07ec64453d  data/Webrecorder-Data/behaviors/dist/instagramUserBehavior.js

What do you think?

@kieranjol kieranjol changed the title Save checksums to manifest textile/md5sum Save checksums to manifest textfile/md5sum Nov 30, 2021
@kieranjol
Copy link
Author

I just noticed this issue #70 which would go some way to resolving my issue. I think that the md5sum text manifests seems to be quite a common approach though. It's also used in the library of congress bagit standard and it would allow for interoperability with other checksumming tools.

@tedsmith
Copy link
Owner

tedsmith commented Dec 6, 2021

Hiya. Not sure I follow. The ability to export to CSV\TSV exists, that hold the filename, path etc. What does the manifest textfile suggestion add over and above that? Sorry for my confusion.

@kieranjol
Copy link
Author

So sorry, I actually replied to this yesterday and I must not have submitted! The ability to create text checksums that are similar to md5sum would allow for greater interoperability with other checksums tools like the sum tools, hashdeep, and even a bunch of custom python scripts that exist within the digital preservation community.

@tedsmith
Copy link
Owner

OK. But I still don't quite follow I'm afraid. The user can output the results of QH to CSV or HTML where the filenames, paths, and hash(es) are saved. I'm not sure how your suggestion of a "manifest" differs? Isn't the output from QH considered as a "manifest"?

@tedsmith tedsmith reopened this Dec 21, 2021
@kieranjol
Copy link
Author

Your example would be a manifest as well, but you would not be able to use that manifest with any other tool than quick hash (without editing the manifest) in order to validate the checksums. But using the md5sum style of checksum relative/file/path allows for other tools to use these particular kinds of manifests.
This is what the Library of Congress Bagit software produces:

ae8c5d5f6288964d65c13459f5334258  storage/2020-09-21/rokyu2aberfc5jto/indexes/index-20200921183136-CXOKODVD.cdxj
e9fe18d9eeafddf6712dc95199ca0611  behaviors/dist/soundcloudArtistBehavior.js
da45bba92fc439ba6c547e07ec64453d  data/Webrecorder-Data/behaviors/dist/instagramUserBehavior.js

and there are several tools out there that would be able to validate those checksums.
My hope for quickhash is that if it can also create these types of manifests, it will be an even more useful tool for digital preservation activities within libraries and archives. I have already asked a donor to use quickhash to create checksums locally before zipping and uploading to the cloud for transfer. It would be ideal if a more interoperable checksum manifest could be used, and I think the main format is that md5sum (or shamus) style.

@kieranjol
Copy link
Author

Ok, coming back to this again as another archive has the following use case:

  • Various stakeholders need to deposit files with checksums, which are then verified upon delivery.
  • I'm hoping that quickhash will be tool that's used as it's open-source and cross-platform with a GUI.
  • The current CSV reports use absolute paths, so if relative paths were used, the checksum manifest that is created could be easily verified when the files are delivered to the archive.
  • If there was an option to just store checksum relative/file/path then multiple tools in the archive could be used to validate the checksum manifest, as that style of manifest is the most commonly used in archival settings.

To see a practical example, imagine getting the contents of this zipfile as an online or hard drive delivery. Being able to have the manifest along with the files like this allows for many tools to validate the delivery. I'm hoping that QuickHash can perform this rather than the CSV/HTML absolute path reports. Teracopy does something similar to the manifest I'm mentioning, but it's windows only and is closed-source.
Archive.zip

@kieranjol
Copy link
Author

Hi Ted, just checking in on this again to see if it's something you're interested in supporting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants