replication audit: perform spot checks of cloud archives for entire moab fixity #1076

jmartin-sul · 2018-08-17T23:48:58Z

currently, our replication audit code compares the checksum we have stored for an archive part with the checksum the S3 provider stores in their metadata for the zip part (see PreservationCatalog::S3::Audit#compare_checksum_metadata).

however, the checksum stored in AWS metadata is just the one we computed and provided to them, so we're only checking to see that the metadata hasn't drifted between the two sources. this check is cheap to do, since we're already reaching out to the AWS to see if the archived part is still available from the cloud as expected. but it's also not a super-meaningful check to have pass.

more meaningful would be random spot-checks of archive contents for fixity. that is, randomly pull down archived copies every so often. make sure the checksums we recompute for the retrieved parts match the checksums we have stored, and that the internal checksums all match the content in the Moab when the zip parts are put back together and re-inflated. we don't want to do that for every zip during the course of regular replication auditing, because that'd be expensive, and overkill.

but some occasional retrieval of content and re-computation of checksums would provide extra peace of mind that our replication strategy is working and that the cloud archives will be usable if needed.

The text was updated successfully, but these errors were encountered:

jmartin-sul · 2022-10-18T19:47:03Z

@edsu also observes that there's AWS facility for getting checksums of a thing that's already stored, which might be another way to address this: https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

also, something something fargate task that checksums within AWS infra and doesn't incur egress charges.

either way, good to keep on the radar, but likely out of scope for the 2022 maintenance work, which is more about making what's there more maintainable.

jmartin-sul added replication replication related questions or issues enhancement labels Aug 17, 2018

jmartin-sul mentioned this issue Oct 9, 2018

test recovery from preservation #1103

Closed

4 tasks

ndushay changed the title ~~replication audit: perform spot checks of cloud archives for fixity~~ replication audit: perform spot checks of cloud archives for entire moab fixity Dec 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replication audit: perform spot checks of cloud archives for entire moab fixity #1076

replication audit: perform spot checks of cloud archives for entire moab fixity #1076

jmartin-sul commented Aug 17, 2018

jmartin-sul commented Oct 18, 2022

replication audit: perform spot checks of cloud archives for entire moab fixity #1076

replication audit: perform spot checks of cloud archives for entire moab fixity #1076

Comments

jmartin-sul commented Aug 17, 2018

jmartin-sul commented Oct 18, 2022