Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replication audit: perform spot checks of cloud archives for entire moab fixity #1076

Open
jmartin-sul opened this issue Aug 17, 2018 · 1 comment
Labels
enhancement replication replication related questions or issues

Comments

@jmartin-sul
Copy link
Member

currently, our replication audit code compares the checksum we have stored for an archive part with the checksum the S3 provider stores in their metadata for the zip part (see PreservationCatalog::S3::Audit#compare_checksum_metadata).

however, the checksum stored in AWS metadata is just the one we computed and provided to them, so we're only checking to see that the metadata hasn't drifted between the two sources. this check is cheap to do, since we're already reaching out to the AWS to see if the archived part is still available from the cloud as expected. but it's also not a super-meaningful check to have pass.

more meaningful would be random spot-checks of archive contents for fixity. that is, randomly pull down archived copies every so often. make sure the checksums we recompute for the retrieved parts match the checksums we have stored, and that the internal checksums all match the content in the Moab when the zip parts are put back together and re-inflated. we don't want to do that for every zip during the course of regular replication auditing, because that'd be expensive, and overkill.

but some occasional retrieval of content and re-computation of checksums would provide extra peace of mind that our replication strategy is working and that the cloud archives will be usable if needed.

@jmartin-sul jmartin-sul added replication replication related questions or issues enhancement labels Aug 17, 2018
@ndushay ndushay changed the title replication audit: perform spot checks of cloud archives for fixity replication audit: perform spot checks of cloud archives for entire moab fixity Dec 2, 2019
@jmartin-sul
Copy link
Member Author

@edsu also observes that there's AWS facility for getting checksums of a thing that's already stored, which might be another way to address this: https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

also, something something fargate task that checksums within AWS infra and doesn't incur egress charges.

either way, good to keep on the radar, but likely out of scope for the 2022 maintenance work, which is more about making what's there more maintainable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement replication replication related questions or issues
Projects
None yet
Development

No branches or pull requests

1 participant