Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does the transfer-object robot verify that versionMetadata exists? #348

Open
jcoyne opened this issue Oct 18, 2021 · 7 comments
Open
Assignees
Labels
question Further information is requested SDR improvements for June-July 2022 work cycle tech debt

Comments

@jcoyne
Copy link
Contributor

jcoyne commented Oct 18, 2021

https://github.com/sul-dlss/preservation_robots/blob/main/lib/robots/sdr_repo/preservation_ingest/transfer_object.rb#L33

I don't think this is necessary. We already verify the bag in dor-services-app before kicking off the workflow:
https://github.com/sul-dlss/dor-services-app/blob/2ac9d78e999eb9a738a8c887d0b6e2f0ce726730/app/jobs/preserve_job.rb#L23-L27
https://github.com/sul-dlss/dor-services-app/blob/2ac9d78e999eb9a738a8c887d0b6e2f0ce726730/app/services/sdr_ingest_service.rb#L39

Let's try switching to use contentMetadata.xml instead.

@andrewjbtw
Copy link

A bag can be valid without a versionMetadata.xml so if you have to have a versionMetadata.xml then you have to check for that file specifically. If you don't have to have a versionMetadata.xml (or any other specific file within the /data directory bag), you don't need to check for it. I would guess that there was or maybe still is a dependency on having versionMetadata.xml in the bag so they put in a check for that specific file.

@jcoyne
Copy link
Contributor Author

jcoyne commented Oct 19, 2021

@justinlittman
Copy link
Contributor

Is there an action on this ticket?

@jcoyne
Copy link
Contributor Author

jcoyne commented Jan 31, 2022

I think we want to figure out why versionMetadata.xml is the target and if we can switch it to be something else, because we are probably not going to generate versionMetadata.xml in the future.

@jmartin-sul jmartin-sul self-assigned this Feb 2, 2022
@jmartin-sul
Copy link
Member

jmartin-sul commented Feb 8, 2022

here's what i found when looking into this yesterday:

  • i don't know that i can see why it's needed. the only references to it in pres robots are in lib/deposit_bag_validator.rb and lib/robots/sdr_repo/preservation_ingest/transfer_object.rb
    • seems like we just really want to be sure we have it? it's the only fedora datastream among the REQUIRED_BAG_FILES :
      REQUIRED_BAG_FILES = [
      DATA_DIR_BASENAME,
      'bagit.txt',
      BAG_INFO_TXT_BASENAME,
      "#{MANIFEST}-#{REQUIRED_MANIFEST_CHECKSUM_TYPE}.txt",
      "#{TAGMANIFEST}-#{REQUIRED_MANIFEST_CHECKSUM_TYPE}.txt",
      VERSION_ADDITIONS_BASENAME,
      VERSION_INVENTORY_BASENAME,
      VERSION_METADATA_PATH
      ].freeze
  • i don't see any reference to it in the moab-versioning gem either. seems like moab-versioning doesn't care, as long as there's a valid bag to work with.
  • hard to say what the intent was, whether it was just a proxy for "did the datastreams get exported?" (under the assumption that one's always there? sometimes seen missing? 🤷) or something else entirely
  • maybe we should test accessioning and versioning an object without versionMetadata.xml, on a branch w/ the check disabled, to see if it works
  • as @jcoyne points out, there's a check done on DSA side already, and the check from pres robots is before the tarpipe to the deposit dir, it just sshes to dor-services to do the check that DSA already just did (so this is not analogous to pres cat verifying a new moab version after it's been created by copying files from the deposit bag to the new moab version, which involves bits being written anew from the deposit dir to the storage root)
  • people to ask if we really wanted to understand the intent: ben, lynn, richard anderson

@ndushay
Copy link
Contributor

ndushay commented Jun 15, 2022

@jcoyne - are we still generating versionMetadata.xml?

What should we use instead? can we use the cocina json file? Or ???

@ndushay ndushay added tech debt SDR improvements for June-July 2022 work cycle question Further information is requested labels Jun 15, 2022
@jcoyne
Copy link
Contributor Author

jcoyne commented Jun 16, 2022

Yes, we are still generating versionMetadata.xml, but mostly because we have this check. I think switching the check to look for bagit.txt would be better (cocina.json would also work). But I'm not sure we really need this check at all because we already have a validator that ensures the bag has all the required files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested SDR improvements for June-July 2022 work cycle tech debt
Projects
None yet
Development

No branches or pull requests

5 participants