Skip to content

Using preassembly for "self deposit" content (originating from H2)

Andrew Berger edited this page Sep 30, 2022 · 3 revisions

Background

In the vast majority of cases, content deposited into the SDR via the self-deposit application at sdr.stanford.edu should be managed through the self-deposit application. When possible, any updates to self-deposited items (including metadata updates), should be made through that application.

Nevertheless, there are a small number of situations where items originally submitted via self-deposit must be updated in Preassembly. In these situations, the items should not be updated in the self-deposit application again.

The two primary situations where it's necessary to use Preassembly for self-deposited items are:

  1. Large file deposits (i.e. where the volume of data is too large for the self-deposit application to handle)
  2. Items previously updated via Argo or Preassembly (e.g. items with customized metadata added via Argo, or items that have already gone through the large file deposit workflow)

Instructions

These instructions assume:

  • a druid has been assigned already
  • if this is a new deposit, the depositor should make their initial deposit using the "supply files via Globus" option in the H2 interface. This will create a druid, without files, that can then be updated via Preassembly.
  • if this is an update to an existing item, then by definition it has a druid
  • the depositor has provided the new or updated files to be accessioned either through Globus or another online service
  • the depositor has confirmed that the files they have provided are complete and constitute a stable version

These instructions do not assume:

  • that when updating an existing item, the depositor will provide a copy of every file, including unchanged files: if they provide only the new and/or updated files, it will be necessary to retrieve the unchanged files from SDR so that they will be included in the new version.

Prepare the content in a staging folder

  1. Using the Preassembly server, create a folder named for the druid in the /dor/staging hierarchy: /dor/staging/[druid]/content
  2. Gather the file(s) from the relevant source location(s). This might require copying from more than one location:
  • For the large-file deposit workflow: get the despositor's files from the Globus server

    • Using rsync or scp, copy the files from their Globus folder location to the druid folder
      • On the Globus server, using the command line, the files will be found under the /tank/globus/restricted/sdr_ingest/ directory hierarchy (rather than through the Globus browser URLs)
    • This requires access to the Globus server (log in at globus@sul-globus-prod) so that the files can be copied directly from server to server
  • For non-Globus submissions: get the depositor's files from the location provided

    • We would like people to use Globus, but smaller files could be provided in a number of ways outside of Globus (email, attached to the JIRA ticket, etc.)
    • In some circumstances, it might be necessary to pull in large files in from an FTP space and special arrangements must be made (i.e. a researcher who cannot install Globus on a lab's server)
    • Copy these files to the /dor/staging/[druid]/content folder
  • For updates to existing objects: if needed, retrieve unchanged files from SDR preservation core

    • Often a depositor will provide only the new and/or modified files when requesting an update to an existing object
    • If there are additional files in the object that should remain unchanged in the new version, these files must be retrieved from SDR preservation core and staged alongside the other files in /dor/staging/[druid]/content
    • Preassembly will make a new version that contains exactly the files that are staged for accessioning: therefore, all files, including unchanged files, must be staged when making a new version
    • If the depositor requests that any files be removed from the updated version of an object, simply do not include these files in the staging folder. As long as these files are not staged for Preassembly, they will be excluded from the updated version. They will nevertheless remain in preservation in case someone needs to examine a previous version.

Prepare the manifest.csv

  1. Following normal preassembly practice, create a manifest.csv file and place it in the top level of the staging folder: /dor/staging/[druid] The manifest should contain the following lines:
druid,object
[druid],content

Run Preassembly

  1. Fill out the Preassembly web form with the following settings
  • Project name: [druid]_v[version number] (e.g. "bc123df4567_v1")
  • Job type: Discovery report (only run Preassembly after checking the report)
  • Content Structure: File
  • Bundle dir: /dor/staging/[druid]
  • Staging style symlink: leave unchecked
  • Content metadata creation: Default
  • I have a file manifest: leave unchecked
  • Preserve, Shelve, Publish settings: choose Preserve=Yes, Shelve=Yes, Publish=Yes -- this ensures that the files are sent to the Purl
Clone this wiki locally