Skip to content

Bulk Metadata Modifications

luisgreg99 edited this page Apr 4, 2019 · 18 revisions

Sometimes we need to modify a bunch of metadata in one fell swoop. A rake task was built to facilitate this process.

Prerequisites

  • An issue to track the change request
  • A spreadsheet with the proper format and data
  • Running the rake task on the server

Scheduling

Bulk updates are scheduled to be run weekly on Thursdays. Members of the development team are responsible for self-identifying as responsible for running the weekly updates.

Spreadsheet Details

Special Considerations

  • The value in the from column must be filled in every row

  • A + value in the from column indicates that a value will be added the metadata field

  • A - value in the from column indicates that a value or values separated with | will be removed from the metadata in a multi-value field

  • The - operator is only supported for multi-value unordered fields with expected type set to text like DOI, subject, alt_title, etc. For a list of all supported fields, check Expected Type column in ScholarsArchive@OSU Metadata Application Profile

  • A * value in the from column indicates that all current values in the metadata field will be replaced

  • The to column can have a pipe-delimited (|) value to add multiple values to the metadata field

Example Data

Example 1:

Consider the following data as an example of what the rake task expects for performing bulk metadata updates:

id,from,to,property																																												
fb494b17q,http://opaquenamespace.org/ns/osuAcademicUnits/qGjPkk5M,http://id.loc.gov/authorities/names/n80017721,degree_grantors																						
8623j0508,http://opaquenamespace.org/ns/osuAcademicUnits/qGjPkk5M,http://id.loc.gov/authorities/names/n80017721,degree_grantors																						
  • id : The ID of the work to be modified
  • from : The value as it exists in the system, this value can be a * to indicate any value.
  • to : The desired change to the existing value
  • property : The property name, as tracked in the Metadata Application Profile, of the metadata value to change

Example 2

The following example uses the - operator to remove only the item with value http://id.loc.gov/authorities/names/no2011160692 from the subject property in work gm80hv32k.

Input subject_remove_uri.csv:

id,from,to,property,
gm80hv32k,-,http://id.loc.gov/authorities/names/no2011160692,subject,

Example 3

The following example uses the - and + operators to replace only the item with value http://id.loc.gov/authorities/names/no2011160692 with Technical note (Forest Products Laboratory (U.S.)) from the subject property in work gm80hv32k.

Input subject_replace_uri_with_text.csv:

id,from,to,property,
gm80hv32k,-,http://id.loc.gov/authorities/names/no2011160692,subject,
gm80hv32k,+,Technical note (Forest Products Laboratory (U.S.)),subject,

Run the rake task on the server

  • Transfer the spreadsheet, in comma-separated values format, to the server. A great directory might be ~/tmp/.
    • $ scp /path/to/file USER@SERVERNAME:/path/to/remote/server/file
  • Execute the rake task
    • $ bundle exec rails scholars_archive:bulk_update_csv csv=/path/to/remove/server/file
  • Watch the logs generated by this task
    • $ ls -ltr log/bulk-update-csv*