Skip to content

Bulk Metadata Modifications

Ryan Wick edited this page May 3, 2021 · 18 revisions

Sometimes we need to modify a bunch of metadata in one fell swoop. A rake task was built to facilitate this process.

Prerequisites

  • An issue to track the change request
  • A spreadsheet with the proper format and data
  • Running the rake task on the server

Scheduling

Bulk updates are scheduled to be run weekly on Thursdays. Members of the development team are responsible for self-identifying as responsible for running the weekly updates.

Spreadsheet Details

Special Considerations

  • The value in the from column must be filled in every row
  • A + value in the from column indicates that a value will be added the metadata field
  • A - value in the from column indicates that a value or values separated with | will be removed from the metadata in a multi-value field
  • The - operator is only supported for multi-value unordered fields with expected type set to text like subject, and alt_title.
  • A * value in the from column indicates that all current values in the metadata field will be replaced
  • The to column can have a pipe-delimited (|) value to add multiple values to the metadata field
  • The to column can have an empty cell to indicate that a value or values can be removed from the metadata

Example Data

Example 1:

Consider the following data as an example of what the rake task expects for performing bulk metadata updates:

id,from,to,property																																												
fb494b17q,http://opaquenamespace.org/ns/osuAcademicUnits/qGjPkk5M,http://id.loc.gov/authorities/names/n80017721,degree_grantors																						
8623j0508,http://opaquenamespace.org/ns/osuAcademicUnits/qGjPkk5M,http://id.loc.gov/authorities/names/n80017721,degree_grantors																						
  • id : The ID of the work to be modified
  • from : The value as it exists in the system, this value can be a * to indicate any value.
  • to : The desired change to the existing value
  • property : The property name, as tracked in the Metadata Application Profile, of the metadata value to change

Example 2

The following example uses the - operator to remove only the item with value http://id.loc.gov/authorities/names/no2011160692 from the subject property in work gm80hv32k.

Input subject_remove_uri.csv:

id,from,to,property,
gm80hv32k,-,http://id.loc.gov/authorities/names/no2011160692,subject,

Example 3

The following example uses the - and + operators to replace only the item with value http://id.loc.gov/authorities/names/no2011160692 with Technical note (Forest Products Laboratory (U.S.)) from the subject property in work gm80hv32k.

Input subject_replace_uri_with_text.csv:

id,from,to,property,
gm80hv32k,-,http://id.loc.gov/authorities/names/no2011160692,subject,
gm80hv32k,+,Technical note (Forest Products Laboratory (U.S.)),subject,

Example 4

The following example uses the * under from and an empty cell under to to remove all items from keyword property in work b2773v82b.

Sample input SA_Bulk_Edits_25_July_2019_keyword.csv:

id,from,to,property,
b2773v82b,*,,keyword,

Run the rake task on the server

  • Transfer the spreadsheet, in comma-separated values format, to the server. A great directory might be ~/tmp/.
    • $ scp /path/to/file USER@SERVERNAME:/path/to/remote/server/file
  • Execute the rake task
    • $ bundle exec rails scholars_archive:bulk_update_csv csv=/path/to/remote/server/file
  • Watch the logs generated by this task
    • $ ls -ltr log/bulk-update-csv*