Skip to content

Perform Bulk Ingest

Ryan Wick edited this page Jun 5, 2021 · 11 revisions

Sometimes there will be a need to ingest a large amount of new items into SA@OSU. A rake task exists that can be used to ingest these items based on a specific format for a CSV.

Prerequisites

  • An Issue to track the changes, which links out to the spreadsheet
  • The spreadsheet, in the specific format described below.
  • A Box folder will be provided with files.
  • Files will have to be moved onto the server.
  • Moving the CSV onto the server to run the rake task on it and ingest the items.

Scheduling

Bulk Ingests will be worked into the regular bulk updates which are occurring in weekly rotations starting Mondays. A developer will be assigned to the ticket and they will have one week to perform the ingest. Bulk ingests do take a bit longer. This due to the amount of babysitting of the logs the developer will need to do. This will be described below.

Spreadsheet Details

  • The Spreadsheet will contain a list of filenames.
  • Files will be hosted in a box folder provided in the ticket
  • The Spreadsheets filenames should match exactly with the files in the box folder.
    • If the filename and spreadsheet filename don't match, change the filename in the spreadsheet to match the file
  • The spreadsheet should contain numbered columns for the ordered fields. (E.G. Creator 1, Creator 2)
  • The spreadsheet may contain a column labeled visibility. The valid values for this column are:
    • open - Public (Default)
    • authenticated - OSU
    • restricted - Private
  • URIs are required for the following fields:
    • language
    • academic_affiliation
    • other_affiliation
    • rights_statement
    • license

Example Data

https://docs.google.com/spreadsheets/d/1jIEXGgSJTiXVis9y3Uu93bJtYsVnC6_TVHJsF2e-gGc/edit#gid=261789307

Run the rake task on the server

  • Download the spreadsheet locally as a CSV
    • scp /path/to/spreadsheet USER@SERVERNAME:/path/to/remote/server/spreadsheet
  • Download the files from box locally and extract it if it is a zip folder
    • scp /path/to/file USER@SERVERNAME:/path/to/remote/server/file
  • Execute the rake task
    • bundle exec rails scholars_archive:bulk_ingest_csv csv=/path/to/remote/server/file path=/path/to/files user={USERNAME}
  • Watch the logs
    • ls -ltr log/bulk-ingest-csv*