-
Notifications
You must be signed in to change notification settings - Fork 0
Prepping data
This is a workflow for preparing metadata for new items for ingest into Drupal.
input: none
script: get/getTaxonomyIdentifiers.py
output: levy-api/existing-taxonomies
First, you need to get existing taxonomy terms from Drupal. This is to ensure you don't make duplicates of already existing terms. To
do this, run getTaxonomyIdentifiers.py
against
your production site. This will create a folder in your levy-api
directory called existing-taxonomies
and will create a CSV for
each type of taxonomy in Drupal.
Currently, there are six taxonomies in the Lester Levy Sheet Music Collection:
- Composition Metadata (composition_metadata.csv)
- Content List (c.csv)
- Creator Roles (creator_r.csv)
- Duplicate Reason Codes (duplicat.csv)
- Instrumentation Metadata (instrumentation_metadata.csv)
- Publishers (publishers.csv)
- Subjects (subjects.csv)
input: none
script: get/getNode_levy_collection_names.py
output: allCollectionNames.csv
Next, you need to get existing levy_collection_names (the entity used for creator/contributor names) from Drupal. This is to ensure you
don't make duplicates of already existing names. To do this, run getNode_levy_collection_names.py
against your production site. This will create a CSV called allCollectionNames.csv
containing all existing levy_collection_names in Drupal in your main levy-api
directory.
input: Metadata spreadsheet
script: explodeTaxonomiesAndNames.py
output: levy-api/aggregated-taxonomies
& levy-api/aggregated-roles
Now, we need to determine what taxonomy terms and levy_collection_names are contained in the metadata for our new items. This script "explodes" taxonomy terms and levy_collection_names from our metadata spreadsheet into CSVs aggregated by the terms themselves (for instance, "love" or "Smith, Bob"). In step four, we will compare these terms/names to those found in Step 1 and Step 2 and determine what terms/names are new and should be created in Drupal.
To do this, run explodeTaxonomiesAndNames.py
with your metadata spreadsheet in the main levy-api
directory.
To learn about how to format your metadata spreadsheet and how to name your columns, please see this example.
input: spreadsheets in levy-api/existing-taxonomies
& levy-api/aggregated-taxonomies
script: findExistingTaxTermsAndTermsToCreate.py
output: levy-api/items-matched
, termsDone/taxonomyTermsDone.csv
, termsToCreate/taxonomyTermsToCreate.csv
This script compares taxonomy terms found in Drupal (Step 1) and terms found in your metadata spreadsheet (Step 3) and produces several helpful spreadsheets.
The first is taxonomyTermsDone.csv
, which is a CSV containing a list of taxonomy terms found in your metadata that already exist in Drupal. This is for your reference.
The second is taxonomyTermsToCreate.csv
, which is a CSV containing a list of taxonomy terms found in your metadata that DO NOT exist in Drupal and need to be created in later steps.
Finally, a new folder named items-matched
is created in your levy-api
directory. This folder contains a CSV for each taxonomy with terms in your metadata spreadsheet. The CSV contains all of the taxonomy terms, associated fileIdentifiers, and the taxonomy terms' Drupal identifier (if found). We will rely on these spreadsheets in later steps.
input: allCollectionNames.csv
& levy-api/aggregated-roles
script: findExistingCollNamesAndNamesToCreate.py
output:matched_CollectionNames.csv
, termsDone/levy_collection_namesDone.csv
, termsToCreate/levy_collection_namesToCreate.csv
This script compares levy_collection_names found in Drupal (Step 2) and terms found in your metadata spreadsheet (Step 3) and produces several helpful spreadsheets.
The first is levy_collection_namesDone.csv
, which is a CSV containing a list of levy_collection_names found in your metadata that already exist in Drupal. This is for your reference.
The second is levy_collection_namesToCreate.csv
, which is a CSV containing a list of levy_collection_names found in your metadata that DO NOT exist in Drupal and need to be created in later steps.
Finally, a CSV called matched_CollectionNames.csv
is created. It contains all of the levy_collection_names, associated fileIdentifiers, and the levy_collection_names' Drupal identifier (if found). We will rely on this spreadsheet in later steps.