Merge pull request #37 from pranavanba/main

Update running instructions for staging_to_archive.R in README
Sage-Bionetworks · Oct 4, 2024 · ca6a6de · ca6a6de
2 parents 998ad16 + 1d82c3a
commit ca6a6de
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -86,6 +86,6 @@ git clone https://github.com/Sage-Bionetworks/recover-parquet-external.git
 3. Run [install_requirements.R](install_requirements.R)
 
 4. Run [internal_to_external_staging.R](scripts/main/internal_to_external_staging.R) to generate the external parquet datasets in the staging locations (S3 and Synapse).
-5. Once the datasets in the staging location have been validated, run [staging_to_archive.R](scripts/main/staging_to_archive.R) to generate the validated external parquet datasets in the date-tagged prod Archive locations (S3 and Synapse).
+5. Once the datasets in the staging location have been validated, run [staging_to_archive.R](scripts/main/staging_to_archive.R) to generate the validated external parquet datasets in the date-tagged prod Archive locations (S3 and Synapse). Currently, you must manually specify the name of the Synapse folder for the validated staging dataset version (e.g. 2024-10-01, 2024-09-10, etc.) you want to move from staging to Archive while running this script (e.g. `validated_date <- readline(...)`).
 6. As needed, run [archive-to-current.R](scripts/main/archive-to-current.R) to update the Current Freeze version of the external parquet data in the appropriate locations (S3 and Synapse).
 7. **(Optional)** Setup a scheduled job (AWS, cron, etc.) using the docker image to run the pipeline at a set frequency or when certain conditions are met