Skip to content

Commit

Permalink
Merge pull request #37 from pranavanba/main
Browse files Browse the repository at this point in the history
Update running instructions for staging_to_archive.R in README
  • Loading branch information
pranavanba authored Oct 4, 2024
2 parents 998ad16 + 1d82c3a commit ca6a6de
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,6 @@ git clone https://github.com/Sage-Bionetworks/recover-parquet-external.git
3. Run [install_requirements.R](install_requirements.R)

4. Run [internal_to_external_staging.R](scripts/main/internal_to_external_staging.R) to generate the external parquet datasets in the staging locations (S3 and Synapse).
5. Once the datasets in the staging location have been validated, run [staging_to_archive.R](scripts/main/staging_to_archive.R) to generate the validated external parquet datasets in the date-tagged prod Archive locations (S3 and Synapse).
5. Once the datasets in the staging location have been validated, run [staging_to_archive.R](scripts/main/staging_to_archive.R) to generate the validated external parquet datasets in the date-tagged prod Archive locations (S3 and Synapse). Currently, you must manually specify the name of the Synapse folder for the validated staging dataset version (e.g. 2024-10-01, 2024-09-10, etc.) you want to move from staging to Archive while running this script (e.g. `validated_date <- readline(...)`).
6. As needed, run [archive-to-current.R](scripts/main/archive-to-current.R) to update the Current Freeze version of the external parquet data in the appropriate locations (S3 and Synapse).
7. **(Optional)** Setup a scheduled job (AWS, cron, etc.) using the docker image to run the pipeline at a set frequency or when certain conditions are met

0 comments on commit ca6a6de

Please sign in to comment.