Skip to content

Commit

Permalink
Merge pull request #32 from pranavanba/main
Browse files Browse the repository at this point in the history
Rename sts_synindex_external.R script to internal_to_external_staging.R
  • Loading branch information
pranavanba authored Jun 10, 2024
2 parents 00f4a87 + 6dff197 commit 9d5d1d1
Show file tree
Hide file tree
Showing 4 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ RUN sed -i -e "s|\"<PERSONAL_ACCESS_TOKEN>\"|\"\${AWS_SYNAPSE_TOKEN}\"\n|g" \
CMD R -e "q()" \
&& sed -i -e "s|\${AWS_SYNAPSE_TOKEN}|$AWS_SYNAPSE_TOKEN|g"\
/root/.aws/config \
&& Rscript /root/recover-parquet-external/sts_synindex_external.R
&& Rscript /root/recover-parquet-external/scripts/main/internal_to_external_staging.R
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ git clone https://github.com/Sage-Bionetworks/recover-parquet-external.git
2. Modify the parameters in the [config](config/config.yml) as needed
3. Run [install_requirements.R](install_requirements.R)

4. Run [sts_synindex_external.R](scripts/main/sts_synindex_external.R) to generate the external parquet datasets in the staging locations (S3 and Synapse).
4. Run [internal_to_external_staging.R](scripts/main/internal_to_external_staging.R) to generate the external parquet datasets in the staging locations (S3 and Synapse).
5. Once the datasets in the staging location have been validated, run [staging_to_archive.R](scripts/main/staging_to_archive.R) to generate the validated external parquet datasets in the date-tagged prod Archive locations (S3 and Synapse).
6. As needed, run [archive-to-current.R](scripts/main/archive-to-current.R) to update the Current Freeze version of the external parquet data in the appropriate locations (S3 and Synapse).
7. **(Optional)** Setup a scheduled job (AWS, cron, etc.) using the docker image to run the pipeline at a set frequency or when certain conditions are met
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ if (nrow(synapse_fileview)>0) {

# Index each file in Synapse
latest_commit <- gh::gh("/repos/:owner/:repo/commits/main", owner = "Sage-Bionetworks", repo = "recover-parquet-external")
latest_commit_this_file <- paste0(latest_commit$html_url %>% stringr::str_replace("commit", "blob"), "/scripts/main/sts_synindex_external.R")
latest_commit_this_file <- paste0(latest_commit$html_url %>% stringr::str_replace("commit", "blob"), "/scripts/main/internal_to_external_staging.R")

act <- synapser::Activity(name = "Indexing",
description = "Indexing external parquet datasets",
Expand Down
File renamed without changes.

0 comments on commit 9d5d1d1

Please sign in to comment.