Automate pipeline #85

pmayd · 2023-10-11T18:45:13Z

Idea:

automate pipeline when uploading files
uploading a file to a bucket will tricker a function/container to process this file
(new) data is automatically ingested into the database

tasosbada · 2023-10-25T17:28:46Z

I think that the automation process should be done at least in two-step.

Container for cleaning the data. Which is based on our data cleaning pipeline in R.
Container that runs a bash script for the data upload and runs bigquery queries. This script could be stored in gcs, so we could modify it.

For the first container, the files can be found in https://storage.cloud.google.com/a4d-315220-documents/docker-a4d-data-extraction/docker-a4d-data-extraction.zip, documentation can be found in the readme.md file. The problem is that although it worked for me locally, it could get deployed on gcp cloud run. It crashed due to the "devtools". I did not try it with the latest version R and our code. Possible solution could be installed dependencies without "devtools" or try it on Kubernetes cluster.

The second container could even be a cloud functions, but it needs communication and access to our gcs for the bash script. I intend to build a container to test this approach. Using bash script on our gcs, we have the flexibility to adapt and use this container only as runtime.

tasosbada · 2023-11-07T22:31:31Z

The docker image template and the repository for the second step (Container that runs a bash script for the data upload and runs ...) with the instruction can be found in our bucket. I zipped it and store it our bucket in case that you want to use in the future. Information and the step-by-step process can be found in the readme.md file.

tasosbada · 2023-11-23T23:49:56Z

In zip file can be found the necessary files and documentation for building and deployment cleaning data pipeline on GCP Cloud Run. The problem that was mentioned above is fixed and the pipeline run on Cloud Run. It runs, but since I do not have any real data input files it complains about this, otherwise it generates the log file properly, which means that there is no problem in execution and all the packages are loaded properly.

Please feel free to contact me if you have any questions or need any help.

Best regards.

pmayd assigned tasosbada Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate pipeline #85

Automate pipeline #85

pmayd commented Oct 11, 2023 •

edited

Loading

tasosbada commented Oct 25, 2023

tasosbada commented Nov 7, 2023

tasosbada commented Nov 23, 2023

Automate pipeline #85

Automate pipeline #85

Comments

pmayd commented Oct 11, 2023 • edited Loading

tasosbada commented Oct 25, 2023

tasosbada commented Nov 7, 2023

tasosbada commented Nov 23, 2023

pmayd commented Oct 11, 2023 •

edited

Loading