TEST THE PROJECT FROM THE CLOUD

Globant’s Data Engineering Coding Challenge

Table of Contents

About The Project
- Built With
Getting Started
Contact
Acknowledgments

TEST THE PROJECT FROM THE CLOUD

GETTING STARTED

This project consists in the creation of an API that receives a csv extension file. A pipeline was created in which the data is preprocessed, that is to say, it is cleaned before sending it to a MySQL database hosted in GCP. After the data is sent, the API is dockerized to deploy it in GCP. Below are the instructions and requirements to execute the project.

Prerequisites

Before starting it is necessary to create a virtual environment, it is recommended to do it with PIPENV, located in the root of the project. The following command will generate the environment automatically.

shell
```
pipenv shell
```

Download the packages from the requirements.txt file

shell
```
pip install -r requirements.txt
```

RUN API

It is time to run the API, first we will test it locally then on GCP

shell
```
uvicorn main:app --reload
```

Run Docker container

You can also run the container where the API is stored (should work the same as running the API directly).

shell
```
docker start <container_name>
```

About The Project

You will find several different sections in here. Mind that:

You can choose which sections to solve based on your experience and available time
if you don’t know how to solve a section, you can proceed with the following one
You can use whichever language, libraries, and frameworks that you want.
The usage of cloud services is allowed, you can choose whichever cloud provider that you want
Try to always apply best practices and develop a scalable solution.
We recommend you to solve everything
If you don’t have time to solve any sections, try to think the toolstack you would like to use and the resulting architecture, and why.
Every complement you might want to add is highly welcome!
In case you have a personal github repository to share with the interviewer, please do

(back to top)

Section 1: API

In the context of a DB migration with 3 different tables (departments, jobs, employees) , create a local REST API that must:

Receive historical data from CSV files
Upload these files to the new DB
Be able to insert batch transactions (1 up to 1000 rows) with one request You need to publish your code in GitHub. It will be taken into account if frequent updates are made to the repository that allow analyzing the development process. Ideally, create a markdown file for the Readme.md

Clarifications

You decide the origin where the CSV files are located.
You decide the destination database type, but it must be a SQL database.
The CSV file is comma separated.

(back to top)

Section 2: SQL

You need to explore the data that was inserted in the previous section. The stakeholders ask for some specific metrics they need. You should create an end-point for each requirement.

Requirements

Number of employees hired for each job and department in 2021 divided by quarter. The table must be ordered alphabetically by department and job.
List of ids, name and number of employees hired of each department that hired more employees than the mean of employees hired in 2021 for all the departments, ordered by the number of employees hired (descending).

Bonus Track! Cloud, Testing & Containers

Add the following to your solution to make it more robust:

Host your architecture in any public cloud (using the services you consider more adequate)
Add automated tests to the API
You can use whichever library that you want
Different tests types, if necessary, are welcome
Containerize your application
Create a Dockerfile to deploy the package

(back to top)

Contact

Yhary Arias - @yharyarias

Email: [email protected]

Bucaramanga, Colombia

(back to top)

References

Google Cloud Run Dockerfile FastAPI

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
app		app
images		images
querys		querys
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Globant’s Data Engineering Coding Challenge

TEST THE PROJECT FROM THE CLOUD

GETTING STARTED

Prerequisites

RUN API

Run Docker container

About The Project

Section 1: API

Section 2: SQL

Requirements

Bonus Track! Cloud, Testing & Containers

Contact

References

About

Releases

Packages

Languages

yharyarias/coding_challenge_de

Folders and files

Latest commit

History

Repository files navigation

Globant’s Data Engineering Coding Challenge

TEST THE PROJECT FROM THE CLOUD

GETTING STARTED

Prerequisites

RUN API

Run Docker container

About The Project

Section 1: API

Section 2: SQL

Requirements

Bonus Track! Cloud, Testing & Containers

Contact

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages