Table of Contents
Project - GCP Project
This project consists in the creation of an API that receives a csv extension file. A pipeline was created in which the data is preprocessed, that is to say, it is cleaned before sending it to a MySQL database hosted in GCP. After the data is sent, the API is dockerized to deploy it in GCP. Below are the instructions and requirements to execute the project.
Before starting it is necessary to create a virtual environment, it is recommended to do it with PIPENV, located in the root of the project. The following command will generate the environment automatically.
- shell
pipenv shell
Download the packages from the requirements.txt file
- shell
pip install -r requirements.txt
It is time to run the API, first we will test it locally then on GCP
- shell
uvicorn main:app --reload
You can also run the container where the API is stored (should work the same as running the API directly).
- shell
docker start <container_name>
You will find several different sections in here. Mind that:
- You can choose which sections to solve based on your experience and available time
- if you don’t know how to solve a section, you can proceed with the following one
- You can use whichever language, libraries, and frameworks that you want.
- The usage of cloud services is allowed, you can choose whichever cloud provider that you want
- Try to always apply best practices and develop a scalable solution.
- We recommend you to solve everything
- If you don’t have time to solve any sections, try to think the toolstack you would like to use and the resulting architecture, and why.
- Every complement you might want to add is highly welcome!
- In case you have a personal github repository to share with the interviewer, please do
In the context of a DB migration with 3 different tables (departments, jobs, employees) , create a local REST API that must:
- Receive historical data from CSV files
- Upload these files to the new DB
- Be able to insert batch transactions (1 up to 1000 rows) with one request You need to publish your code in GitHub. It will be taken into account if frequent updates are made to the repository that allow analyzing the development process. Ideally, create a markdown file for the Readme.md
Clarifications
- You decide the origin where the CSV files are located.
- You decide the destination database type, but it must be a SQL database.
- The CSV file is comma separated.
You need to explore the data that was inserted in the previous section. The stakeholders ask for some specific metrics they need. You should create an end-point for each requirement.
- Number of employees hired for each job and department in 2021 divided by quarter. The table must be ordered alphabetically by department and job.
- List of ids, name and number of employees hired of each department that hired more employees than the mean of employees hired in 2021 for all the departments, ordered by the number of employees hired (descending).
Add the following to your solution to make it more robust:
- Host your architecture in any public cloud (using the services you consider more adequate)
- Add automated tests to the API
- You can use whichever library that you want
- Different tests types, if necessary, are welcome
- Containerize your application
- Create a Dockerfile to deploy the package
Yhary Arias - @yharyarias
Email: [email protected]
Bucaramanga, Colombia