! modify the path DATA and DATA_FOR_MONGODB variables in .env

End to end data engineering project with Spark, Mongodb, Minio, postgres and Metabase

Utilizing of open source technologies for the implementation of a data pipeline

Architecture

Source Code

All the source code demonstrated in this post is open-source and available on GitHub. git clone https://github.com/Stefen-Taime/projet_data.git

Prerequisites

As a prerequisite for this post, you will need to create the following resources:

(1) Linux Machine;
(1) Docker ;
(1) Docker Compose;
(1) Virtualenv;

Setup

git clone https://github.com/Stefen-Taime/projet_data.git cd projet_data/extractor

pip install -r requirements.txt python main.py

or

docker build --tag=extractor . docker-compose up run

#This folder contains code used to create a downlaods folder, iteratively download files from a list of uris, unzip them and delete zip files. At this point you should have in the extractor directory with a new folder Dowloads with 2 csv files

then

cd .. cd docker

docker-compose -f docker-compose-nosql.yml up -d #for mongodb docker-compose -f docker-compose-sql.yml up -d #for postgres and adminer port 8085, metabase port 3000 docker-compose -f docker-compose-s3.yml up -d #for minio port 9000 docker-compose -f docker-compose-spark.yml up -d #for spark master and jupyter notebook port 8888

cd .. cd loader pip install -r requirements.txt

! modify the path DATA and DATA_FOR_MONGODB variables in .env

python loader.py mongodb #upload data in mongodb database (if you have an error, manually create an auto-mpg database and enter an auto collection and try again) python loader.py minio #upload data in minio(if you have an error, manually create a landing compartment and try again) He must have now an auto-mpg database and inside an auto collection with data in it for mongodb and also data in minio

then

go to localhost 8888 and the password is "stefen".once in jupyter notebook run all cells go to localhost 8085 go to localhost 3000

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docker		docker
extractor		extractor
images		images
loader		loader
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End to end data engineering project with Spark, Mongodb, Minio, postgres and Metabase

Utilizing of open source technologies for the implementation of a data pipeline

Architecture

Source Code

Prerequisites

Setup

then

! modify the path DATA and DATA_FOR_MONGODB variables in .env

then

Cleaning Up

About

Releases

Packages

Languages

Stefen-Taime/projet_data

Folders and files

Latest commit

History

Repository files navigation

End to end data engineering project with Spark, Mongodb, Minio, postgres and Metabase

Utilizing of open source technologies for the implementation of a data pipeline

Architecture

Source Code

Prerequisites

Setup

then

! modify the path DATA and DATA_FOR_MONGODB variables in .env

then

Cleaning Up

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages