Data Warehouse and ETL project for traffic PEMS dataset.
Reusable modular ELT project consisting of building a data warehouse using MySQL, doing transformations using DBT and reporting using redash.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
You can find the live DBT dashboard with generated lineage-graphs at 10academywarehouse.netlify.app.
What things you need to install the software and how to install them.
In a linux environment
sudo snap install docker
Windows or Mac: Get Docker from here.
Get Git from here.
To load in the data lets install DVC
pip install dvc[gdrive]
Steps that tell you how to get a development env running.
clone the repository
git clone https://github.com/Azariagmt/Datawarehouse
cd Datawarehouse
Pull necessary data
dvc pull
Get mysql, airflow, phpmyadmin and dbt containers up and running
docker-compose up
Get the reporting dashboard up and running
cd redash
docker-compose up
- navigate to localhost 5000 to access airflow
- Create connection to running database config with name mysql-connect
- trigger dag to load data into running datawarehouse
- Navigate to localhost 8080 to access phpmyadmin and see if data has been loaded into specified table
- trigger dbt run command from within airflow
- docker-exec into dbt container
- Python - Programming Language
- Airflow - Automation
- DBT - Data Transformation handler
- Redshift - Reporting environment
- DVC - Data version control
- Superset - Migrated reporting environment
- @10xac Idea & resources
- @Azariagmt Initial work
- Hat tip to anyone whose code was used
- References
Installing DBT
DBT videos
DBT tutorial
Setting up redash
Getting started with Redash
redash query export Installing superset caching_sha2_password error Allow local infile loading