Skip to content

Azariagmt/Datawarehouse

Repository files navigation

Project logo

DWH

Status GitHub Issues GitHub Pull Requests License


Data Warehouse and ETL project for traffic PEMS dataset.

📝 Table of Contents

🧐 About

Reusable modular ELT project consisting of building a data warehouse using MySQL, doing transformations using DBT and reporting using redash. ELT diagram

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

You can find the live DBT dashboard with generated lineage-graphs at 10academywarehouse.netlify.app.

Prerequisites

What things you need to install the software and how to install them.

Docker

In a linux environment

sudo snap install docker

Windows or Mac: Get Docker from here.

Git

Get Git from here.

DVC

To load in the data lets install DVC

pip install dvc[gdrive]

Installing

Steps that tell you how to get a development env running.

clone the repository

git clone https://github.com/Azariagmt/Datawarehouse
cd Datawarehouse

Pull necessary data

dvc pull

Get mysql, airflow, phpmyadmin and dbt containers up and running

docker-compose up

Get the reporting dashboard up and running

cd redash
docker-compose up

🎈 Usage

  • navigate to localhost 5000 to access airflow
  • Create connection to running database config with name mysql-connect
  • trigger dag to load data into running datawarehouse
  • Navigate to localhost 8080 to access phpmyadmin and see if data has been loaded into specified table
  • trigger dbt run command from within airflow
  • docker-exec into dbt container

⛏️ Built Using

  • Python - Programming Language
  • Airflow - Automation
  • DBT - Data Transformation handler
  • Redshift - Reporting environment
  • DVC - Data version control
  • Superset - Migrated reporting environment

✍️ Authors

🎉 Acknowledgements