Skip to content

Latest commit

 

History

History
44 lines (34 loc) · 1.92 KB

README.md

File metadata and controls

44 lines (34 loc) · 1.92 KB

Chicago traffic crashes

Code to clean and perform exploratory data analysis on traffic crash datasets provided by the City of Chicago:

Tools

  • Orchestration and raw data fetching / loading: Prefect
  • Raw data transformations: dbt
  • Database: PostgreSQL

Setup

Prerequisites

  1. Install PostgreSQL
  2. Optionally create a dedicated Postgres user and database for this project (I created a db called traffic_crashes, then created a user crash with a separate password and granted the user on the database and all tables in it.)

Installation

  1. Clone this repo.
  2. Create a Python virtual environment and install requirements:
$ cd chicago_traffic_crashes
$ python -m venv venv
$ source venv/bin/activate
$ python -m pip install -r requirements.txt
  1. Rename .env-example to .env and update values.
  2. Rename dbt_models/profiles-example.yml to profiles.yml and update values.

Running the code

To run the code manually: $ cd chicago_traffic_crashes/orchestration $ python3 main.py

This will immediately kick off a run of the full pipeline:

  1. Fetch crashes, people, and vehicles CSVs from the Chicago Open Data Portal website
  2. Load the data from the CSVs into local [resource]_raw PostgreSQL tables.
  3. Run dbt models to transform raw data and load into [resource]_stg tables.
  4. Run dbt models to create analytics views using [resource]_stg tables.

To run the pipeline on a schedule, run the Prefect server with prefect server. The default schedule is once a day at midnight central time, but can be adjusted in deployment.py.