Chicago traffic crashes

Code to clean and perform exploratory data analysis on traffic crash datasets provided by the City of Chicago:

Crashes
Vehicles
People

Tools

Orchestration and raw data fetching / loading: Prefect
Raw data transformations: dbt
Database: PostgreSQL

Setup

Prerequisites

Install PostgreSQL
Optionally create a dedicated Postgres user and database for this project (I created a db called traffic_crashes, then created a user crash with a separate password and granted the user on the database and all tables in it.)

Installation

Clone this repo.
Create a Python virtual environment and install requirements:

$ cd chicago_traffic_crashes
$ python -m venv venv
$ source venv/bin/activate
$ python -m pip install -r requirements.txt

Rename .env-example to .env and update values.
Rename dbt_models/profiles-example.yml to profiles.yml and update values.

Running the code

To run the code manually: $ cd chicago_traffic_crashes/orchestration $ python3 main.py

This will immediately kick off a run of the full pipeline:

Fetch crashes, people, and vehicles CSVs from the Chicago Open Data Portal website
Load the data from the CSVs into local [resource]_raw PostgreSQL tables.
Run dbt models to transform raw data and load into [resource]_stg tables.
Run dbt models to create analytics views using [resource]_stg tables.

To run the pipeline on a schedule, run the Prefect server with prefect server. The default schedule is once a day at midnight central time, but can be adjusted in deployment.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Chicago traffic crashes

Tools

Setup

Prerequisites

Installation

Running the code

Files

README.md

Latest commit

History

README.md

File metadata and controls

Chicago traffic crashes

Tools

Setup

Prerequisites

Installation

Running the code