Skip to content

datatoolsrc2023/chicago_traffic_crashes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chicago traffic crashes

Code to clean and perform exploratory data analysis on traffic crash datasets provided by the City of Chicago:

Tools

  • Orchestration and raw data fetching / loading: Prefect
  • Raw data transformations: dbt
  • Database: PostgreSQL

Setup

Prerequisites

  1. Install PostgreSQL
  2. Optionally create a dedicated Postgres user and database for this project (I created a db called traffic_crashes, then created a user crash with a separate password and granted the user on the database and all tables in it.)

Installation

  1. Clone this repo.
  2. Create a Python virtual environment and install requirements:
$ cd chicago_traffic_crashes
$ python -m venv venv
$ source venv/bin/activate
$ python -m pip install -r requirements.txt
  1. Rename .env-example to .env and update values.
  2. Rename dbt_models/profiles-example.yml to profiles.yml and update values.

Running the code

To run the code manually: $ cd chicago_traffic_crashes/orchestration $ python3 main.py

This will immediately kick off a run of the full pipeline:

  1. Fetch crashes, people, and vehicles CSVs from the Chicago Open Data Portal website
  2. Load the data from the CSVs into local [resource]_raw PostgreSQL tables.
  3. Run dbt models to transform raw data and load into [resource]_stg tables.
  4. Run dbt models to create analytics views using [resource]_stg tables.

To run the pipeline on a schedule, run the Prefect server with prefect server. The default schedule is once a day at midnight central time, but can be adjusted in deployment.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages