-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'dev-f23' into az_state_cleaner
- Loading branch information
Showing
22 changed files
with
12,998 additions
and
968 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -138,3 +138,4 @@ venv.bak/ | |
|
||
# data files | ||
*.avro | ||
data/*.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,5 @@ | ||
# 2023-fall-clinic-climate-cabinet | ||
|
||
## Project Background | ||
|
||
- Local politics are vital in enacting climate legislation, but information on local and state political campaign is often under-explored. | ||
- Climate Cabinate's Hypothesis: Powerful fossil fuel companies' interests do not align with climate-friendly policies in local and state legislature. Therefore, their contribution and the politicians under their fingers are holding us back from achieving green energy goal. | ||
- What are the impact of fossil fuel companies on local and state political campaign? Which donors are from fossil fuel companies? Which donors are from clean energy companies? | ||
- Problem Statement: In state and local races in select states, what is the disparity between campaign contributions from fossil fuel and clean energy companies? | ||
- Final Goal: | ||
1. Develop a graph of campaign finance networks in select states with entities (individuals, parties, PACs, etc.) as nodes and directed edges weighted by monetary connection. | ||
2. Classify nodes as ‘Clean energy’, ‘Fossil fuel’, or ‘Other’ | ||
3. Analyze funding disparities in select races | ||
|
||
## Data Science Clinic Project Goals | ||
|
||
1. Collect state's political campaign finance report data which should include | ||
|
@@ -22,7 +11,6 @@ the conribution made by green energy company versus that by fossil | |
fuel company in terms of state's political campaign activity | ||
|
||
|
||
|
||
## Usage | ||
|
||
### Docker | ||
|
@@ -46,63 +34,17 @@ If you prefer to develop inside a container with VS Code then do the following s | |
3. Click the blue or green rectangle in the bottom left of VS code (should say something like `><` or `>< WSL`). Options should appear in the top center of your screen. Select `Reopen in Container`. | ||
|
||
|
||
|
||
|
||
## Repository Structure | ||
|
||
### utils | ||
Project python code | ||
|
||
Files: | ||
- arizona.py: python code to implement Arizona's state cleaner abstract class | ||
- michigan.py: python code to implement Michigan's state cleaner abstract class | ||
- minnesota.py: python code to implement Minnesota's state cleaner abstract class | ||
- pennsylvania.py: python code to implement Pennsylvania's state cleaner abstract class | ||
- constants.py: the python script file to store any necessary constants used for state campaign finance data preprocess, clean, and stardandization | ||
- clean.py: python code for the state cleaner parent class implementation | ||
- pipeline.py: python code for running the state cleaner for 4 states. It generates the final database (DataFrame) through steps of preprocess, clean, standardize, and create table | ||
|
||
|
||
### notebooks | ||
Contains short, clean notebooks to demonstrate analysis, including information such as: | ||
1. Raw dataset format (file format, relational?) | ||
2. Raw dataset column information (type, content) | ||
3. Top 10 contributors and top 10 recipients in each state per year | ||
4. Bar charts to compare contributions by donor type (PAC, individual, etc) and to compare recipients by the office type they are running for | ||
5. Additional analysis: Yearly trend and possible explanation | ||
|
||
Files: | ||
- AZ_EDA | ||
- mi_campaign_eda | ||
- MN_EDA | ||
- PA_EDA | ||
|
||
### data | ||
|
||
Contains details of acquiring all raw data used in repository. If data is small (<50MB) then it is okay to save it to the repo, making sure to clearly document how to the data is obtained. | ||
|
||
If the data is larger than 50MB than you should not add it to the repo and instead document how to get the data in the README.md file in the data directory. | ||
|
||
This [README.md file](/data/README.md) should be kept up to date. | ||
|
||
### output | ||
Should contain work product generated by the analysis. Keep in mind that results should (generally) be excluded from the git repository. | ||
|
||
Creating a searchable, relational database of Arizona, Michigan, Minnesota, and Pennsylvania campaign finance data to chart money flows from 2018 to 2023 | ||
- individual table: include nidividual recipient and donor information of id, first name, last name, full name, entity type (Individual, Lobbyist), state, party, company | ||
- organization table: include organizational recipient and donor information of id, name, state, entity type (party, committee, corporation, etc.) | ||
- transaction table: include contribution and expenditure transaction information of transaction id, donor id, recipient id, year, amount, recipient office sought, purpose, and transaction type | ||
|
||
|
||
### Project Pipeline | ||
|
||
1. Collect state's finance campaign data either from web scraping (AZ, MI, PA) or direct download (MN) | ||
2. User can go to [the shared Google Drive]('https://drive.google.com/drive/u/2/folders/1HUbOU0KRZy85mep2SHMU48qUQ1ZOSNce') to download each state's data to their local repo following this format: repo_root / "data" / "file" | ||
3. Install all the necessary python packages listed in requirements.txt | ||
2. User can go to [this shared Google Drive]('https://drive.google.com/drive/u/2/folders/1HUbOU0KRZy85mep2SHMU48qUQ1ZOSNce') to download each state's data to their local repo following this format: repo_root / "data" / "raw" / <State Initial> / "file" | ||
3. Open in development container which installs all necessary packages. | ||
4. Use utils/pipeline.py to preprocess, clean, standardize, and create tables for each state and ultimately concatinate tables across 4 states into a comprehensive database | ||
5. The final result should be an individual DataFrame, an organization DataFrame, and a transaction DataFrame. They each contain all data in AZ, MI, MN, PA datasets | ||
6. For future reference, the above pipeline also stores the information mapping given id to our database id (generated via uuid) in a csv file in the format of (state)IDMap.csv | ||
5. The final result should be an individual DataFrame, an organization DataFrame, and a list of transaction DataFrames. The tables combine all data in AZ, MI, MN, PA datasets | ||
6. For future reference, the above pipeline also stores the information mapping given id to our database id (generated via uuid) in a csv file in the format of (state)IDMap.csv in the output folder | ||
|
||
## Team Members | ||
|
||
## Team Member | ||
Student Name: April Wang | ||
Student Email: [email protected] | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.