diff --git a/.gitignore b/.gitignore index 8e42268..cab3579 100644 --- a/.gitignore +++ b/.gitignore @@ -138,3 +138,4 @@ venv.bak/ # data files *.avro +data/*.txt \ No newline at end of file diff --git a/README.md b/README.md index 1ae1135..7f02e46 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,5 @@ # 2023-fall-clinic-climate-cabinet -## Project Background - -- Local politics are vital in enacting climate legislation, but information on local and state political campaign is often under-explored. -- Climate Cabinate's Hypothesis: Powerful fossil fuel companies' interests do not align with climate-friendly policies in local and state legislature. Therefore, their contribution and the politicians under their fingers are holding us back from achieving green energy goal. -- What are the impact of fossil fuel companies on local and state political campaign? Which donors are from fossil fuel companies? Which donors are from clean energy companies? -- Problem Statement: In state and local races in select states, what is the disparity between campaign contributions from fossil fuel and clean energy companies? -- Final Goal: - 1. Develop a graph of campaign finance networks in select states with entities (individuals, parties, PACs, etc.) as nodes and directed edges weighted by monetary connection. - 2. Classify nodes as ‘Clean energy’, ‘Fossil fuel’, or ‘Other’ - 3. Analyze funding disparities in select races - ## Data Science Clinic Project Goals 1. Collect state's political campaign finance report data which should include @@ -22,7 +11,6 @@ the conribution made by green energy company versus that by fossil fuel company in terms of state's political campaign activity - ## Usage ### Docker @@ -46,63 +34,17 @@ If you prefer to develop inside a container with VS Code then do the following s 3. Click the blue or green rectangle in the bottom left of VS code (should say something like `><` or `>< WSL`). Options should appear in the top center of your screen. Select `Reopen in Container`. - - -## Repository Structure - -### utils -Project python code - -Files: -- arizona.py: python code to implement Arizona's state cleaner abstract class -- michigan.py: python code to implement Michigan's state cleaner abstract class -- minnesota.py: python code to implement Minnesota's state cleaner abstract class -- pennsylvania.py: python code to implement Pennsylvania's state cleaner abstract class -- constants.py: the python script file to store any necessary constants used for state campaign finance data preprocess, clean, and stardandization -- clean.py: python code for the state cleaner parent class implementation -- pipeline.py: python code for running the state cleaner for 4 states. It generates the final database (DataFrame) through steps of preprocess, clean, standardize, and create table - - -### notebooks -Contains short, clean notebooks to demonstrate analysis, including information such as: -1. Raw dataset format (file format, relational?) -2. Raw dataset column information (type, content) -3. Top 10 contributors and top 10 recipients in each state per year -4. Bar charts to compare contributions by donor type (PAC, individual, etc) and to compare recipients by the office type they are running for -5. Additional analysis: Yearly trend and possible explanation - -Files: -- AZ_EDA -- mi_campaign_eda -- MN_EDA -- PA_EDA - -### data - -Contains details of acquiring all raw data used in repository. If data is small (<50MB) then it is okay to save it to the repo, making sure to clearly document how to the data is obtained. - -If the data is larger than 50MB than you should not add it to the repo and instead document how to get the data in the README.md file in the data directory. - -This [README.md file](/data/README.md) should be kept up to date. - -### output -Should contain work product generated by the analysis. Keep in mind that results should (generally) be excluded from the git repository. - -Creating a searchable, relational database of Arizona, Michigan, Minnesota, and Pennsylvania campaign finance data to chart money flows from 2018 to 2023 -- individual table: include nidividual recipient and donor information of id, first name, last name, full name, entity type (Individual, Lobbyist), state, party, company -- organization table: include organizational recipient and donor information of id, name, state, entity type (party, committee, corporation, etc.) -- transaction table: include contribution and expenditure transaction information of transaction id, donor id, recipient id, year, amount, recipient office sought, purpose, and transaction type - - ### Project Pipeline + 1. Collect state's finance campaign data either from web scraping (AZ, MI, PA) or direct download (MN) -2. User can go to [the shared Google Drive]('https://drive.google.com/drive/u/2/folders/1HUbOU0KRZy85mep2SHMU48qUQ1ZOSNce') to download each state's data to their local repo following this format: repo_root / "data" / "file" -3. Install all the necessary python packages listed in requirements.txt +2. User can go to [this shared Google Drive]('https://drive.google.com/drive/u/2/folders/1HUbOU0KRZy85mep2SHMU48qUQ1ZOSNce') to download each state's data to their local repo following this format: repo_root / "data" / "raw" / / "file" +3. Open in development container which installs all necessary packages. 4. Use utils/pipeline.py to preprocess, clean, standardize, and create tables for each state and ultimately concatinate tables across 4 states into a comprehensive database -5. The final result should be an individual DataFrame, an organization DataFrame, and a transaction DataFrame. They each contain all data in AZ, MI, MN, PA datasets -6. For future reference, the above pipeline also stores the information mapping given id to our database id (generated via uuid) in a csv file in the format of (state)IDMap.csv +5. The final result should be an individual DataFrame, an organization DataFrame, and a list of transaction DataFrames. The tables combine all data in AZ, MI, MN, PA datasets +6. For future reference, the above pipeline also stores the information mapping given id to our database id (generated via uuid) in a csv file in the format of (state)IDMap.csv in the output folder + +## Team Members -## Team Member Student Name: April Wang Student Email: yuzhouw@uchicago.edu diff --git a/data/README.md b/data/README.md index 84b2890..cd08796 100644 --- a/data/README.md +++ b/data/README.md @@ -2,7 +2,6 @@ This directory contains information for use in this project. -Please make sure to document each source file here. #### Arizona Campaign Finance Data ##### Summary @@ -72,71 +71,52 @@ contribution data and READMEs in a Google Drive for the duration of this project #### Minnesota Campaign Finance Data ##### Summary -- The Minnesota Campaign Finance data are publicly available on the -[Minnesota Campaign Finance and Public Disclosure Board](https://cfb.mn.gov/reports-and-data/self-help/data-downloads/campaign-finance/) in csv format and has no anti-webscraping defenses. +- The Minnesota Campaign Finance data are available in this shared +[Google Drive](https://drive.google.com/drive/u/2/folders/1uA70woWDhTf3_0F8AbadDa_XIKraCeoc) in zip format and has no anti-webscraping defenses. Please first unzip it and store 12 csv files (10 candidate-recipient contribution dataset, 1 noncandidate-recipient dataset, and 1 expenditure dataset) to local repo in this format: repo root / "data" / "file name" -- However, there is an glitch in the data available through the Data Downloads page above: this dataset does not include contributions reported by the committees of candidates for State Court of Appeals Judge. Consequently, I have utilized an alternative dataset provided by the Minnesota Campaign Finance website developer. This dataset comprises 10 separate CSV files, each documenting contributions made to a specific recipient type from 1998 to 2023. I have consolidated these files into a single dataset to ensure comprehensive coverage. +- The above dataset is provided by the Minnesota Campaign Finance website developer. This dataset includes 10 separate CSV files, each documenting contributions made to a specific recipient type from 1998 to 2023. This dataset also includes a non-candidate contribution dataset dating back to 1998 and an independent expenditure dataset dating back to 2015. -- The old dataset comprises itemized records of contributions and expenditures made since 2015, specifically including transactions exceeding $200, which aligns with the reporting threshold set at $200 in Minnesota campaign finance regulations. The new dataset itemizes all contributions to candidates from 1998 to 2023. +- MN dataset comprises itemized records of contributions and expenditures exceeding $200, which aligns with the reporting threshold set at $200 in Minnesota campaign finance regulations. -- For the purpose of our project I will focus on contribution, not expenditure. +- For the purpose of our project I will focus on contribution and independent expenditure from 2018 to 2023. ##### Features - Races / Office Sought: - 1. Governor (GC) - 2. Attorney General(AG) - 3. Secretary of State(SS) - 4. State Auditor(SA) - 5. State Treasurer (ST, this office was abolished in 2003 and no longer exists) - 6. State Senator(Senate) - 7. State Representative(House) - 8. State Supreme Court Justice(SC) - 9. State Appeals Court Judge(AP) - 10. State District Court Judge(DC) - -- This dataset covers 1998 to present + - AG: Attorney General + - AP: State Appeals Court Judge + - DC: State District Court Judge + - GC: Governor + - House: State Representative + - SA: State Auditor + - SC: State Supreme Court Justice + - Senate: State Senator + - SS: Secretary of State + - ST: State Treasurer (this office was abolished in 2003 and no longer exists) + +- Donor Types: + - I: Individual + - L: Lobbyist + - C: Candidate Committee + - F: Political Committee/Fund + - S: Supporting Association + - P: Party Unit + - B: Businness + - H: Hennepin County Local Candidate Committee + - U: Association Not Registered in Board + - O: Other + - PTU: Political Party Unit + - PCF: Political Committee and Fund - Trasactions required to report and itemize: Contributions received from any particular source in excess of $200 within a calendar year -- Limitation: - 1. This new dataset only covers contributions made to candidates, i.e., all recipients are candidates - 2. Only covers contributions over 200$ by MN campaign finance regulation - 3. This dataset only dates back to 1998. Pre-1998 is not digitized so access to that data is limited to paper reports. +- Limitation: Only covers contributions over 200$ by MN campaign finance regulation - Additional information: 1. in-kind: Donations of things other than money are in-kind contributions to the receiving entity 2. For the purpose of our project, I created a separate column of total donation by summing both monetary donation and in-kind donation - 3. Type and Subtype Acronym: - - PCC: Political Contribution Committee - - PTU: Political Party Unit - - PCF - Political Committee Fund - - PF: Political Fund - - PC: Political Committee - - PCN: Positive Community Norms - - PFN: Professional Fundraising Network - - IEF: Independent Expenditure Fund - - IEC: Independent Expenditure Committee - - BC: Ballot Committee - 4. Recipient Type and Subtype: - - Candidates: Recipient Type PCC - - Party Units: Recipient Type PTU - - State Party Units: Recipient Type PTU, Recipient Subtype SPU - - Party Unit Caucus Committees: Recipient Type PTU, Recipient subtype CAU - - Local Party Units: Recipient Type PTU - - Committees and Funds: Recipient Type PCF, Recipient Subtype PF, PC, PCN, PFN, IEF, IEC, BC - - Independent Expenditure Committees and Funds: Recipient Type PCF, Recipient Subtype IEF, IE - 5. Contributors whose total contributions exceed $200 are individually itemized in separate rows. Contributions from donors who each give $200 or less are reported as aggregate totals and are not included in this dataset by definition. - 6. Contributor/donor Types: - - C: Candidate Committee - - I: Individual - - L: Lobbyist - - F: Political Committee/Fund - - S: Self - - P: Party Unit - - H: Registered with Hennepin County - - O: Other - 7. The new dataset has 467 missing rows, of which belong to "Registration fee for Netroots event" and have no recipient, donor, or total donation amount. + 3. Contributors whose total contributions exceed $200 are individually itemized in separate rows. Contributions from donors who each give $200 or less are reported as aggregate totals and are not included in this dataset by definition. + 4. The dataset has 467 missing rows, of which belong to "Registration fee for Netroots event" and have no recipient, donor, or total donation amount. #### Pennsylvania Campaign Finance Data diff --git a/notebooks/mi_campaign_eda.ipynb b/notebooks/MI_EDA.ipynb similarity index 97% rename from notebooks/mi_campaign_eda.ipynb rename to notebooks/MI_EDA.ipynb index 1db9424..31ddf73 100644 --- a/notebooks/mi_campaign_eda.ipynb +++ b/notebooks/MI_EDA.ipynb @@ -4,7 +4,15 @@ "cell_type": "code", "execution_count": 1, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/Users/necabotheking/Documents/Github/2023-fall-clinic-climate-cabinet/data/Contribution\n" + ] + } + ], "source": [ "import pandas as pd\n", "import os\n", @@ -61,13 +69,13 @@ "campaign_contribution_dataframe_lst = []\n", "\n", "for file in os.listdir(MI_EXP_FILEPATH):\n", - " filepath = MI_EXP_FILEPATH + \"/\" + file\n", + " filepath = str(MI_EXP_FILEPATH) + \"/\" + file\n", " campaign_expenditure_dataframe_lst.append(\n", " read_expenditure_data(filepath, MI_EXPENDITURE_COLUMNS)\n", " )\n", "\n", "for file in os.listdir(MI_CON_FILEPATH):\n", - " filepath = MI_CON_FILEPATH + \"/\" + file\n", + " filepath = str(MI_CON_FILEPATH) + \"/\" + file\n", " campaign_contribution_dataframe_lst.append(\n", " read_contribution_data(filepath, MI_CONTRIBUTION_COLUMNS)\n", " )" @@ -4289,7 +4297,7 @@ 1 ], "title": { - "text": "1999-2023 Count" + "text": "2018-2023 Count" } } } @@ -4309,7 +4317,7 @@ ")\n", "fig.update_layout(\n", " xaxis_title=\"Committee Types\",\n", - " yaxis_title=\"1999-2023 Count\",\n", + " yaxis_title=\"2018-2023 Count\",\n", " xaxis={\"categoryorder\": \"total ascending\"},\n", ")\n", "fig.show()" @@ -4324,7 +4332,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 24, "metadata": {}, "outputs": [], "source": [ @@ -4337,7 +4345,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 25, "metadata": {}, "outputs": [ { @@ -5260,7 +5268,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 26, "metadata": {}, "outputs": [ { @@ -5272,7 +5280,7 @@ "data": [ { "alignmentgroup": "True", - "hovertemplate": "Schedule Description=DIRECT
Year=%{x}
Count=%{y}", + "hovertemplate": "Schedule Description=DIRECT
Year=%{x}
Number of Expenditures=%{y}", "legendgroup": "DIRECT", "marker": { "color": "#636efa", @@ -5307,7 +5315,7 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Schedule Description=SUPP
Year=%{x}
Count=%{y}", + "hovertemplate": "Schedule Description=SUPP
Year=%{x}
Number of Expenditures=%{y}", "legendgroup": "SUPP", "marker": { "color": "#EF553B", @@ -5342,7 +5350,7 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Schedule Description=INDEPENDEN
Year=%{x}
Count=%{y}", + "hovertemplate": "Schedule Description=INDEPENDEN
Year=%{x}
Number of Expenditures=%{y}", "legendgroup": "INDEPENDEN", "marker": { "color": "#00cc96", @@ -5377,7 +5385,7 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Schedule Description=OFFICE
Year=%{x}
Count=%{y}", + "hovertemplate": "Schedule Description=OFFICE
Year=%{x}
Number of Expenditures=%{y}", "legendgroup": "OFFICE", "marker": { "color": "#ab63fa", @@ -5412,7 +5420,7 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Schedule Description=INKIND
Year=%{x}
Count=%{y}", + "hovertemplate": "Schedule Description=INKIND
Year=%{x}
Number of Expenditures=%{y}", "legendgroup": "INKIND", "marker": { "color": "#FFA15A", @@ -5447,7 +5455,7 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Schedule Description=GOTV
Year=%{x}
Count=%{y}", + "hovertemplate": "Schedule Description=GOTV
Year=%{x}
Number of Expenditures=%{y}", "legendgroup": "GOTV", "marker": { "color": "#19d3f3", @@ -6306,7 +6314,7 @@ } }, "title": { - "text": "Stacked Bar Chart for Committee Types by Year" + "text": "Michigan Campaign Expenditure Schedule Types by Year" }, "xaxis": { "anchor": "y", @@ -6325,7 +6333,7 @@ 1 ], "title": { - "text": "Count" + "text": "Number of Expenditures" } } } @@ -6343,16 +6351,16 @@ "data": [ { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=CAN
Year=%{x}
Count=%{y}", - "legendgroup": "CAN", + "hovertemplate": "Committee Type=Candidate Committee
Year=%{x}
Number of Expenditures=%{y}", + "legendgroup": "Candidate Committee", "marker": { "color": "#636efa", "pattern": { "shape": "" } }, - "name": "CAN", - "offsetgroup": "CAN", + "name": "Candidate Committee", + "offsetgroup": "Candidate Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -6378,16 +6386,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=IND
Year=%{x}
Count=%{y}", - "legendgroup": "IND", + "hovertemplate": "Committee Type=Independent Political Action Committee
Year=%{x}
Number of Expenditures=%{y}", + "legendgroup": "Independent Political Action Committee", "marker": { "color": "#EF553B", "pattern": { "shape": "" } }, - "name": "IND", - "offsetgroup": "IND", + "name": "Independent Political Action Committee", + "offsetgroup": "Independent Political Action Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -6413,16 +6421,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=GUB
Year=%{x}
Count=%{y}", - "legendgroup": "GUB", + "hovertemplate": "Committee Type=Gubernatorial Committee
Year=%{x}
Number of Expenditures=%{y}", + "legendgroup": "Gubernatorial Committee", "marker": { "color": "#00cc96", "pattern": { "shape": "" } }, - "name": "GUB", - "offsetgroup": "GUB", + "name": "Gubernatorial Committee", + "offsetgroup": "Gubernatorial Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -6448,16 +6456,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=POL
Year=%{x}
Count=%{y}", - "legendgroup": "POL", + "hovertemplate": "Committee Type=Political Action Committee
Year=%{x}
Number of Expenditures=%{y}", + "legendgroup": "Political Action Committee", "marker": { "color": "#ab63fa", "pattern": { "shape": "" } }, - "name": "POL", - "offsetgroup": "POL", + "name": "Political Action Committee", + "offsetgroup": "Political Action Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -6483,16 +6491,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=COU
Year=%{x}
Count=%{y}", - "legendgroup": "COU", + "hovertemplate": "Committee Type=County Party Committee
Year=%{x}
Number of Expenditures=%{y}", + "legendgroup": "County Party Committee", "marker": { "color": "#FFA15A", "pattern": { "shape": "" } }, - "name": "COU", - "offsetgroup": "COU", + "name": "County Party Committee", + "offsetgroup": "County Party Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -6518,16 +6526,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=BAL
Year=%{x}
Count=%{y}", - "legendgroup": "BAL", + "hovertemplate": "Committee Type=Ballot Question Committee
Year=%{x}
Number of Expenditures=%{y}", + "legendgroup": "Ballot Question Committee", "marker": { "color": "#19d3f3", "pattern": { "shape": "" } }, - "name": "BAL", - "offsetgroup": "BAL", + "name": "Ballot Question Committee", + "offsetgroup": "Ballot Question Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -6553,16 +6561,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=STA
Year=%{x}
Count=%{y}", - "legendgroup": "STA", + "hovertemplate": "Committee Type=State Party Committee
Year=%{x}
Number of Expenditures=%{y}", + "legendgroup": "State Party Committee", "marker": { "color": "#FF6692", "pattern": { "shape": "" } }, - "name": "STA", - "offsetgroup": "STA", + "name": "State Party Committee", + "offsetgroup": "State Party Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -6588,16 +6596,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=DIS
Year=%{x}
Count=%{y}", - "legendgroup": "DIS", + "hovertemplate": "Committee Type=District Party Committee
Year=%{x}
Number of Expenditures=%{y}", + "legendgroup": "District Party Committee", "marker": { "color": "#B6E880", "pattern": { "shape": "" } }, - "name": "DIS", - "offsetgroup": "DIS", + "name": "District Party Committee", + "offsetgroup": "District Party Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -6626,7 +6634,7 @@ "barmode": "stack", "legend": { "title": { - "text": "Committee Tyoe" + "text": "Committee Type" }, "tracegroupgap": 0 }, @@ -7447,7 +7455,7 @@ } }, "title": { - "text": "Stacked Bar Chart for Expenditure Committee Types by Year" + "text": "Michigan Campaign Expenditure Committee Types by Year" }, "xaxis": { "anchor": "y", @@ -7466,7 +7474,7 @@ 1 ], "title": { - "text": "Count" + "text": "Number of Expenditures" } } } @@ -7484,16 +7492,16 @@ "data": [ { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=IND
Year=%{x}
Count=%{y}", - "legendgroup": "IND", + "hovertemplate": "Committee Type=Independent Political Action Committee
Year=%{x}
Number of Contributions=%{y}", + "legendgroup": "Independent Political Action Committee", "marker": { "color": "#636efa", "pattern": { "shape": "" } }, - "name": "IND", - "offsetgroup": "IND", + "name": "Independent Political Action Committee", + "offsetgroup": "Independent Political Action Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -7519,16 +7527,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=CAN
Year=%{x}
Count=%{y}", - "legendgroup": "CAN", + "hovertemplate": "Committee Type=Candidate Committee
Year=%{x}
Number of Contributions=%{y}", + "legendgroup": "Candidate Committee", "marker": { "color": "#EF553B", "pattern": { "shape": "" } }, - "name": "CAN", - "offsetgroup": "CAN", + "name": "Candidate Committee", + "offsetgroup": "Candidate Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -7554,16 +7562,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=GUB
Year=%{x}
Count=%{y}", - "legendgroup": "GUB", + "hovertemplate": "Committee Type=Gubernatorial Committee
Year=%{x}
Number of Contributions=%{y}", + "legendgroup": "Gubernatorial Committee", "marker": { "color": "#00cc96", "pattern": { "shape": "" } }, - "name": "GUB", - "offsetgroup": "GUB", + "name": "Gubernatorial Committee", + "offsetgroup": "Gubernatorial Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -7589,16 +7597,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=BAL
Year=%{x}
Count=%{y}", - "legendgroup": "BAL", + "hovertemplate": "Committee Type=Ballot Question Committee
Year=%{x}
Number of Contributions=%{y}", + "legendgroup": "Ballot Question Committee", "marker": { "color": "#ab63fa", "pattern": { "shape": "" } }, - "name": "BAL", - "offsetgroup": "BAL", + "name": "Ballot Question Committee", + "offsetgroup": "Ballot Question Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -7624,16 +7632,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=COU
Year=%{x}
Count=%{y}", - "legendgroup": "COU", + "hovertemplate": "Committee Type=County Party Committee
Year=%{x}
Number of Contributions=%{y}", + "legendgroup": "County Party Committee", "marker": { "color": "#FFA15A", "pattern": { "shape": "" } }, - "name": "COU", - "offsetgroup": "COU", + "name": "County Party Committee", + "offsetgroup": "County Party Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -7659,16 +7667,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=POL
Year=%{x}
Count=%{y}", - "legendgroup": "POL", + "hovertemplate": "Committee Type=Political Action Committee
Year=%{x}
Number of Contributions=%{y}", + "legendgroup": "Political Action Committee", "marker": { "color": "#19d3f3", "pattern": { "shape": "" } }, - "name": "POL", - "offsetgroup": "POL", + "name": "Political Action Committee", + "offsetgroup": "Political Action Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -7694,16 +7702,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=DIS
Year=%{x}
Count=%{y}", - "legendgroup": "DIS", + "hovertemplate": "Committee Type=District Party Committee
Year=%{x}
Number of Contributions=%{y}", + "legendgroup": "District Party Committee", "marker": { "color": "#FF6692", "pattern": { "shape": "" } }, - "name": "DIS", - "offsetgroup": "DIS", + "name": "District Party Committee", + "offsetgroup": "District Party Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -7729,16 +7737,16 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Committee Tyoe=STA
Year=%{x}
Count=%{y}", - "legendgroup": "STA", + "hovertemplate": "Committee Type=State Party Committee
Year=%{x}
Number of Contributions=%{y}", + "legendgroup": "State Party Committee", "marker": { "color": "#B6E880", "pattern": { "shape": "" } }, - "name": "STA", - "offsetgroup": "STA", + "name": "State Party Committee", + "offsetgroup": "State Party Committee", "orientation": "v", "showlegend": true, "textposition": "auto", @@ -7767,7 +7775,7 @@ "barmode": "stack", "legend": { "title": { - "text": "Committee Tyoe" + "text": "Committee Type" }, "tracegroupgap": 0 }, @@ -8588,7 +8596,7 @@ } }, "title": { - "text": "Stacked Bar Chart for Contributions Committee Types by Year" + "text": "Michigan Campaign Contributions Committee Types by Year" }, "xaxis": { "anchor": "y", @@ -8607,7 +8615,7 @@ 1 ], "title": { - "text": "Count" + "text": "Number of Contributions" } } } @@ -8625,7 +8633,7 @@ "data": [ { "alignmentgroup": "True", - "hovertemplate": "Contribution Type=DIRECT
Year=%{x}
Count=%{y}", + "hovertemplate": "Contribution Type=DIRECT
Year=%{x}
Number of Contrbution Types=%{y}", "legendgroup": "DIRECT ", "marker": { "color": "#636efa", @@ -8660,7 +8668,7 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Contribution Type=DIRECT/FUND RAISER
Year=%{x}
Count=%{y}", + "hovertemplate": "Contribution Type=DIRECT/FUND RAISER
Year=%{x}
Number of Contrbution Types=%{y}", "legendgroup": "DIRECT/FUND RAISER ", "marker": { "color": "#EF553B", @@ -8695,7 +8703,7 @@ }, { "alignmentgroup": "True", - "hovertemplate": "Contribution Type=LOAN FROM A PERSON
Year=%{x}
Count=%{y}", + "hovertemplate": "Contribution Type=LOAN FROM A PERSON
Year=%{x}
Number of Contrbution Types=%{y}", "legendgroup": "LOAN FROM A PERSON ", "marker": { "color": "#00cc96", @@ -9554,7 +9562,7 @@ } }, "title": { - "text": "Stacked Bar Chart for Contribution Types by Year" + "text": "Michigan Campaign Contribution Types by Year" }, "xaxis": { "anchor": "y", @@ -9573,7 +9581,7 @@ 1 ], "title": { - "text": "Count" + "text": "Number of Contrbution Types" } } } @@ -9604,7 +9612,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.2" + "version": "3.11.5" } }, "nbformat": 4, diff --git a/notebooks/MN_EDA.ipynb b/notebooks/MN_EDA.ipynb new file mode 100644 index 0000000..e52e68c --- /dev/null +++ b/notebooks/MN_EDA.ipynb @@ -0,0 +1,10012 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import plotly.express as px\n", + "import warnings\n", + "warnings.filterwarnings('ignore')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# MN Contribution EDA" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Read in and Preprocess Datasets\n", + "\n", + "### 1.1 Read in datasets" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# Read in candidate-recipient contribution data\n", + "df_ag = pd.read_csv('/project/data/CandReceipts/AG.csv')\n", + "df_ap = pd.read_csv('/project/data/CandReceipts/AP.csv')\n", + "df_dc = pd.read_csv('/project/data/CandReceipts/DC.csv')\n", + "df_gc = pd.read_csv('/project/data/CandReceipts/GC.csv')\n", + "df_house = pd.read_csv('/project/data/CandReceipts/House.csv')\n", + "df_sa = pd.read_csv('/project/data/CandReceipts/SA.csv')\n", + "df_sc = pd.read_csv('/project/data/CandReceipts/SC.csv')\n", + "df_senate = pd.read_csv('/project/data/CandReceipts/Senate.csv')\n", + "df_ss = pd.read_csv('/project/data/CandReceipts/SS.csv')\n", + "df_st = pd.read_csv('/project/data/CandReceipts/ST.csv')\n", + "\n", + "# Read in non-candidate-recipient contribution data\n", + "df_non_cand = pd.read_csv('/project/data/non_candidate_con.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "10 datasets on candidate-recipient contributions and 1 dataset on non-candidate-recipient contributions. They are seperate and not relational" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.2 Check for DataFrames' column consistency" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11]\n" + ] + } + ], + "source": [ + "# First check for DataFrames' column numbers\n", + "df_lst = [df_ag, df_ap, df_dc, df_gc, df_house, df_sa, df_sc, df_ss, df_st, df_senate, df_non_cand]\n", + "df_lens = []\n", + "for df in df_lst:\n", + " df_lens.append(df.shape[1])\n", + "print(df_lens)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All dfs have consistent columns\n" + ] + } + ], + "source": [ + "from utils.MN_util import datasets_col_consistent\n", + "\n", + "datasets_col_consistent(df_lst[:-1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.3 Adjust for DataFrames' column consistency" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(Index(['OfficeSought', 'CandRegNumb', 'CandFirstName', 'CandLastName',\n", + " 'CommitteeName', 'DonationDate', 'DonorType', 'DonorName',\n", + " 'DonationAmount', 'InKindDonAmount', 'InKindDescriptionText'],\n", + " dtype='object'),\n", + " Index(['PCFRegNumb', 'Committee', 'ETType', 'ETSubType', 'DonationDate',\n", + " 'DonorType', 'DonorRegNumb', 'DonorName', 'DonationAmount',\n", + " 'InKindDonAmount', 'InKindDescriptionText'],\n", + " dtype='object'))" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_ag.columns, df_non_cand.columns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Based on the project need and dataset consistency, use these columns: RegNumb, RecipientType, OfficeSought, CandFirstName, CandLastName, Committee, DonationDate, DonorType, DonorName, DonationAmount, InKindDonAmount, InKindDescriptionText" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from utils.MN_util import preprocess_candidate_df\n", + "\n", + "df_ag = preprocess_candidate_df(df_ag)\n", + "df_ap = preprocess_candidate_df(df_ap)\n", + "df_dc = preprocess_candidate_df(df_dc)\n", + "df_gc = preprocess_candidate_df(df_gc)\n", + "df_house = preprocess_candidate_df(df_house)\n", + "df_sa = preprocess_candidate_df(df_sa)\n", + "df_sc = preprocess_candidate_df(df_sc)\n", + "df_ss = preprocess_candidate_df(df_ss)\n", + "df_st = preprocess_candidate_df(df_st)\n", + "df_senate = preprocess_candidate_df(df_senate)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from utils.MN_util import preprocess_noncandidate_df\n", + "df_non_cand = preprocess_noncandidate_df(df_non_cand)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Index(['OfficeSought', 'RegNumb', 'CandFirstName', 'CandLastName', 'Committee',\n", + " 'Date', 'DonorType', 'DonorName', 'Amount', 'InKindAmount',\n", + " 'InKindDescription', 'RecipientType'],\n", + " dtype='object')\n", + "Index(['RegNumb', 'Committee', 'RecipientType', 'Date', 'DonorType',\n", + " 'DonorName', 'Amount', 'InKindAmount', 'InKindDescription'],\n", + " dtype='object')\n" + ] + } + ], + "source": [ + "print(df_senate.columns)\n", + "print(df_non_cand.columns)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "from utils.MN_util import preprocess_contribution_df\n", + "\n", + "new_df_lst = [df_ag, df_ap, df_dc, df_gc, df_house, df_sa, df_sc, df_ss, df_st, \n", + " df_senate, df_non_cand]\n", + "\n", + "contribution_df = preprocess_contribution_df(new_df_lst)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['I', 'F', 'C', 'O', 'L', 'P', 'H', 'U', 'S', nan, 'B'],\n", + " dtype=object)" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "contribution_df['DonorType'].unique()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "OfficeSought object\n", + "RegNumb int64\n", + "CandFirstName object\n", + "CandLastName object\n", + "Committee object\n", + "Date datetime64[ns]\n", + "DonorType object\n", + "DonorName object\n", + "Amount float64\n", + "InKindAmount float64\n", + "InKindDescription object\n", + "RecipientType object\n", + "Year int64\n", + "TotalAmount float64\n", + "dtype: object" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "contribution_df.dtypes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Donor Types:\n", + "1. C: Candidate Committee (limited to state-level candidates who had a principal campaign committee registered with the Board from which the contribution was made)\n", + "2. I: Non-lobbyist individual \n", + "3. L: Lobbyist \n", + "4. F: Political Committee/Fund \n", + "5. S: Supporting association of a political fund registered with the Board that donates to its own political fund\n", + "6. P: Political party unit\n", + "7. H: Local candidate committee (limited to candidates within Hennepin County who satisfy the definition of local candidate, did not exist until 2022)\n", + "8. O: Other (catch-all category that in some cases includes businesses, supporting associations of political funds registered with the Board that donate to their own political fund, associations that are not registered with the Board, and any entity that does not fall within one of the other categories)\n", + "9. U: Association not registered with the Board (may include a committee registered with the FEC or a regulatory committee in another state, a 501(c)(4), 501(c)(6), or 527 nonprofit organization, the campaign committee of a candidate for local office (excluding certain Hennepin County candidates from 2022 onward), etc.)\n", + "10. B: Business (company & corporation)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['Candidate', 'PCF', 'PTU'], dtype=object)" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "contribution_df['RecipientType'].unique()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['AG', nan, 'GC', 'House', 'Senate', 'SA', 'SS', 'SC', 'DC', 'AP',\n", + " 'ST'], dtype=object)" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "contribution_df['OfficeSought'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Recipient Types:\n", + "- Candidate\n", + "- PCF: Political committee or fund\n", + "- PTU: Political party unit\n", + "\n", + "#### Office Types (within candidate recipient):\n", + "- AG = Attorney General\n", + "- AP = State Appeals Court Judge\n", + "- DC = State District Court Judge\n", + "- GC = Governor\n", + "- House = State Representative\n", + "- SA = State Auditor\n", + "- SC = State Supreme Court Justice\n", + "- SS = Secretary of State\n", + "- ST = State Treasurer (this office was abolished in 2003 and no longer exists)\n", + "- Senate = State Senator" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.4 Check Missing Values" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "OfficeSought 483394\n", + "RegNumb 0\n", + "CandFirstName 483394\n", + "CandLastName 483394\n", + "Committee 0\n", + "Date 0\n", + "DonorType 113\n", + "DonorName 10\n", + "Amount 0\n", + "InKindAmount 0\n", + "InKindDescription 3508411\n", + "RecipientType 0\n", + "Year 0\n", + "TotalAmount 0\n", + "dtype: int64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "contribution_df.isna().sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total number of contribution entries = 3548337\n", + "Total number of nonclassifiable contribution amount = 335584\n", + "Total number of nonclassifiable donors = 10\n", + "Proportion of nonclassifiable entries = 9.457799999999999%\n" + ] + } + ], + "source": [ + "print('Total number of contribution entries = ', len(contribution_df))\n", + "no_amount = len(contribution_df[contribution_df['TotalAmount'] == 0])\n", + "print('Total number of nonclassifiable contribution amount = ', no_amount)\n", + "no_don = contribution_df['DonorName'].isna().sum()\n", + "print('Total number of nonclassifiable donors = ', no_don)\n", + "unclassifiable_prop = round((no_amount+no_don)/len(contribution_df),6)\n", + "print('Proportion of nonclassifiable entries =', f\"{unclassifiable_prop*100}%\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.5 Drop Non-classifiable Contribution Data" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "from utils.MN_util import drop_nonclassifiable\n", + "contribution_df = drop_nonclassifiable(contribution_df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Top 10\n", + "### 2.1 Top 10 Donors" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Year\n", + "2018 47781848.87\n", + "2019 25764905.60\n", + "2020 34614419.07\n", + "2021 26898778.21\n", + "2022 57692246.12\n", + "2023 264961.92\n", + "Name: TotalAmount, dtype: float64" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "filtered_df = contribution_df[(contribution_df['Year'] >= 2018) & (contribution_df['Year'] <= 2023) & (contribution_df['Year'] != -1)]\n", + "donation_by_year = filtered_df.groupby('Year')['TotalAmount'].sum()\n", + "donation_by_year" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "# Group by 'Year' and 'Contributor' for total contribution for each contributor in each year\n", + "don_by_year_contributor = filtered_df.groupby(\n", + " ['Year', 'DonorName'])['TotalAmount'].sum().reset_index()\n", + "\n", + "# Find the top 10 contributors\n", + "top_10_contributors = don_by_year_contributor.groupby('Year').apply(\n", + " lambda group: group.nlargest(10, 'TotalAmount')).reset_index(drop=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
YearDonorNameTotalAmount
02018Minn DFL State Central Committee519799.42
12018Minn Chamber of Commerce Leadership Fd462600.00
22018Minn Nurses Assn Pol Comm (MNA-PC)322750.00
32018MTA PAC (fka SITCO PAC)301400.00
42018Education Minn PAC291200.00
52018Shakopee Mdewakanton Sioux290550.00
62018Faegre Baker Daniels State-Reg Pol Fund276350.00
72018IBEW Minn State Council PAC188500.00
82018Haselow, Robert175025.00
92018Minn Dental Political Action Committee167350.00
102019Shakopee Mdewakanton Sioux683300.00
112019Minn Chamber of Commerce Leadership Fd466000.00
122019Minn Business Partnership PAC369750.00
132019Faegre Baker Daniels State-Reg Pol Fund297650.00
142019Beer PAC-Minn Beer Wholesalers Assoc279250.00
152019Winthrop & Weinstine PA Political Fund262750.00
162019Minn CPAs Public Affairs Committee262250.00
172019IBEW Minn State Council PAC257000.00
182019Haselow, Robert237800.00
192019Rural Electric Political Action Comm236200.00
202020Education Minn PAC367950.00
212020MTA PAC (fka SITCO PAC)335800.00
222020Minn Nurses Assn Pol Comm (MNA-PC)310360.00
232020Laborers District Council of Minn & ND Pol Fund223750.00
242020Haselow, Justine221800.00
252020Mendoza, Salvador210500.00
262020Prairie Island Indian Community PAC206300.00
272020Haselow, Robert192150.00
282020Faegre Drinker State Political Fund184150.00
292020Minn Business Partnership PAC175000.00
302021Faegre Drinker State Political Fund470400.00
312021Minn Chamber of Commerce Leadership Fd349500.00
322021Einess, Ward268200.00
332021North Central States Carpenters PAC252550.00
342021Minn CPAs Public Affairs Committee231000.00
352021Rural Electric Political Action Comm221150.00
362021Minn Business Partnership PAC201475.00
372021Christopherson, Kirsten173200.00
382021Beer PAC-Minn Beer Wholesalers Assoc165750.00
392021Winthrop & Weinstine PA Political Fund159750.00
402022Shakopee Mdewakanton Sioux340900.00
412022Minn Chamber of Commerce Leadership Fd272750.00
422022Laborers District Council of Minn & ND Pol Fund235050.00
432022Faegre Drinker State Political Fund202500.00
442022Education Minn PAC196258.16
452022MTA PAC (fka SITCO PAC)192250.00
462022Haselow, Robert167300.00
472022Mendoza, Salvador167000.00
482022Mendoza, Mia154000.00
492022North Central States Carpenters PAC151500.00
502023IBEW Local 29211400.11
512023SEIU local 265194.45
522023Teamsters Local 3463776.37
532023Johnson, Erin2899.00
542023Teamsters Local 7922897.77
552023Zarth, John2700.00
562023Zarth, Kelly2700.00
572023Teamsters Local 9702401.89
582023Teamsters Local 9742390.62
592023Wilbert, Michelle2241.00
\n", + "
" + ], + "text/plain": [ + " Year DonorName TotalAmount\n", + "0 2018 Minn DFL State Central Committee 519799.42\n", + "1 2018 Minn Chamber of Commerce Leadership Fd 462600.00\n", + "2 2018 Minn Nurses Assn Pol Comm (MNA-PC) 322750.00\n", + "3 2018 MTA PAC (fka SITCO PAC) 301400.00\n", + "4 2018 Education Minn PAC 291200.00\n", + "5 2018 Shakopee Mdewakanton Sioux 290550.00\n", + "6 2018 Faegre Baker Daniels State-Reg Pol Fund 276350.00\n", + "7 2018 IBEW Minn State Council PAC 188500.00\n", + "8 2018 Haselow, Robert 175025.00\n", + "9 2018 Minn Dental Political Action Committee 167350.00\n", + "10 2019 Shakopee Mdewakanton Sioux 683300.00\n", + "11 2019 Minn Chamber of Commerce Leadership Fd 466000.00\n", + "12 2019 Minn Business Partnership PAC 369750.00\n", + "13 2019 Faegre Baker Daniels State-Reg Pol Fund 297650.00\n", + "14 2019 Beer PAC-Minn Beer Wholesalers Assoc 279250.00\n", + "15 2019 Winthrop & Weinstine PA Political Fund 262750.00\n", + "16 2019 Minn CPAs Public Affairs Committee 262250.00\n", + "17 2019 IBEW Minn State Council PAC 257000.00\n", + "18 2019 Haselow, Robert 237800.00\n", + "19 2019 Rural Electric Political Action Comm 236200.00\n", + "20 2020 Education Minn PAC 367950.00\n", + "21 2020 MTA PAC (fka SITCO PAC) 335800.00\n", + "22 2020 Minn Nurses Assn Pol Comm (MNA-PC) 310360.00\n", + "23 2020 Laborers District Council of Minn & ND Pol Fund 223750.00\n", + "24 2020 Haselow, Justine 221800.00\n", + "25 2020 Mendoza, Salvador 210500.00\n", + "26 2020 Prairie Island Indian Community PAC 206300.00\n", + "27 2020 Haselow, Robert 192150.00\n", + "28 2020 Faegre Drinker State Political Fund 184150.00\n", + "29 2020 Minn Business Partnership PAC 175000.00\n", + "30 2021 Faegre Drinker State Political Fund 470400.00\n", + "31 2021 Minn Chamber of Commerce Leadership Fd 349500.00\n", + "32 2021 Einess, Ward 268200.00\n", + "33 2021 North Central States Carpenters PAC 252550.00\n", + "34 2021 Minn CPAs Public Affairs Committee 231000.00\n", + "35 2021 Rural Electric Political Action Comm 221150.00\n", + "36 2021 Minn Business Partnership PAC 201475.00\n", + "37 2021 Christopherson, Kirsten 173200.00\n", + "38 2021 Beer PAC-Minn Beer Wholesalers Assoc 165750.00\n", + "39 2021 Winthrop & Weinstine PA Political Fund 159750.00\n", + "40 2022 Shakopee Mdewakanton Sioux 340900.00\n", + "41 2022 Minn Chamber of Commerce Leadership Fd 272750.00\n", + "42 2022 Laborers District Council of Minn & ND Pol Fund 235050.00\n", + "43 2022 Faegre Drinker State Political Fund 202500.00\n", + "44 2022 Education Minn PAC 196258.16\n", + "45 2022 MTA PAC (fka SITCO PAC) 192250.00\n", + "46 2022 Haselow, Robert 167300.00\n", + "47 2022 Mendoza, Salvador 167000.00\n", + "48 2022 Mendoza, Mia 154000.00\n", + "49 2022 North Central States Carpenters PAC 151500.00\n", + "50 2023 IBEW Local 292 11400.11\n", + "51 2023 SEIU local 26 5194.45\n", + "52 2023 Teamsters Local 346 3776.37\n", + "53 2023 Johnson, Erin 2899.00\n", + "54 2023 Teamsters Local 792 2897.77\n", + "55 2023 Zarth, John 2700.00\n", + "56 2023 Zarth, Kelly 2700.00\n", + "57 2023 Teamsters Local 970 2401.89\n", + "58 2023 Teamsters Local 974 2390.62\n", + "59 2023 Wilbert, Michelle 2241.00" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "top_10_contributors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.2 Top 10 Recipients" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "contribution_df1 = filtered_df.copy(deep=True)\n", + "contribution_df1['CandLastName'].fillna('NA', inplace=True)\n", + "contribution_df1['CandFirstName'].fillna('NA', inplace=True)\n", + "contribution_df1['Committee'].fillna('NA', inplace=True)\n", + "by_year_recipients = contribution_df1.groupby(\n", + " ['Year', 'RegNumb', 'RecipientType', 'CandLastName', 'CandFirstName', 'Committee'])['TotalAmount'].sum().reset_index()\n", + "\n", + "top_10_recipient = by_year_recipients.groupby('Year').apply(\n", + " lambda group: group.nlargest(10, 'TotalAmount')).reset_index(drop=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
YearRegNumbRecipientTypeCandLastNameCandFirstNameCommitteeTotalAmount
0201818135CandidateWalzTimTim Walz for Governor8095546.97
1201817641CandidateJohnsonJeffJohnson (Jeff) for Governor4060066.88
2201818336CandidateEllisonKeithKeith Ellison for Attorney General1708185.84
3201818133CandidateWardlowDougDoug Wardlow for Attorney General1621028.92
4201818125CandidateMurphyErinMurphy (Erin) for Minnesota1396061.64
5201817653CandidateSimonSteveSimon (Steve) for Secretary of State904363.13
6201818292CandidatePawlentyTimTim Pawlenty for Governor545366.12
7201815677CandidateHortmanMelissaMelissa Hortman Campaign Committee535150.00
8201817902CandidateEdelsonHeatherHeather Edelson for House492906.69
9201812604CandidateDavidsGregoryPeople for (Gregory) Davids Committee460620.00
10201918135CandidateWalzTimTim Walz for Governor1956067.75
11201911829CandidateCohenRichardCohen (Richard) Volunteer Committee707400.00
12201916964CandidateHousleyKarinHousley (Karin) For Senate700500.00
13201915677CandidateHortmanMelissaMelissa Hortman Campaign Committee565400.00
14201912604CandidateDavidsGregoryPeople for (Gregory) Davids Committee536355.00
15201915317CandidateTomassoniDavidTomassoni (David) for State Senate491970.00
16201913262CandidateLimmerWarrenLimmer (Warren) for Senate Committee398250.00
17201918336CandidateEllisonKeithKeith Ellison for Attorney General390704.04
18201917040CandidateOsmekDavidDavid Osmek Volunteer Committee379960.00
19201917653CandidateSimonSteveSimon (Steve) for Secretary of State372350.00
20202018135CandidateWalzTimTim Walz for Governor1426981.50
21202017105CandidateNelsonCarlaNelson (Carla) for Senate514803.24
22202017481CandidateHawjFoungFoung (Hawj) for Senate 67451980.00
23202015501CandidateRosenJulieJulie Rosen for State Senate420900.00
24202017443CandidateKentSusanSusan Kent for Senate384646.92
25202015667CandidateDibbleD ScottVolunteers for (Scott) Dibble376200.00
26202018021CandidateWestlinBonnieBonnie Westlin for MN Senate373766.56
27202015705CandidateSenjemDavidSenjem (David) for Senate367400.00
28202017653CandidateSimonSteveSimon (Steve) for Secretary of State362450.00
29202016891CandidateMillerJeremyFriends for (Jeremy) Miller357750.00
30202118135CandidateWalzTimTim Walz for Governor5709831.89
31202117653CandidateSimonSteveSimon (Steve) for Secretary of State1143340.00
32202118690CandidateJensenScottDr. Scott Jensen for Governor1003867.95
33202118336CandidateEllisonKeithKeith Ellison for Attorney General481584.00
34202115317CandidateTomassoniDavidTomassoni (David) for State Senate453100.00
35202115677CandidateHortmanMelissaMelissa Hortman Campaign Committee364540.00
36202116964CandidateHousleyKarinHousley (Karin) For Senate357900.00
37202117140CandidateBakkThomas (Tom)Bakk (Thomas) for Senate336560.00
38202112604CandidateDavidsGregoryPeople for (Gregory) Davids Committee333450.00
39202118133CandidateWardlowDougDoug Wardlow for Attorney General318997.98
40202218135CandidateWalzTimTim Walz for Governor15408317.75
41202217653CandidateSimonSteveSimon (Steve) for Secretary of State4425042.27
42202218690CandidateJensenScottDr. Scott Jensen for Governor3356977.05
43202218336CandidateEllisonKeithKeith Ellison for Attorney General2897191.44
44202215677CandidateHortmanMelissaMelissa Hortman Campaign Committee894300.00
45202218731CandidateSchultzJamesJim Schultz For Minnesota Attorney General893429.04
46202218133CandidateWardlowDougDoug Wardlow for Attorney General469847.28
47202213262CandidateLimmerWarrenLimmer (Warren) for Senate Committee454275.00
48202210601CandidateMurphyMaryMary Murphy Volunteer Committee452742.21
49202218218CandidateBlahaJulieMinnesotans for Julie Blaha432397.00
50202341291PCFNANAAll of Mpls32650.00
51202318690CandidateJensenScottDr. Scott Jensen for Governor32505.00
52202330689PCFNANAAGC Building Constructors29900.00
53202341345PCFNANAGreater Than28371.96
54202330726PCFNANAAGC Transportation Builders28000.00
55202330013PCFNANAJoint Council 32 DRIVE18072.03
56202330345PCFNANAWinthrop & Weinstine PA Political Fund12000.00
57202341281PCFNANAFaith in Minnesota Action11832.00
58202341262PCFNANANutrien Ag Solutions Employee Citizenship Fund...11805.00
59202330119PCFNANAIBEW Local 292 Political Education Fund11400.11
\n", + "
" + ], + "text/plain": [ + " Year RegNumb RecipientType CandLastName CandFirstName \\\n", + "0 2018 18135 Candidate Walz Tim \n", + "1 2018 17641 Candidate Johnson Jeff \n", + "2 2018 18336 Candidate Ellison Keith \n", + "3 2018 18133 Candidate Wardlow Doug \n", + "4 2018 18125 Candidate Murphy Erin \n", + "5 2018 17653 Candidate Simon Steve \n", + "6 2018 18292 Candidate Pawlenty Tim \n", + "7 2018 15677 Candidate Hortman Melissa \n", + "8 2018 17902 Candidate Edelson Heather \n", + "9 2018 12604 Candidate Davids Gregory \n", + "10 2019 18135 Candidate Walz Tim \n", + "11 2019 11829 Candidate Cohen Richard \n", + "12 2019 16964 Candidate Housley Karin \n", + "13 2019 15677 Candidate Hortman Melissa \n", + "14 2019 12604 Candidate Davids Gregory \n", + "15 2019 15317 Candidate Tomassoni David \n", + "16 2019 13262 Candidate Limmer Warren \n", + "17 2019 18336 Candidate Ellison Keith \n", + "18 2019 17040 Candidate Osmek David \n", + "19 2019 17653 Candidate Simon Steve \n", + "20 2020 18135 Candidate Walz Tim \n", + "21 2020 17105 Candidate Nelson Carla \n", + "22 2020 17481 Candidate Hawj Foung \n", + "23 2020 15501 Candidate Rosen Julie \n", + "24 2020 17443 Candidate Kent Susan \n", + "25 2020 15667 Candidate Dibble D Scott \n", + "26 2020 18021 Candidate Westlin Bonnie \n", + "27 2020 15705 Candidate Senjem David \n", + "28 2020 17653 Candidate Simon Steve \n", + "29 2020 16891 Candidate Miller Jeremy \n", + "30 2021 18135 Candidate Walz Tim \n", + "31 2021 17653 Candidate Simon Steve \n", + "32 2021 18690 Candidate Jensen Scott \n", + "33 2021 18336 Candidate Ellison Keith \n", + "34 2021 15317 Candidate Tomassoni David \n", + "35 2021 15677 Candidate Hortman Melissa \n", + "36 2021 16964 Candidate Housley Karin \n", + "37 2021 17140 Candidate Bakk Thomas (Tom) \n", + "38 2021 12604 Candidate Davids Gregory \n", + "39 2021 18133 Candidate Wardlow Doug \n", + "40 2022 18135 Candidate Walz Tim \n", + "41 2022 17653 Candidate Simon Steve \n", + "42 2022 18690 Candidate Jensen Scott \n", + "43 2022 18336 Candidate Ellison Keith \n", + "44 2022 15677 Candidate Hortman Melissa \n", + "45 2022 18731 Candidate Schultz James \n", + "46 2022 18133 Candidate Wardlow Doug \n", + "47 2022 13262 Candidate Limmer Warren \n", + "48 2022 10601 Candidate Murphy Mary \n", + "49 2022 18218 Candidate Blaha Julie \n", + "50 2023 41291 PCF NA NA \n", + "51 2023 18690 Candidate Jensen Scott \n", + "52 2023 30689 PCF NA NA \n", + "53 2023 41345 PCF NA NA \n", + "54 2023 30726 PCF NA NA \n", + "55 2023 30013 PCF NA NA \n", + "56 2023 30345 PCF NA NA \n", + "57 2023 41281 PCF NA NA \n", + "58 2023 41262 PCF NA NA \n", + "59 2023 30119 PCF NA NA \n", + "\n", + " Committee TotalAmount \n", + "0 Tim Walz for Governor 8095546.97 \n", + "1 Johnson (Jeff) for Governor 4060066.88 \n", + "2 Keith Ellison for Attorney General 1708185.84 \n", + "3 Doug Wardlow for Attorney General 1621028.92 \n", + "4 Murphy (Erin) for Minnesota 1396061.64 \n", + "5 Simon (Steve) for Secretary of State 904363.13 \n", + "6 Tim Pawlenty for Governor 545366.12 \n", + "7 Melissa Hortman Campaign Committee 535150.00 \n", + "8 Heather Edelson for House 492906.69 \n", + "9 People for (Gregory) Davids Committee 460620.00 \n", + "10 Tim Walz for Governor 1956067.75 \n", + "11 Cohen (Richard) Volunteer Committee 707400.00 \n", + "12 Housley (Karin) For Senate 700500.00 \n", + "13 Melissa Hortman Campaign Committee 565400.00 \n", + "14 People for (Gregory) Davids Committee 536355.00 \n", + "15 Tomassoni (David) for State Senate 491970.00 \n", + "16 Limmer (Warren) for Senate Committee 398250.00 \n", + "17 Keith Ellison for Attorney General 390704.04 \n", + "18 David Osmek Volunteer Committee 379960.00 \n", + "19 Simon (Steve) for Secretary of State 372350.00 \n", + "20 Tim Walz for Governor 1426981.50 \n", + "21 Nelson (Carla) for Senate 514803.24 \n", + "22 Foung (Hawj) for Senate 67 451980.00 \n", + "23 Julie Rosen for State Senate 420900.00 \n", + "24 Susan Kent for Senate 384646.92 \n", + "25 Volunteers for (Scott) Dibble 376200.00 \n", + "26 Bonnie Westlin for MN Senate 373766.56 \n", + "27 Senjem (David) for Senate 367400.00 \n", + "28 Simon (Steve) for Secretary of State 362450.00 \n", + "29 Friends for (Jeremy) Miller 357750.00 \n", + "30 Tim Walz for Governor 5709831.89 \n", + "31 Simon (Steve) for Secretary of State 1143340.00 \n", + "32 Dr. Scott Jensen for Governor 1003867.95 \n", + "33 Keith Ellison for Attorney General 481584.00 \n", + "34 Tomassoni (David) for State Senate 453100.00 \n", + "35 Melissa Hortman Campaign Committee 364540.00 \n", + "36 Housley (Karin) For Senate 357900.00 \n", + "37 Bakk (Thomas) for Senate 336560.00 \n", + "38 People for (Gregory) Davids Committee 333450.00 \n", + "39 Doug Wardlow for Attorney General 318997.98 \n", + "40 Tim Walz for Governor 15408317.75 \n", + "41 Simon (Steve) for Secretary of State 4425042.27 \n", + "42 Dr. Scott Jensen for Governor 3356977.05 \n", + "43 Keith Ellison for Attorney General 2897191.44 \n", + "44 Melissa Hortman Campaign Committee 894300.00 \n", + "45 Jim Schultz For Minnesota Attorney General 893429.04 \n", + "46 Doug Wardlow for Attorney General 469847.28 \n", + "47 Limmer (Warren) for Senate Committee 454275.00 \n", + "48 Mary Murphy Volunteer Committee 452742.21 \n", + "49 Minnesotans for Julie Blaha 432397.00 \n", + "50 All of Mpls 32650.00 \n", + "51 Dr. Scott Jensen for Governor 32505.00 \n", + "52 AGC Building Constructors 29900.00 \n", + "53 Greater Than 28371.96 \n", + "54 AGC Transportation Builders 28000.00 \n", + "55 Joint Council 32 DRIVE 18072.03 \n", + "56 Winthrop & Weinstine PA Political Fund 12000.00 \n", + "57 Faith in Minnesota Action 11832.00 \n", + "58 Nutrien Ag Solutions Employee Citizenship Fund... 11805.00 \n", + "59 IBEW Local 292 Political Education Fund 11400.11 " + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "top_10_recipient" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Compare donation by donor and recipient types\n", + "\n", + "### 3.1 Compare donation by donor types" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "donor_type_mapping = {\n", + " 'B': 'Business',\n", + " 'C': 'Candidate committee',\n", + " 'F': 'Political committee or fund',\n", + " 'H': 'Local candidate committee registered with Hennepin County',\n", + " 'I': 'Non-lobbyist individual',\n", + " 'L': 'Lobbyist',\n", + " 'O': 'Other',\n", + " 'P': 'Political party unit',\n", + " 'S': 'Self',\n", + " 'U': 'Association not registered with the Board'\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + " \n", + " " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.plotly.v1+json": { + "config": { + "plotlyServerURL": "https://plot.ly" + }, + "data": [ + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=Association not registered with the Board
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Association not registered with the Board", + "marker": { + "color": "#636efa", + "pattern": { + "shape": "" + } + }, + "name": "Association not registered with the Board", + "offsetgroup": "Association not registered with the Board", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2022 + ], + "xaxis": "x", + "y": [ + 1067.47 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=Business
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Business", + "marker": { + "color": "#EF553B", + "pattern": { + "shape": "" + } + }, + "name": "Business", + "offsetgroup": "Business", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2019 + ], + "xaxis": "x", + "y": [ + 250 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=Candidate committee
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Candidate committee", + "marker": { + "color": "#00cc96", + "pattern": { + "shape": "" + } + }, + "name": "Candidate committee", + "offsetgroup": "Candidate committee", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 294413.35, + 151774.38, + 520202, + 84590.11, + 641245.11, + 1810 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=Lobbyist
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Lobbyist", + "marker": { + "color": "#ab63fa", + "pattern": { + "shape": "" + } + }, + "name": "Lobbyist", + "offsetgroup": "Lobbyist", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 1901804.56, + 2715933.95, + 1400427.52, + 2804633.25, + 1953755.2, + 2950 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=Local candidate committee registered with Hennepin County
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Local candidate committee registered with Hennepin County", + "marker": { + "color": "#FFA15A", + "pattern": { + "shape": "" + } + }, + "name": "Local candidate committee registered with Hennepin County", + "offsetgroup": "Local candidate committee registered with Hennepin County", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 650, + 6750, + 1490 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=Non-lobbyist individual
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Non-lobbyist individual", + "marker": { + "color": "#19d3f3", + "pattern": { + "shape": "" + } + }, + "name": "Non-lobbyist individual", + "offsetgroup": "Non-lobbyist individual", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 34883943.13, + 13551729.61, + 23319309.17, + 17640251.09, + 46536784.82, + 162987.49 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=Other
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Other", + "marker": { + "color": "#FF6692", + "pattern": { + "shape": "" + } + }, + "name": "Other", + "offsetgroup": "Other", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 222136.57, + 165915.45, + 194767.4, + 170847.48, + 239708.38, + 93846.9 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=Political committee or fund
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Political committee or fund", + "marker": { + "color": "#B6E880", + "pattern": { + "shape": "" + } + }, + "name": "Political committee or fund", + "offsetgroup": "Political committee or fund", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022 + ], + "xaxis": "x", + "y": [ + 7583168.95, + 8642981.92, + 6856305.01, + 5878779.67, + 6411623.78 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=Political party unit
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Political party unit", + "marker": { + "color": "#FF97FF", + "pattern": { + "shape": "" + } + }, + "name": "Political party unit", + "offsetgroup": "Political party unit", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 2561953.7, + 343542.51, + 1993899.57, + 222952.33, + 1624652.29, + 877.53 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=Self
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Self", + "marker": { + "color": "#FECB52", + "pattern": { + "shape": "" + } + }, + "name": "Self", + "offsetgroup": "Self", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 333696.61, + 192777.78, + 327697.4, + 96724.28, + 268027.56, + 1000 + ], + "yaxis": "y" + } + ], + "layout": { + "barmode": "relative", + "legend": { + "title": { + "text": "Donor Type" + }, + "tracegroupgap": 0 + }, + "template": { + "data": { + "bar": [ + { + "error_x": { + "color": "#2a3f5f" + }, + "error_y": { + "color": "#2a3f5f" + }, + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "bar" + } + ], + "barpolar": [ + { + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "barpolar" + } + ], + "carpet": [ + { + "aaxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "baxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "type": "carpet" + } + ], + "choropleth": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "choropleth" + } + ], + "contour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "contour" + } + ], + "contourcarpet": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "contourcarpet" + } + ], + "heatmap": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmap" + } + ], + "heatmapgl": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmapgl" + } + ], + "histogram": [ + { + "marker": { + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "histogram" + } + ], + "histogram2d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2d" + } + ], + "histogram2dcontour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2dcontour" + } + ], + "mesh3d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "mesh3d" + } + ], + "parcoords": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "parcoords" + } + ], + "pie": [ + { + "automargin": true, + "type": "pie" + } + ], + "scatter": [ + { + "fillpattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + }, + "type": "scatter" + } + ], + "scatter3d": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatter3d" + } + ], + "scattercarpet": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattercarpet" + } + ], + "scattergeo": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergeo" + } + ], + "scattergl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergl" + } + ], + "scattermapbox": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattermapbox" + } + ], + "scatterpolar": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolar" + } + ], + "scatterpolargl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolargl" + } + ], + "scatterternary": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterternary" + } + ], + "surface": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "surface" + } + ], + "table": [ + { + "cells": { + "fill": { + "color": "#EBF0F8" + }, + "line": { + "color": "white" + } + }, + "header": { + "fill": { + "color": "#C8D4E3" + }, + "line": { + "color": "white" + } + }, + "type": "table" + } + ] + }, + "layout": { + "annotationdefaults": { + "arrowcolor": "#2a3f5f", + "arrowhead": 0, + "arrowwidth": 1 + }, + "autotypenumbers": "strict", + "coloraxis": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "colorscale": { + "diverging": [ + [ + 0, + "#8e0152" + ], + [ + 0.1, + "#c51b7d" + ], + [ + 0.2, + "#de77ae" + ], + [ + 0.3, + "#f1b6da" + ], + [ + 0.4, + "#fde0ef" + ], + [ + 0.5, + "#f7f7f7" + ], + [ + 0.6, + "#e6f5d0" + ], + [ + 0.7, + "#b8e186" + ], + [ + 0.8, + "#7fbc41" + ], + [ + 0.9, + "#4d9221" + ], + [ + 1, + "#276419" + ] + ], + "sequential": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "sequentialminus": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ] + }, + "colorway": [ + "#636efa", + "#EF553B", + "#00cc96", + "#ab63fa", + "#FFA15A", + "#19d3f3", + "#FF6692", + "#B6E880", + "#FF97FF", + "#FECB52" + ], + "font": { + "color": "#2a3f5f" + }, + "geo": { + "bgcolor": "white", + "lakecolor": "white", + "landcolor": "#E5ECF6", + "showlakes": true, + "showland": true, + "subunitcolor": "white" + }, + "hoverlabel": { + "align": "left" + }, + "hovermode": "closest", + "mapbox": { + "style": "light" + }, + "paper_bgcolor": "white", + "plot_bgcolor": "#E5ECF6", + "polar": { + "angularaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "radialaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "scene": { + "xaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "yaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "zaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + } + }, + "shapedefaults": { + "line": { + "color": "#2a3f5f" + } + }, + "ternary": { + "aaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "baxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "caxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "title": { + "x": 0.05 + }, + "xaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + }, + "yaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + } + } + }, + "title": { + "text": "Donations by Donor Type From 2018 To 2023" + }, + "xaxis": { + "anchor": "y", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Year" + } + }, + "yaxis": { + "anchor": "x", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Total Contributions" + } + } + } + }, + "text/html": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "grouped2 = filtered_df.groupby(['Year', 'DonorType'])['TotalAmount'].sum().reset_index()\n", + "\n", + "grouped2['FullDonorType'] = grouped2['DonorType'].map(donor_type_mapping)\n", + "\n", + "fig = px.bar(\n", + " grouped2,\n", + " x='Year',\n", + " y='TotalAmount',\n", + " color='FullDonorType',\n", + " title='Donations by Donor Type From 2018 To 2023',\n", + " labels={\"Year\": \"Year\", \"TotalAmount\": \"Total Contributions\", \"FullDonorType\": \"Donor Type\"},\n", + " category_orders={\"FullDonorType\": sorted(donor_type_mapping.values())}\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.plotly.v1+json": { + "config": { + "plotlyServerURL": "https://plot.ly" + }, + "data": [ + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "Association not registered with the Board", + "marker": { + "color": "#636efa", + "pattern": { + "shape": "" + } + }, + "name": "Association not registered with the Board", + "offsetgroup": "Association not registered with the Board", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "Association not registered with the Board" + ], + "xaxis": "x", + "y": [ + 1067.47 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "Candidate committee", + "marker": { + "color": "#00cc96", + "pattern": { + "shape": "" + } + }, + "name": "Candidate committee", + "offsetgroup": "Candidate committee", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "Candidate committee" + ], + "xaxis": "x", + "y": [ + 641245.11 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "Lobbyist", + "marker": { + "color": "#ab63fa", + "pattern": { + "shape": "" + } + }, + "name": "Lobbyist", + "offsetgroup": "Lobbyist", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "Lobbyist" + ], + "xaxis": "x", + "y": [ + 1953755.2 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "Local candidate committee registered with Hennepin County", + "marker": { + "color": "#FFA15A", + "pattern": { + "shape": "" + } + }, + "name": "Local candidate committee registered with Hennepin County", + "offsetgroup": "Local candidate committee registered with Hennepin County", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "Local candidate committee registered with Hennepin County" + ], + "xaxis": "x", + "y": [ + 6750 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "Non-lobbyist individual", + "marker": { + "color": "#19d3f3", + "pattern": { + "shape": "" + } + }, + "name": "Non-lobbyist individual", + "offsetgroup": "Non-lobbyist individual", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "Non-lobbyist individual" + ], + "xaxis": "x", + "y": [ + 46536784.82 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "Other", + "marker": { + "color": "#FF6692", + "pattern": { + "shape": "" + } + }, + "name": "Other", + "offsetgroup": "Other", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "Other" + ], + "xaxis": "x", + "y": [ + 239708.38 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "Political committee or fund", + "marker": { + "color": "#B6E880", + "pattern": { + "shape": "" + } + }, + "name": "Political committee or fund", + "offsetgroup": "Political committee or fund", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "Political committee or fund" + ], + "xaxis": "x", + "y": [ + 6411623.78 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "Political party unit", + "marker": { + "color": "#FF97FF", + "pattern": { + "shape": "" + } + }, + "name": "Political party unit", + "offsetgroup": "Political party unit", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "Political party unit" + ], + "xaxis": "x", + "y": [ + 1624652.29 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Donor Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "Self", + "marker": { + "color": "#FECB52", + "pattern": { + "shape": "" + } + }, + "name": "Self", + "offsetgroup": "Self", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "Self" + ], + "xaxis": "x", + "y": [ + 268027.56 + ], + "yaxis": "y" + } + ], + "layout": { + "annotations": [ + { + "font": {}, + "showarrow": false, + "text": "Year=2022", + "x": 0.5, + "xanchor": "center", + "xref": "paper", + "y": 1, + "yanchor": "bottom", + "yref": "paper" + } + ], + "barmode": "relative", + "legend": { + "title": { + "text": "Donor Type" + }, + "tracegroupgap": 0 + }, + "template": { + "data": { + "bar": [ + { + "error_x": { + "color": "#2a3f5f" + }, + "error_y": { + "color": "#2a3f5f" + }, + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "bar" + } + ], + "barpolar": [ + { + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "barpolar" + } + ], + "carpet": [ + { + "aaxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "baxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "type": "carpet" + } + ], + "choropleth": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "choropleth" + } + ], + "contour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "contour" + } + ], + "contourcarpet": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "contourcarpet" + } + ], + "heatmap": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmap" + } + ], + "heatmapgl": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmapgl" + } + ], + "histogram": [ + { + "marker": { + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "histogram" + } + ], + "histogram2d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2d" + } + ], + "histogram2dcontour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2dcontour" + } + ], + "mesh3d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "mesh3d" + } + ], + "parcoords": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "parcoords" + } + ], + "pie": [ + { + "automargin": true, + "type": "pie" + } + ], + "scatter": [ + { + "fillpattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + }, + "type": "scatter" + } + ], + "scatter3d": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatter3d" + } + ], + "scattercarpet": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattercarpet" + } + ], + "scattergeo": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergeo" + } + ], + "scattergl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergl" + } + ], + "scattermapbox": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattermapbox" + } + ], + "scatterpolar": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolar" + } + ], + "scatterpolargl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolargl" + } + ], + "scatterternary": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterternary" + } + ], + "surface": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "surface" + } + ], + "table": [ + { + "cells": { + "fill": { + "color": "#EBF0F8" + }, + "line": { + "color": "white" + } + }, + "header": { + "fill": { + "color": "#C8D4E3" + }, + "line": { + "color": "white" + } + }, + "type": "table" + } + ] + }, + "layout": { + "annotationdefaults": { + "arrowcolor": "#2a3f5f", + "arrowhead": 0, + "arrowwidth": 1 + }, + "autotypenumbers": "strict", + "coloraxis": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "colorscale": { + "diverging": [ + [ + 0, + "#8e0152" + ], + [ + 0.1, + "#c51b7d" + ], + [ + 0.2, + "#de77ae" + ], + [ + 0.3, + "#f1b6da" + ], + [ + 0.4, + "#fde0ef" + ], + [ + 0.5, + "#f7f7f7" + ], + [ + 0.6, + "#e6f5d0" + ], + [ + 0.7, + "#b8e186" + ], + [ + 0.8, + "#7fbc41" + ], + [ + 0.9, + "#4d9221" + ], + [ + 1, + "#276419" + ] + ], + "sequential": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "sequentialminus": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ] + }, + "colorway": [ + "#636efa", + "#EF553B", + "#00cc96", + "#ab63fa", + "#FFA15A", + "#19d3f3", + "#FF6692", + "#B6E880", + "#FF97FF", + "#FECB52" + ], + "font": { + "color": "#2a3f5f" + }, + "geo": { + "bgcolor": "white", + "lakecolor": "white", + "landcolor": "#E5ECF6", + "showlakes": true, + "showland": true, + "subunitcolor": "white" + }, + "hoverlabel": { + "align": "left" + }, + "hovermode": "closest", + "mapbox": { + "style": "light" + }, + "paper_bgcolor": "white", + "plot_bgcolor": "#E5ECF6", + "polar": { + "angularaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "radialaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "scene": { + "xaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "yaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "zaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + } + }, + "shapedefaults": { + "line": { + "color": "#2a3f5f" + } + }, + "ternary": { + "aaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "baxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "caxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "title": { + "x": 0.05 + }, + "xaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + }, + "yaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + } + } + }, + "title": { + "text": "Donations by Donor Type in 2022" + }, + "xaxis": { + "anchor": "y", + "categoryarray": [ + "Association not registered with the Board", + "Business", + "Candidate committee", + "Lobbyist", + "Local candidate committee registered with Hennepin County", + "Non-lobbyist individual", + "Other", + "Political committee or fund", + "Political party unit", + "Self" + ], + "categoryorder": "array", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Donor Type" + } + }, + "yaxis": { + "anchor": "x", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Total Contributions" + } + } + } + }, + "text/html": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "filtered_df2 = contribution_df[contribution_df['Year'] == 2022]\n", + "grouped3 = filtered_df2.groupby(['Year', 'DonorType'])['TotalAmount'].sum().reset_index()\n", + "\n", + "grouped3['FullDonorType'] = grouped3['DonorType'].map(donor_type_mapping)\n", + "\n", + "fig = px.bar(\n", + " grouped3,\n", + " x='FullDonorType', # Use 'FullDonorType' as the x-axis\n", + " y='TotalAmount',\n", + " color='FullDonorType',\n", + " title='Donations by Donor Type in 2022',\n", + " labels={\"TotalAmount\": \"Total Contributions\", \"FullDonorType\": \"Donor Type\"},\n", + " category_orders={\"FullDonorType\": sorted(donor_type_mapping.values())},\n", + " facet_col='Year', # Use facet_col to create separate bars for each year (optional)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Observations and Interpretations\n", + "1. Individuals, excluding lobbyists, constitute the largest share of contributions in the MN dataset.\n", + "2. The second most substantial contributor category is General Purpose Political Committee or Fund, followed by lobbyists.\n", + "3. Contributions from other donor types are notably lower throughout the years.\n", + "4. Analyzing a sample from 2018 to 2022, we observe a cyclical pattern with a major increase in contributions, followed by three years of reduced contribution totals. This cycle aligns with the four-year election cycle.\n", + "5. From 1998 to 2023, there are several years with significantly lower contribution amount: 1999, 2001, 2003, 2007, 2011." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2 Compare donation by recipient types" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.plotly.v1+json": { + "config": { + "plotlyServerURL": "https://plot.ly" + }, + "data": [ + { + "alignmentgroup": "True", + "hovertemplate": "Recipient Type=Candidate
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Candidate", + "marker": { + "color": "#636efa", + "pattern": { + "shape": "" + } + }, + "name": "Candidate", + "offsetgroup": "Candidate", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 45075149.29, + 23545018.91, + 31859975.5, + 24773070.92, + 54222877.08, + 40405 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Recipient Type=PCF
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "PCF", + "marker": { + "color": "#EF553B", + "pattern": { + "shape": "" + } + }, + "name": "PCF", + "offsetgroup": "PCF", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 1656145.98, + 1459419.13, + 1701776.18, + 1508361.7, + 2292700.66, + 218273.57 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Recipient Type=PTU
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "PTU", + "marker": { + "color": "#00cc96", + "pattern": { + "shape": "" + } + }, + "name": "PTU", + "offsetgroup": "PTU", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 1050553.6, + 760467.56, + 1052667.39, + 617345.59, + 1176668.38, + 6283.35 + ], + "yaxis": "y" + } + ], + "layout": { + "barmode": "relative", + "legend": { + "title": { + "text": "Recipient Type" + }, + "tracegroupgap": 0 + }, + "template": { + "data": { + "bar": [ + { + "error_x": { + "color": "#2a3f5f" + }, + "error_y": { + "color": "#2a3f5f" + }, + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "bar" + } + ], + "barpolar": [ + { + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "barpolar" + } + ], + "carpet": [ + { + "aaxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "baxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "type": "carpet" + } + ], + "choropleth": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "choropleth" + } + ], + "contour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "contour" + } + ], + "contourcarpet": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "contourcarpet" + } + ], + "heatmap": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmap" + } + ], + "heatmapgl": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmapgl" + } + ], + "histogram": [ + { + "marker": { + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "histogram" + } + ], + "histogram2d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2d" + } + ], + "histogram2dcontour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2dcontour" + } + ], + "mesh3d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "mesh3d" + } + ], + "parcoords": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "parcoords" + } + ], + "pie": [ + { + "automargin": true, + "type": "pie" + } + ], + "scatter": [ + { + "fillpattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + }, + "type": "scatter" + } + ], + "scatter3d": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatter3d" + } + ], + "scattercarpet": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattercarpet" + } + ], + "scattergeo": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergeo" + } + ], + "scattergl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergl" + } + ], + "scattermapbox": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattermapbox" + } + ], + "scatterpolar": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolar" + } + ], + "scatterpolargl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolargl" + } + ], + "scatterternary": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterternary" + } + ], + "surface": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "surface" + } + ], + "table": [ + { + "cells": { + "fill": { + "color": "#EBF0F8" + }, + "line": { + "color": "white" + } + }, + "header": { + "fill": { + "color": "#C8D4E3" + }, + "line": { + "color": "white" + } + }, + "type": "table" + } + ] + }, + "layout": { + "annotationdefaults": { + "arrowcolor": "#2a3f5f", + "arrowhead": 0, + "arrowwidth": 1 + }, + "autotypenumbers": "strict", + "coloraxis": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "colorscale": { + "diverging": [ + [ + 0, + "#8e0152" + ], + [ + 0.1, + "#c51b7d" + ], + [ + 0.2, + "#de77ae" + ], + [ + 0.3, + "#f1b6da" + ], + [ + 0.4, + "#fde0ef" + ], + [ + 0.5, + "#f7f7f7" + ], + [ + 0.6, + "#e6f5d0" + ], + [ + 0.7, + "#b8e186" + ], + [ + 0.8, + "#7fbc41" + ], + [ + 0.9, + "#4d9221" + ], + [ + 1, + "#276419" + ] + ], + "sequential": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "sequentialminus": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ] + }, + "colorway": [ + "#636efa", + "#EF553B", + "#00cc96", + "#ab63fa", + "#FFA15A", + "#19d3f3", + "#FF6692", + "#B6E880", + "#FF97FF", + "#FECB52" + ], + "font": { + "color": "#2a3f5f" + }, + "geo": { + "bgcolor": "white", + "lakecolor": "white", + "landcolor": "#E5ECF6", + "showlakes": true, + "showland": true, + "subunitcolor": "white" + }, + "hoverlabel": { + "align": "left" + }, + "hovermode": "closest", + "mapbox": { + "style": "light" + }, + "paper_bgcolor": "white", + "plot_bgcolor": "#E5ECF6", + "polar": { + "angularaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "radialaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "scene": { + "xaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "yaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "zaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + } + }, + "shapedefaults": { + "line": { + "color": "#2a3f5f" + } + }, + "ternary": { + "aaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "baxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "caxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "title": { + "x": 0.05 + }, + "xaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + }, + "yaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + } + } + }, + "title": { + "text": "Donations by Recipient Type from 2018 to 2023" + }, + "xaxis": { + "anchor": "y", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Year" + } + }, + "yaxis": { + "anchor": "x", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Total Contributions" + } + } + } + }, + "text/html": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "grouped4 = filtered_df.groupby(['Year', 'RecipientType'])['TotalAmount'].sum().reset_index()\n", + "\n", + "fig = px.bar(\n", + " grouped4,\n", + " x='Year',\n", + " y='TotalAmount',\n", + " color='RecipientType',\n", + " title='Donations by Recipient Type from 2018 to 2023',\n", + " labels={\"Year\": \"Year\", \"TotalAmount\": \"Total Contributions\", 'RecipientType': 'Recipient Type'},\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.plotly.v1+json": { + "config": { + "plotlyServerURL": "https://plot.ly" + }, + "data": [ + { + "alignmentgroup": "True", + "hovertemplate": "Recipient Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "Candidate", + "marker": { + "color": "#636efa", + "pattern": { + "shape": "" + } + }, + "name": "Candidate", + "offsetgroup": "Candidate", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "Candidate" + ], + "xaxis": "x", + "y": [ + 54222877.08 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Recipient Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "PCF", + "marker": { + "color": "#EF553B", + "pattern": { + "shape": "" + } + }, + "name": "PCF", + "offsetgroup": "PCF", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "PCF" + ], + "xaxis": "x", + "y": [ + 2292700.66 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Recipient Type=%{x}
Year=2022
Total Contributions=%{y}", + "legendgroup": "PTU", + "marker": { + "color": "#00cc96", + "pattern": { + "shape": "" + } + }, + "name": "PTU", + "offsetgroup": "PTU", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + "PTU" + ], + "xaxis": "x", + "y": [ + 1176668.38 + ], + "yaxis": "y" + } + ], + "layout": { + "annotations": [ + { + "font": {}, + "showarrow": false, + "text": "Year=2022", + "x": 0.5, + "xanchor": "center", + "xref": "paper", + "y": 1, + "yanchor": "bottom", + "yref": "paper" + } + ], + "barmode": "relative", + "legend": { + "title": { + "text": "Recipient Type" + }, + "tracegroupgap": 0 + }, + "template": { + "data": { + "bar": [ + { + "error_x": { + "color": "#2a3f5f" + }, + "error_y": { + "color": "#2a3f5f" + }, + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "bar" + } + ], + "barpolar": [ + { + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "barpolar" + } + ], + "carpet": [ + { + "aaxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "baxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "type": "carpet" + } + ], + "choropleth": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "choropleth" + } + ], + "contour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "contour" + } + ], + "contourcarpet": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "contourcarpet" + } + ], + "heatmap": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmap" + } + ], + "heatmapgl": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmapgl" + } + ], + "histogram": [ + { + "marker": { + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "histogram" + } + ], + "histogram2d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2d" + } + ], + "histogram2dcontour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2dcontour" + } + ], + "mesh3d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "mesh3d" + } + ], + "parcoords": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "parcoords" + } + ], + "pie": [ + { + "automargin": true, + "type": "pie" + } + ], + "scatter": [ + { + "fillpattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + }, + "type": "scatter" + } + ], + "scatter3d": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatter3d" + } + ], + "scattercarpet": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattercarpet" + } + ], + "scattergeo": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergeo" + } + ], + "scattergl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergl" + } + ], + "scattermapbox": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattermapbox" + } + ], + "scatterpolar": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolar" + } + ], + "scatterpolargl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolargl" + } + ], + "scatterternary": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterternary" + } + ], + "surface": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "surface" + } + ], + "table": [ + { + "cells": { + "fill": { + "color": "#EBF0F8" + }, + "line": { + "color": "white" + } + }, + "header": { + "fill": { + "color": "#C8D4E3" + }, + "line": { + "color": "white" + } + }, + "type": "table" + } + ] + }, + "layout": { + "annotationdefaults": { + "arrowcolor": "#2a3f5f", + "arrowhead": 0, + "arrowwidth": 1 + }, + "autotypenumbers": "strict", + "coloraxis": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "colorscale": { + "diverging": [ + [ + 0, + "#8e0152" + ], + [ + 0.1, + "#c51b7d" + ], + [ + 0.2, + "#de77ae" + ], + [ + 0.3, + "#f1b6da" + ], + [ + 0.4, + "#fde0ef" + ], + [ + 0.5, + "#f7f7f7" + ], + [ + 0.6, + "#e6f5d0" + ], + [ + 0.7, + "#b8e186" + ], + [ + 0.8, + "#7fbc41" + ], + [ + 0.9, + "#4d9221" + ], + [ + 1, + "#276419" + ] + ], + "sequential": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "sequentialminus": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ] + }, + "colorway": [ + "#636efa", + "#EF553B", + "#00cc96", + "#ab63fa", + "#FFA15A", + "#19d3f3", + "#FF6692", + "#B6E880", + "#FF97FF", + "#FECB52" + ], + "font": { + "color": "#2a3f5f" + }, + "geo": { + "bgcolor": "white", + "lakecolor": "white", + "landcolor": "#E5ECF6", + "showlakes": true, + "showland": true, + "subunitcolor": "white" + }, + "hoverlabel": { + "align": "left" + }, + "hovermode": "closest", + "mapbox": { + "style": "light" + }, + "paper_bgcolor": "white", + "plot_bgcolor": "#E5ECF6", + "polar": { + "angularaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "radialaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "scene": { + "xaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "yaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "zaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + } + }, + "shapedefaults": { + "line": { + "color": "#2a3f5f" + } + }, + "ternary": { + "aaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "baxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "caxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "title": { + "x": 0.05 + }, + "xaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + }, + "yaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + } + } + }, + "title": { + "text": "Donations by Recipient Type in 2022" + }, + "xaxis": { + "anchor": "y", + "categoryarray": [ + "Candidate", + "PCF", + "PTU" + ], + "categoryorder": "array", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Recipient Type" + } + }, + "yaxis": { + "anchor": "x", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Total Contributions" + } + } + } + }, + "text/html": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "grouped4 = filtered_df2.groupby(['Year', 'RecipientType'])['TotalAmount'].sum().reset_index()\n", + "\n", + "fig = px.bar(\n", + " grouped4,\n", + " x='RecipientType', # Use 'RecipientType' as the x-axis\n", + " y='TotalAmount',\n", + " color='RecipientType',\n", + " title='Donations by Recipient Type in 2022',\n", + " labels={\"TotalAmount\": \"Total Contributions\", 'RecipientType': 'Recipient Type'},\n", + " facet_col='Year', # Use facet_col to create separate bars for each year (optional)\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Observations and Interpretations\n", + "1. Candidates, as the recipients, make up the overwhelming majority of contributions.\n", + "2. Examining the period from 1998 to 2023, a distinct cyclical pattern emerges, characterized by alternating years of increased and decreased contributions, which may correspond to congressional elections or MN state house representatives elections which take place every two years.\n", + "3. Starting in 2012, recipient types \"Political Committee or Fund\" and \"Political Party Unit\" began receiving a larger share of contributions compared to prior years." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [], + "source": [ + "race_type_mapping = {\n", + " 'AG': 'Attorney General',\n", + " 'AP': 'State Appeals Court Judge',\n", + " 'DC': 'State District Court Judge',\n", + " 'GC': 'Governor',\n", + " 'House': 'State Representative',\n", + " 'SA': 'State Auditor',\n", + " 'SC': 'State Supreme Court Justice',\n", + " 'SS': 'Secretary of State',\n", + " 'ST': 'State Treasurer',\n", + " 'Senate': 'Senate'\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['AG', nan, 'GC', 'House', 'Senate', 'SA', 'SS', 'SC', 'DC', 'AP',\n", + " 'ST'], dtype=object)" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "contribution_df['OfficeSought'].unique()" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.plotly.v1+json": { + "config": { + "plotlyServerURL": "https://plot.ly" + }, + "data": [ + { + "alignmentgroup": "True", + "hovertemplate": "Full Race Type=Attorney General
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Attorney General", + "marker": { + "color": "#636efa", + "pattern": { + "shape": "" + } + }, + "name": "Attorney General", + "offsetgroup": "Attorney General", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 4003902.58, + 392454.04, + 293103, + 895901.69, + 4278017.76, + 1500 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Full Race Type=Governor
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Governor", + "marker": { + "color": "#EF553B", + "pattern": { + "shape": "" + } + }, + "name": "Governor", + "offsetgroup": "Governor", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 14891205.43, + 1993660.43, + 1426981.5, + 7058411.1, + 19091855.92, + 32505 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Full Race Type=Secretary of State
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Secretary of State", + "marker": { + "color": "#00cc96", + "pattern": { + "shape": "" + } + }, + "name": "Secretary of State", + "offsetgroup": "Secretary of State", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022 + ], + "xaxis": "x", + "y": [ + 1137408.7, + 372350, + 362450, + 1167468.76, + 4795282.29 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Full Race Type=Senate
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "Senate", + "marker": { + "color": "#ab63fa", + "pattern": { + "shape": "" + } + }, + "name": "Senate", + "offsetgroup": "Senate", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 3036186.27, + 11018745.95, + 14723556.76, + 7701612.07, + 11084849.84, + 5400 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Full Race Type=State Appeals Court Judge
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "State Appeals Court Judge", + "marker": { + "color": "#FFA15A", + "pattern": { + "shape": "" + } + }, + "name": "State Appeals Court Judge", + "offsetgroup": "State Appeals Court Judge", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018 + ], + "xaxis": "x", + "y": [ + 139482.88 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Full Race Type=State Auditor
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "State Auditor", + "marker": { + "color": "#19d3f3", + "pattern": { + "shape": "" + } + }, + "name": "State Auditor", + "offsetgroup": "State Auditor", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022 + ], + "xaxis": "x", + "y": [ + 409451.94, + 50750, + 20335.42, + 85461.67, + 567693.37 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Full Race Type=State District Court Judge
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "State District Court Judge", + "marker": { + "color": "#FF6692", + "pattern": { + "shape": "" + } + }, + "name": "State District Court Judge", + "offsetgroup": "State District Court Judge", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2022 + ], + "xaxis": "x", + "y": [ + 611312.41, + 2750, + 98175.46, + 32351.04 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Full Race Type=State Representative
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "State Representative", + "marker": { + "color": "#B6E880", + "pattern": { + "shape": "" + } + }, + "name": "State Representative", + "offsetgroup": "State Representative", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 20544735.5, + 9615308.49, + 14631841.16, + 7864215.63, + 14370826.86, + 1000 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Full Race Type=State Supreme Court Justice
Year=%{x}
Total Contributions=%{y}", + "legendgroup": "State Supreme Court Justice", + "marker": { + "color": "#FF97FF", + "pattern": { + "shape": "" + } + }, + "name": "State Supreme Court Justice", + "offsetgroup": "State Supreme Court Justice", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2022 + ], + "xaxis": "x", + "y": [ + 301463.58, + 99000, + 303532.2, + 2000 + ], + "yaxis": "y" + } + ], + "layout": { + "barmode": "relative", + "legend": { + "title": { + "text": "Full Race Type" + }, + "tracegroupgap": 0 + }, + "template": { + "data": { + "bar": [ + { + "error_x": { + "color": "#2a3f5f" + }, + "error_y": { + "color": "#2a3f5f" + }, + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "bar" + } + ], + "barpolar": [ + { + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "barpolar" + } + ], + "carpet": [ + { + "aaxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "baxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "type": "carpet" + } + ], + "choropleth": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "choropleth" + } + ], + "contour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "contour" + } + ], + "contourcarpet": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "contourcarpet" + } + ], + "heatmap": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmap" + } + ], + "heatmapgl": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmapgl" + } + ], + "histogram": [ + { + "marker": { + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "histogram" + } + ], + "histogram2d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2d" + } + ], + "histogram2dcontour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2dcontour" + } + ], + "mesh3d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "mesh3d" + } + ], + "parcoords": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "parcoords" + } + ], + "pie": [ + { + "automargin": true, + "type": "pie" + } + ], + "scatter": [ + { + "fillpattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + }, + "type": "scatter" + } + ], + "scatter3d": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatter3d" + } + ], + "scattercarpet": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattercarpet" + } + ], + "scattergeo": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergeo" + } + ], + "scattergl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergl" + } + ], + "scattermapbox": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattermapbox" + } + ], + "scatterpolar": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolar" + } + ], + "scatterpolargl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolargl" + } + ], + "scatterternary": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterternary" + } + ], + "surface": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "surface" + } + ], + "table": [ + { + "cells": { + "fill": { + "color": "#EBF0F8" + }, + "line": { + "color": "white" + } + }, + "header": { + "fill": { + "color": "#C8D4E3" + }, + "line": { + "color": "white" + } + }, + "type": "table" + } + ] + }, + "layout": { + "annotationdefaults": { + "arrowcolor": "#2a3f5f", + "arrowhead": 0, + "arrowwidth": 1 + }, + "autotypenumbers": "strict", + "coloraxis": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "colorscale": { + "diverging": [ + [ + 0, + "#8e0152" + ], + [ + 0.1, + "#c51b7d" + ], + [ + 0.2, + "#de77ae" + ], + [ + 0.3, + "#f1b6da" + ], + [ + 0.4, + "#fde0ef" + ], + [ + 0.5, + "#f7f7f7" + ], + [ + 0.6, + "#e6f5d0" + ], + [ + 0.7, + "#b8e186" + ], + [ + 0.8, + "#7fbc41" + ], + [ + 0.9, + "#4d9221" + ], + [ + 1, + "#276419" + ] + ], + "sequential": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "sequentialminus": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ] + }, + "colorway": [ + "#636efa", + "#EF553B", + "#00cc96", + "#ab63fa", + "#FFA15A", + "#19d3f3", + "#FF6692", + "#B6E880", + "#FF97FF", + "#FECB52" + ], + "font": { + "color": "#2a3f5f" + }, + "geo": { + "bgcolor": "white", + "lakecolor": "white", + "landcolor": "#E5ECF6", + "showlakes": true, + "showland": true, + "subunitcolor": "white" + }, + "hoverlabel": { + "align": "left" + }, + "hovermode": "closest", + "mapbox": { + "style": "light" + }, + "paper_bgcolor": "white", + "plot_bgcolor": "#E5ECF6", + "polar": { + "angularaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "radialaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "scene": { + "xaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "yaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "zaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + } + }, + "shapedefaults": { + "line": { + "color": "#2a3f5f" + } + }, + "ternary": { + "aaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "baxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "caxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "title": { + "x": 0.05 + }, + "xaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + }, + "yaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + } + } + }, + "title": { + "text": "Donations by Candidate Recipient Race From 2018 To 2023" + }, + "xaxis": { + "anchor": "y", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Year" + } + }, + "yaxis": { + "anchor": "x", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Total Contributions" + } + } + } + }, + "text/html": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "grouped5 = filtered_df.groupby(['Year', 'OfficeSought'])['TotalAmount'].sum().reset_index()\n", + "grouped5['FullRaceType'] = grouped5['OfficeSought'].map(race_type_mapping)\n", + "\n", + "fig = px.bar(\n", + " grouped5,\n", + " x='Year',\n", + " y='TotalAmount',\n", + " color='FullRaceType',\n", + " title='Donations by Candidate Recipient Race From 2018 To 2023',\n", + " labels={\"Year\": \"Year\", \"TotalAmount\": \"Total Contributions\", 'FullRaceType': 'Full Race Type'},\n", + " category_orders={\"FullRaceType\": sorted(race_type_mapping.values())}\n", + ")\n", + "\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Clearly, state senators and house representatives recieve the most contributions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# MN Expenditure EDA" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Read in and Preprocess Datasets" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "df_independent = pd.read_csv('/project/data/independent_exp.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/tmp/ipykernel_56567/3801224637.py:2: DeprecationWarning: Call to deprecated function preprocess_expenditure.\n", + " df_expenditure = preprocess_expenditure(df_independent)\n" + ] + } + ], + "source": [ + "from utils.MN_util import preprocess_expenditure\n", + "df_expenditure = preprocess_expenditure(df_independent)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "SpenderRegNum int64\n", + "SpenderName object\n", + "SpenderType object\n", + "VendorName object\n", + "Amount float64\n", + "UnpaidAmount float64\n", + "Date object\n", + "Year int64\n", + "Purpose object\n", + "Type object\n", + "In-kind? object\n", + "InKindDescription object\n", + "AffectedCommitteeName object\n", + "AffectedCommitteeRegNum float64\n", + "dtype: object" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_expenditure.dtypes" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SpenderRegNumSpenderNameSpenderTypeVendorNameAmountUnpaidAmountDateYearPurposeTypeIn-kind?InKindDescriptionAffectedCommitteeNameAffectedCommitteeRegNum
020003MN DFL State Central CommitteePTU76 Words2500.000.010/28/20222022Production, RadioIndependent ExpenditureNoNaNWalz, Tim Gov Committee18135.0
120003MN DFL State Central CommitteePTU76 Words2500.000.010/28/20222022Production, RadioIndependent ExpenditureNoNaNWalz, Tim Gov Committee18135.0
220003MN DFL State Central CommitteePTU76 Words2500.000.010/28/20222022Production, RadioIndependent ExpenditureNoNaNWalz, Tim Gov Committee18135.0
320003MN DFL State Central CommitteePTU76 Words2500.000.010/28/20222022Production, RadioIndependent ExpenditureNoNaNWalz, Tim Gov Committee18135.0
420003MN DFL State Central CommitteePTU76 Words2500.000.010/28/20222022Production, RadioIndependent ExpenditureNoNaNWalz, Tim Gov Committee18135.0
.............................................
44185880026MN Assoc of Professional Employees Political FundPCFMN Association of Professional Employees163.340.010/28/20202020Employee Expense: Staff costsIndependent ExpenditureNoNaNKent, Susan Senate Committee17443.0
44185980026MN Assoc of Professional Employees Political FundPCFNo Coast Workshop217.390.010/13/20162016Printing and Photocopying: Independent Expendi...Independent ExpenditureNoNaNBayley, Lisa Pritchard House Committee17933.0
44186080026MN Assoc of Professional Employees Political FundPCFNo Coast Workshop217.390.010/13/20162016Printing and Photocopying: Independent Expendi...Independent ExpenditureNoNaNWard, JoAnn House Committee17438.0
44186180026MN Assoc of Professional Employees Political FundPCFNo Coast Workshop144.320.011/03/20162016Postage/ Delivery: Independent Expenditure Mai...Independent ExpenditureNoNaNKoenen, Lyle J Senate Committee17407.0
44186280026MN Assoc of Professional Employees Political FundPCFNo Coast Workshop144.330.011/03/20162016Postage/ Delivery: Independent Expenditure Mai...Independent ExpenditureNoNaNPryor, Laurie House Committee18031.0
\n", + "

441863 rows × 14 columns

\n", + "
" + ], + "text/plain": [ + " SpenderRegNum SpenderName \\\n", + "0 20003 MN DFL State Central Committee \n", + "1 20003 MN DFL State Central Committee \n", + "2 20003 MN DFL State Central Committee \n", + "3 20003 MN DFL State Central Committee \n", + "4 20003 MN DFL State Central Committee \n", + "... ... ... \n", + "441858 80026 MN Assoc of Professional Employees Political Fund \n", + "441859 80026 MN Assoc of Professional Employees Political Fund \n", + "441860 80026 MN Assoc of Professional Employees Political Fund \n", + "441861 80026 MN Assoc of Professional Employees Political Fund \n", + "441862 80026 MN Assoc of Professional Employees Political Fund \n", + "\n", + " SpenderType VendorName Amount \\\n", + "0 PTU 76 Words 2500.00 \n", + "1 PTU 76 Words 2500.00 \n", + "2 PTU 76 Words 2500.00 \n", + "3 PTU 76 Words 2500.00 \n", + "4 PTU 76 Words 2500.00 \n", + "... ... ... ... \n", + "441858 PCF MN Association of Professional Employees 163.34 \n", + "441859 PCF No Coast Workshop 217.39 \n", + "441860 PCF No Coast Workshop 217.39 \n", + "441861 PCF No Coast Workshop 144.32 \n", + "441862 PCF No Coast Workshop 144.33 \n", + "\n", + " UnpaidAmount Date Year \\\n", + "0 0.0 10/28/2022 2022 \n", + "1 0.0 10/28/2022 2022 \n", + "2 0.0 10/28/2022 2022 \n", + "3 0.0 10/28/2022 2022 \n", + "4 0.0 10/28/2022 2022 \n", + "... ... ... ... \n", + "441858 0.0 10/28/2020 2020 \n", + "441859 0.0 10/13/2016 2016 \n", + "441860 0.0 10/13/2016 2016 \n", + "441861 0.0 11/03/2016 2016 \n", + "441862 0.0 11/03/2016 2016 \n", + "\n", + " Purpose \\\n", + "0 Production, Radio \n", + "1 Production, Radio \n", + "2 Production, Radio \n", + "3 Production, Radio \n", + "4 Production, Radio \n", + "... ... \n", + "441858 Employee Expense: Staff costs \n", + "441859 Printing and Photocopying: Independent Expendi... \n", + "441860 Printing and Photocopying: Independent Expendi... \n", + "441861 Postage/ Delivery: Independent Expenditure Mai... \n", + "441862 Postage/ Delivery: Independent Expenditure Mai... \n", + "\n", + " Type In-kind? InKindDescription \\\n", + "0 Independent Expenditure No NaN \n", + "1 Independent Expenditure No NaN \n", + "2 Independent Expenditure No NaN \n", + "3 Independent Expenditure No NaN \n", + "4 Independent Expenditure No NaN \n", + "... ... ... ... \n", + "441858 Independent Expenditure No NaN \n", + "441859 Independent Expenditure No NaN \n", + "441860 Independent Expenditure No NaN \n", + "441861 Independent Expenditure No NaN \n", + "441862 Independent Expenditure No NaN \n", + "\n", + " AffectedCommitteeName AffectedCommitteeRegNum \n", + "0 Walz, Tim Gov Committee 18135.0 \n", + "1 Walz, Tim Gov Committee 18135.0 \n", + "2 Walz, Tim Gov Committee 18135.0 \n", + "3 Walz, Tim Gov Committee 18135.0 \n", + "4 Walz, Tim Gov Committee 18135.0 \n", + "... ... ... \n", + "441858 Kent, Susan Senate Committee 17443.0 \n", + "441859 Bayley, Lisa Pritchard House Committee 17933.0 \n", + "441860 Ward, JoAnn House Committee 17438.0 \n", + "441861 Koenen, Lyle J Senate Committee 17407.0 \n", + "441862 Pryor, Laurie House Committee 18031.0 \n", + "\n", + "[441863 rows x 14 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_expenditure" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['PTU' 'PCF']\n" + ] + } + ], + "source": [ + "print(df_expenditure['SpenderType'].unique())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Check Missing Values and Drop Unclassifiable Data" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "SpenderRegNum 0\n", + "SpenderName 0\n", + "SpenderType 0\n", + "VendorName 605\n", + "Amount 0\n", + "UnpaidAmount 0\n", + "Date 0\n", + "Year 0\n", + "Purpose 22801\n", + "Type 0\n", + "In-kind? 0\n", + "InKindDescription 437803\n", + "AffectedCommitteeName 11151\n", + "AffectedCommitteeRegNum 11146\n", + "dtype: int64" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_expenditure.isna().sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total number of expenditure entries = 441863\n", + "Total number of nonclassifiable expenditure amount = 306\n", + "Total number of nonclassifiable spenders = 0\n", + "Proportion of nonclassifiable entries = 0.0693%\n" + ] + } + ], + "source": [ + "print('Total number of expenditure entries =', len(df_expenditure))\n", + "no_amount = len(df_expenditure[df_expenditure['Amount'] == 0])\n", + "print('Total number of nonclassifiable expenditure amount =', no_amount)\n", + "no_spender = df_expenditure['SpenderName'].isna().sum()\n", + "print('Total number of nonclassifiable spenders =', no_spender)\n", + "\n", + "unclassifiable_prop = round((no_amount+no_spender)/len(df_expenditure),6)\n", + "print('Proportion of nonclassifiable entries =', f\"{unclassifiable_prop*100}%\")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/tmp/ipykernel_56567/3446921023.py:2: DeprecationWarning: Call to deprecated function drop_nonclassifiable_expenditure.\n", + " df_expenditure = drop_nonclassifiable_expenditure(df_expenditure)\n" + ] + } + ], + "source": [ + "from utils.MN_util import drop_nonclassifiable_expenditure\n", + "df_expenditure = drop_nonclassifiable_expenditure(df_expenditure)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Year\n", + "2018 3.313543e+07\n", + "2019 8.444989e+05\n", + "2020 1.088950e+08\n", + "2021 3.822523e+05\n", + "2022 2.761982e+09\n", + "2023 2.746372e+06\n", + "Name: Amount, dtype: float64" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "filtered_df = df_expenditure[(df_expenditure['Year'] >= 2018) & (df_expenditure['Year'] <= 2023)]\n", + "expenditure_by_year = filtered_df.groupby('Year') \n", + "expenditure_by_year['Amount'].sum()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. Top Spenders and Vendors (2018-2023)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 3.1 Top 10 Spenders" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "exp_by_year_spender = filtered_df.groupby(\n", + " ['Year', 'SpenderName'])['Amount'].sum().reset_index()\n", + "\n", + "top_10_spender = exp_by_year_spender.groupby('Year').apply(\n", + " lambda group: group.nlargest(10, 'Amount')).reset_index(drop=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
YearSpenderNameAmount
02018Alliance for a Better Minnesota Action Fund8.425909e+06
12018MN DFL State Central Committee4.856272e+06
22018MN Victory PAC3.331147e+06
32018HRCC2.016482e+06
42018Freedom Club State PAC1.852156e+06
52018DFL House Caucus1.513452e+06
62018Pro Jobs Majority1.395251e+06
72018MN Jobs Coalition Legislative Fund1.372963e+06
82018Coalition of Minnesota Businesses IEPC8.758362e+05
92018Housing First Fund8.530283e+05
102019MN DFL State Central Committee4.473529e+05
112019Advance Minnesota Independent Expenditure Comm...1.026880e+05
122019Coalition of Minnesota Businesses IEPC8.046022e+04
132019Senate Victory Fund (SVF)4.077900e+04
142019Right Now Minnesota3.890146e+04
152019Alliance for a Better Minnesota Action Fund3.358908e+04
162019Republican Party of Minn3.298655e+04
172019Win at the Door PAC2.585051e+04
182019Planned Parenthood of Minn Pol Action Fund1.400514e+04
192019Housing First Fund9.000000e+03
202020MN DFL State Central Committee2.875138e+07
212020DFL House Caucus1.914924e+07
222020Alliance for a Better Minnesota Action Fund1.165052e+07
232020HRCC8.943520e+06
242020Senate Victory Fund (SVF)8.658417e+06
252020DFL Senate Caucus6.625248e+06
262020Advance Minnesota Independent Expenditure Comm...3.480636e+06
272020Freedom Club State PAC1.869632e+06
282020Planned Parenthood of Minn Pol Action Fund1.788499e+06
292020Everytown for Gun Safety Action Fund1.729842e+06
302021Plan for Progress2.306310e+05
312021MN Realtors Political Action Committee1.329537e+05
322021Planned Parenthood of Minn Pol Action Fund1.024660e+04
332021MN Homeowners Alliance Independent Expenditure...7.671000e+03
342021Carpenters Local 3227.500000e+02
352022MN DFL State Central Committee5.595062e+08
362022Alliance for a Better Minnesota Action Fund4.210824e+08
372022DFL House Caucus2.336432e+08
382022MN for Freedom1.594200e+08
392022Safe Accessible Fair Elections Minnesota1.346198e+08
402022DAGA MN People's Lawyer Project1.128677e+08
412022HRCC1.014986e+08
422022Advance Minnesota Independent Expenditure Comm...1.011120e+08
432022MN Jobs Coalition Legislative Fund9.271635e+07
442022iVote Fund MN8.180324e+07
452023All of Mpls1.410424e+06
462023TakeAction Political Fund7.931047e+05
472023Minneapolis for the Many2.365442e+05
482023Laborers District Council of Minn & ND Pol Fund1.541600e+05
492023Minneapolis Regional Labor Federation1.200286e+05
502023Faith in Minnesota Fund3.046350e+04
512023Move Minnesota Action Fund1.647280e+03
\n", + "
" + ], + "text/plain": [ + " Year SpenderName Amount\n", + "0 2018 Alliance for a Better Minnesota Action Fund 8.425909e+06\n", + "1 2018 MN DFL State Central Committee 4.856272e+06\n", + "2 2018 MN Victory PAC 3.331147e+06\n", + "3 2018 HRCC 2.016482e+06\n", + "4 2018 Freedom Club State PAC 1.852156e+06\n", + "5 2018 DFL House Caucus 1.513452e+06\n", + "6 2018 Pro Jobs Majority 1.395251e+06\n", + "7 2018 MN Jobs Coalition Legislative Fund 1.372963e+06\n", + "8 2018 Coalition of Minnesota Businesses IEPC 8.758362e+05\n", + "9 2018 Housing First Fund 8.530283e+05\n", + "10 2019 MN DFL State Central Committee 4.473529e+05\n", + "11 2019 Advance Minnesota Independent Expenditure Comm... 1.026880e+05\n", + "12 2019 Coalition of Minnesota Businesses IEPC 8.046022e+04\n", + "13 2019 Senate Victory Fund (SVF) 4.077900e+04\n", + "14 2019 Right Now Minnesota 3.890146e+04\n", + "15 2019 Alliance for a Better Minnesota Action Fund 3.358908e+04\n", + "16 2019 Republican Party of Minn 3.298655e+04\n", + "17 2019 Win at the Door PAC 2.585051e+04\n", + "18 2019 Planned Parenthood of Minn Pol Action Fund 1.400514e+04\n", + "19 2019 Housing First Fund 9.000000e+03\n", + "20 2020 MN DFL State Central Committee 2.875138e+07\n", + "21 2020 DFL House Caucus 1.914924e+07\n", + "22 2020 Alliance for a Better Minnesota Action Fund 1.165052e+07\n", + "23 2020 HRCC 8.943520e+06\n", + "24 2020 Senate Victory Fund (SVF) 8.658417e+06\n", + "25 2020 DFL Senate Caucus 6.625248e+06\n", + "26 2020 Advance Minnesota Independent Expenditure Comm... 3.480636e+06\n", + "27 2020 Freedom Club State PAC 1.869632e+06\n", + "28 2020 Planned Parenthood of Minn Pol Action Fund 1.788499e+06\n", + "29 2020 Everytown for Gun Safety Action Fund 1.729842e+06\n", + "30 2021 Plan for Progress 2.306310e+05\n", + "31 2021 MN Realtors Political Action Committee 1.329537e+05\n", + "32 2021 Planned Parenthood of Minn Pol Action Fund 1.024660e+04\n", + "33 2021 MN Homeowners Alliance Independent Expenditure... 7.671000e+03\n", + "34 2021 Carpenters Local 322 7.500000e+02\n", + "35 2022 MN DFL State Central Committee 5.595062e+08\n", + "36 2022 Alliance for a Better Minnesota Action Fund 4.210824e+08\n", + "37 2022 DFL House Caucus 2.336432e+08\n", + "38 2022 MN for Freedom 1.594200e+08\n", + "39 2022 Safe Accessible Fair Elections Minnesota 1.346198e+08\n", + "40 2022 DAGA MN People's Lawyer Project 1.128677e+08\n", + "41 2022 HRCC 1.014986e+08\n", + "42 2022 Advance Minnesota Independent Expenditure Comm... 1.011120e+08\n", + "43 2022 MN Jobs Coalition Legislative Fund 9.271635e+07\n", + "44 2022 iVote Fund MN 8.180324e+07\n", + "45 2023 All of Mpls 1.410424e+06\n", + "46 2023 TakeAction Political Fund 7.931047e+05\n", + "47 2023 Minneapolis for the Many 2.365442e+05\n", + "48 2023 Laborers District Council of Minn & ND Pol Fund 1.541600e+05\n", + "49 2023 Minneapolis Regional Labor Federation 1.200286e+05\n", + "50 2023 Faith in Minnesota Fund 3.046350e+04\n", + "51 2023 Move Minnesota Action Fund 1.647280e+03" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "top_10_spender" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 3.2 Top 10 Vendors" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "exp_by_year_vendor = filtered_df.groupby(\n", + " ['Year', 'VendorName'])['Amount'].sum().reset_index()\n", + "\n", + "top_10_vendor = exp_by_year_vendor.groupby('Year').apply(\n", + " lambda group: group.nlargest(10, 'Amount')).reset_index(drop=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
YearVendorNameAmount
02018Great American Media7.092650e+06
12018Clarify Agency3.904962e+06
22018Nebo Media3.017132e+06
32018Berlin Rosen LTD1.425687e+06
42018Ax Media1.370301e+06
52018Sage Media Planning Placement9.975067e+05
62018Larry John Wright Advertising Inc8.335000e+05
72018Gumbinner Davies7.910048e+05
82018Targeted CreativeCommunication6.808424e+05
92018E Street Group LLC6.327749e+05
102019Canal Partners Media3.626000e+05
112019Berlin Rosen LTD6.373017e+04
122019Nebo Media5.850000e+04
132019Majority Strategies5.400000e+04
142019Weber Johnson Public Affairs4.471800e+04
152019Front Runner Digital3.847900e+04
162019Clarify Agency3.158908e+04
172019Singularis2.251155e+04
182019AppNexus Inc1.951000e+04
192019Connection Strategy LLC1.796022e+04
202020Clarify Agency1.710700e+07
212020Canal Partners Media1.013758e+07
222020Sage Media Planning Placement9.441231e+06
232020Gumbinner Davies8.908617e+06
242020FP1 Digital LLC7.931499e+06
252020Blueprint Interactive5.614262e+06
262020Berlin Rosen LTD4.278030e+06
272020Wildfire Mail3.743325e+06
282020Targeted CreativeCommunication3.461442e+06
292020Left Hook3.343557e+06
302021New Publica LLC2.006310e+05
312021Real Strategies1.199037e+05
322021Eye Contact Media3.000000e+04
332021(MN) Access Marketing1.305000e+04
342021Planned Parenthood MN ND SD Action Fund9.520600e+03
352021CliftonLarsonAllen7.671000e+03
362021Minnesota Campaign Finance Board7.500000e+02
372021Facebook7.260000e+02
382022Clarify Agency2.831990e+08
392022Great American Media2.675292e+08
402022Canal Partners Media1.777085e+08
412022Deliver Strategies1.184445e+08
422022Berlin Rosen LTD1.128677e+08
432022Red Eagle Media1.118814e+08
442022Sage Media Planning Placement9.786503e+07
452022Nebo Media9.038561e+07
462022FP1 Digital LLC8.575528e+07
472022Targeted Creative Communication8.388532e+07
482023GRSG Company1.007000e+06
492023The People's Canvass7.261599e+05
502023Eye Contact Media2.510000e+05
512023Nordeast Digital1.600000e+05
522023Pivot Mailing1.073689e+05
532023Printastik7.429610e+04
542023Seven Corners Print and Promo7.181544e+04
552023Apparatus6.426000e+04
562023Do-GoodBiz5.364402e+04
572023Postmaster4.603125e+04
\n", + "
" + ], + "text/plain": [ + " Year VendorName Amount\n", + "0 2018 Great American Media 7.092650e+06\n", + "1 2018 Clarify Agency 3.904962e+06\n", + "2 2018 Nebo Media 3.017132e+06\n", + "3 2018 Berlin Rosen LTD 1.425687e+06\n", + "4 2018 Ax Media 1.370301e+06\n", + "5 2018 Sage Media Planning Placement 9.975067e+05\n", + "6 2018 Larry John Wright Advertising Inc 8.335000e+05\n", + "7 2018 Gumbinner Davies 7.910048e+05\n", + "8 2018 Targeted CreativeCommunication 6.808424e+05\n", + "9 2018 E Street Group LLC 6.327749e+05\n", + "10 2019 Canal Partners Media 3.626000e+05\n", + "11 2019 Berlin Rosen LTD 6.373017e+04\n", + "12 2019 Nebo Media 5.850000e+04\n", + "13 2019 Majority Strategies 5.400000e+04\n", + "14 2019 Weber Johnson Public Affairs 4.471800e+04\n", + "15 2019 Front Runner Digital 3.847900e+04\n", + "16 2019 Clarify Agency 3.158908e+04\n", + "17 2019 Singularis 2.251155e+04\n", + "18 2019 AppNexus Inc 1.951000e+04\n", + "19 2019 Connection Strategy LLC 1.796022e+04\n", + "20 2020 Clarify Agency 1.710700e+07\n", + "21 2020 Canal Partners Media 1.013758e+07\n", + "22 2020 Sage Media Planning Placement 9.441231e+06\n", + "23 2020 Gumbinner Davies 8.908617e+06\n", + "24 2020 FP1 Digital LLC 7.931499e+06\n", + "25 2020 Blueprint Interactive 5.614262e+06\n", + "26 2020 Berlin Rosen LTD 4.278030e+06\n", + "27 2020 Wildfire Mail 3.743325e+06\n", + "28 2020 Targeted CreativeCommunication 3.461442e+06\n", + "29 2020 Left Hook 3.343557e+06\n", + "30 2021 New Publica LLC 2.006310e+05\n", + "31 2021 Real Strategies 1.199037e+05\n", + "32 2021 Eye Contact Media 3.000000e+04\n", + "33 2021 (MN) Access Marketing 1.305000e+04\n", + "34 2021 Planned Parenthood MN ND SD Action Fund 9.520600e+03\n", + "35 2021 CliftonLarsonAllen 7.671000e+03\n", + "36 2021 Minnesota Campaign Finance Board 7.500000e+02\n", + "37 2021 Facebook 7.260000e+02\n", + "38 2022 Clarify Agency 2.831990e+08\n", + "39 2022 Great American Media 2.675292e+08\n", + "40 2022 Canal Partners Media 1.777085e+08\n", + "41 2022 Deliver Strategies 1.184445e+08\n", + "42 2022 Berlin Rosen LTD 1.128677e+08\n", + "43 2022 Red Eagle Media 1.118814e+08\n", + "44 2022 Sage Media Planning Placement 9.786503e+07\n", + "45 2022 Nebo Media 9.038561e+07\n", + "46 2022 FP1 Digital LLC 8.575528e+07\n", + "47 2022 Targeted Creative Communication 8.388532e+07\n", + "48 2023 GRSG Company 1.007000e+06\n", + "49 2023 The People's Canvass 7.261599e+05\n", + "50 2023 Eye Contact Media 2.510000e+05\n", + "51 2023 Nordeast Digital 1.600000e+05\n", + "52 2023 Pivot Mailing 1.073689e+05\n", + "53 2023 Printastik 7.429610e+04\n", + "54 2023 Seven Corners Print and Promo 7.181544e+04\n", + "55 2023 Apparatus 6.426000e+04\n", + "56 2023 Do-GoodBiz 5.364402e+04\n", + "57 2023 Postmaster 4.603125e+04" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "top_10_vendor" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 4. Compare Expenditure by Spender Type and Vendor Type (2018-2023)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "spendor_type_mapping = {\n", + " 'PCC': 'Candidate', \n", + " 'PCF': 'Political Committee and Fund', \n", + " 'PTU': 'Political Party Unit'\n", + "}\n", + "\n", + "grouped = filtered_df.groupby(['Year', 'SpenderType'])['Amount'].sum().reset_index()\n", + "grouped['FullSpenderType'] = grouped['SpenderType'].map(spendor_type_mapping)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + " \n", + " " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.plotly.v1+json": { + "config": { + "plotlyServerURL": "https://plot.ly" + }, + "data": [ + { + "alignmentgroup": "True", + "hovertemplate": "Spender Type=Political Committee and Fund
Year=%{x}
Amount=%{y}", + "legendgroup": "Political Committee and Fund", + "marker": { + "color": "#EF553B", + "pattern": { + "shape": "" + } + }, + "name": "Political Committee and Fund", + "offsetgroup": "Political Committee and Fund", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2021, + 2022, + 2023 + ], + "xaxis": "x", + "y": [ + 24593840.52, + 319162.02, + 35961980.29, + 382252.34, + 1784959169.2, + 2746372.45 + ], + "yaxis": "y" + }, + { + "alignmentgroup": "True", + "hovertemplate": "Spender Type=Political Party Unit
Year=%{x}
Amount=%{y}", + "legendgroup": "Political Party Unit", + "marker": { + "color": "#00cc96", + "pattern": { + "shape": "" + } + }, + "name": "Political Party Unit", + "offsetgroup": "Political Party Unit", + "orientation": "v", + "showlegend": true, + "textposition": "auto", + "type": "bar", + "x": [ + 2018, + 2019, + 2020, + 2022 + ], + "xaxis": "x", + "y": [ + 8541591.08, + 525336.89, + 72933007.88, + 977022482.31 + ], + "yaxis": "y" + } + ], + "layout": { + "barmode": "relative", + "legend": { + "title": { + "text": "Spender Type" + }, + "tracegroupgap": 0 + }, + "template": { + "data": { + "bar": [ + { + "error_x": { + "color": "#2a3f5f" + }, + "error_y": { + "color": "#2a3f5f" + }, + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "bar" + } + ], + "barpolar": [ + { + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "barpolar" + } + ], + "carpet": [ + { + "aaxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "baxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "type": "carpet" + } + ], + "choropleth": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "choropleth" + } + ], + "contour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "contour" + } + ], + "contourcarpet": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "contourcarpet" + } + ], + "heatmap": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmap" + } + ], + "heatmapgl": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmapgl" + } + ], + "histogram": [ + { + "marker": { + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "histogram" + } + ], + "histogram2d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2d" + } + ], + "histogram2dcontour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2dcontour" + } + ], + "mesh3d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "mesh3d" + } + ], + "parcoords": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "parcoords" + } + ], + "pie": [ + { + "automargin": true, + "type": "pie" + } + ], + "scatter": [ + { + "fillpattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + }, + "type": "scatter" + } + ], + "scatter3d": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatter3d" + } + ], + "scattercarpet": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattercarpet" + } + ], + "scattergeo": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergeo" + } + ], + "scattergl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergl" + } + ], + "scattermapbox": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattermapbox" + } + ], + "scatterpolar": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolar" + } + ], + "scatterpolargl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolargl" + } + ], + "scatterternary": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterternary" + } + ], + "surface": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "surface" + } + ], + "table": [ + { + "cells": { + "fill": { + "color": "#EBF0F8" + }, + "line": { + "color": "white" + } + }, + "header": { + "fill": { + "color": "#C8D4E3" + }, + "line": { + "color": "white" + } + }, + "type": "table" + } + ] + }, + "layout": { + "annotationdefaults": { + "arrowcolor": "#2a3f5f", + "arrowhead": 0, + "arrowwidth": 1 + }, + "autotypenumbers": "strict", + "coloraxis": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "colorscale": { + "diverging": [ + [ + 0, + "#8e0152" + ], + [ + 0.1, + "#c51b7d" + ], + [ + 0.2, + "#de77ae" + ], + [ + 0.3, + "#f1b6da" + ], + [ + 0.4, + "#fde0ef" + ], + [ + 0.5, + "#f7f7f7" + ], + [ + 0.6, + "#e6f5d0" + ], + [ + 0.7, + "#b8e186" + ], + [ + 0.8, + "#7fbc41" + ], + [ + 0.9, + "#4d9221" + ], + [ + 1, + "#276419" + ] + ], + "sequential": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "sequentialminus": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ] + }, + "colorway": [ + "#636efa", + "#EF553B", + "#00cc96", + "#ab63fa", + "#FFA15A", + "#19d3f3", + "#FF6692", + "#B6E880", + "#FF97FF", + "#FECB52" + ], + "font": { + "color": "#2a3f5f" + }, + "geo": { + "bgcolor": "white", + "lakecolor": "white", + "landcolor": "#E5ECF6", + "showlakes": true, + "showland": true, + "subunitcolor": "white" + }, + "hoverlabel": { + "align": "left" + }, + "hovermode": "closest", + "mapbox": { + "style": "light" + }, + "paper_bgcolor": "white", + "plot_bgcolor": "#E5ECF6", + "polar": { + "angularaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "radialaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "scene": { + "xaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "yaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "zaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + } + }, + "shapedefaults": { + "line": { + "color": "#2a3f5f" + } + }, + "ternary": { + "aaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "baxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "caxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "title": { + "x": 0.05 + }, + "xaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + }, + "yaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + } + } + }, + "title": { + "text": "Expenditure by Spender Type From 2018 To 2023" + }, + "xaxis": { + "anchor": "y", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Year" + } + }, + "yaxis": { + "anchor": "x", + "domain": [ + 0, + 1 + ], + "title": { + "text": "Amount" + }, + "type": "log" + } + } + }, + "text/html": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "fig = px.bar(\n", + " grouped,\n", + " x='Year',\n", + " y='Amount',\n", + " color='FullSpenderType',\n", + " title='Expenditure by Spender Type From 2018 To 2023',\n", + " labels={\"Year\": \"Year\", \"TotalAmount\": \"Total Expenditure Log\", \"FullSpenderType\": \"Spender Type\"},\n", + " category_orders={\"FullSpenderType\": sorted(spendor_type_mapping.values())}\n", + ")\n", + "fig.update_yaxes(type='log')\n", + "fig.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/PA_EDA.ipynb b/notebooks/PA_EDA.ipynb index 18f0d8d..e293c98 100644 --- a/notebooks/PA_EDA.ipynb +++ b/notebooks/PA_EDA.ipynb @@ -1,9 +1,17 @@ { "cells": [ + { + "cell_type": "markdown", + "id": "49be4b35", + "metadata": {}, + "source": [ + "#### This Notebook examines Pennsylvania'a campaign data specifically from 2018-2023, although previous years can be loaded onto the analysis considerations. The dataset is relational, with the five documents per annum (contributions, debt, expense, expenditures, and filer) linked through a unique filer ID." + ] + }, { "cell_type": "code", "execution_count": 1, - "id": "a55e9cbe", + "id": "cc9cf8a7", "metadata": {}, "outputs": [], "source": [ @@ -11,31 +19,25 @@ "import numpy as np\n", "import plotly.express as px\n", "import warnings\n", - "warnings.filterwarnings('ignore')\n", + "\n", + "warnings.filterwarnings(\"ignore\")\n", "import sys\n", - "sys.path.append('/home/alankagiri/2023-fall-clinic-climate-cabinet')\n", + "\n", + "sys.path.append(\"/home/alankagiri/2023-fall-clinic-climate-cabinet\")\n", "from utils import PA_EDA_Functions as eda\n", "from utils import PA_Data_Web_Scraper as scraper\n", "from utils import constants as const" ] }, - { - "cell_type": "markdown", - "id": "49be4b35", - "metadata": {}, - "source": [ - "##### This Notebook examines Pennsylvania'a campaign data specifically from 2018-2023, although previous years can be loaded onto the analysis considerations. The dataset is relational, with the five documents per annum (contributions, debt, expense, expenditures, and filer) linked through a unique filer ID." - ] - }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 2, "id": "19c96a4a", "metadata": {}, "outputs": [], "source": [ "# download the data\n", - "scraper.download_PA_data(2018,2023)" + "scraper.download_PA_data(2018, 2023)" ] }, { @@ -46,26 +48,26 @@ "outputs": [], "source": [ "#initialize the datasets:\n", - "contrib_paths = [[\"../data/contrib_2018_03042019.txt\", 2018],\n", - " [\"../data/contrib.txt\", 2019],\n", - " [\"../data/contrib_2020.txt\",2020],\n", - " [\"../data/contrib_2021.txt\",2021],\n", - " [\"../data/contrib_2022.txt\",2022],\n", - " [\"../data/2023/contrib_2023.txt\",2023]]\n", + "contrib_paths = [[\"../data/contrib_2018_03042019_2018.txt\", 2018],\n", + " [\"../data/contrib_2019.txt\", 2019],\n", + " [\"../data/contrib_2020_2020.txt\",2020],\n", + " [\"../data/contrib_2021_2021.txt\",2021],\n", + " [\"../data/contrib_2022_2022.txt\",2022],\n", + " [\"../data/2023/contrib_2023_2023.txt\",2023]]\n", "\n", - "filer_paths = [[\"../data/filer_2018_03042019.txt\", 2018],\n", - " [\"../data/filer.txt\",2019],\n", - " [\"../data/filer_2020.txt\",2020],\n", - " [\"../data/filer_2021.txt\",2021],\n", - " [\"../data/filer_2022.txt\",2022],\n", - " [\"../data/2023/filer_2023.txt\",2023]]\n", + "filer_paths = [[\"../data/filer_2018_03042019_2018.txt\", 2018],\n", + " [\"../data/filer_2019.txt\",2019],\n", + " [\"../data/filer_2020_2020.txt\",2020],\n", + " [\"../data/filer_2021_2021.txt\",2021],\n", + " [\"../data/filer_2022_2022.txt\",2022],\n", + " [\"../data/2023/filer_2023_2023.txt\",2023]]\n", "\n", - "expense_paths = [[\"../data/expense_2018_03042019.txt\",2018],\n", - " [\"../data/expense.txt\", 2019],\n", - " [\"../data/expense_2020.txt\", 2020],\n", - " [\"../data/expense_2021.txt\",2021],\n", - " [\"../data/expense_2022.txt\",2022],\n", - " [\"../data/2023/expense_2023.txt\",2023]]" + "expense_paths = [[\"../data/expense_2018_03042019_2018.txt\",2018],\n", + " [\"../data/expense_2019.txt\", 2019],\n", + " [\"../data/expense_2020_2020.txt\", 2020],\n", + " [\"../data/expense_2021_2021.txt\",2021],\n", + " [\"../data/expense_2022_2022.txt\",2022],\n", + " [\"../data/2023/expense_2023_2023.txt\",2023]]\n" ] }, { @@ -118,18 +120,18 @@ "merged_datasets_per_year = []\n", "merged_expense_dataset = []\n", "for i in range(len(contrib_paths)):\n", - " contrib_df = eda.initialize_PA_dataset(contrib_paths[i][0],contrib_paths[i][1])\n", - " filer_df = eda.initialize_PA_dataset(filer_paths[i][0],filer_paths[i][1])\n", - " expense_df = eda.initialize_PA_dataset(expense_paths[i][0],expense_paths[i][1])\n", - " merged = eda.merge_same_year_datasets(contrib_df,filer_df)\n", + " contrib_df = eda.initialize_PA_dataset(contrib_paths[i][0], contrib_paths[i][1])\n", + " filer_df = eda.initialize_PA_dataset(filer_paths[i][0], filer_paths[i][1])\n", + " expense_df = eda.initialize_PA_dataset(expense_paths[i][0], expense_paths[i][1])\n", + " merged = eda.merge_same_year_datasets(contrib_df, filer_df)\n", " merged_datasets_per_year.append(merged)\n", " merged_expense_dataset.append(expense_df)" ] }, { "cell_type": "code", - "execution_count": 4, - "id": "63f2c05a", + "execution_count": 5, + "id": "3b4ebc54", "metadata": {}, "outputs": [ { @@ -153,19 +155,15 @@ " \n", " \n", " \n", - " FILER_ID\n", + " RECIPIENT_ID\n", " YEAR\n", - " CONTRIBUTOR\n", - " CONT_AMT_1\n", - " CONT_AMT_2\n", - " CONT_AMT_3\n", - " CONT_DESCRIP\n", + " DONOR\n", " TOTAL_CONT_AMT\n", - " CONTRIBUTOR_TYPE\n", - " FILER_TYPE\n", - " FILER_NAME\n", - " OFFICE\n", - " PARTY\n", + " DONOR_TYPE\n", + " RECIPIENT_TYPE\n", + " RECIPIENT\n", + " RECIPIENT_OFFICE\n", + " RECIPIENT_PARTY\n", " \n", " \n", " \n", @@ -173,11 +171,7 @@ " 0\n", " 2000081\n", " 2018\n", - " Joseph A Ribas\n", - " 25.00\n", - " 0\n", - " 0\n", - " NaN\n", + " JOSEPH A RIBAS\n", " 25.00\n", " INDIVIDUAL\n", " Committee\n", @@ -189,11 +183,7 @@ " 1\n", " 2000081\n", " 2018\n", - " Paul J Kashella\n", - " 40.00\n", - " 0\n", - " 0\n", - " NaN\n", + " PAUL J KASHELLA\n", " 40.00\n", " INDIVIDUAL\n", " Committee\n", @@ -205,11 +195,7 @@ " 2\n", " 2000081\n", " 2018\n", - " Vicky C Thiel\n", - " 25.00\n", - " 0\n", - " 0\n", - " NaN\n", + " VICKY C THIEL\n", " 25.00\n", " INDIVIDUAL\n", " Committee\n", @@ -221,11 +207,7 @@ " 3\n", " 2000081\n", " 2018\n", - " Joseph B Hildebrandt\n", - " 20.00\n", - " 0\n", - " 0\n", - " NaN\n", + " JOSEPH B HILDEBRANDT\n", " 20.00\n", " INDIVIDUAL\n", " Committee\n", @@ -237,11 +219,7 @@ " 4\n", " 2000081\n", " 2018\n", - " Jacqueline A Espinoza\n", - " 50.00\n", - " 0\n", - " 0\n", - " NaN\n", + " JACQUELINE A ESPINOZA\n", " 50.00\n", " INDIVIDUAL\n", " Committee\n", @@ -260,10 +238,6 @@ " ...\n", " ...\n", " ...\n", - " ...\n", - " ...\n", - " ...\n", - " ...\n", " \n", " \n", " 387459\n", @@ -271,10 +245,6 @@ " 2023\n", " ERIC J YARNELL\n", " 38.47\n", - " 0\n", - " 0\n", - " NaN\n", - " 38.47\n", " INDIVIDUAL\n", " Committee\n", " HIGHMARK PAC OF HIGHMARK INC.\n", @@ -287,10 +257,6 @@ " 2023\n", " PATRICIA LAUGHLIN\n", " 116.00\n", - " 0\n", - " 0\n", - " NaN\n", - " 116.00\n", " INDIVIDUAL\n", " Committee\n", " HIGHMARK PAC OF HIGHMARK INC.\n", @@ -303,10 +269,6 @@ " 2023\n", " MATTHEW J RHENISH\n", " 130.00\n", - " 0\n", - " 0\n", - " NaN\n", - " 130.00\n", " INDIVIDUAL\n", " Committee\n", " HIGHMARK PAC OF HIGHMARK INC.\n", @@ -319,10 +281,6 @@ " 2023\n", " JAMES J BENEDICT\n", " 192.30\n", - " 0\n", - " 0\n", - " NaN\n", - " 192.30\n", " INDIVIDUAL\n", " Committee\n", " HIGHMARK PAC OF HIGHMARK INC.\n", @@ -335,10 +293,6 @@ " 2023\n", " KRISTY A YOHEY\n", " 50.00\n", - " 0\n", - " 0\n", - " NaN\n", - " 50.00\n", " INDIVIDUAL\n", " Committee\n", " HIGHMARK PAC OF HIGHMARK INC.\n", @@ -347,53 +301,53 @@ " \n", " \n", "\n", - "

6507958 rows × 13 columns

\n", + "

6507958 rows × 9 columns

\n", "" ], "text/plain": [ - " FILER_ID YEAR CONTRIBUTOR CONT_AMT_1 CONT_AMT_2 \\\n", - "0 2000081 2018 Joseph A Ribas 25.00 0 \n", - "1 2000081 2018 Paul J Kashella 40.00 0 \n", - "2 2000081 2018 Vicky C Thiel 25.00 0 \n", - "3 2000081 2018 Joseph B Hildebrandt 20.00 0 \n", - "4 2000081 2018 Jacqueline A Espinoza 50.00 0 \n", - "... ... ... ... ... ... \n", - "387459 393671 2023 ERIC J YARNELL 38.47 0 \n", - "387460 393671 2023 PATRICIA LAUGHLIN 116.00 0 \n", - "387461 393671 2023 MATTHEW J RHENISH 130.00 0 \n", - "387462 393671 2023 JAMES J BENEDICT 192.30 0 \n", - "387463 393671 2023 KRISTY A YOHEY 50.00 0 \n", + " RECIPIENT_ID YEAR DONOR TOTAL_CONT_AMT DONOR_TYPE \\\n", + "0 2000081 2018 JOSEPH A RIBAS 25.00 INDIVIDUAL \n", + "1 2000081 2018 PAUL J KASHELLA 40.00 INDIVIDUAL \n", + "2 2000081 2018 VICKY C THIEL 25.00 INDIVIDUAL \n", + "3 2000081 2018 JOSEPH B HILDEBRANDT 20.00 INDIVIDUAL \n", + "4 2000081 2018 JACQUELINE A ESPINOZA 50.00 INDIVIDUAL \n", + "... ... ... ... ... ... \n", + "387459 393671 2023 ERIC J YARNELL 38.47 INDIVIDUAL \n", + "387460 393671 2023 PATRICIA LAUGHLIN 116.00 INDIVIDUAL \n", + "387461 393671 2023 MATTHEW J RHENISH 130.00 INDIVIDUAL \n", + "387462 393671 2023 JAMES J BENEDICT 192.30 INDIVIDUAL \n", + "387463 393671 2023 KRISTY A YOHEY 50.00 INDIVIDUAL \n", "\n", - " CONT_AMT_3 CONT_DESCRIP TOTAL_CONT_AMT CONTRIBUTOR_TYPE FILER_TYPE \\\n", - "0 0 NaN 25.00 INDIVIDUAL Committee \n", - "1 0 NaN 40.00 INDIVIDUAL Committee \n", - "2 0 NaN 25.00 INDIVIDUAL Committee \n", - "3 0 NaN 20.00 INDIVIDUAL Committee \n", - "4 0 NaN 50.00 INDIVIDUAL Committee \n", - "... ... ... ... ... ... \n", - "387459 0 NaN 38.47 INDIVIDUAL Committee \n", - "387460 0 NaN 116.00 INDIVIDUAL Committee \n", - "387461 0 NaN 130.00 INDIVIDUAL Committee \n", - "387462 0 NaN 192.30 INDIVIDUAL Committee \n", - "387463 0 NaN 50.00 INDIVIDUAL Committee \n", + " RECIPIENT_TYPE RECIPIENT \\\n", + "0 Committee FIRSTENERGY CORP. POLITICAL ACTION COMMITTEE \n", + "1 Committee FIRSTENERGY CORP. POLITICAL ACTION COMMITTEE \n", + "2 Committee FIRSTENERGY CORP. POLITICAL ACTION COMMITTEE \n", + "3 Committee FIRSTENERGY CORP. POLITICAL ACTION COMMITTEE \n", + "4 Committee FIRSTENERGY CORP. POLITICAL ACTION COMMITTEE \n", + "... ... ... \n", + "387459 Committee HIGHMARK PAC OF HIGHMARK INC. \n", + "387460 Committee HIGHMARK PAC OF HIGHMARK INC. \n", + "387461 Committee HIGHMARK PAC OF HIGHMARK INC. \n", + "387462 Committee HIGHMARK PAC OF HIGHMARK INC. \n", + "387463 Committee HIGHMARK PAC OF HIGHMARK INC. \n", "\n", - " FILER_NAME OFFICE PARTY \n", - "0 FIRSTENERGY CORP. POLITICAL ACTION COMMITTEE NaN NaN \n", - "1 FIRSTENERGY CORP. POLITICAL ACTION COMMITTEE NaN NaN \n", - "2 FIRSTENERGY CORP. POLITICAL ACTION COMMITTEE NaN NaN \n", - "3 FIRSTENERGY CORP. POLITICAL ACTION COMMITTEE NaN NaN \n", - "4 FIRSTENERGY CORP. POLITICAL ACTION COMMITTEE NaN NaN \n", - "... ... ... ... \n", - "387459 HIGHMARK PAC OF HIGHMARK INC. NaN NaN \n", - "387460 HIGHMARK PAC OF HIGHMARK INC. NaN NaN \n", - "387461 HIGHMARK PAC OF HIGHMARK INC. NaN NaN \n", - "387462 HIGHMARK PAC OF HIGHMARK INC. NaN NaN \n", - "387463 HIGHMARK PAC OF HIGHMARK INC. NaN NaN \n", + " RECIPIENT_OFFICE RECIPIENT_PARTY \n", + "0 NaN NaN \n", + "1 NaN NaN \n", + "2 NaN NaN \n", + "3 NaN NaN \n", + "4 NaN NaN \n", + "... ... ... \n", + "387459 NaN NaN \n", + "387460 NaN NaN \n", + "387461 NaN NaN \n", + "387462 NaN NaN \n", + "387463 NaN NaN \n", "\n", - "[6507958 rows x 13 columns]" + "[6507958 rows x 9 columns]" ] }, - "execution_count": 4, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } @@ -413,7 +367,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "id": "1488530d", "metadata": {}, "outputs": [ @@ -447,7 +401,7 @@ " \n", " \n", " 0\n", - " FILER_ID\n", + " RECIPIENT_ID\n", " object\n", " 0\n", " 0.00\n", @@ -461,77 +415,49 @@ " \n", " \n", " 2\n", - " CONTRIBUTOR\n", + " DONOR\n", " object\n", " 0\n", " 0.00\n", " \n", " \n", " 3\n", - " CONT_AMT_1\n", - " float64\n", - " 0\n", - " 0.00\n", - " \n", - " \n", - " 4\n", - " CONT_AMT_2\n", - " int64\n", - " 0\n", - " 0.00\n", - " \n", - " \n", - " 5\n", - " CONT_AMT_3\n", - " int64\n", - " 0\n", - " 0.00\n", - " \n", - " \n", - " 6\n", - " CONT_DESCRIP\n", - " object\n", - " 6338782\n", - " 97.40\n", - " \n", - " \n", - " 7\n", " TOTAL_CONT_AMT\n", " float64\n", " 0\n", " 0.00\n", " \n", " \n", - " 8\n", - " CONTRIBUTOR_TYPE\n", + " 4\n", + " DONOR_TYPE\n", " object\n", " 0\n", " 0.00\n", " \n", " \n", - " 9\n", - " FILER_TYPE\n", + " 5\n", + " RECIPIENT_TYPE\n", " object\n", " 5573\n", " 0.09\n", " \n", " \n", - " 10\n", - " FILER_NAME\n", + " 6\n", + " RECIPIENT\n", " object\n", " 0\n", " 0.00\n", " \n", " \n", - " 11\n", - " OFFICE\n", + " 7\n", + " RECIPIENT_OFFICE\n", " object\n", " 6236824\n", " 95.83\n", " \n", " \n", - " 12\n", - " PARTY\n", + " 8\n", + " RECIPIENT_PARTY\n", " object\n", " 4289375\n", " 65.91\n", @@ -541,38 +467,50 @@ "" ], "text/plain": [ - " columnName colType numNulls null_percent\n", - "0 FILER_ID object 0 0.00\n", - "1 YEAR int64 0 0.00\n", - "2 CONTRIBUTOR object 0 0.00\n", - "3 CONT_AMT_1 float64 0 0.00\n", - "4 CONT_AMT_2 int64 0 0.00\n", - "5 CONT_AMT_3 int64 0 0.00\n", - "6 CONT_DESCRIP object 6338782 97.40\n", - "7 TOTAL_CONT_AMT float64 0 0.00\n", - "8 CONTRIBUTOR_TYPE object 0 0.00\n", - "9 FILER_TYPE object 5573 0.09\n", - "10 FILER_NAME object 0 0.00\n", - "11 OFFICE object 6236824 95.83\n", - "12 PARTY object 4289375 65.91" + " columnName colType numNulls null_percent\n", + "0 RECIPIENT_ID object 0 0.00\n", + "1 YEAR int64 0 0.00\n", + "2 DONOR object 0 0.00\n", + "3 TOTAL_CONT_AMT float64 0 0.00\n", + "4 DONOR_TYPE object 0 0.00\n", + "5 RECIPIENT_TYPE object 5573 0.09\n", + "6 RECIPIENT object 0 0.00\n", + "7 RECIPIENT_OFFICE object 6236824 95.83\n", + "8 RECIPIENT_PARTY object 4289375 65.91" ] }, - "execution_count": 5, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "cols, type, nulls, null_percent = [],[],[],[]\n", + "cols, type, nulls, null_percent = [], [], [], []\n", "for column in contrib_filer_info_2018_2023.columns:\n", - " cols.append(column)\n", - " type.append(contrib_filer_info_2018_2023.dtypes[column]) \n", - " nulls.append(contrib_filer_info_2018_2023[column].isna().sum(),)\n", - " null_percent.append(round(contrib_filer_info_2018_2023[column].isna().sum()/len(contrib_filer_info_2018_2023)*100,2))\n", + " cols.append(column)\n", + " type.append(contrib_filer_info_2018_2023.dtypes[column])\n", + " nulls.append(\n", + " contrib_filer_info_2018_2023[column].isna().sum(),\n", + " )\n", + " null_percent.append(\n", + " round(\n", + " contrib_filer_info_2018_2023[column].isna().sum()\n", + " / len(contrib_filer_info_2018_2023)\n", + " * 100,\n", + " 2,\n", + " )\n", + " )\n", "\n", - "summary_df = {'columnName':cols, 'colType':type,'numNulls':nulls,'null_percent':null_percent}\n", - "summary_df = pd.DataFrame(summary_df)#, columns==['columnName','colType','numNulls','nullPercent'])\n", - "summary_df\n" + "summary_df = {\n", + " \"columnName\": cols,\n", + " \"colType\": type,\n", + " \"numNulls\": nulls,\n", + " \"null_percent\": null_percent,\n", + "}\n", + "summary_df = pd.DataFrame(\n", + " summary_df\n", + ") # , columns==['columnName','colType','numNulls','nullPercent'])\n", + "summary_df" ] }, { @@ -580,7 +518,7 @@ "id": "e65803c2", "metadata": {}, "source": [ - " Having reduced the contributor and filer datasets to the relevant datasets, it is evident that with the exception of {CONT_DESCRIP, OFFICE, PARTY} columns, most of the values are reported and available. With regards to the type of data stored in the datasets, most are considered objects (which are mainly strings), in part due to the presence of dirty/inconsistent data inputs." + "*Having reduced the contributor and filer datasets to the relevant datasets, it is evident that with the exception of {CONT_DESCRIP, OFFICE, PARTY} columns, most of the values are reported and available. With regards to the type of data stored in the datasets, most are considered objects (which are mainly strings), in part due to the presence of dirty/inconsistent data inputs.*" ] }, { @@ -593,7 +531,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "id": "89830e0b", "metadata": {}, "outputs": [ @@ -621,50 +559,50 @@ " TOTAL_CONT_AMT\n", " \n", " \n", - " CONTRIBUTOR\n", + " DONOR\n", " \n", " \n", " \n", " \n", " \n", " CHARLOTTE SWENSON\n", - " 114202495.38\n", + " 114202845.38\n", " \n", " \n", - " Jeffrey Yass\n", + " JEFFREY YASS\n", " 57205000.00\n", " \n", " \n", - " Total Other Contributions\n", - " 37769756.19\n", + " TOTAL OTHER CONTRIBUTIONS\n", + " 39332908.48\n", " \n", " \n", " COMMONWEALTH CHILDREN'S CHOICE FUND\n", - " 29672522.70\n", + " 34954611.22\n", " \n", " \n", " STUDENTS FIRST PAC\n", - " 28641424.71\n", + " 31699924.71\n", " \n", " \n", - " STUDENT'S FIRST PAC\n", - " 18500000.00\n", + " COMMONWEALTH LEADERS FUND\n", + " 27270100.22\n", " \n", " \n", - " COMMONWEALTH LEADERS FUND\n", - " 17637897.21\n", + " STUDENT'S FIRST PAC\n", + " 18500000.00\n", " \n", " \n", - " Contributions from FEC Report\n", - " 16911418.85\n", + " CONTRIBUTIONS FROM FEC REPORT\n", + " 17850908.02\n", " \n", " \n", - " House Democratic Campaign Committee\n", - " 14139567.49\n", + " HOUSE DEMOCRATIC CAMPAIGN COMMITTEE\n", + " 15784775.41\n", " \n", " \n", - " CONTRIBUTIONS FROM NON-PA SOURCES\n", - " 13101403.54\n", + " HOUSE REPUBLICAN CAMPAIGN COMMITTEE\n", + " 13752286.61\n", " \n", " \n", "\n", @@ -672,26 +610,26 @@ ], "text/plain": [ " TOTAL_CONT_AMT\n", - "CONTRIBUTOR \n", - "CHARLOTTE SWENSON 114202495.38\n", - "Jeffrey Yass 57205000.00\n", - "Total Other Contributions 37769756.19\n", - "COMMONWEALTH CHILDREN'S CHOICE FUND 29672522.70\n", - "STUDENTS FIRST PAC 28641424.71\n", + "DONOR \n", + "CHARLOTTE SWENSON 114202845.38\n", + "JEFFREY YASS 57205000.00\n", + "TOTAL OTHER CONTRIBUTIONS 39332908.48\n", + "COMMONWEALTH CHILDREN'S CHOICE FUND 34954611.22\n", + "STUDENTS FIRST PAC 31699924.71\n", + "COMMONWEALTH LEADERS FUND 27270100.22\n", "STUDENT'S FIRST PAC 18500000.00\n", - "COMMONWEALTH LEADERS FUND 17637897.21\n", - "Contributions from FEC Report 16911418.85\n", - "House Democratic Campaign Committee 14139567.49\n", - "CONTRIBUTIONS FROM NON-PA SOURCES 13101403.54" + "CONTRIBUTIONS FROM FEC REPORT 17850908.02\n", + "HOUSE DEMOCRATIC CAMPAIGN COMMITTEE 15784775.41\n", + "HOUSE REPUBLICAN CAMPAIGN COMMITTEE 13752286.61" ] }, - "execution_count": 6, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "eda.top_n_contributors(contrib_filer_info_2018_2023,10)" + "eda.top_n_contributors(contrib_filer_info_2018_2023, 10)" ] }, { @@ -794,7 +732,7 @@ } ], "source": [ - "eda.top_n_recipients(contrib_filer_info_2018_2023,10)" + "eda.top_n_recipients(contrib_filer_info_2018_2023, 10)" ] }, { @@ -2065,7 +2003,7 @@ "id": "8c96cd77", "metadata": {}, "source": [ - " The dataset is organized from the perspective of the entity filing the finance reports, which in this case is either a political committee, a lobbyist, or a candidate. As such, it is somewhat difficult to ascertain the classification of the contributors (were they a PAC, an individual, a corporation...) as there is no linearity in their names. However, the overwhelming majority of contribution recipients were committees, indicating that most entities donated to PACs or SuperPACS." + "*The dataset is organized from the perspective of the entity filing the finance reports, which in this case is either a political committee, a lobbyist, or a candidate. As such, it is somewhat difficult to ascertain the classification of the contributors (were they a PAC, an individual, a corporation...) as there is no linearity in their names. However, the overwhelming majority of contribution recipients were committees, indicating that most entities donated to PACs or SuperPACS.*" ] }, { @@ -3175,19 +3113,21 @@ "id": "f3f038ea", "metadata": {}, "source": [ - "Thankfully the years are largely similar. However in 2022 additional columns were appended to the filer and contributor datasets, but these columns are irrelevant for the sake of our analysis" + "*Thankfully the years are largely similar. However in 2022 additional columns were appended to the filer and contributor datasets, but these columns are irrelevant for the sake of our analysis*" ] }, { "cell_type": "markdown", + "id": "578aa697", "metadata": {}, "source": [ - "##### This next portion repeats the EDA done on contribution and filer data but on the expenditure datasets spanning 2018-2023. The expense dataset stores information from Schedule III of the campaign finance report, which details information about the services rendered to the filer by the recipient, as well as the nature of the expenditure (contribution, service, phone-banking, etc)" + "#### This next portion repeats the EDA done on contribution and filer data but on the expenditure datasets spanning 2018-2023. The expense dataset stores information from Schedule III of the campaign finance report, which details information about the services rendered to the filer by the recipient, as well as the nature of the expenditure (contribution, service, phone-banking, etc)" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 8, + "id": "9b19d082", "metadata": {}, "outputs": [ { @@ -3211,11 +3151,11 @@ " \n", " \n", " \n", - " FILER_ID\n", + " DONOR_ID\n", " YEAR\n", - " EXPENSE_NAME\n", - " EXPENSE_AMT\n", - " EXPENSE_DESC\n", + " RECIPIENT\n", + " AMOUNT\n", + " PURPOSE\n", " \n", " \n", " \n", @@ -3259,31 +3199,102 @@ " 500.00\n", " POLITICAL CONTRIBUTION\n", " \n", + " \n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " \n", + " \n", + " 64297\n", + " 387871\n", + " 2023\n", + " ZELDA YODER\n", + " 46.07\n", + " PLAQUE / SIGN\n", + " \n", + " \n", + " 64298\n", + " 389369\n", + " 2023\n", + " ZEM ZEM SHRINE CLUB\n", + " 250.00\n", + " DEPOSIT FOR CAMPAIGN FUNDRAISING EVENT\n", + " \n", + " \n", + " 64299\n", + " 394204\n", + " 2023\n", + " ZERO DAY BREWERY\n", + " 873.50\n", + " FOOD FOR FUNDRAISER\n", + " \n", + " \n", + " 64300\n", + " 392983\n", + " 2023\n", + " ZIO BRICK OVEN PIZZA\n", + " 84.07\n", + " MEETING EXPENSE\n", + " \n", + " \n", + " 64301\n", + " 394388\n", + " 2023\n", + " ZOOM\n", + " 17.11\n", + " MEETING FEE\n", + " \n", " \n", "\n", + "

640473 rows × 5 columns

\n", "" ], "text/plain": [ - " FILER_ID YEAR EXPENSE_NAME EXPENSE_AMT EXPENSE_DESC\n", - "0 2001144 2018 MICHAEL TURZAI 931.92 REIMBURSEMENT\n", - "1 2001144 2018 ARMSTRONG 25.04 INTERNET\n", - "2 2001144 2018 COMCAST 421.50 INTERNET\n", - "3 2001144 2018 NAYLAX 250.00 AD\n", - "4 2002299 2018 FRIENDS OF TOM TOSTI 500.00 POLITICAL CONTRIBUTION" + " DONOR_ID YEAR RECIPIENT AMOUNT \\\n", + "0 2001144 2018 MICHAEL TURZAI 931.92 \n", + "1 2001144 2018 ARMSTRONG 25.04 \n", + "2 2001144 2018 COMCAST 421.50 \n", + "3 2001144 2018 NAYLAX 250.00 \n", + "4 2002299 2018 FRIENDS OF TOM TOSTI 500.00 \n", + "... ... ... ... ... \n", + "64297 387871 2023 ZELDA YODER 46.07 \n", + "64298 389369 2023 ZEM ZEM SHRINE CLUB 250.00 \n", + "64299 394204 2023 ZERO DAY BREWERY 873.50 \n", + "64300 392983 2023 ZIO BRICK OVEN PIZZA 84.07 \n", + "64301 394388 2023 ZOOM 17.11 \n", + "\n", + " PURPOSE \n", + "0 REIMBURSEMENT \n", + "1 INTERNET \n", + "2 INTERNET \n", + "3 AD \n", + "4 POLITICAL CONTRIBUTION \n", + "... ... \n", + "64297 PLAQUE / SIGN \n", + "64298 DEPOSIT FOR CAMPAIGN FUNDRAISING EVENT \n", + "64299 FOOD FOR FUNDRAISER \n", + "64300 MEETING EXPENSE \n", + "64301 MEETING FEE \n", + "\n", + "[640473 rows x 5 columns]" ] }, - "execution_count": 6, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "expense_info_2018_2023 = eda.merge_all_datasets(merged_expense_dataset)\n", - "expense_info_2018_2023.head(5)" + "expense_info_2018_2023" ] }, { "cell_type": "markdown", + "id": "44e00f78", "metadata": {}, "source": [ "##### 1.2 For each column, what are the contents of it? How many blanks or nulls are there? What is the format? If there it is one of several types, what are those types?" @@ -3292,6 +3303,7 @@ { "cell_type": "code", "execution_count": 7, + "id": "5aa57902", "metadata": {}, "outputs": [ { @@ -3376,20 +3388,34 @@ } ], "source": [ - "cols, type, nulls, null_percent = [],[],[],[]\n", + "cols, type, nulls, null_percent = [], [], [], []\n", "for column in expense_info_2018_2023.columns:\n", - " cols.append(column)\n", - " type.append(expense_info_2018_2023.dtypes[column]) \n", - " nulls.append(expense_info_2018_2023[column].isna().sum(),)\n", - " null_percent.append(round((expense_info_2018_2023[column].isna().sum()/len(expense_info_2018_2023))*100,2))\n", + " cols.append(column)\n", + " type.append(expense_info_2018_2023.dtypes[column])\n", + " nulls.append(\n", + " expense_info_2018_2023[column].isna().sum(),\n", + " )\n", + " null_percent.append(\n", + " round(\n", + " (expense_info_2018_2023[column].isna().sum() / len(expense_info_2018_2023))\n", + " * 100,\n", + " 2,\n", + " )\n", + " )\n", "\n", - "summary_df = {'columnName':cols, 'colType':type,'numNulls':nulls,'null_percent':null_percent}\n", + "summary_df = {\n", + " \"columnName\": cols,\n", + " \"colType\": type,\n", + " \"numNulls\": nulls,\n", + " \"null_percent\": null_percent,\n", + "}\n", "summary_df = pd.DataFrame(summary_df)\n", "summary_df" ] }, { "cell_type": "markdown", + "id": "099cfa6d", "metadata": {}, "source": [ "##### 2.2 What are the top 10 expenditure reasons in your data? The top 10 recipients?" @@ -3398,6 +3424,7 @@ { "cell_type": "code", "execution_count": 8, + "id": "7e1a1238", "metadata": {}, "outputs": [ { @@ -3495,23 +3522,26 @@ ], "source": [ "pd.set_option(\"display.float_format\", \"{:.2f}\".format)\n", - "expenditure_reasons = (expense_info_2018_2023.groupby([\"EXPENSE_DESC\"])\n", - " .agg({\"EXPENSE_AMT\": sum})\n", - " .sort_values(by=\"EXPENSE_AMT\", ascending=False)\n", - " )\n", + "expenditure_reasons = (\n", + " expense_info_2018_2023.groupby([\"EXPENSE_DESC\"])\n", + " .agg({\"EXPENSE_AMT\": sum})\n", + " .sort_values(by=\"EXPENSE_AMT\", ascending=False)\n", + ")\n", "expenditure_reasons.head(10)" ] }, { "cell_type": "markdown", + "id": "b43420ed", "metadata": {}, "source": [ - "It's a bit difficult to ascertain the description column, mainly because there is no standardized reporting format. Filers are free to describe the expenditure as they see fit, which makes grouping them into categories uncertain. Some seem to link the Federal Election Committee's website url. The combined cost of expenditures lacking descriptions is the highest" + "*It's a bit difficult to ascertain the description column, mainly because there is no standardized reporting format. Filers are free to describe the expenditure as they see fit, which makes grouping them into categories uncertain. Some seem to link the Federal Election Committee's website url. The combined cost of expenditures lacking descriptions is the highest*" ] }, { "cell_type": "code", "execution_count": 10, + "id": "dc365327", "metadata": {}, "outputs": [ { @@ -3609,15 +3639,17 @@ ], "source": [ "pd.set_option(\"display.float_format\", \"{:.2f}\".format)\n", - "expenditure_recipients = (expense_info_2018_2023.groupby([\"EXPENSE_NAME\"])\n", - " .agg({\"EXPENSE_AMT\": sum})\n", - " .sort_values(by=\"EXPENSE_AMT\", ascending=False)\n", - " )\n", + "expenditure_recipients = (\n", + " expense_info_2018_2023.groupby([\"EXPENSE_NAME\"])\n", + " .agg({\"EXPENSE_AMT\": sum})\n", + " .sort_values(by=\"EXPENSE_AMT\", ascending=False)\n", + ")\n", "expenditure_recipients.head(10)" ] }, { "cell_type": "markdown", + "id": "519814ac", "metadata": {}, "source": [ "It's very interesting that highest recipient of expenditures is ACME Markets, a supermarket chain. More interesting is that a PAC seems to be the recipient, which reveals an interesting reality. How legally clear is it when a PAC receives money in the form of contributions, vs when it does and this amount is considered an expenditure by the filer? If an organization seeks the \"services\" of a PAC and lists them as an expenditure, it wouldn't seem obvious if that PAC would then list its payment as a contribution. In the case it doesn't, this raises an interesting potential outcome of PACs ostensibly receiving funds to \"help\" campaigns they are already ideologically aligned with without counting such \"collaborations\" as donations" @@ -3625,6 +3657,7 @@ }, { "cell_type": "markdown", + "id": "114ebe79", "metadata": {}, "source": [ "##### 3.2: If you have multiple years, are they all similar? If not, is the difference explicable (maybe by election schedules)" @@ -3632,6 +3665,15 @@ }, { "cell_type": "markdown", + "id": "4fad2efd", + "metadata": {}, + "source": [ + "*The years are all similar*" + ] + }, + { + "cell_type": "markdown", + "id": "6f3c1845", "metadata": {}, "source": [ "The years are all similar" diff --git a/notebooks/README.md b/notebooks/README.md index 608c1f0..e65e1c8 100644 --- a/notebooks/README.md +++ b/notebooks/README.md @@ -1,20 +1,9 @@ ### Notebook directory -This should contain information about what is done in each notebook +* `AZ_EDA` : Notebook containing the EDA and visualizations for Arizona +* `MI_EDA.ipynb`: This notebook contains the exploratory data analysis of the Michigan campaign contribution datasets, with a dropdown that allows the user to select different years to view. -* `Test.ipynb` : This is a test notebook to demonstrate how to use this repository. - -* `az_webcrawler_3.ipynb` : This is a notebook of test code, including the final code used in `az_curl_crawler.py` - -* `arizona_scraper_proof_of_concept` is a notebook containing proof of concept for a curl-based webcrawler, which was later expanded on in `az_webcrawler_3.ipynb` and finally used to make `az_curl_crawler.py` - -* `mi_campaign_eda.ipynb`: This notebook contains the exploratory data analysis of the Michigan campaign contribution datasets, with a dropdown that allows the user to select different years to view. - -* `mi_campaign_expenditure.ipynb`: This notebook contains the exploratory data analysis of the Michigan campaign expenditure datasets, with a dropdown that allows the user to select different years to view. - -* `AZ_EDA` : A notebook containing the EDA and plots for Arizona. +* `MN_EDA.ipynb` : Notebook containing the EDA and visualizations for Minnesota contribution and expenditure data * `PA_EDA.ipynb` : This notebook contains the EDA for Pennsylvania datasets on contributions, filer information, and expenditure data from 2018-2023. - -* `az_cleaner_scratch.ipynb` and `az_cleaner_scratch_thanksgiving.ipynb` are notebooks which were used to test parallel versions of the code which concluded in `clean.py` and `cleaner_utils.py` \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index 866f821..876d841 100644 --- a/requirements.txt +++ b/requirements.txt @@ -5,7 +5,9 @@ pre-commit~=2.20 ipykernel~=6.16 # project packages +pandas~=1.4 +plotly~=5.17.0 pandas~=2.0.3 plotly~=5.18.0 bs4~=0.0.1 -nbformat~=5.9.2 \ No newline at end of file +nbformat~=5.9.2 diff --git a/utils/MN_util.py b/utils/MN_util.py new file mode 100644 index 0000000..c00c4a0 --- /dev/null +++ b/utils/MN_util.py @@ -0,0 +1,233 @@ +import functools +import warnings + +import pandas as pd + + +def deprecated(func): + """This is a decorator which can be used to mark functions + as deprecated. It will result in a warning being emitted + when the function is used.""" + + @functools.wraps(func) + def new_func(*args, **kwargs): + warnings.simplefilter("always", DeprecationWarning) # turn off filter + warnings.warn( + "Call to deprecated function {}.".format(func.__name__), + category=DeprecationWarning, + stacklevel=2, + ) + warnings.simplefilter("default", DeprecationWarning) # reset filter + return func(*args, **kwargs) + + return new_func + + +@deprecated +def datasets_col_consistent(df_lst: list): + """ + Checks if a list of MN DataFrames have the same columns/features + + Args: + df_lst (list): a list of MN DataFrames whose columns will be checked + Returns: + Nothing, print out the checking result for column consistency + """ + + previous_columns = df_lst[0].columns + consistent_col_count = 1 + + for df in df_lst[1:]: + if not (df.columns == previous_columns).all(): + print("Columns not consistent across races") + else: + consistent_col_count += 1 + if consistent_col_count == len(df_lst): + print("All dfs have consistent columns") + + +@deprecated +def preprocess_candidate_df(df: pd.DataFrame) -> pd.DataFrame: + """ + Preprocesses all MN candidate-recipient contribution dfs. + + Args: + df (DataFrame): the MN DataFrames to preprocess + Returns: + DataFrame: Preprocessed MN contribution df with candidate recipients + """ + + df_copy = df.copy(deep=True) + column_mapping = { + "CandRegNumb": "RegNumb", + "CommitteeName": "Committee", + "DonationDate": "Date", + "DonationAmount": "Amount", + "InKindDonAmount": "InKindAmount", + "InKindDescriptionText": "InKindDescription", + } + df_copy = df_copy.rename(columns=column_mapping) + df_copy["RecipientType"] = "Candidate" + + return df_copy + + +@deprecated +def preprocess_noncandidate_df(df: pd.DataFrame) -> pd.DataFrame: + """ + Preprocesses the MN non-candidate-recipient contribution df. + + Args: + df (DataFrame): the MN DataFrames to preprocess + Returns: + DataFrame: Preprocessed contribution df with non-candidate recipients + """ + + df_copy = df.copy(deep=True) + columns_to_keep = [ + "PCFRegNumb", + "Committee", + "ETType", + "DonationDate", + "DonorType", + "DonorName", + "DonationAmount", + "InKindDonAmount", + "InKindDescriptionText", + ] + df_copy = df_copy[columns_to_keep] + column_mapping = { + "PCFRegNumb": "RegNumb", + "ETType": "RecipientType", + "DonationDate": "Date", + "DonationAmount": "Amount", + "InKindDonAmount": "InKindAmount", + "InKindDescriptionText": "InKindDescription", + } + df_copy = df_copy.rename(columns=column_mapping) + + return df_copy + + +@deprecated +def preprocess_contribution_df(df_lst: list) -> pd.DataFrame: + """ + Preprocesses separate dfs into a complete contribution df for MN + + Args: + df_lst (list): a list of MN DataFrames to merge and adjust columns + Returns: + DataFrame: the merged and preprocessed contribution df + """ + + contribution_df = pd.concat(df_lst, ignore_index=True) + contribution_df["Date"] = pd.to_datetime(contribution_df["Date"]) + contribution_df["Year"] = contribution_df["Date"].dt.year + contribution_df = contribution_df.sort_values(by="Year", ascending=False) + + contribution_df["DonorType"] = contribution_df["DonorType"].str.upper() + + contribution_df["Amount"] = pd.to_numeric( + contribution_df["Amount"], errors="coerce" + ) + contribution_df["Amount"] = contribution_df["Amount"].fillna(0) + + contribution_df["InKindAmount"] = pd.to_numeric( + contribution_df["InKindAmount"], errors="coerce" + ) + contribution_df["InKindAmount"] = contribution_df["InKindAmount"].fillna(0) + + contribution_df["TotalAmount"] = ( + contribution_df["Amount"] + contribution_df["InKindAmount"] + ) + + contribution_df["Year"].fillna(-1, inplace=True) + contribution_df["RegNumb"].fillna(-1, inplace=True) + contribution_df["Year"] = contribution_df["Year"].astype(int) + contribution_df["RegNumb"] = contribution_df["RegNumb"].astype(int) + + return contribution_df + + +@deprecated +def drop_nonclassifiable(df: pd.DataFrame) -> pd.DataFrame: + """ + Drop contributions with zero transaction amount or no donor registration + number, or no donor name + + Args: + df (DataFrame): MN contribution DataFrames to drop nonclassifiable data + Returns: + DataFrame: the contribution df without non-classifiable data + """ + + df = df[df["TotalAmount"] != 0] + df = df.dropna(subset=["RegNumb", "DonorName"], how="any") + df = df.reset_index(drop=True) + + return df + + +@deprecated +def preprocess_expenditure(df: pd.DataFrame) -> pd.DataFrame: + """ + Preprocesses MN independent expenditure dataset into a DataFrame. + + Args: + df (DataFrame): the MN independent expenditure DataFrames to preprocess + Returns: + DataFrame: Preprocessed MN general expenditure DataFrames + """ + + df_copy = df.copy(deep=True) + columns_to_keep = [ + "Spender Reg Num", + "Spender", + "Spender type", + "Vendor name", + "Amount", + "Unpaid amount", + "Date", + "Year", + "Purpose", + "Type", + "In kind?", + "In kind descr", + "Affected Comte Name", + "Affected Cmte Reg Num", + ] + column_mapping = { + "Spender Reg Num": "SpenderRegNum", + "Spender": "SpenderName", + "Spender type": "SpenderType", + "Vendor name": "VendorName", + "Unpaid amount": "UnpaidAmount", + "In kind?": "In-kind?", + "In kind descr": "InKindDescription", + "Affected Comte Name": "AffectedCommitteeName", + "Affected Cmte Reg Num": "AffectedCommitteeRegNum", + } + + df_copy = df_copy[columns_to_keep] + df_copy.rename(columns=column_mapping, inplace=True) + + return df_copy + + +@deprecated +def drop_nonclassifiable_expenditure(df: pd.DataFrame) -> pd.DataFrame: + """ + Drop contributions with zero transaction amount or no spender registration + number and name + + Args: + df (DataFrame): MN expenditure DataFrames to drop nonclassifiable data + Returns: + DataFrame: the expenditure df without non-classifiable data + """ + + df = df[df["Amount"] != 0] + df = df.dropna(subset=["SpenderRegNum", "SpenderName"], how="any") + df = df.reset_index(drop=True) + + return df diff --git a/utils/PA_Data_Web_Scraper.py b/utils/PA_Data_Web_Scraper.py index 0db8adf..4313e54 100644 --- a/utils/PA_Data_Web_Scraper.py +++ b/utils/PA_Data_Web_Scraper.py @@ -3,24 +3,10 @@ import numpy as np import requests -from bs4 import BeautifulSoup as BS from utils import constants as const -def make_request(website_url: str) -> object: - """makes a HTTML request to the specified url, whose data is pulled out into - a Beautiful Soup - - Args: - website_url: the url link to the campaign finance reports on PA's - government website - - Returns: A parsed BeautifulSoup document - """ - return BS(requests.get(website_url).text, "html.parser") - - def download_PA_data(start_year: int, end_year: int): """downloads PA datasets from specified years to a local directory Args: @@ -34,8 +20,27 @@ def download_PA_data(start_year: int, end_year: int): years = np.arange(start_year, end_year + 1) for year in years: - link = const.PA_MAIN_URL + const.PA_ZIPPED_URL + str(year) + ".zip" + link = f"{const.PA_MAIN_URL}{const.PA_ZIPPED_URL}{year}.zip" + req = requests.get(link) zippedfiles = zipfile.ZipFile(BytesIO(req.content)) - zippedfiles.extractall("../data") + for zippedfile in zippedfiles.infolist(): + zippedfile.filename = zippedfile.filename.replace( + ".txt", "_" + str(year) + ".txt" + ) + zippedfiles.extract(zippedfile, "../data/Raw/PA") + + +def main(): + """""" + text = input( + "Provide a range of desired years to extract data. Format is \ + year1, year2. Ex: 2018, 2023" + ) + years = text.split(",") + download_PA_data(years[0], years[1]) + + +if __name__ == "__main__": + main() diff --git a/utils/PA_EDA_Functions.py b/utils/PA_EDA_Functions.py index 9820912..50c692b 100644 --- a/utils/PA_EDA_Functions.py +++ b/utils/PA_EDA_Functions.py @@ -23,6 +23,10 @@ def assign_col_names(filepath: str, year: int) -> list: file_type = dir[len(dir) - 1] if "contrib" in file_type: + # in 2022 PA changed its data storage format by adding an extra variable + # making the number and names of columns different from preceding years. + # The following if statements account for this by referencing the + # appropriate column list in constants.py if year < 2022: return const.PA_CONT_COLS_NAMES_PRE2022 else: @@ -70,8 +74,9 @@ def pre_process_contributor_dataset(df: pd.DataFrame): a pandas dataframe whose columns are appropriately formatted. """ df["TOTAL_CONT_AMT"] = df["CONT_AMT_1"] + df["CONT_AMT_2"] + df["CONT_AMT_3"] - df["CONTRIBUTOR"] = df["CONTRIBUTOR"].astype("str") - df["CONTRIBUTOR_TYPE"] = df["CONTRIBUTOR"].apply(classify_contributor) + df["DONOR"] = df["DONOR"].astype("str") + df["DONOR"] = df["DONOR"].str.upper() + df["DONOR_TYPE"] = df["DONOR"].apply(classify_contributor) df.drop( columns={ "ADDRESS_1", @@ -88,16 +93,20 @@ def pre_process_contributor_dataset(df: pd.DataFrame): "E_ZIPCODE", "SECTION", "CYCLE", + "CONT_DESCRIP", "CONT_DATE_1", + "CONT_AMT_1", "CONT_DATE_2", + "CONT_AMT_2", "CONT_DATE_3", + "CONT_AMT_3", }, inplace=True, ) if "TIMESTAMP" in df.columns: df.drop(columns={"TIMESTAMP", "REPORTER_ID"}, inplace=True) - df["CONTRIBUTOR"] = df["CONTRIBUTOR"].apply(lambda x: str(x).upper()) + df["DONOR"] = df["DONOR"].apply(lambda x: str(x).upper()) return df @@ -135,9 +144,9 @@ def pre_process_filer_dataset(df: pd.DataFrame): if "TIMESTAMP" in df.columns: df.drop(columns={"TIMESTAMP", "REPORTER_ID"}, inplace=True) - df.drop_duplicates(subset=["FILER_ID"], inplace=True) - df["FILER_TYPE"] = df.FILER_TYPE.map(const.PA_FILER_ABBREV_DICT) - df["FILER_NAME"] = df["FILER_NAME"].apply(lambda x: str(x).upper()) + df.drop_duplicates(subset=["RECIPIENT_ID"], inplace=True) + df["RECIPIENT_TYPE"] = df.RECIPIENT_TYPE.map(const.PA_FILER_ABBREV_DICT) + df["RECIPIENT"] = df["RECIPIENT"].apply(lambda x: str(x).upper()) return df @@ -165,8 +174,8 @@ def pre_process_expense_dataset(df: pd.DataFrame): ) if "EXPENSE_REPORTER_ID" in df.columns: df.drop(columns={"EXPENSE_TIMESTAMP", "EXPENSE_REPORTER_ID"}, inplace=True) - df["EXPENSE_DESC"] = df["EXPENSE_DESC"].apply(lambda x: str(x).upper()) - df["EXPENSE_NAME"] = df["EXPENSE_NAME"].apply(lambda x: str(x).upper()) + df["PURPOSE"] = df["PURPOSE"].apply(lambda x: str(x).upper()) + df["RECIPIENT"] = df["RECIPIENT"].apply(lambda x: str(x).upper()) return df @@ -194,7 +203,10 @@ def initialize_PA_dataset(data_filepath: str, year: int) -> pd.DataFrame: ) df["YEAR"] = year - df["FILER_ID"] = df["FILER_ID"].astype("str") + if "RECIPIENT_ID" in df.columns: + df["RECIPIENT_ID"] = df["RECIPIENT_ID"].astype("str") + else: + df["DONOR_ID"] = df["DONOR_ID"].astype("str") dir = data_filepath.split("/") file_type = dir[len(dir) - 1] @@ -227,7 +239,7 @@ def top_n_recipients(df: pd.DataFrame, num_recipients: int) -> object: Returns: A pandas table (object)""" recipients = ( - df.groupby(["FILER_NAME"]) + df.groupby(["RECIPIENT"]) .agg({"TOTAL_CONT_AMT": sum}) .sort_values(by="TOTAL_CONT_AMT", ascending=False) ) @@ -253,7 +265,7 @@ def top_n_contributors(df: pd.DataFrame, num_contributors: int) -> object: a pandas table (object)""" contributors = ( - df.groupby(["CONTRIBUTOR"]) + df.groupby(["DONOR"]) .agg({"TOTAL_CONT_AMT": sum}) .sort_values(by="TOTAL_CONT_AMT", ascending=False) ) @@ -276,7 +288,7 @@ def merge_same_year_datasets( Returns The merged pandas dataframe """ - merged_df = pd.merge(cont_file, filer_file, how="left", on="FILER_ID") + merged_df = pd.merge(cont_file, filer_file, how="left", on="RECIPIENT_ID") return merged_df @@ -302,7 +314,7 @@ def group_filerType_Party(dataset: pd.DataFrame) -> object: Returns: A table object""" - return dataset.groupby(["FILER_TYPE", "PARTY"]).agg({"TOTAL_CONT_AMT": sum}) + return dataset.groupby(["RECIPIENT_TYPE", "PARTY"]).agg({"TOTAL_CONT_AMT": sum}) def plot_recipients_by_office(merged_dataset: pd.DataFrame) -> object: @@ -347,7 +359,7 @@ def compare_cont_by_donorType(merged_dataset: pd.DataFrame) -> object: """ pd.set_option("display.float_format", "{:.2f}".format) cont_by_donor = ( - merged_dataset.groupby(["YEAR", "FILER_TYPE"]) + merged_dataset.groupby(["YEAR", "RECIPIENT_TYPE"]) .agg({"TOTAL_CONT_AMT": sum}) .reset_index() ) @@ -356,11 +368,11 @@ def compare_cont_by_donorType(merged_dataset: pd.DataFrame) -> object: data_frame=cont_by_donor, x="YEAR", y="TOTAL_CONT_AMT", - color="FILER_TYPE", + color="RECIPIENT_TYPE", title="PA Recipients of Annual Contributions (2018 - 2023)", labels={ "TOTAL_CONT_AMT": "Total Contribution Amount", - "FILER_TYPE": "Type of Filer", + "RECIPIENT_TYPE": "Type of Filer", }, ) fig.show() diff --git a/utils/PA_constants.py b/utils/PA_constants.py deleted file mode 100644 index 6c10644..0000000 --- a/utils/PA_constants.py +++ /dev/null @@ -1,136 +0,0 @@ -""" -This document lists the constants used in web scraping and Exploratory -Data Analysis - -""" -# Web Scraping Constants: - -main_url = "https://www.dos.pa.gov" -zipped_url = ( - "/VotingElections/CandidatesCommittees/CampaignFinance/Resources/Documents/" -) - -# EDA constants: - -cont_cols_names_pre2022: list = [ - "FilerID", - "EYear", - "Cycle", - "Section", - "Contributor", - "Address1", - "Address2", - "City", - "State", - "Zipcode", - "occupation", - "Ename", - "EAddress1", - "EAddress2", - "ECity", - "EState", - "EZipcode", - "ContDate1", - "ContAmt1", - "ContDate2", - "ContAmt2", - "ContDate3", - "ContAmt3", - "ContDesc", -] - -cont_cols_names_post22: list = [ - "FilerID", - "ReporterID", - "Timestamp", - "EYear", - "Cycle", - "Section", - "Contributor", - "Address1", - "Address2", - "City", - "State", - "Zipcode", - "occupation", - "Ename", - "EAddress1", - "EAddress2", - "ECity", - "EState", - "EZipcode", - "ContDate1", - "ContAmt1", - "ContDate2", - "ContAmt2", - "ContDate3", - "ContAmt3", - "ContDesc", -] - -filer_cols_names_pre2022: list = [ - "FilerID", - "EYear", - "Cycle", - "Amend", - "Terminate", - "FilerType", - "FilerName", - "Office", - "District", - "Party", - "Address1", - "Address2", - "City", - "State", - "Zipcode", - "County", - "PHONE", - "BEGINNING", - "MONETARY", - "INKIND", -] - -filer_cols_names_post2022: list = [ - "FilerID", - "ReporterID", - "Timestamp", - "EYear", - "Cycle", - "Amend", - "Terminate", - "FilerType", - "FilerName", - "Office", - "District", - "Party", - "Address1", - "Address2", - "City", - "State", - "Zipcode", - "County", - "PHONE", - "BEGINNING", - "MONETARY", - "INKIND", -] - -office_abb_dict: dict = { - "GOV": "Governor", - "LTG": "Liutenant Gov", - "ATT": "Attorney General", - "AUD": "Auditor General", - "TRE": "State Treasurer", - "SPM": "Justice of the Supreme Crt", - "SPR": "Judge of the Superior Crt", - "CCJ": "Judge of the CommonWealth Crt", - "CPJ": "Judge of the Crt of Common Pleas", - "MCJ": "Judge of the Municipal Crt", - "TCJ": "Judge of the Traffic Crt", - "STS": "Senator (General Assembly)", - "STH": "Rep (General Assembly)", - "OTH": "Other(local offices)", - "MISC": "Unknown", -} -filer_abb_dict: dict = {1.0: "Candidate", 2.0: "Committee", 3.0: "Lobbyist"} diff --git a/utils/README.md b/utils/README.md index 08644ed..ec020f4 100644 --- a/utils/README.md +++ b/utils/README.md @@ -1,43 +1,22 @@ -**Guide on running the Arizona state cleaner** - -**The Files** - -The state cleaner is located in utils/arizona.py. - -It is further supported by the following files: - -utils/clean.py contains the base abstract class which the Arizona cleaner is built upon. - -utils/cleaner_utils.py contains utility functions necessary to the cleaner's functioning. - -utils/constants.py contains constants necessary to the cleaner's functioning, including the filepaths for the files to be used in the demo. - -utils/az_curl_crawler.py is the web crawler/scraper which gathered the data which the state cleaner processes. It is not necessary for the purposes of this demo. - -**Information** - -The state cleaner takes three files as input. They are in the google drive as az_transactions_demo.csv, az_individuals_demo.csv, and az_orgs_demo.csv. The filepaths leading to them (assuming one is working in the google drive) are in constants.py and at the bottom of arizona.py. - -Each file contains a subset of arizona electoral finance data of each of the six categories we examine: those being candidates, individual contributors, political action committees (PACs), poltical parties, vendors, and organizations (which the Arizona database uses as a catch-all term). - -az_transactions_demo.csv contains transaction-level data: each row is a transaction, with a little over 50 columns of information, most of which gets filtered out. - -az_individuals_demo.csv contains information on the individuals (for us, that means individual contributors and candidates) involved in transactions in the dataset. - -az_orgs_demo.csv contains information on the organizations (those being the PACs, parties, 'organizations' and vendors) involved in transactions in the dataset. - -**Running the State Cleaner** - -The arizona.py file has an if __name__ == "__main__" clause at the bottom which will run the cleaner on the demo data. The data must be available via the specified filepaths in the google drive (laid out in utils/constants). Simply use: - -from utils.arizona import ArizonaCleaner -clean = ArizonaCleaner() - -clean_tables = clean.clean_state() - -This process should take under four minutes. - -Otherwise, one may call ArizonaCleaner.clean_state([]) with the relevant filepaths inside the list as the only argument. Note that the files must go in this order: individuals, organizations, transactions. - -Note that since this is a subset of the data, many of the recipients/contributors listed in the final transactions table are not present in the individuals or organizations tables. For example, the original id '-1' refers to anonymous individual contributors within the state of arizona who made donations under $100. We have chosen not to include them in this demo subset because they massively inflate the number of transactions but give us almost no information. However, in all cases, at least one of either the donors or recipients is listed in either the final individuals or organizations tables. - +## Minnesota Util: +#### MN_util.py + +Util functions for MN EDA +1. datasets_col_consistent (deprecated) +2. preprocess_candidate_df (deprecated) +3. preprocess_noncandidate_df (deprecated) +4. preprocess_contribution_df (deprecated) +5. drop_nonclassifiable (deprecated) +6. preprocess_expenditure (deprecated) +7. drop_nonclassifiable_expenditure (deprecated) + +#### minnesota.py +1. entity_name_dictionary +2. preprocess_candidate_contribution +3. preprocess_noncandidate_contribution +4. preprocess_expenditure +5. preprocess +6. clean +7. standardize +8. create_tables +9. clean_state diff --git a/utils/arizona_cleaner.py b/utils/arizona_cleaner.py deleted file mode 100644 index d8b9515..0000000 --- a/utils/arizona_cleaner.py +++ /dev/null @@ -1,209 +0,0 @@ -import pandas as pd -from clean import StateCleaner -from cleaner_utils import ( - az_employment_checker, - az_individuals_convert, - az_name_clean, - az_organizations_convert, - az_transactions_convert, - az_transactor_sorter, - convert_date, -) -from constants import ( - AZ_INDIVIDUALS_FILEPATH, - AZ_ORGANIZATIONS_FILEPATH, - AZ_TRANSACTIONS_FILEPATH, -) - - -class ArizonaCleaner(StateCleaner): - """This class is based on the StateCleaner abstract class, - and cleans Arizona data""" - - def preprocess(filepaths_list: list[str]) -> list[pd.DataFrame]: - """Turns filepaths into dataframes - - The input must be a list of valid filepaths which lead - to pandas dataframes. Typically, these should be just two - files: a transactions file and a details file, as - harvested by az_curl_crawler. If these conditions are not - met, the rest of the pipeline will not work - - args: list of two filepaths for dataframes, - transactions and details, in that order - - returns: a list of two dataframes, transactions and details, - in that order - - """ - - df_list = [] - - for filepath in filepaths_list: - df_list.append(pd.read_csv(filepath)) - - return df_list - - def clean_state(filepaths: list[str]) -> (pd.DataFrame, pd.DataFrame, pd.DataFrame): - """Calls the other methods in order - - This is the master function of the ArizonaCleaner - class, and calling it will activate the cleaning - pipeline which takes in filenames and outputs cleaned, - standardized, and schema-compliant tables - - args: list of two filepaths which lead to dataframes - - returns: three schema-compliant tables for - transactions, individuals, and organizations - - """ - - transactions, individuals, organizations = ArizonaCleaner.preprocess(filepaths) - - details = pd.concat([individuals, organizations]) - - cleaned_transactions, cleaned_details = ArizonaCleaner.clean( - [transactions, details] - ) - - standardized_transactions, standardized_details = ArizonaCleaner.standardize( - [cleaned_transactions, cleaned_details] - ) - - ( - az_individuals, - az_organizations, - az_transactions, - ) = ArizonaCleaner.create_tables( - [standardized_transactions, standardized_details] - ) - - return (az_individuals, az_organizations, az_transactions) - - def create_tables( - data: list[pd.DataFrame], - ) -> (pd.DataFrame, pd.DataFrame, pd.DataFrame): - """ - Creates the Individuals, Organizations, and Transactions tables from - the dataframe list outputted from standardize - - Inputs: - data: a list of 1 or 3 dataframes as outputted from standardize method. - - Returns: (individuals_table, organizations_table, transactions_table) - tuple containing the tables as defined in database schema - """ - - transactions, details = data - - individual_details = details[ - (details["entity_type"] == "Individual") - | (details["entity_type"] == "Candidate") - ] - organization_details = details[ - (details["entity_type"] != "Individual") - & (details["entity_type"] != "Candidate") - ] - - # gathers relevant columns, puts them in schema order, - # and enforces datatype - az_transactions = az_transactions_convert(transactions) - - # does the same for individuals and organizations, - # so long as there is some amount of data in each - - if len(individual_details) > 0: - az_individuals = az_individuals_convert(individual_details) - else: - az_individuals = None - - if len(organization_details) > 0: - az_organizations = az_organizations_convert(organization_details) - else: - az_organizations = None - - return (az_individuals, az_organizations, az_transactions) - - def standardize(details_df_list: list[pd.DataFrame]) -> list[pd.DataFrame]: - """standardize names of entities - - takes in details dataframe and replaces the names of - organization types to fit into the schema when appropriate - - args: details dataframe - - returns: details dataframe with relevant entity type - names replaced by those for the regular schema - """ - - transactions_df, details_df = details_df_list[0], details_df_list[1] - - az_entity_name_dictionary = { - "Organizations": "Company", - "PACs": "Committee", - "Parties": "Party", - "Vendors": "Vendor", - "Individual Contributors": "Individual", - "Candidates": "Candidate", - } - details_df.replace({"entity_type": az_entity_name_dictionary}, inplace=True) - - return transactions_df, details_df - - def clean(data: list[pd.DataFrame]) -> pd.DataFrame: - """clean the contents of the columns - - INCOMPLETE - - transactions and details dataframes undergo cleaning of - transaction dates, names are imputed to the right column, - and employer information is retrieved, - - args: transactions and details dataframes - - returns: cleaned transactions and details dataframes - - NOTE: TO DO: coerce correct dtypes and make text lowercase - - """ - - transactions, details = data - - merged_df = pd.merge(details, transactions, on="retrieved_id", how="inner") - - # Filter rows in the first dataframe based on the common 'ids' - details = details[details["retrieved_id"].isin(merged_df["retrieved_id"])] - - try: - transactions["TransactionDate"] = transactions["TransactionDate"].apply( - convert_date - ) - except TypeError: - transactions["TransactionDate"] = transactions["TransactionDate"] - - details = az_name_clean(details) - - details = details.apply(az_employment_checker, args=(transactions,), axis=1) - - transactions = transactions.apply(az_transactor_sorter, axis=1) - - merged_df = pd.merge( - transactions["base_transactor_id"], - details[["retrieved_id", "office_name"]], - how="left", - left_on="base_transactor_id", - right_on="retrieved_id", - ) - - office_sought = merged_df.where(pd.notnull(merged_df), None)["office_name"] - - transactions["office_sought"] = office_sought - - return [transactions, details] - - -if __name__ == "__main__": - ArizonaCleaner.clean_state( - [AZ_TRANSACTIONS_FILEPATH, AZ_INDIVIDUALS_FILEPATH, AZ_ORGANIZATIONS_FILEPATH] - ) diff --git a/utils/clean.py b/utils/clean.py index 6d1238a..9f22c63 100644 --- a/utils/clean.py +++ b/utils/clean.py @@ -32,10 +32,7 @@ def preprocess(self, filepaths_list: list[str]) -> list[pd.DataFrame]: required naming conventions, order, and extensions defined per state. - Returns: a list of dataframes. If state data is all in one format - (i.e. there are not separate individual and transaction tables), - a list containing a single dataframe. Otherwise a list of three - DataFrames that represent [transactions, individuals, organizations] + Returns: a list of dataframes based on the needs of each state. """ pass @@ -43,11 +40,11 @@ def preprocess(self, filepaths_list: list[str]) -> list[pd.DataFrame]: def clean(self, data: list[pd.DataFrame]) -> list[pd.DataFrame]: """Cleans the state dataframe as needed and returns the dataframe - Cleans the columns, converts dtypes to match database schema, and drops rows - not representing minimal viable transactions + Cleans the columns, converts dtypes to match database schema, and drops + rows not representing minimal viable transactions Inputs: - data: a list of 1 or 3 dataframes as outputted from preprocess method. + data: a list of 1 or 3 dataframes as output from preprocess method. Returns: a list of dataframes. If state data is all in one format (i.e. there are not separate individual and transaction tables), @@ -60,8 +57,8 @@ def clean(self, data: list[pd.DataFrame]) -> list[pd.DataFrame]: def standardize(self, data: list[pd.DataFrame]) -> list[pd.DataFrame]: """Standardizes the dataframe into the necessary format for the schema - Maps entity/office types and column names as defined in schema, adjust and add - UUIDs as necessary + Maps entity/office types and column names as defined in schema, adjust + and add UUIDs as necessary Inputs: data: a list of 1 or 3 dataframes as outputted from clean method. @@ -73,6 +70,21 @@ def standardize(self, data: list[pd.DataFrame]) -> list[pd.DataFrame]: """ pass + def standardize_entity_names(self, entity: pd.DataFrame) -> pd.DataFrame: + """Creates a new 'standard_entity_type' column from 'raw_entity_type' + Args: + entity_table: an entity dataframe containing 'raw_entity_type' + + Returns: entity_table with 'standard_entity_type created from the + entity_name_dictionary + """ + entity["standard_entity_type"] = entity["raw_entity_type"].map( + lambda raw_entity_type: self.entity_name_dictionary.get( + raw_entity_type, None + ) + ) + return entity + @abstractmethod def create_tables( self, data: list[pd.DataFrame] @@ -82,7 +94,7 @@ def create_tables( the dataframe list outputted from standardize Inputs: - data: a list of 1 or 3 dataframes as outputted from standardize method. + data: a list of 1 or 3 dataframes as output from standardize method. Returns: (individuals_table, organizations_table, transactions_table) tuple containing the tables as defined in database schema @@ -90,7 +102,7 @@ def create_tables( pass @abstractmethod - def clean_state(self) -> (pd.DataFrame, pd.DataFrame, pd.DataFrame): + def clean_state(self) -> (pd.DataFrame, pd.DataFrame, list[pd.DataFrame]): """ Runs the StateCleaner pipeline returning a tuple of cleaned dataframes @@ -104,7 +116,8 @@ def clean_state(self) -> (pd.DataFrame, pd.DataFrame, pd.DataFrame): defined per state. Returns: cleans the state and returns the standardized Inidividuals, - Organizations, and Transactions tables in a tuple + Organizations, and list of Transactions tables in the order: + [ind->ind, ind->org, org->ind, org->org] tables in a tuple """ pass diff --git a/utils/constants.py b/utils/constants.py index 3542395..1bf573e 100644 --- a/utils/constants.py +++ b/utils/constants.py @@ -6,6 +6,10 @@ BASE_FILEPATH = Path("constants.py").resolve().parent # returns the base_path to the directory +MI_EXP_FILEPATH = BASE_FILEPATH / "data" / "raw" / "MI" / "Expenditure" + +MI_CON_FILEPATH = BASE_FILEPATH / "data" / "raw" / "MI" / "Contribution" + AZ_TRANSACTIONS_FILEPATH = ( BASE_FILEPATH / "data" / "raw" / "AZ" / "az_transactions_demo.csv" ) @@ -21,10 +25,6 @@ HEADERS = {"User-Agent": USER_AGENT} -MI_EXP_FILEPATH = str(BASE_FILEPATH / "data" / "Expenditure") - -MI_CON_FILEPATH = str(BASE_FILEPATH / "data" / "Contribution") - MI_SOS_URL = "https://miboecfr.nictusa.com/cfr/dumpall/cfrdetail/" MI_CONTRIBUTION_COLUMNS = [ @@ -55,6 +55,142 @@ "extra_desc", ] +# MN State Cleaner Constants: + +MN_CANDIDATE_CONTRIBUTION_COL = [ + "OfficeSought", + "CandRegNumb", + "CandFirstName", + "CandLastName", + "DonationDate", + "DonorType", + "DonorName", + "DonationAmount", + "InKindDonAmount", + "InKindDescriptionText", +] + +MN_CANDIDATE_CONTRIBUTION_MAP = { + "OfficeSought": "office_sought", + "CandRegNumb": "recipient_id", + "CandFirstName": "recipient_first_name", + "CandLastName": "recipient_last_name", + "DonationDate": "date", + "DonorType": "donor_type", + "DonorName": "donor_full_name", + "DonationAmount": "amount", + "InKindDonAmount": "inkind_amount", + "InKindDescriptionText": "purpose", +} + +MN_NONCANDIDATE_CONTRIBUTION_COL = [ + "PCFRegNumb", + "Committee", + "ETType", + "DonationDate", + "DonorType", + "DonorRegNumb", + "DonorName", + "DonationAmount", + "InKindDonAmount", + "InKindDescriptionText", +] + +MN_NONCANDIDATE_CONTRIBUTION_MAP = { + "PCFRegNumb": "recipient_id", + "Committee": "recipient_full_name", + "ETType": "recipient_type", + "DonationDate": "date", + "DonorType": "donor_type", + "DonorRegNumb": "donor_id", + "DonorName": "donor_full_name", + "DonationAmount": "amount", + "InKindDonAmount": "inkind_amount", + "InKindDescriptionText": "purpose", +} + +MN_INDEPENDENT_EXPENDITURE_COL = [ + "Spender", + "Spender Reg Num", + "Spender type", + "Affected Comte Name", + "Affected Cmte Reg Num", + "For /Against", + "Date", + "Type", + "Amount", + "Purpose", + "Vendor State", +] + +MN_INDEPENDENT_EXPENDITURE_MAP = { + "Spender": "donor_full_name", + "Spender Reg Num": "donor_id", + "Spender type": "donor_type", + "Affected Comte Name": "recipient_full_name", + "Affected Cmte Reg Num": "recipient_id", + "Date": "date", + "Amount": "amount", + "Purpose": "purpose", + "Type": "transaction_type", + "Vendor State": "state", +} + +MN_RACE_MAP = { + "GC": "Governor", + "AG": "Attorney General", + "SS": "Secretary of State", + "SA": "State Auditor", + "ST": "State Treasurer", + "Senate": "State Senator", + "House": "State Representative", + "SC": "State Supreme Court Justice", + "AP": "State Appeals Court Judge", + "DC": "State District Court Judge", +} + + +MI_CONT_DROP_COLS = [ + "doc_seq_no", + "page_no", + "cont_detail_id", + "doc_type_desc", + "address", + "city", + "zip", + "occupation", + "received_date", + "aggregate", + "extra_desc", +] + +MI_EXP_DROP_COLS = [ + "doc_seq_no", + "expenditure_type", + "gub_account_type", + "gub_elec_type", + "page_no", + "detail_id", + "doc_type_desc", + "extra_desc", + "address", + "city", + "zip", + "exp_date", + "state_loc", + "supp_opp", + "can_or_ballot", + "county", + "debt_payment", + "vend_addr", + "vend_city", + "vend_state", + "vend_zip", + "gotv_ink_ind", + "fundraiser", +] + + PA_MAIN_URL = "https://www.dos.pa.gov" PA_ZIPPED_URL = ( "/VotingElections/CandidatesCommittees/CampaignFinance/Resources/Documents/" @@ -63,11 +199,11 @@ # PA EDA constants: PA_CONT_COLS_NAMES_PRE2022: list = [ - "FILER_ID", + "RECIPIENT_ID", "YEAR", "CYCLE", "SECTION", - "CONTRIBUTOR", + "DONOR", "ADDRESS_1", "ADDRESS_2", "CITY", @@ -90,13 +226,13 @@ ] PA_CONT_COLS_NAMES_POST2022: list = [ - "FILER_ID", + "RECIPIENT_ID", "REPORTER_ID", "TIMESTAMP", "YEAR", "CYCLE", "SECTION", - "CONTRIBUTOR", + "DONOR", "ADDRESS_1", "ADDRESS_2", "CITY", @@ -119,16 +255,16 @@ ] PA_FILER_COLS_NAMES_PRE2022: list = [ - "FILER_ID", + "RECIPIENT_ID", "YEAR", "CYCLE", "AMEND", "TERMINATE", - "FILER_TYPE", - "FILER_NAME", - "OFFICE", + "RECIPIENT_TYPE", + "RECIPIENT", + "RECIPIENT_OFFICE", "DISTRICT", - "PARTY", + "RECIPIENT_PARTY", "ADDRESS_1", "ADDRESS_2", "CITY", @@ -142,18 +278,18 @@ ] PA_FILER_COLS_NAMES_POST2022: list = [ - "FILER_ID", + "RECIPIENT_ID", "REPORTER_ID", "TIMESTAMP", "YEAR", "CYCLE", "AMEND", "TERMINATE", - "FILER_TYPE", - "FILER_NAME", - "OFFICE", + "RECIPIENT_TYPE", + "RECIPIENT", + "RECIPIENT_OFFICE", "DISTRICT", - "PARTY", + "RECIPIENT_PARTY", "ADDRESS_1", "ADDRESS_2", "CITY", @@ -167,35 +303,35 @@ ] PA_EXPENSE_COLS_NAMES_PRE2022: list = [ - "FILER_ID", + "DONOR_ID", "YEAR", "EXPENSE_CYCLE", - "EXPENSE_NAME", + "RECIPIENT", "EXPENSE_ADDRESS_1", "EXPENSE_ADDRESS_2", "EXPENSE_CITY", "EXPENSE_STATE", "EXPENSE_ZIPCODE", "EXPENSE_DATE", - "EXPENSE_AMT", - "EXPENSE_DESC", + "AMOUNT", + "PURPOSE", ] PA_EXPENSE_COLS_NAMES_POST2022: list = [ - "FILER_ID", + "DONOR_ID", "EXPENSE_REPORTER_ID", "EXPENSE_TIMESTAMP", "YEAR", "EXPENSE_CYCLE", - "EXPENSE_NAME", + "RECIPIENT", "EXPENSE_ADDRESS_1", "EXPENSE_ADDRESS_2", "EXPENSE_CITY", "EXPENSE_STATE", "EXPENSE_ZIPCODE", "EXPENSE_DATE", - "EXPENSE_AMT", - "EXPENSE_DESC", + "AMOUNT", + "PURPOSE", ] PA_OFFICE_ABBREV_DICT: dict = { @@ -221,6 +357,7 @@ "OTH": "Other(local offices)", } PA_FILER_ABBREV_DICT: dict = {1.0: "Candidate", 2.0: "Committee", 3.0: "Lobbyist"} + PA_ORGANIZATION_IDENTIFIERS: list = [ "FRIENDS", "CITIZENS", @@ -255,6 +392,7 @@ "INC", "INCORPORATED", "LLC", + "FUND", ] MI_EXPENDITURE_COLUMNS = [ @@ -297,6 +435,60 @@ "fundraiser", ] +MICHIGAN_CONTRIBUTION_COLS_REORDER = [ + "doc_seq_no", + "page_no", + "contribution_id", + "cont_detail_id", + "doc_stmnt_year", + "doc_type_desc", + "common_name", + "com_type", + "can_first_name", + "can_last_name", + "contribtype", + "f_name", + "l_name_or_org", + "address", + "city", + "state", + "zip", + "occupation", + "employer", + "amount", + "received_date", + "aggregate", + "extra_desc", + "amount", +] + +MICHIGAN_CONTRIBUTION_COLS_RENAME = [ + "doc_seq_no", + "page_no", + "contribution_id", + "cont_detail_id", + "doc_stmnt_year", + "doc_type_desc", + "com_legal_name", + "common_name", + "cfr_com_id", + "com_type", + "can_first_name", + "can_last_name", + "contribtype", + "f_name", + "l_name_or_org", + "address", + "city", + "state", + "zip", + "occupation", + "employer", + "received_date", + "amount", + "aggregate", +] + AZ_pages_dict = { "Candidate": 1, diff --git a/utils/mi_campaign_webscraper.py b/utils/mi_campaign_webscraper.py index 573d932..f651b85 100644 --- a/utils/mi_campaign_webscraper.py +++ b/utils/mi_campaign_webscraper.py @@ -132,7 +132,7 @@ def create_directory() -> None: for path in FILEPATHS: if os.path.exists(path): - # remove existing MI contribution data + # remove existing MI campaign data shutil.rmtree(path) print(f"Deleted existing directory: {path}") @@ -141,3 +141,14 @@ def create_directory() -> None: else: os.makedirs(path) print(f"Created directory: {path}") + + +def main() -> None: + """ + Runs the main function and scrapes and downloads the MI campaign data + """ + scrape_and_download_mi_data() + + +if __name__ == "__main__": + main() diff --git a/utils/michigan.py b/utils/michigan.py new file mode 100644 index 0000000..18f5e28 --- /dev/null +++ b/utils/michigan.py @@ -0,0 +1,925 @@ +import os +import uuid + +import numpy as np +import pandas as pd + +from utils.clean import StateCleaner +from utils.constants import ( + BASE_FILEPATH, + MI_CON_FILEPATH, + MI_CONT_DROP_COLS, + MI_CONTRIBUTION_COLUMNS, + MI_EXP_DROP_COLS, + MI_EXP_FILEPATH, + MI_EXPENDITURE_COLUMNS, + MICHIGAN_CONTRIBUTION_COLS_RENAME, + MICHIGAN_CONTRIBUTION_COLS_REORDER, +) +from utils.preprocess_mi_campaign_data import ( + read_contribution_data, + read_expenditure_data, +) + + +class MichiganCleaner(StateCleaner): + entity_name_dictionary = { + "cfr_com_id": "original_com_id", + "f_name": "first_name", + "l_name_or_org": "last_name", + "employer": "company", + "doc_stmnt_year": "year", + "exp_desc": "exp_desc", + "contribtype": "transaction_type", + "schedule_desc": "transaction_type", + } + # individuals_column_order = [] + # organizations_column_order = [] + # transactions_column_order = [] + # NOTE: could these be added as constants? + id_mapping_column_order = [ + "state", + "year", + "entity_type", + "provided_id", + "database_id", + ] + # map to entity types listed in the schema + + def create_filepaths_list(self) -> list[list[str], list[str]]: + """Creates a list of Michigan Contribution and Expenditure filepaths by + first iterating through the expenditure filepaths then the contribution + filepaths + + Inputs: None + + Returns: List of lists of strings + exp_filepath_lst: list of expenditure filepaths + con_filepath_lst: list of contribution filepaths + """ + exp_filepath_lst = [] + con_filepath_lst = [] + + for file in MI_EXP_FILEPATH.iterdir(): + exp_filepath_lst.append(str(file)) + for file in MI_CON_FILEPATH.iterdir(): + con_filepath_lst.append(str(file)) + + return [exp_filepath_lst, con_filepath_lst] + + # NOTE: Helper methods above are called throughout the class + + def preprocess(self, filepaths_list: list[str]) -> list[pd.DataFrame]: + """ + Preprocesses the state data and returns a dataframe + + Reads in the state's data, makes any necessary bug fixes, and + combines the data into a list of DataFrames, discards data not schema + + Inputs: + filepaths_list: list of lists of absolute filepaths to relevant state data. + required naming conventions, order, and extensions + defined per state. + + Returns: a list of dataframes containing campaign contribution and + expenditure data + """ + expenditures_lst, contributions_lst = filepaths_list + + temp_exp_list = [] + temp_cont_list = [] + + for file in expenditures_lst: + temp_exp_list.append(read_expenditure_data(file, MI_EXPENDITURE_COLUMNS)) + for file in contributions_lst: + temp_cont_list.append(read_contribution_data(file, MI_CONTRIBUTION_COLUMNS)) + + contribution_dataframe = self.merge_dataframes(temp_cont_list) + expenditure_dataframe = self.merge_dataframes(temp_exp_list) + + return [contribution_dataframe, expenditure_dataframe] + + # NOTE: Helper methods for preprocess are below + + def merge_dataframes(self, temp_list: [pd.DataFrame, pd.DataFrame]) -> pd.DataFrame: + """Merges the list of dataframes into one Pandas DataFrame + + Inputs: + temp_list: list of contribution of expenditure dataframes + + Returns: + merged_dataframe: Pandas DataFrame of merged contribution + or expenditure data + """ + merged_dataframe = pd.concat(temp_list) + if "schedule_desc" not in merged_dataframe.columns: + # "schedule_desc" is only in the expenditure dataframe + merged_dataframe = self.fix_menominee_county_bug_contribution( + merged_dataframe + ) + else: + merged_dataframe = self.drop_menominee_county(merged_dataframe) + + return merged_dataframe + + def fix_menominee_county_bug_contribution( + self, merged_campaign_dataframe: pd.DataFrame + ) -> pd.DataFrame: + """Fixes the Menominee County Rows within the Contribution data + + Inputs: + merged_campaign_dataframe: Pandas DataFrame of merged MI + contribution data + + + Returns: + merged_campaign_dataframe: Pandas DataFrame of merged MI + contribution data edited in place to fix the + Menominee County Democratic Party bug + + """ + subset_condition = ( + merged_campaign_dataframe["com_type"] == "MENOMINEE COUNTY DEMOCRATIC PARTY" + ) + + rows_to_fix = merged_campaign_dataframe[subset_condition] + + merged_campaign_dataframe = merged_campaign_dataframe.drop( + merged_campaign_dataframe.loc[subset_condition].index + ) + + rows_to_fix = rows_to_fix[MICHIGAN_CONTRIBUTION_COLS_REORDER] + rows_to_fix.columns = MICHIGAN_CONTRIBUTION_COLS_RENAME + + rows_to_fix["aggregate"] = 0.0 + + merged_campaign_dataframe = pd.concat( + [merged_campaign_dataframe, rows_to_fix], ignore_index=True + ) + + return merged_campaign_dataframe + + def drop_menominee_county( + self, merged_campaign_dataframe: pd.DataFrame + ) -> pd.DataFrame: + """Drops the menominee county rows within the Michigan Expenditure data + + Inputs: + merged_campaign_dataframe: Pandas DataFrame of merged MI expenditure data + + Returns: + merged_campaign_dataframe: Pandas DataFrame of merged MI + expenditure data with menominee County Democratic Party columns dropped + """ + # There are only 20 menominee county rows read in incorrectly and + # missing key data these rows are dropped + subset_condition = ( + merged_campaign_dataframe["com_type"] == "MENOMINEE COUNTY DEMOCRATIC PARTY" + ) + + merged_campaign_dataframe = merged_campaign_dataframe.drop( + merged_campaign_dataframe.loc[subset_condition].index + ) + + return merged_campaign_dataframe + + def clean(self, data: list[pd.DataFrame]) -> list[pd.DataFrame]: + contribution_dataframe, expenditure_dataframe = data + + clean_cont = self.clean_contribution_dataframe(contribution_dataframe) + + clean_exp = self.clean_expenditure_dataframe(expenditure_dataframe) + + merged_dataframe = pd.concat([clean_cont, clean_exp], axis=0, ignore_index=True) + # concatenate the dataframes along rows ignore the prior index + + return [merged_dataframe] + + # NOTE: Helper methods for clean are below + + def clean_contribution_dataframe( + self, + merged_contribution_dataframe: pd.DataFrame, + ) -> pd.DataFrame: + """Cleans the contribution dataframe as needed and returns the dataframe + + Inputs: + merged_contribution_dataframe: + Merged Michigan campaign contribution dataframe + + Returns: + merged_contribution_dataframe: + Merged Michigan campaign contribution dataframe cleaned in place + """ + merged_contribution_dataframe["cfr_com_id"] = ( + merged_contribution_dataframe["cfr_com_id"] + .apply(pd.to_numeric, errors="coerce") + .astype("Int64") + ) + merged_contribution_dataframe["amount"] = merged_contribution_dataframe[ + "amount" + ].apply(pd.to_numeric, errors="coerce") + # convert committee IDs to integer, amount and aggregate cols to float + merged_contribution_dataframe = merged_contribution_dataframe.drop( + columns=MI_CONT_DROP_COLS + ) + merged_contribution_dataframe["full_name"] = ( + merged_contribution_dataframe["f_name"].fillna("") + + " " + + merged_contribution_dataframe["l_name_or_org"] + ) + + merged_contribution_dataframe["candidate_full_name"] = np.where( + merged_contribution_dataframe["can_first_name"].notna() + & merged_contribution_dataframe["can_last_name"].notna(), + merged_contribution_dataframe["can_first_name"] + + " " + + merged_contribution_dataframe["can_last_name"], + np.nan, + ) + + return merged_contribution_dataframe + + def clean_expenditure_dataframe( + self, + merged_expenditure_dataframe: pd.DataFrame, + ) -> pd.DataFrame: + """Cleans the expenditure dataframe as needed and returns the dataframe + + Inputs: + merged_expenditure_dataframe: Merged Michigan campaign + expenditure dataframe + + Returns: + merged_expenditure_dataframe: Merged Michigan expenditure + dataframe cleaned in place + """ + merged_expenditure_dataframe[ + ["amount", "supp_opp"] + ] = merged_expenditure_dataframe[["amount", "supp_opp"]].apply( + pd.to_numeric, errors="coerce" + ) + merged_expenditure_dataframe["cfr_com_id"] = ( + merged_expenditure_dataframe["cfr_com_id"] + .apply(pd.to_numeric, errors="coerce") + .astype("int64") + ) + merged_expenditure_dataframe = merged_expenditure_dataframe.rename( + columns={"lname_or_org": "l_name_or_org"} + ) + # convert committee IDs to integer, amount col to float + # rename last_name column for consistency in standardize + merged_expenditure_dataframe = merged_expenditure_dataframe.drop( + columns=MI_EXP_DROP_COLS + ) + merged_expenditure_dataframe["full_name"] = ( + merged_expenditure_dataframe["f_name"].fillna("") + + " " + + merged_expenditure_dataframe["l_name_or_org"] + ) + + return merged_expenditure_dataframe + + def standardize(self, data: list[pd.DataFrame]) -> list[pd.DataFrame]: + data = self.add_uuid_columns(data) + standardized_merged_dataframe = data[0] + + standardized_merged_dataframe = standardized_merged_dataframe.rename( + columns=self.entity_name_dictionary + ) + + return [standardized_merged_dataframe] + + # NOTE: Helper methods for standardize are below + + def add_uuid_columns( + self, cleaned_dataframe_lst: list[pd.DataFrame] + ) -> list[pd.DataFrame]: + """Generate a UUID for a pandas DataFrame + + Inputs: + merged_expenditure_dataframe: Merged Michigan campaign + expenditure or contribution dataframe + + Returns: + merged_expenditure_dataframe: Merged Michigan campaign + expenditure or contribution dataframe modified in place + """ + merged_dataframe = cleaned_dataframe_lst[0] + + merged_dataframe = self.generate_uuid( + merged_dataframe, + ["full_name", "candidate_full_name", "com_legal_name", "vend_name"], + ) + + return [merged_dataframe] + + def generate_uuid( + self, merged_campaign_dataframe: pd.DataFrame, column_names: [str] + ) -> pd.DataFrame: + """Generates uuids for the pandas DataFrame based on the column names provided + + Inputs: + merged_campaign_dataframe: Merged Michigan campaign + expenditure or contribution dataframe + column_names: List of column names for which UUIDs will be generated + + Returns: + merged_campaign_dataframe: Merged Michigan campaign + expenditure or contribution dataframe modified in place + + """ + for col_name in column_names: + non_null_values = merged_campaign_dataframe[ + merged_campaign_dataframe[col_name].notnull() + ][col_name] + + ids = {value: str(uuid.uuid4()) for value in non_null_values} + + # Map the generated UUIDs to a new column in the DataFrame + merged_campaign_dataframe[ + "{}_uuid".format(col_name) + ] = merged_campaign_dataframe[col_name].map(ids) + + # create transaction ID for each row of the dataframe + merged_campaign_dataframe["transaction_id"] = [ + uuid.uuid4() for _ in range(len(merged_campaign_dataframe)) + ] + + return merged_campaign_dataframe + + def create_tables( + self, data: list[pd.DataFrame] + ) -> (pd.DataFrame, pd.DataFrame, pd.DataFrame): + """ + Creates the Individuals, Organizations, and Transactions tables from + the dataframe list outputted from standardize + + Inputs: + data: a list of 1 or 3 dataframes as outputted from standardize method. + + Returns: (individuals_table, organizations_table, transactions_table) + tuple containing the tables as defined in database schema + """ + individuals_table, individuals_id_mapping = self.create_individuals_table(data) + organizations_table, organizations_id_mapping = self.create_organizations_table( + data + ) + transactions_table, transactions_id_mapping = self.create_transactions_table( + data + ) + + self.output_id_mapping( + individuals_id_mapping, organizations_id_mapping, transactions_id_mapping + ) + + return (individuals_table, organizations_table, transactions_table) + + # NOTE: The helper functions for ID_mapping output are below + + def output_id_mapping( + self, + individuals_map: pd.DataFrame, + organizations_map: pd.DataFrame, + transactions_map: pd.DataFrame, + ) -> None: + """Creates MichiganIDMAp.csv + + Inputs: + individuals_map: dataframe of individuals mapped to database + and provided uuid + organizations_map: dataframe of organizations mapped to database + and provided uuid + transactions_map: database of transactions mapped to databasee + and provided uuid + + Returns: None, Creates data/output/MichiganIDMAp.csv + """ + output_path = BASE_FILEPATH / "output" / "MichiganIDMap.csv" + + michigan_id_map = pd.concat( + [individuals_map, organizations_map, transactions_map], ignore_index=True + ) + + if not os.path.exists(output_path.parent): + os.makedirs(output_path.parent) + + michigan_id_map.to_csv(output_path, index=False) + + def create_individuals_id_mapping( + self, individuals: pd.DataFrame, candidates: pd.DataFrame + ) -> pd.DataFrame: + """Creates the ID mapping dataframe for individuals + + Inputs: + individuals: DataFrame with individuals data + candidates: DataFrame with candidates data + + Returns: id_mapping: dataframe in the ID mapping format + """ + individuals = individuals[["year", "full_name_uuid"]].copy() + candidates = candidates[["year", "candidate_full_name_uuid"]].copy() + + individuals = individuals.rename(columns={"full_name_uuid": "database_id"}) + candidates = candidates.rename( + columns={"candidate_full_name_uuid": "database_id"} + ) + + id_mapping = pd.concat([individuals, candidates], ignore_index=True) + id_mapping["state"] = "MI" + id_mapping["entity_type"] = "Individual" + id_mapping["provided_id"] = np.nan + + id_mapping = id_mapping[self.id_mapping_column_order].copy() + + return id_mapping + + def create_organizations_id_mapping( + self, + corporations: pd.DataFrame, + committees: pd.DataFrame, + vendors: pd.DataFrame, + ) -> pd.DataFrame: + """Creates the ID mapping dataframe for organizations + + Inputs: + corporations: dataframe with corporations data + committees: dataFrame with campaign committee data + vendors: dataFrame with vendors data + + Returns: id_mapping: dataframe in the ID mapping format + """ + corporations = corporations[["year", "full_name_uuid"]].copy() + committees = committees[ + ["year", "com_legal_name_uuid", "original_com_id"] + ].copy() + vendors = vendors[["year", "vend_name_uuid"]].copy() + + corporations = corporations.rename(columns={"full_name_uuid": "database_id"}) + committees = committees.rename( + columns={ + "com_legal_name_uuid": "database_id", + "original_com_id": "provided_id", + } + ) + vendors = vendors.rename(columns={"vend_name_uuid": "database_id"}) + + id_mapping = pd.concat([corporations, committees, vendors], ignore_index=True) + id_mapping["state"] = "MI" + id_mapping["entity_type"] = "Organization" + + id_mapping = id_mapping[self.id_mapping_column_order].copy() + + return id_mapping + + def create_transactions_id_mapping( + self, + org_com: pd.DataFrame, + ind_com: pd.DataFrame, + com_vend: pd.DataFrame, + ) -> pd.DataFrame: + """Creates the ID mapping dataframe for transactions + + Inputs: + org_com: dataframe with organizations to organization (committee) + transactions + + ind_com: dataframe with individual to organization (committee) + transactions + + com_vend: dataframe with committee (organization) to vendor + (organization) transactions + + Returns: id_mapping: dataframe in the ID mapping format + """ + org_com = org_com[["year", "transaction_id"]].copy() + ind_com = ind_com[["year", "transaction_id"]].copy() + com_vend = com_vend[["year", "transaction_id"]].copy() + + org_com = org_com.rename(columns={"transaction_id": "database_id"}) + ind_com = ind_com.rename(columns={"transaction_id": "database_id"}) + com_vend = com_vend.rename(columns={"transaction_id": "database_id"}) + + id_mapping = pd.concat([org_com, ind_com, com_vend], ignore_index=True) + id_mapping["provided_id"] = np.nan + id_mapping["state"] = "MI" + id_mapping["entity_type"] = "Transaction" + + id_mapping = id_mapping[self.id_mapping_column_order].copy() + + return id_mapping + + # NOTE: universal helper functions for creating the tables are below + + def filter_dataframe( + self, merged_campaign_dataframe: pd.DataFrame, column_name: str + ): + """Filters the inputted dataframe based on the column name + + Inputs: + merged_campaign_dataframe: + column_name: name of a column to filter the dataset on + + Returns: + filtered_df: dataframe filtered based on the given column name + + """ + filtered_df = merged_campaign_dataframe[ + merged_campaign_dataframe[column_name].notnull() + ] + # returns all the columns and rows associated with the column name + # that are not null + + return filtered_df + + # NOTE: Helper functions for creating the individuals table are below + + def standardize_and_concatenate_individuals( + self, individuals: pd.DataFrame, candidates: pd.DataFrame + ): + """Standardizes and concatenates the individual donors and candidates + into one dataframe + + Inputs: + individuals: DataFrame with individuals data + candidates: DataFrame with candidates data + + Returns: individuals_df: standardized DataFrame with individual data + following the database schema + + """ + individuals["entity_type"] = "Individual" + candidates["entity_type"] = "Candidate" + + # rename the columns so they can be concatenated + individuals = individuals.rename(columns={"full_name_uuid": "id"}) + candidates = candidates.rename( + columns={ + "candidate_full_name_uuid": "id", + "can_first_name": "first_name", + "can_last_name": "last_name", + "candidate_full_name": "full_name", + } + ) + individuals_df = pd.concat( + [ + individuals, + candidates, + ], + ignore_index=True, + sort=False, + ) + individuals_df["party"] = np.nan + individuals_df["state"] = individuals_df["state"].fillna("MI") + + return individuals_df + + def create_filtered_individuals_tables( + self, standardized_dataframe_lst: list[pd.DataFrame] + ) -> list[pd.DataFrame, pd.DataFrame]: + """Filters the list of dataframes to create the individuals dataframe + + Inputs: + standardized_dataframe_lst: list containing one of dataframes + containing standardized Michigan contribution and expenditure data + + Returns: + individuals_df: individuals dataframe as defined in the + database schema + id_mapping: id mapping for the individuals table + """ + merged_dataframe = standardized_dataframe_lst[0] + + individuals_df = self.filter_dataframe(merged_dataframe, "first_name") + candidates_df = self.filter_dataframe( + merged_dataframe, "candidate_full_name_uuid" + ) + id_mapping = self.create_individuals_id_mapping(individuals_df, candidates_df) + individuals_df = individuals_df[ + [ + "full_name_uuid", + "first_name", + "last_name", + "full_name", + "state", + "company", + ] + ].copy() + + candidates_df = candidates_df[ + [ + "candidate_full_name_uuid", + "can_first_name", + "can_last_name", + "candidate_full_name", + ] + ] + + individuals_df = self.standardize_and_concatenate_individuals( + individuals_df, candidates_df + ) + individuals_df = individuals_df[ + [ + "id", + "first_name", + "last_name", + "full_name", + "entity_type", + "state", + "party", + "company", + ] + ] + + return [individuals_df, id_mapping] + + # NOTE: Helper functions for creating the organizations table are below + + def standardize_and_concatenate_organizations( + self, + corporations: pd.DataFrame, + committees: pd.DataFrame, + vendors: pd.DataFrame, + ) -> pd.DataFrame: + """Standardizes and concatenates the corporations, committees, and + vendors into one dataframe + + Inputs: + corporations: dataframe with corporations data + committees: dataFrame with campaign committee data + vendors: dataFrame with vendors data + + Returns: + organizations: standardized DataFrame with organizations data + following the database schema + """ + corporations["entity_type"] = "corporation" + committees["entity_type"] = "committee" + vendors["entity_type"] = "vendor" + + corporations = corporations.rename( + columns={"full_name_uuid": "id", "full_name": "name"} + ) + committees = committees.rename( + columns={"com_legal_name_uuid": "id", "com_legal_name": "name"} + ) + vendors = vendors.rename(columns={"vend_name_uuid": "id", "vend_name": "name"}) + + organizations = pd.concat( + [corporations, committees, vendors], + ignore_index=True, + sort=False, + ) + organizations["state"] = "MI" + + return organizations + + def create_filtered_organizations_tables( + self, standardized_dataframe_lst: list[pd.DataFrame] + ) -> list[pd.DataFrame, pd.DataFrame]: + """Filters the list of dataframes to create the organizations dataframe + + Inputs: + standardized_dataframe_lst: list containing one of dataframes + containing standardized Michigan contribution and expenditure data + + Returns: + organizations_df: organizations dataframe as defined in the + database schema + id_mapping: id mapping for the organizations table + + """ + merged_dataframe = standardized_dataframe_lst[0] + + # contributing corporations have a first name that is null + corporations_df = merged_dataframe[merged_dataframe["first_name"].isna()] + committees_df = self.filter_dataframe(merged_dataframe, "com_legal_name_uuid") + vendors_df = self.filter_dataframe(merged_dataframe, "vend_name_uuid") + + id_mapping = self.create_organizations_id_mapping( + corporations_df, committees_df, vendors_df + ) + + corporations_df = corporations_df[["full_name_uuid", "full_name"]] + + committees_df = committees_df[["com_legal_name_uuid", "com_legal_name"]] + + vendors_df = vendors_df[["vend_name_uuid", "vend_name"]] + + organizations_df = self.standardize_and_concatenate_organizations( + corporations_df, committees_df, vendors_df + ) + + organizations_df = organizations_df[["id", "name", "state", "entity_type"]] + + return [organizations_df, id_mapping] + + # NOTE: Helper functions for generating the transactions table are below + + def standardize_and_concatenate_transactions( + self, + org_com: pd.DataFrame, + ind_com: pd.DataFrame, + com_vend: pd.DataFrame, + ) -> pd.DataFrame: + """Standardizes and concatenates the corporations, committees, and + vendors into one dataframe + + Inputs: + org_com: dataframe with organizations to committee transactions + ind_com: dataframe with individual to committee transactions + com_vend: dataframe with committee to vendor transactions + + Returns: + transactions: standardized DataFrame with transactions data + following the database schema + + """ + org_com = org_com.rename( + columns={ + "full_name_uuid": "donor_id", + "com_legal_name_uuid": "recipient_id", + } + ) + ind_com = ind_com.rename( + columns={ + "full_name_uuid": "donor_id", + "com_legal_name_uuid": "recipient_id", + } + ) + com_vend = com_vend.rename( + columns={ + "com_legal_name_uuid": "donor_id", + "vend_name_uuid": "recipient_id", + } + ) + transactions = pd.concat( + [org_com, ind_com, com_vend], axis=0, ignore_index=True, sort=False + ) + + transactions["office_sought"] = np.nan + + return transactions + + def create_filtered_transactions_tables( + self, standardized_dataframe_lst: list[pd.DataFrame] + ) -> list[pd.DataFrame, pd.DataFrame]: + """Creates the Transactions tables from the dataframe list outputted + from standardize + + Inputs: + standardized_dataframe_lst: list containing one of dataframes + containing standardized Michigan contribution and expenditure data + + Returns: + transactions_table: table as defined in database schema + id_mapping: id mapping for the transactions table + """ + merged_dataframe = standardized_dataframe_lst[0] + + # contributing corporations have a first name that is null + + # organization -> committee transaction + org_to_com = merged_dataframe[merged_dataframe["first_name"].isna()] + # individual -> committee transaction + ind_to_com = self.filter_dataframe(merged_dataframe, "first_name") + # committee -> vendor transaction + com_to_vend = self.filter_dataframe(merged_dataframe, "vend_name_uuid") + + id_mapping = self.create_transactions_id_mapping( + org_to_com, ind_to_com, com_to_vend + ) + org_to_com = org_to_com[ + [ + "transaction_id", + "full_name_uuid", + "year", + "amount", + "com_legal_name_uuid", + "purpose", + "transaction_type", + ] + ] + + ind_to_com = ind_to_com[ + [ + "transaction_id", + "full_name_uuid", + "year", + "amount", + "com_legal_name_uuid", + "purpose", + "transaction_type", + ] + ] + + com_to_vend = com_to_vend[ + [ + "transaction_id", + "com_legal_name_uuid", + "year", + "amount", + "vend_name_uuid", + "purpose", + "transaction_type", + ] + ] + + transactions_df = self.standardize_and_concatenate_transactions( + org_to_com, ind_to_com, com_to_vend + ) + + transactions_df = transactions_df[ + [ + "donor_id", + "year", + "amount", + "recipient_id", + "office_sought", + "purpose", + "transaction_type", + ] + ] + + return [transactions_df, id_mapping] + + # NOTE: the helper functions below are used directly in create tables + # calling the functions above + + def create_individuals_table( + self, standardized_dataframe_lst: list[pd.DataFrame] + ) -> list[pd.DataFrame, pd.DataFrame]: + """ + Creates the Individuals tables from the dataframe list outputted + from standardize + + Inputs: + standardized_dataframe_lst: a list of 1 or 3 dataframes as + outputted from standardize method. + + Returns: + individuals_table: table as defined in database schema + id_mapping: id mapping for the individuals table + """ + individuals_table, id_mapping = self.create_filtered_individuals_tables( + standardized_dataframe_lst + ) + + return [individuals_table, id_mapping] + + def create_organizations_table( + self, standardized_dataframe_lst: list[pd.DataFrame] + ) -> list[pd.DataFrame, pd.DataFrame]: + """ + Creates the Organizations tables from the dataframe list outputted + from standardize + + Inputs: + standardized_dataframe_lst: a list of 1 or 3 dataframes as + outputted from standardize method. + + Returns: + organizations_table: table as defined in database schema + id_mapping: id mapping for the organizations table + """ + organizations_table, id_mapping = self.create_filtered_organizations_tables( + standardized_dataframe_lst + ) + + return [organizations_table, id_mapping] + + def create_transactions_table( + self, standardized_dataframe_lst: list[pd.DataFrame] + ) -> list[pd.DataFrame, pd.DataFrame]: + """ + Creates the Transactions tables from the dataframe list outputted + from standardize + + Inputs: + standardized_dataframe_lst: a list of 1 or 3 dataframes as + outputted from standardize method. + + Returns: + transactions_table: table as defined in database schema + id_mapping: id mapping for the transactions table + """ + transactions_table, id_mapping = self.create_filtered_transactions_tables( + standardized_dataframe_lst + ) + return [transactions_table, id_mapping] + + def clean_state(self) -> (pd.DataFrame, pd.DataFrame, pd.DataFrame): + """ + Runs the StateCleaner pipeline returning a tuple of cleaned dataframes + + Returns: use preprocess, clean, standardize, and create_tables methods + to output (individuals_table, organizations_table, transactions_table) + as defined in database schema + """ + filepaths_lst = self.create_filepaths_list() + preprocessed_dataframe_lst = self.preprocess(filepaths_lst) + cleaned_dataframe_lst = self.clean(preprocessed_dataframe_lst) + standardized_dataframe_lst = self.standardize(cleaned_dataframe_lst) + tables = self.create_tables(standardized_dataframe_lst) + + return tables diff --git a/utils/minnesota.py b/utils/minnesota.py new file mode 100644 index 0000000..94edd75 --- /dev/null +++ b/utils/minnesota.py @@ -0,0 +1,432 @@ +import uuid +from pathlib import Path + +import numpy as np +import pandas as pd +from utils.clean import StateCleaner +from utils.constants import ( + MN_CANDIDATE_CONTRIBUTION_COL, + MN_CANDIDATE_CONTRIBUTION_MAP, + MN_INDEPENDENT_EXPENDITURE_COL, + MN_INDEPENDENT_EXPENDITURE_MAP, + MN_NONCANDIDATE_CONTRIBUTION_COL, + MN_NONCANDIDATE_CONTRIBUTION_MAP, + MN_RACE_MAP, +) + +here = Path(__file__).resolve() +repo_root = here.parent.parent + +FILEPATHS_LST = [ + repo_root / "data" / "AG.csv", + repo_root / "data" / "AP.csv", + repo_root / "data" / "DC.csv", + repo_root / "data" / "GC.csv", + repo_root / "data" / "House.csv", + repo_root / "data" / "SA.csv", + repo_root / "data" / "SC.csv", + repo_root / "data" / "Senate.csv", + repo_root / "data" / "SS.csv", + repo_root / "data" / "ST.csv", + repo_root / "data" / "non_candidate_con.csv", + repo_root / "data" / "independent_exp.csv", +] + + +class MinnesotaCleaner(StateCleaner): + def entity_name_dictionary(self) -> dict: + self.entity_name_dictionary = { + "I": "Individual", + "L": "Lobbyist", + "B": "Company", + "PCF": "Committee", + "C": "Committee", + "F": "Committee", + "H": "Committee", + "S": "Committee", + "U": "Committee", + "PTU": "Party", + "P": "Party", + "O": "Other", + } + + return self.entity_name_dictionary + + def preprocess_candidate_contribution(self, df: pd.DataFrame) -> pd.DataFrame: + """ + Helper function to preprocess MN candidate-recipient contribution df + + Args: + df (DataFrame): a single raw MN candidate contribution df + Returns: + DataFrame: Preprocessed contribution df of candidate recipients + """ + + df1 = df.copy(deep=True) + df1 = df1[MN_CANDIDATE_CONTRIBUTION_COL] + df1 = df1.rename(columns=MN_CANDIDATE_CONTRIBUTION_MAP) + df1["recipient_type"] = "I" + df1["recipient_full_name"] = None + df1["donor_type"] = df1["donor_type"].str.upper() + df1["donor_first_name"] = None + df1["donor_last_name"] = None + df1["donor_id"] = None + df1["state"] = "MN" + df1["amount"] = pd.to_numeric(df1["amount"], errors="coerce") + df1["inkind_amount"] = pd.to_numeric(df1["inkind_amount"], errors="coerce") + df1["transaction_type"] = np.where( + (df1["inkind_amount"].notna()) & (df1["inkind_amount"] != 0), + "in-kind", + None, + ) + + return df1 + + def preprocess_noncandidate_contribution(self, df: pd.DataFrame) -> pd.DataFrame: + """ + Helper function to preprocess MN noncandidate-recipient contribution df + + Args: + df (DataFrame): a single raw MN non-candidate contribution df + Returns: + DataFrame: Preprocessed contribution df of noncandidate recipients + """ + + df1 = df.copy(deep=True) + df1 = df1[MN_NONCANDIDATE_CONTRIBUTION_COL] + df1 = df1.rename(columns=MN_NONCANDIDATE_CONTRIBUTION_MAP) + df1["recipient_first_name"] = None + df1["recipient_last_name"] = None + df1["donor_type"] = df1["donor_type"].str.upper() + df1["donor_first_name"] = None + df1["donor_last_name"] = None + df1["state"] = "MN" + df1["office_sought"] = None + df1["amount"] = pd.to_numeric(df1["amount"], errors="coerce") + df1["donor_id"] = df1["donor_id"].fillna(0).astype("int64") + df1["inkind_amount"] = pd.to_numeric(df1["inkind_amount"], errors="coerce") + df1["transaction_type"] = np.where( + (df1["inkind_amount"].notna()) & (df1["inkind_amount"] != 0), + "in-kind", + None, + ) + + return df1 + + def preprocess_expenditure(self, df: pd.DataFrame) -> pd.DataFrame: + """ + Helper function to preprocess MN expenditure dataset + + Args: + df (DataFrame): MN expenditure DataFrames + Returns: + DataFrame: Preprocessed MN expenditure DataFrames + """ + + df1 = df.copy(deep=True) + df1 = df1[MN_INDEPENDENT_EXPENDITURE_COL] + df1 = df1.rename(columns=MN_INDEPENDENT_EXPENDITURE_MAP) + # Donors and recipients are both organization, only have full name + df1["recipient_first_name"] = None + df1["recipient_last_name"] = None + df1["donor_first_name"] = None + df1["donor_last_name"] = None + df1["recipient_id"] = df1["donor_id"].fillna(0).astype("int64") + # Negate the contribution amount if it's against the recipient + df1.loc[df1["For /Against"] == "Against", "amount"] = -df1["amount"] + df1["inkind_amount"] = None + df1 = df1.drop(columns=["For /Against"]) + df1["recipient_type"] = "PCF" # recipient: affected political committee + df1["office_sought"] = None + + return df1 + + def preprocess(self, filepaths_list: list[str]) -> list[pd.DataFrame]: + """ + Preprocesses MN data and returns a list of a single dataframe combining + both contributions and expenditures + + Inputs: + filepaths_list (list): list of absolute filepaths of contribution + and expenditure csv files + + Order: 10 candidate-recipient contribution files (AG, AP, DC, GC, House, + SA, SC, Senate, SS, ST), 1 noncandidate-recipient contribution file, + and 1 expenditure file + + Naming Convention: root + data folder (+ candidate folder) + file name + + Returns: + A list containing 1 preprocessed DataFrame + """ + + candidate_dfs = [] + processed_dfs = [] + # candidate-recipient contribution data + for filepath1 in filepaths_list[:-2]: + candidate = pd.read_csv(filepath1) + candidate_dfs.append(candidate) + for candidate_df in candidate_dfs: + processed_cand_con = self.preprocess_candidate_contribution(candidate_df) + processed_dfs.append(processed_cand_con) + # noncandidate-recipient contribution data + noncandidate_df = pd.read_csv(filepaths_list[-2]) + processed_noncandidate_con = self.preprocess_noncandidate_contribution( + noncandidate_df + ) + processed_dfs.append(processed_noncandidate_con) + # expenditure data + expenditure_df = pd.read_csv(filepaths_list[-1]) + processed_expenditure_df = self.preprocess_expenditure(expenditure_df) + processed_dfs.append(processed_expenditure_df) + combined_df = pd.concat( + processed_dfs, + ignore_index=True, + ) + + return [combined_df] + + def clean(self, data: list[pd.DataFrame]) -> list[pd.DataFrame]: + """ + Cleans MN data by converting dtyes to match database schema and dropping + non-classfiable rows (not represent minimal viable transactions) + + Inputs: + data: a list of 1 DataFrame outputted from the preprocess method + + Returns: a list of 1 cleaned MN DataFrame + """ + + df = data[0] + df["date"] = pd.to_datetime(df["date"]) + df["year"] = df["date"].dt.year + df = df.drop(columns=["date"]) + type_mapping = { + "state": "str", + "recipient_id": "str", + "donor_id": "str", + "recipient_first_name": "str", + "recipient_last_name": "str", + "recipient_full_name": "str", + "donor_first_name": "str", + "donor_last_name": "str", + "donor_full_name": "str", + "recipient_type": "str", + "donor_type": "str", + "purpose": "str", + "transaction_type": "str", + "year": "int64", + "amount": "float64", + "inkind_amount": "float64", + } + df = df.astype(type_mapping) + # non-classfiable rows of no transaction amount, no donor/recipient info + df = df[df["amount"] != 0] + df = df.dropna(subset=["recipient_id", "donor_id"], how="any") + df = df.drop(columns=["inkind_amount"]) + df = df.reset_index(drop=True) + + return [df] + + def standardize(self, data: list[pd.DataFrame]) -> list[pd.DataFrame]: + """ + Standardizes the dataframe into the necessary format for the schema + + Maps entity/office types and column names as defined in schema, adjust + and add UUIDs as necessary, store base id and provided id in MNIDMap.csv + + Inputs: + df: A list of 1 preprocessed and cleaned Dataframe outputted from + the clean method + + Returns: A list of 1 standarized DataFrame matching database schema + """ + + + df = data[0].copy() # Create a copy to avoid modifying the original DataFrame + df["company"] = None # MN dataset has no company information + df["party"] = None # MN dataset has no party information + df["transaction_id"] = None + df["office_sought"] = df["office_sought"].replace(MN_RACE_MAP) + + # Standardize entity names to match other states in the database schema + entity_map = self.entity_name_dictionary() + df["recipient_type"] = df["recipient_type"].map(entity_map) + df["donor_type"] = df["donor_type"].map(entity_map) + id_mapping = {} + + # Standardize entity names to match othe states in database schema + df["recipient_type"] = df["recipient_type"].replace(self.entity_name_dictionary) + df["donor_type"] = df["donor_type"].replace(self.entity_name_dictionary) + id_mapping = {} + for index, row in df.iterrows(): + recipient_uuid = str(uuid.uuid4()) + donor_uuid = str(uuid.uuid4()) + transaction_uuid = str(uuid.uuid4()) + + # MN has partial recipient id, generate uuid, map them to original id + if row["recipient_id"]: + if ( + row["recipient_type"] == "Individual" + or row["recipient_type"] == "Lobbyist" + ): + entity_type = "Individual" + else: + entity_type = "Organization" + id_mapping[row["recipient_id"]] = ( + row["state"], + row["year"], + entity_type, + row["recipient_id"], + recipient_uuid, + ) + + df.at[index, "recipient_id"] = recipient_uuid + + # MN has partial donor id, generate uuid, map them to original id + if row["donor_id"]: + if row["donor_type"] == "Individual" or row["donor_type"] == "Lobbyist": + entity_type = "Individual" + else: + entity_type = "Organization" + id_mapping[row["donor_id"]] = ( + row["state"], + row["year"], + entity_type, + row["donor_id"], + donor_uuid, + ) + + df.at[index, "donor_id"] = donor_uuid + + df.at[index, "transaction_id"] = transaction_uuid + + + # Convert id_mapping to DataFrame and save to CSV + id_mapping_df = pd.DataFrame.from_dict( + id_mapping, + orient="index", + columns=["state", "year", "entity_type", "provided_id", "database_id"], + ) + id_mapping_df.to_csv("MNIDMap.csv", index=False) + + return [df] + + def create_tables( + self, data: list[pd.DataFrame] + ) -> (pd.DataFrame, pd.DataFrame, pd.DataFrame): + """ + Creates the Individuals, Organizations, and Transactions tables from + the dataframe list outputted from the standardize function + + Inputs: + data: a list of 1 dataframe outputted from the standardize method + + Returns: (individuals_table, organizations_table, transactions_table) + tuple containing the tables as defined in database schema + """ + + df = data[0] + # Create individual table from both recipient and donor entries + ind_recipient_df = pd.DataFrame( + data=df[ + df["recipient_type"].isin(["Individual", "Lobbyist"]) + ].drop_duplicates(subset="recipient_id"), + columns=[ + "recipient_id", + "recipient_first_name", + "recipient_last_name", + "recipient_full_name", + "recipient_type", + "state", + "party", + "company", + ], + ) + ind_recipient_df = ind_recipient_df.rename( + columns={ + "recipient_id": "id", + "recipient_first_name": "first_name", + "recipient_last_name": "last_name", + "recipient_full_name": "full_name", + "recipient_type": "entity_type", + } + ) + ind_donor_df = pd.DataFrame( + data=df[df["donor_type"].isin(["Individual", "Lobbyist"])].drop_duplicates( + subset="donor_id" + ), + columns=[ + "donor_id", + "donor_first_name", + "donor_last_name", + "donor_full_name", + "donor_type", + "state", + "party", + "company", + ], + ) + ind_donor_df = ind_donor_df.rename( + columns={ + "donor_id": "id", + "donor_first_name": "first_name", + "donor_last_name": "last_name", + "donor_full_name": "full_name", + "donor_type": "entity_type", + } + ) + ind_df = pd.concat([ind_recipient_df, ind_donor_df], ignore_index=True) + + # Create organization table from both recipient and donor entries + org_recipient_df = pd.DataFrame( + data=df[ + ~df["recipient_type"].isin(["Individual", "Lobbyist"]) + ].drop_duplicates(subset="recipient_id"), + columns=["recipient_id", "recipient_full_name", "state", "recipient_type"], + ) + org_recipient_df = org_recipient_df.rename( + columns={ + "recipient_id": "id", + "recipient_full_name": "name", + "recipient_type": "entity_type", + } + ) + org_donor_df = pd.DataFrame( + data=df[~df["donor_type"].isin(["Individual", "Lobbyist"])].drop_duplicates( + subset="donor_id" + ), + columns=["donor_id", "donor_full_name", "state", "donor_type"], + ) + org_donor_df = org_donor_df.rename( + columns={ + "donor_id": "id", + "donor_full_name": "name", + "donor_type": "entity_type", + } + ) + org_df = pd.concat([org_recipient_df, org_donor_df], ignore_index=True) + + tran_df = pd.DataFrame( + data=df.drop_duplicates(subset="transaction_id"), + columns=[ + "transaction_id", + "donor_id", + "year", + "amount", + "recipient_id", + "office_sought", + "purpose", + "transaction_type", + ], + ) + + return ind_df, org_df, tran_df + + def clean_state(self) -> (pd.DataFrame, pd.DataFrame, pd.DataFrame): + preprocessed_df = self.preprocess(FILEPATHS_LST) + cleaned_df = self.clean(preprocessed_df) + standardized_df = self.standardize(cleaned_df) + ind_df, org_df, tran_df = self.create_tables(standardized_df) + + return ind_df, org_df, tran_df diff --git a/utils/pennsylvania.py b/utils/pennsylvania.py new file mode 100644 index 0000000..9d4280c --- /dev/null +++ b/utils/pennsylvania.py @@ -0,0 +1,568 @@ +import uuid +from pathlib import Path + +import numpy as np +import pandas as pd + +from utils import clean +from utils import constants as const + + +class PennsylvaniaCleaner(clean.StateCleaner): + + Base_Filepath = Path(__file__).resolve().parent.parent + # resolve to the path of the current python file + # repo_root will be the absolute path to the root of the repository, + # no matter where the repository is. + # For pathlib Path objects, `.parent` gets the parent path_splitectory and + # `/` can be used like `/` in unix style paths. + filepaths_list = Base_Filepath / "data" / "raw" / "PA" + + # helper functions: + def assign_col_names(self, filepath: str, year: int) -> list: + """Assigns the right column names to the right datasets. + + Args: + filepath: the path in which the data is stored/located. + + year: to make parsing through the data more manageable, the year + from which the data originates is also taken. + + Returns: + a list of the appropriate column names for the dataset + """ + dir = filepath.split("/") + file_type = dir[len(dir) - 1] + + if "contrib" in file_type: + if year < 2022: + return const.PA_CONT_COLS_NAMES_PRE2022 + else: + return const.PA_CONT_COLS_NAMES_POST2022 + elif "filer" in file_type: + if year < 2022: + return const.PA_FILER_COLS_NAMES_PRE2022 + else: + return const.PA_FILER_COLS_NAMES_POST2022 + elif "expense" in file_type: + if year < 2022: + return const.PA_EXPENSE_COLS_NAMES_PRE2022 + else: + return const.PA_EXPENSE_COLS_NAMES_POST2022 + + def replace_id_with_uuid( + self, df: pd.DataFrame, col1: str, col2: str + ) -> tuple[dict, pd.DataFrame]: + """Creates a dictionary whose keys are generated UUIDs that map to values + corresponding to unique IDs from the donor and recipient IDs columns in df + + Args: + A pandas dataframe with at least two columns (col1, col1) + col1, col2: columns of df that should have IDs + Returns: + A tuple whose first value is the modified df, where the IDs have been + replaced with the UUIDS, and a dictionary correspondings to the UUIDs as + keys and the original IDs from col1 and col2 as the values + """ + # a set is used because there could be IDs in the donor column that also + # appear in the recipient column due to concatenation, and I want to keep + # the IDs unique throughout + ids_1 = set(df[col1]) + ids_2 = set(df[col2]) + unique_ids = list(ids_1.union(ids_2)) + + with_uuid = [] + for id in unique_ids: + with_uuid.append([id, str(uuid.uuid4())]) + + mapped_dict = {lst[0]: (lst[1]) for lst in with_uuid} + df[col1] = df[col1].map(mapped_dict) + df[col2] = df[col2].map(mapped_dict) + mapped_dict = {value: key for key, value in mapped_dict.items()} + return mapped_dict, df + + def classify_contributor(self, donor: str) -> str: + """Takes a string input and compares it against a list of identifiers + most commonly associated with organizations/corporations/PACs, and + classifies the string input as belong to an individual or organization + + Args: + contributor: a string + Returns: + string "ORGANIZATION" or "INDIVIDUAL" depending on the + classification of the parameter + """ + split = donor.split() + loc = 0 + while loc < len(split): + if split[loc].upper() in const.PA_ORGANIZATION_IDENTIFIERS: + return "Organization" + loc += 1 + return "Individual" + + def pre_process_contributor_dataset(self, df: pd.DataFrame): + """pre-processes a contributor dataset by sifting through the columns and + keeping the relevant columns for EDA and AbstractStateCleaner purposes + + Args: + df: the contributor dataset + + Returns: + a pandas dataframe whose columns are appropriately formatted. + """ + df["AMOUNT"] = df["CONT_AMT_1"] + df["CONT_AMT_2"] + df["CONT_AMT_3"] + df["RECIPIENT_ID"] = df["RECIPIENT_ID"].astype("str") + df["DONOR"] = df["DONOR"].astype("str") + df["DONOR"] = df["DONOR"].str.title() + df["DONOR_TYPE"] = df["DONOR"].apply(self.classify_contributor) + df = df.drop( + columns={ + "ADDRESS_1", + "ADDRESS_2", + "CITY", + "STATE", + "ZIPCODE", + "OCCUPATION", + "E_NAME", + "E_ADDRESS_1", + "E_ADDRESS_2", + "E_CITY", + "E_STATE", + "E_ZIPCODE", + "SECTION", + "CYCLE", + "CONT_DATE_1", + "CONT_AMT_1", + "CONT_DATE_2", + "CONT_AMT_2", + "CONT_DATE_3", + "CONT_AMT_3", + } + ) + + if "TIMESTAMP" in df.columns: + df = df.drop(columns={"TIMESTAMP", "REPORTER_ID"}) + df["DONOR"] = df["DONOR"].apply(lambda x: str(x).title()) + + return df + + def pre_process_filer_dataset(self, df: pd.DataFrame): + """pre-processes a filer dataset by sifting through the columns and + keeping the relevant columns for EDA and AbstractStateCleaner purposes + + Args: + df: the filer dataset + + Returns: + a pandas dataframe whose columns are appropriately formatted. + """ + df["RECIPIENT_ID"] = df["RECIPIENT_ID"].astype("str") + df = df.drop( + columns={ + "YEAR", + "CYCLE", + "AMEND", + "TERMINATE", + "DISTRICT", + "ADDRESS_1", + "ADDRESS_2", + "CITY", + "STATE", + "ZIPCODE", + "COUNTY", + "PHONE", + "BEGINNING", + "MONETARY", + "INKIND", + } + ) + if "TIMESTAMP" in df.columns: + df = df.drop(columns={"TIMESTAMP", "REPORTER_ID"}) + + df = df.drop_duplicates(subset=["RECIPIENT_ID"]) + df["RECIPIENT_TYPE"] = df.RECIPIENT_TYPE.map(const.PA_FILER_ABBREV_DICT) + df["RECIPIENT"] = df["RECIPIENT"].apply(lambda x: str(x).title()) + return df + + def pre_process_expense_dataset(self, df: pd.DataFrame): + """pre-processes an expenditure dataset by sifting through the columns and + keeping the relevant columns for EDA and AbstractStateCleaner purposes + + Args: + df: the expenditure dataset + + Returns: + a pandas dataframe whose columns are appropriately formatted. + """ + df["DONOR_ID"] = df["DONOR_ID"].astype("str") + df = df.drop( + columns={ + "EXPENSE_CYCLE", + "EXPENSE_ADDRESS_1", + "EXPENSE_ADDRESS_2", + "EXPENSE_CITY", + "EXPENSE_STATE", + "EXPENSE_ZIPCODE", + "EXPENSE_DATE", + } + ) + if "EXPENSE_REPORTER_ID" in df.columns: + df = df.drop(columns={"EXPENSE_TIMESTAMP", "EXPENSE_REPORTER_ID"}) + df["PURPOSE"] = df["PURPOSE"].apply(lambda x: str(x).title()) + df["RECIPIENT"] = df["RECIPIENT"].apply(lambda x: str(x).title()) + return df + + def initialize_PA_dataset(self, data_filepath: str, year: int) -> pd.DataFrame: + """initializes the PA data appropriately based on whether the data contains + filer, contributor, or expense information + + Args: + data_filepath: the path in which the data is stored/located. + + year: the year from which the data originates + + Returns: + a pandas dataframe whose columns are appropriately formatted, and + any dirty rows with inconsistent columns names dropped. + """ + + df = pd.read_csv( + data_filepath, + names=self.assign_col_names(data_filepath, year), + sep=",", + encoding="latin-1", + on_bad_lines="warn", + ) + + df["YEAR"] = year + dir = data_filepath.split("/") + file_type = dir[len(dir) - 1] + + if "contrib" in file_type: + return self.pre_process_contributor_dataset(df) + + elif "filer" in file_type: + return self.pre_process_filer_dataset(df) + + elif "expense" in file_type: + return self.pre_process_expense_dataset(df) + + else: + raise ValueError( + "This function is currently formatted for filer, \ + expense, and contributor datasets. Make sure your data \ + is from these sources." + ) + + def merge_contrib_filer_datasets( + self, cont_file: pd.DataFrame, filer_file: pd.DataFrame + ) -> pd.DataFrame: + """merges the contributor and filer datasets from the same year using the + unique filerID + Args: + cont_file: The contributor dataset + + filer_file: the filer dataset from the same year as the contributor + file. + Returns + The merged pandas dataframe + """ + merged_df = pd.merge(cont_file, filer_file, how="left", on="RECIPIENT_ID") + return merged_df + + def merge_expend_filer_datasets( + self, expend_file: pd.DataFrame, filer_file: pd.DataFrame + ) -> pd.DataFrame: + """merges the expenditure and filer datasets from the same year using the + unique filerID + Args: + expend_file: The expenditure dataset + + filer_file: the filer dataset from the same year as the expenditure file + Returns + The merged pandas dataframe + """ + merged_df = pd.merge( + expend_file, filer_file, left_on="DONOR_ID", right_on="RECIPIENT_ID" + ).drop("RECIPIENT_ID", axis=1) + return merged_df + + def format_contrib_data_for_concat(self, df: pd.DataFrame) -> pd.DataFrame: + """Reformartes the merged contributor-filer dataset such that it has the + same columns as the merged expenditure-filer dataset so that concatenation + can occur + + Args: + The merged contributor-filer dataset + + Returns: + A new dataframe with the appropriate column formatting for concatenation + """ + df["DONOR_ID"] = np.nan + df["DONOR_PARTY"] = np.nan + df["DONOR_OFFICE"] = np.nan + columns = df.columns.to_list() + columns.sort() + df = df.loc[:, columns] + return df + + def format_expend_data_for_concat(self, df: pd.DataFrame) -> pd.DataFrame: + """Reformartes the merged expenditure-filer dataset such that it has the + same columns as the merged contributor-filer dataset so that concatenation + can occur + + Args: + The merged expenditure-filer dataset + + Returns: + A new dataframe with the appropriate column formatting for concatenation + """ + df["RECIPIENT_ID"] = np.nan + df = df.rename( + columns={ + "RECIPIENT_x": "RECIPIENT", + "RECIPIENT_y": "DONOR", + "RECIPIENT_TYPE": "DONOR_TYPE", + "RECIPIENT_PARTY": "DONOR_PARTY", + "RECIPIENT_OFFICE": "DONOR_OFFICE", + } + ) + df["RECIPIENT_TYPE"] = np.nan + df["RECIPIENT_OFFICE"] = np.nan + df["RECIPIENT_PARTY"] = np.nan + columns = df.columns.to_list() + columns.sort() + df = df.loc[:, columns] + return df + + def combine_contributor_expenditure_datasets( + self, + contrib_ds: list[pd.DataFrame], + filer_ds: list[pd.DataFrame], + expend_ds: list[pd.DataFrame], + ) -> pd.DataFrame: + """This function takes datasets with information from the contributor, + filer, and expenditure datasets in each given year, merges the + contributor and expenditure datasets with pertinent information from the + filer dataset,and concatenates the 3 datasets into 1 dataset with. + + Args: + 3 datasets: contributor, filer, and expenditure datasets. Each of + the datasets is a list of dataframes, with each entry in the + dataframes being a given file from a select year + + Returns: + A concatenated dataframe with transaction information, contributor + information, and recipient information. + """ + merged_cont_datasets_per_yr = [] + merged_exp_dataset_per_yr = [] + + for i in range(len(contrib_ds)): + cont_merged = self.merge_contrib_filer_datasets(contrib_ds[i], filer_ds[i]) + expend_merged = self.merge_expend_filer_datasets(expend_ds[i], filer_ds[i]) + merged_cont_datasets_per_yr.append(cont_merged) + merged_exp_dataset_per_yr.append(expend_merged) + + merged_cont_datasets_per_yr = pd.concat(merged_cont_datasets_per_yr) + merged_exp_dataset_per_yr = pd.concat(merged_exp_dataset_per_yr) + contrib_filer_info = self.format_contrib_data_for_concat( + merged_cont_datasets_per_yr + ) + expend_filer_info = self.format_expend_data_for_concat( + merged_exp_dataset_per_yr + ) + return pd.concat([contrib_filer_info, expend_filer_info]) + + def output_ID_mapping(self, dictionary: dict, df: pd.DataFrame): + pass + + def make_individuals_table(df: pd.Dataframe) -> pd.DataFrame: + """This function isolates donors and recipients who are classified as + individuals and returns a dataframe with strictly individual information + pertinent to the StateCleaner schema. + + Args: + df: a pandas dataframe with donor and recipient information + Returns: + a pandas dataframe strictly with information regarding individuals + from the inputted dataframe + """ + donor_individuals = df.loc[ + ( + (df.DONOR_TYPE == "Individuals") + | (df.DONOR_TYPE == "Candidate") + | (df.DONOR_TYPE == "Lobbyist") + ) + ] + donor_individuals = donor_individuals[ + ["DONOR", "DONOR_ID", "DONOR_PARTY", "DONOR_TYPE"] + ] + donor_individuals = donor_individuals.rename( + columns={ + "DONOR": "full_name", + "DONOR_ID": "id", + "DONOR_PARTY": "party", + "DONOR_TYPE": "entity_type", + } + ) + + recipient_individuals = df.loc[ + ( + (df.RECIPIENT_TYPE == "Individuals") + | (df.RECIPIENT_TYPE == "Candidate") + | (df.RECIPIENT_TYPE == "Lobbyist") + ) + ] + recipient_individuals = recipient_individuals[ + ["RECIPIENT", "RECIPIENT_ID", "RECIPIENT_PARTY", "RECIPIENT_TYPE"] + ] + recipient_individuals = recipient_individuals.rename( + columns={ + "RECIPIENT": "full_name", + "RECIPIENT_ID": "id", + "RECIPIENT_PARTY": "party", + "RECIPIENT_TYPE": "entity_type", + } + ) + + all_individuals = pd.concat([donor_individuals, recipient_individuals]) + all_individuals = all_individuals.drop_duplicates() + all_individuals["first_name"] = np.nan + all_individuals["last_name"] = np.nan + all_individuals["state"] = "PA" + all_individuals["company"] = np.nan + return all_individuals + + def make_organizations_table(organizations_df: pd.DataFrame) -> pd.DataFrame: + """This function isolates donors and recipients who are classified as + committees or organizations and returns a dataframe with strictly + committee/organization information pertinent to the StateCleaner schema. + + Args: + df: a pandas dataframe with donor and recipient information + Returns: + a pandas dataframe strictly with information regarding committess or + organizations from the inputted dataframe. + """ + donor_organizations = organizations_df.loc[ + ( + (organizations_df.DONOR_TYPE == "Committee") + | (organizations_df.DONOR_TYPE == "Organization") + ) + ] + donor_organizations = donor_organizations[["DONOR_ID", "DONOR", "DONOR_TYPE"]] + donor_organizations = donor_organizations.rename( + columns={"DONOR_ID": "id", "DONOR": "name", "DONOR_TYPE": "entity_type"} + ) + recipient_organizations = organizations_df.loc[ + ( + (organizations_df.RECIPIENT_TYPE == "Committee") + | (organizations_df.RECIPIENT_TYPE == "Organization") + ) + ] + recipient_organizations = recipient_organizations[ + ["RECIPIENT_ID", "RECIPIENT", "RECIPIENT_TYPE"] + ] + recipient_organizations = recipient_organizations.rename( + columns={ + "RECIPIENT_ID": "id", + "RECIPIENT": "name", + "RECIPIENT_TYPE": "entity_type", + } + ) + all_organizations = pd.concat([donor_organizations, recipient_organizations]) + all_organizations = all_organizations.drop_duplicates() + all_organizations["state"] = "PA" + + return all_organizations + + def make_transactions_table(organizations_df: pd.DataFrame) -> pd.DataFrame: + pass + + def preprocess(self, filepaths_list: list[str]) -> list[pd.DataFrame]: + + contributor_datasets, filer_datasets, expense_datasets = [], [], [] + # incase the provided filepath was not in increasing order, this just + # makes sure the filepaths numerically line up + filepaths_list.sort() + for path in filepaths_list: + # because of the different formatting for data in different years + # obstained from the PA website years, I had my webscraper append + # the year to the filename + path_split = path.split("_") + year = path_split[len(path_split) - 1].replace(".txt", "") + year = int(year) + + df = self.initialize_PA_dataset(path, year) + if "contrib" in path: + contributor_datasets.append(df) + elif "filer" in path: + filer_datasets.append(df) + elif "expense" in path: + expense_datasets.append(df) + else: + pass # do nothing + + merged_dataset = self.combine_contributor_expenditure_datasets( + contributor_datasets, filer_datasets, expense_datasets + ) + dictionary, merged_dataset = self.replace_id_with_uuid( + merged_dataset, "DONOR_ID", "RECIPIENT_ID" + ) + self.output_ID_mapping(dictionary, merged_dataset) # return dictionary + + # assign transaction_id + merged_dataset["TRANSACTION_ID"] = str(uuid.uuid4()) + return merged_dataset + + def create_tables( + self, standardized_df: pd.Dataframe + ) -> (pd.DataFrame, pd.DataFrame, pd.DataFrame): + + # separate the standardized_df information into the relevant columns for + # individuals, organizations, and transactions tables + + # Individuals Table: + individuals_df = standardized_df[ + [ + "DONOR", + "DONOR_ID", + "DONOR_PARTY", + "DONOR_TYPE", + "RECIPIENT", + "RECIPIENT_ID", + "RECIPIENT_PARTY", + "RECIPIENT_TYPE", + ] + ] + individuals_table = self.make_individuals_table(individuals_df) + + # Organizations Table + organizations_df = standardized_df[ + [ + "DONOR", + "DONOR_ID", + "DONOR_TYPE", + "RECIPIENT", + "RECIPIENT_ID", + "RECIPIENT_TYPE", + ] + ] + organizations_table = self.make_organizations_table(organizations_df) + # Transactions Table + transactions_df = standardized_df[ + [ + "AMOUNT", + "DONOR_ID", + "DONOR_OFFICE", + "PURPOSE", + "RECIPIENT_ID", + "RECIPIENT_OFFICE", + "YEAR", + "TRANSACTION_ID", + ] + ] + transactions_table = self.make_transactions_table(transactions_df) + + return individuals_table, organizations_table, transactions_table diff --git a/utils/pipeline.py b/utils/pipeline.py index ece3b28..2be5d9b 100644 --- a/utils/pipeline.py +++ b/utils/pipeline.py @@ -1,14 +1,16 @@ import pandas as pd -# import state cleaners here from utils.arizona import ArizonaCleaner +from utils.michigan import MichiganCleaner +from utils.minnesota import MinnesotaCleaner +from utils.pennsylvania import PennsylvaniaCleaner + -# uncomment your state once it is added state_cleaners = [ ArizonaCleaner(), - # MichiganCleaner(), - # MinnesotaCleaner(), - # PennsylvaniaCleaner(), + MichiganCleaner(), + MinnesotaCleaner(), + PennsylvaniaCleaner(), ] if __name__ == "__main__": @@ -27,4 +29,4 @@ complete_individuals_table = pd.concat(single_state_individuals_tables) complete_organizations_table = pd.concat(single_state_organizations_tables) - complete_transactions_table = pd.concat(single_state_transactions_tables) + # complete_transactions_table = pd.concat(single_state_transactions_tables) diff --git a/utils/preprocess_mi_campaign_data.py b/utils/preprocess_mi_campaign_data.py index bf23acd..91afad4 100644 --- a/utils/preprocess_mi_campaign_data.py +++ b/utils/preprocess_mi_campaign_data.py @@ -15,7 +15,7 @@ def fix_mi_dataframes(filepath, columns): pass -def read_expenditure_data(filepath: str, columns: list) -> pd.DataFrame: +def read_expenditure_data(filepath: str, columns: list[str]) -> pd.DataFrame: """Reads in the MI expenditure data Inputs: @@ -37,7 +37,7 @@ def read_expenditure_data(filepath: str, columns: list) -> pd.DataFrame: return df -def read_contribution_data(filepath: str, columns: list) -> pd.DataFrame: +def read_contribution_data(filepath: str, columns: list[str]) -> pd.DataFrame: """Reads in the MI campaign data and skips the errors Inputs: filepath (str): filepath to the MI Campaign Data txt file @@ -46,7 +46,8 @@ def read_contribution_data(filepath: str, columns: list) -> pd.DataFrame: Returns: df (Pandas DataFrame): dataframe of the MI campaign data """ if filepath.endswith("00.txt"): - # MI files that contain 00 contain headers + # MI files that contain 00 or between 1998 and 2003 contain headers + # VALUES_TO_CHECK contains the years between 1998 and 2003 df = pd.read_csv( filepath, delimiter="\t", @@ -56,7 +57,6 @@ def read_contribution_data(filepath: str, columns: list) -> pd.DataFrame: low_memory=False, on_bad_lines="skip", ) - else: df = pd.read_csv( filepath, @@ -96,10 +96,10 @@ def plot_year_contribution_types(all_year_contribution_dataframe: pd.DataFrame) barmode="stack", labels={ "doc_stmnt_year": "Year", - "count": "Count", + "count": "Number of Contrbution Types", "contribtype": "Contribution Type", }, - title="Stacked Bar Chart for Contribution Types by Year", + title="Michigan Campaign Contribution Types by Year", ) fig.show() @@ -113,24 +113,37 @@ def plot_committee_types_by_year(all_year_contribution_dataframe: pd.DataFrame) Return: None """ + committee_names = { + "DIS": "District Party Committee", + "STA": "State Party Committee", + "BAL": "Ballot Question Committee", + "COU": "County Party Committee", + "POL": "Political Action Committee", + "GUB": "Gubernatorial Committee", + "CAN": "Candidate Committee", + "IND": "Independent Political Action Committee", + } contribution_committee_type_by_year = ( all_year_contribution_dataframe.groupby("doc_stmnt_year")["com_type"] .value_counts() .reset_index() ) + contribution_committee_type_by_year[ + "com_type_full" + ] = contribution_committee_type_by_year["com_type"].map(committee_names) fig = px.bar( contribution_committee_type_by_year, x="doc_stmnt_year", y="count", - color="com_type", + color="com_type_full", barmode="stack", labels={ "doc_stmnt_year": "Year", - "count": "Count", - "com_type": "Committee Tyoe", + "count": "Number of Contributions", + "com_type_full": "Committee Type", }, - title="Stacked Bar Chart for Contributions Committee Types by Year", + title="Michigan Campaign Contributions Committee Types by Year", ) fig.show() @@ -146,25 +159,39 @@ def plot_expenditure_committee_types_by_year( Return: None """ + committee_names = { + "DIS": "District Party Committee", + "STA": "State Party Committee", + "BAL": "Ballot Question Committee", + "COU": "County Party Committee", + "POL": "Political Action Committee", + "GUB": "Gubernatorial Committee", + "CAN": "Candidate Committee", + "IND": "Independent Political Action Committee", + } expenditure_committee_type_by_year = ( all_year_expenditure_dataframe.groupby("doc_stmnt_year")["com_type"] .value_counts() .reset_index() ) + expenditure_committee_type_by_year[ + "com_type_full" + ] = expenditure_committee_type_by_year["com_type"].map(committee_names) fig = px.bar( expenditure_committee_type_by_year, x="doc_stmnt_year", y="count", - color="com_type", + color="com_type_full", barmode="stack", labels={ "doc_stmnt_year": "Year", - "count": "Count", - "com_type": "Committee Tyoe", + "count": "Number of Expenditures", + "com_type_full": "Committee Type", }, - title="Stacked Bar Chart for Expenditure Committee Types by Year", + title="Michigan Campaign Expenditure Committee Types by Year", ) + fig.show() @@ -191,10 +218,10 @@ def plot_year_schedule_types(all_year_expenditure_dataframe: pd.DataFrame) -> No barmode="stack", labels={ "doc_stmnt_year": "Year", - "count": "Count", + "count": "Number of Expenditures", "schedule_desc": "Schedule Description", }, - title="Stacked Bar Chart for Expenditure Schedule Types by Year", + title="Michigan Campaign Expenditure Schedule Types by Year", ) fig.show()