In this project we analyzed data from US Census, CDC, and other sources, and build a machine learning model to predict health outcomes across US counties.
The project consists of three main parts, which are:
- Data collection, preprocessing
- Exploratory Data Analysis (EDA)
- Machine learning model building
▪️ Case_Study.ipynb contains jupyter notebook file with the project itself.
▪️ sunshine_data_scraper.py contains python code for scraping average annual sunshine data by state (across US).
▪️ Health_outomes_slides.pdf is a document with the presentation slides for this project.
To reproduce the same working enviroment, please follow these steps:
- Clone the repository to desired location on your machine
git clone link-to-the-repo
- Create conda enviroment with the correct python version and activate it
conda create --name myenv python=3.9
conda activate myenv
- Install ipykernel package and enable your enviroment to show up in the list of available kernels in jupyter notebook
conda install ipykernel
python -m ipykernel install --user --name myenv --display-name "myenv"
- Install required packages
conda install pandas==1.3.5
conda install openpyxl==3.0.10
conda install nbformat==5.7.0
conda install seaborn==0.12.2
conda install scipy==1.10.0
conda install scikit-learn==1.2.2
conda install yellowbrick==1.5
- Install PLotly from the sourse
conda install -c https://conda.anaconda.org/plotly plotly
- Open jupyter notebook from the same directory
jupyter notebook
- Change the Kernel to "myenv" using Menu bar.
Now you can run the notebook cells.
Link to Google Colab notebook: Health outcomes analysis in USA notebook
Link to presentation slides: Canva Slides