This project focuses on analyzing various risk factors associated with diabetes. Using a dataset containing health metrics and demographic information, I applied data analysis techniques to identify key predictors of diabetes. The project involved data cleaning, exploratory data analysis (EDA), feature engineering, and the application of machine learning models to predict the likelihood of diabetes based on these factors.
- Data Cleaning: Addressed missing values, outliers, and inconsistencies in the dataset to ensure accurate analysis.
- Exploratory Data Analysis (EDA): Conducted a thorough EDA to understand the distribution of variables, relationships between features, and potential patterns in the data.
- Feature Engineering: Created new features to improve model performance, including BMI categorization, age groupings, and more.
- Machine Learning: Implemented models such as Logistic Regression, Decision Trees, and Random Forest to predict diabetes risk. Evaluated model performance using metrics like accuracy, precision, recall, and F1-score.
- Results & Insights: Identified significant risk factors for diabetes, including age, BMI, blood pressure, and family history. The final model provided a reliable method for predicting diabetes risk, with practical implications for early intervention and prevention.
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- Jupyter Notebook
The analysis revealed that factors such as age, BMI, blood pressure, and family history are significant predictors of diabetes. The predictive model built in this project can be used for early detection and prevention of diabetes, which is crucial in managing the disease effectively.
This project not only provided valuable insights into the key factors influencing diabetes but also demonstrated my ability to perform comprehensive data analysis and build predictive models. It has practical relevance in healthcare for identifying high-risk individuals and guiding preventative measures.