Skip to content

sudarshan-krishnan/Diabetes-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diabetes Prediction Using Machine Learning

This project demonstrates how to predict diabetes using various machine learning models. By leveraging Python's powerful data science libraries, we import and analyze the data, preprocess it, and then build and evaluate several predictive models.

Key Features

  • Data Collection & Analysis: Load and explore the diabetes dataset.
  • Data Visualization: Use plots to understand data distribution and relationships.
  • Preprocessing: Handle missing values, encode categorical variables, and scale the data.
  • Model Training: Train multiple machine learning models to predict diabetes.
  • Model Evaluation: Assess model performance using accuracy, confusion matrix, and classification report.
  • Model Comparison: Compare the performance of different models using ROC curves and accuracy scores.

Components

  • Python libraries: NumPy, pandas, seaborn, statsmodels, matplotlib, scikit-learn, xgboost, and missingno.
  • Jupyter Notebook for interactive data analysis and model building.

How It Works

  1. Import Dependencies: Import necessary libraries for data manipulation, visualization, and modeling.
  2. Data Collection & Analysis: Load the diabetes dataset and explore its structure, including checking for missing values and basic statistical measures.
  3. Data Visualization: Plot various features to understand their distributions and relationships.
  4. Data Preprocessing: Handle missing values by replacing them with median values based on the target variable (Outcome).
  5. Data Scaling: Scale the data using StandardScaler and RobustScaler.
  6. Model Training: Train various models including Logistic Regression, KNN, SVM, Decision Tree, Random Forest, Gradient Boosting, and XGBoost.
  7. Model Evaluation: Evaluate the models using accuracy, confusion matrix, and classification report.
  8. Model Comparison: Compare models' performance using ROC curves and accuracy scores.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published