Disease Prediction Machine Learning Project

Overview

This repository contains a machine-learning project focused on classifying diseases based on a pool of over 130 symptoms. This project aims to develop a model that can accurately predict an individual's disease based on various combinations of symptoms.

Dataset

The dataset used for this project consists of 4920 samples. The dataset was obtained from Kaggle.

Approach

In this project, I employed several different machine learning models from Scikit-learn to train and evaluate the predictive model. The steps involved in the process include:

Importing data: Cleaning the dataset, handling missing values, and scaling features.
Model training: Training the selected algorithm on the preprocessed dataset.
Model evaluation: Assessing the performance of the trained model using appropriate metrics such as accuracy, precision, recall, and F1-score.

Results

Currently, the trained model achieves an accuracy score of 99.4% using Decision Trees.

Usage

To run this project locally, follow these steps:

Open this link to access the Google Colab file and make a copy
Download dataset files from this repository
Run the Google Colab to preprocess the data, train the model, make predictions, and save the model.

Technologies Used

Programming languages: Python

Libraries: Scikit-learn Pandas NumPy Matplotlib Seaborn

Future Improvements

Some potential enhancements for this project include:

Exploring how the order of severity can affect the accuracy
Tuning hyperparameters to optimize the model's performance.
Deploying the trained model as a web application or API for real-time predictions.

Acknowledgments

Special thanks to Pranay Patil for providing the dataset used in this project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Disease Prediction Machine Learning Project

Overview

Dataset

Approach

Results

Usage

Technologies Used

Future Improvements

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

Disease Prediction Machine Learning Project

Overview

Dataset

Approach

Results

Usage

Technologies Used

Future Improvements

Acknowledgments