Developed a machine learning model that predicted the Water Quality Index (WQI) of the Ganga River with 85 percent accuracy, using historical water data.
The primary objective of this project is to develop a predictive model for the Water Quality Index of the River Ganga.
-
Data Collection and Preprocessing: Gather comprehensive datasets encompassing various water quality parameters along the Ganges. Preprocess the data to handle missing values, outliers, and ensure compatibility with machine learning algorithms.
-
Feature Selection: Identify key features impacting water quality and select a subset for model training to enhance efficiency and interpretability.
-
Algorithm Selection: Evaluate and compare the performance of diverse machine learning algorithms such as Random Forest, Support Vector Machines (SVM), and others ML algorithms for predicting the Water Quality Index.
-
Model Training and Validation: Train the selected models on historical data, utilizing a portion for validation to assess performance and fine-tune hyperparameters.
-
Prediction and Visualization: Predict which ML algo gives the most accurate water quality level of the and receive real-time predictions of the Water Quality Index.
- Google Colaboratory
- Python Panda
- Random Forest Classifier has the highest scores in all metrics (Accuracy, Precision, Recall, and F1 Score) with perfect scores of 1.0000, indicating it perfectly predicts the water quality classifications in the testing set.
- Decision Tree Classifier also performs very well, with high scores in all metrics, but not as perfect as the Random Forest.
- KNN Classifier and Logistic Regression have similar performance, with decent accuracy and other metric scores.
- SVM Classifier has the lowest scores among the five models, indicating it is less suitable for this classification task compared to the other models.
Here are some related topics:
Email at [email protected] for contributions.