In this project with the help of Machine Learning, I have classified a water sample as safe or not safe based on concentration of 20 elements in the sample.
First the dataset has been taken from Kaggle (link will be provided below). Then after using preprocessing techniques and performing data analysis, different classification models have been fitted into the dataset and a good accuracy has been achieved.
Then the model has been partially deployed using Anvil - an online tool which helps you to directly connect your notebook with an interface.
Taking one step further , I have downlaoded the model using joblib and then containerized it using Docker.
-
app - contains server.py (used for creating endpoints using FastAPI) and model.joblib (downloaded model)
-
Dockerfile - for creating container
-
client.py - Just a test file for sending data
-
requirements.txt- contains all requirements used when building container
-
water-quality-notebook.ipynb - Google colab file containing the preprocessing, data analysis and model fitting.
-
water-quality-pca.ipynb - Bonus notebook where I have applied PCA to check and retain the variance with less number of features
https://www.kaggle.com/datasets/mssmartypants/water-quality
{"features":[0.940, 14.470,0.030,2.880,0.003,0.800,0.430,1.380,0.110,0.670,0.670,0.135,9.750,1.890,0.006,27.170,5.420,0.080,0.190,0.020]}
Output: 1
- Python - Programming language for ML
- Numpy, Pandas - Used for loading and working on dataset
- Anvil - Used for building interface
- Docker - Used for containerization