Manufacturing process feature selection and categorization. The project focuses on MLops best practices.
The projects contains semicoductor sensor data and classifies the end product as Pass or Fail. There are ~580 sensor features that are used. Can we make a classification model that uses the best featuers to make Pass/Fail prediction?
data source: https://www.kaggle.com/datasets/paresh2047/uci-semcom?resource=download
- Data Set Characteristics: Multivariate
- Number of Instances: 1567
- Area: Computer
- Attribute Characteristics: Real
- Number of Attributes: 591
- Date Donated: 2008-11-19
- Associated Tasks: Classification, Causal-Discovery
- Missing Values? Yes
Task 1) Add experiment tracking and set-up registery server (local) artifacts in s3 with MLflow
- Experiment tracking (DONE)
- Model registery (DONE)
- Connect to S3 (DONE)
Task 2) Convert notebook into a pipeline
- Define pipeline (DONE)
- write individual scirpts of pipeline (DONE)
- create a scikit learn pipeline to have preprocess and model together (DONE)
- finish predict (DONE)
- Use and Test model (DONE)
Task 3) Add orchestration with Prefect
- Added Prefect flow in train.py (DONE) (12/08/22)
- Full deploy prefect
Task 4) Add Monitoring
- Add Monitoring service (DONE) (19/08/22)
- Connect mlflow to monitoring image (DONE) (23/08/22)
Task 4) AWS model deploy also connect kinesis and lambda function
- Predict using kinesis streams
Task 5) Best practices --> Create tests,linting and pre-commit hooks
- Add tests: Prefect tasks (DONE) (15/08/22) Integration tests (DONE) (20/08/22)
- Add pre-commit hooks (Done) (20/08/22)
- Add linting (DONE) (20/08/22)
- Add Makefile (DONE) (22/08/22)
- Add CI/CD (DONE) (22/08/22)
Taks 6) Final touches
- Make Readme.md nicer (DONE) (03/09/22)
- Write project description (DONE) (03/09/22)
- Write Instructions (DONE) (03/09/22)
requirments:
- docker installed
- Having your AWS credential added in docker compose file and model exists in S3.
for 2) you can also use the model.pkl. Release line 46 in prediction_service\app.py. Put "#" anything related to S3.
docker compose -f docker-compose.yml up --build
to test it run:
bash build_test_shut.sh
then you can run:
python .\send_data.py
finally:
python ./prefect_monitoring/prefect_monitoring.py
This will create an html file with the report
For the data drift you can check the Grafana container on port 3000 :)
Currently this project focuses on MLops. It is weak on the actual ML-pipeline.
-
A good idea is to apply L1 regularization in a feature selection step.
-
Try other classification models and grid search.
-
Use Docker compose or any otehr method to automatically push to ECS.
-
Use Kinesis and through a lambda function send stream data to the ECS.
use:
$pipenv install
in train.py and main_notebook.ipynb
mlflow.create_experiment("semicon-sensor-clf","[your S3 bucket]")
(or use a local file)
run: predict.py
go to : http://localhost:8001/docs
press :"try it out"
use example: from test_one_input.txt (it should give output as "0")
run: pytest
docker compose -f docker-compose.yml up --build python .\send_data.py
If you are on windows and want to run a Makefile go to: https://chocolatey.org/install and follow the instructions
run this in gitbash:
make build
It will run needed tests and then build image and run the container
after this you can run:
python send_data.py
python ./prefect_monitoring/prefect_monitoring.py
This will create an html file with the report
pipenv:
$pipenv install $pipenv install --dev [library] $pipenv --venv
mlflow:
mlflow server --backend-store-uri=sqlite:///mlflow.db --default-artifact-root=s3://mlflow-semicon-clf/
use:
mlflow.set_tracking_uri("sqlite:///mlflow.db") #mlflow.set_experiment("testing-mlflow") mlflow.create_experiment("semicon-sensor-clf","s3://mlflow-semicon-clf/") mlflow.set_experiment("semicon-sensor-clf")
linting and black:
pylint --recursive=y train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py, ./prediction_service/app.py
black --skip-string-normalization --diff train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py
black --skip-string-normalization train.py, predict.py, ./prefect_monitoring/prefect_monitoring.py, ./prediction_service/app.py
git:
pre-commit
prefect:
prefect orion start
aws ecs: docker compose --project-name semicontest -f docker-compose.yml up --build