This project shows how to realize MLOps in Git/GitHub. In order to achieve this aim, this project heavily leverages the toolse such as DVC, DVC Studio, DVCLive - all products built by iterative.ai, Google Drive, Jarvislabs.ai, and HuggingFace Hub.
- Click "Use this template" button to create your own repository
- Wait for few seconds, then
Initial Setup
PR will be automatically created - Merge the PR, and you are good to go
- Run
pip install -r requirements.txt
(requirements.txt) - Run
dvc init
to enable DVC - Add your data under
data
directory - Run
git rm -r --cached 'data' && git commit -m "stop tracking data"
- Run
dvc add [ADDED FILE OR DIRECTORY]
to track your data with DVC - Run
dvc remote add -d gdrive_storage gdrive://[ID of specific folder in gdrive]
to add Google Drive as the remote data storage - Run
dvc push
, then URL to auth is provided. Copy and paste it to the browser, and autheticate - Copy the content of
.dvc/tmp/gdrive-user-credentials.json
and put it as in GitHub Secret with the name ofGDRIVE_CREDENTIAL
- Run
git add . && git commit -m "initial commit" && git push origin main
to keep the initial setup - Write your own pipeline under
pipeline
directory. Codes for basic image classification in TensorFlow are provided initially. - Run the following
dvc stage add
for training stage
# if you want to use Iterative Studio / DVCLive for tracking training progress
$ dvc stage add -n train \
-p train.train_size,train.batch_size,train.epoch,train.lr \
-d pipeline/modeling.py -d pipeline/train.py -d data \
--plots-no-cache dvclive/scalars/train/loss.tsv \
--plots-no-cache dvclive/scalars/train/sparse_categorical_accuracy.tsv \
--plots-no-cache dvclive/scalars/eval/loss.tsv \
--plots-no-cache dvclive/scalars/eval/sparse_categorical_accuracy.tsv \
-o outputs/model \
python pipeline/train.py outputs/model
# if you want to use W&B for tracking training progress
$ dvc stage add -n train \
-p train.train_size,train.batch_size,train.epoch,train.lr \
-d pipeline/modeling.py -d pipeline/train_wandb.py -d data \
-o outputs/model \
python pipeline/train_wandb.py outputs/model
- Run the following
dvc stage add
for evaluate stage
# if you want to use Iterative Studio / DVCLive for tracking training progress
$ dvc stage add -n evaluate \
-p evaluate.test,evaluate.batch_size \
-d pipeline/evaluate.py -d data/test -d outputs/model \
-M outputs/metrics.json \
python pipeline/evaluate.py outputs/model
# if you want to use W&B for tracking training progress
$ dvc stage add -n evaluate \
-p evaluate.test,evaluate.batch_size \
-d pipeline/evaluate.py -d data/test -d outputs/model \
python pipeline/evaluate.py outputs/model
- Update
params.yaml
as you need. - Run
git add . && git commit -m "add initial pipeline setup" && git push origin main
- Run
dvc repro
to run the pipeline initially - Run
dvc add outputs/model.tar.gz
to add compressed version of model - Run
dvc push outputs/model.tar.gz
- Run
echo "/pipeline/__pycache__" >> .gitignore
to ignore unnecessary directory - Run
git add . && git commit -m "add initial pipeline run" && git push origin main
- Add access token and user email of JarvisLabs.ai to GitHub Secret as
JARVISLABS_ACCESS_TOKEN
andJARVISLABS_USER_EMAIL
- Add GitHub access token to GitHub Secret as
GH_ACCESS_TOKEN
- Create a PR and write
#train --with dvc
as in comment (you have to be the onwer of the repo)
- Add W&B's project name to GitHub Secret as
WANDB_PROJECT
- Add W&B's API KEY to GitHub Secret as
WANDB_API_KEY
- Use
#train --with wandb
instead of#train --with dvc
- Add access token of HugginFace to GitHub Secret as
HF_AT
- Add username of HugginfFace to GitHub Secret as
HF_USER_ID
- Write
#deploy-hf
in comment of PR you want to deploy to HuggingFace Space- GitHub Action assumes your model is archieved as
model.tar.gz
underoutputs
directory - Algo GitHub Action assumes your HuggingFace Space app is written in Gradio under
hf-space
directory. You need to changeapp_template.py
as you need(you shouldn't remove any environment variables in the file).
- GitHub Action assumes your model is archieved as
- Write solid steps to reproduce this repo for other tasks
- Support W&B for tracking the training process instead of DVCLive
- Deploy experimental model to HF Space
- Deploy current model to GKE with auto TFServing deployment project
- Add more cloud providers offering GPU VMs
- Integrate more managed services for management
- W&B Artifact for dataset/model versioning and experiment tracking
- HugginfFace for dataset/model versioning
- Integrate more managed services for deployment
- Add more example codebase (pipeline)
- TensorFlow based Object Detection
- PyTorch based Image Classification
- HuggingFace Transformers
- DVC(Data Version Control): Manages data in somewhere else(i.e. cloud storage) while keeping the version and remote information in metadata file in Git repository.
- DVCLive: Provides callbacks for ML framework(i.e. TensorFlow, Keras) to record metrics during training in tsv format.
- DVC Studio: Visuallize the metrics from files in Git repository. What to visuallize is recorded in
dvc.yaml
. - Google Drive: Is used as a remote data repository. However, you can use others such as AWS S3, Google Cloud Storage, or your own file server.
- Jarvislabs.ai: Is used to provision cloud GPU VM instances to conduct each experiments.