We have a DVC Pipeline defined in dvc.yaml file.
The pipeline is composed of stages using Python scripts, defined in src:
flowchart TD
node2[eval]
node3[get-data]
node4[split-data]
node5[train]
node3-->node4
node4-->node2
node4-->node5
node5-->node2
We use DVC Params, defined in params.yaml, to configure the pipeline.
The pipeline enables local reproducibility
and can be run with dvc repro
:
git clone git@[email protected]:iterative/workshop-uncool-mlops.git
cd workshop-uncool-mlops
pip install -r requirements.txt
dvc repro
The pipeline generates DVC Metrics and DVC Plots to evaluate model performance, which can be found in outs
You can connect the repo with https://studio.iterative.ai/ in order to have a better visualization for the metrics, parameters and plots associated to each commit:
https://studio.iterative.ai/user/daavoo/views/workshop-uncool-mlops-5fgmd70rkt
Because the metrics and plots files are small enough to be tracked by git
, after we run the pipeline we can share the results with others:
git add `dvc.lock` outs
git push
However, the rest of the outputs are gitignored because they are too big to be tracked by git
.