CI/CD for Machine Learning
Ravi Bhadauria
Machine Learning Engineer
Sequence of stages defining ML workflow and dependencies
Defined in dvc.yaml
file
deps
)cmd
)outs
)metrics
and plots
Similar to the GitHub Actions workflow
dvc stage add
dvc stage add \
-n preprocess \
-d raw_data.csv -d preprocess.py \
-o processed_data.csv \
python preprocess.py
dvc.yaml
contentsstages:
preprocess:
cmd: python preprocess.py
deps:
- preprocess.py
- raw_data.csv
outs:
- processed_data.csv
dvc stage add \
-n train \
-d train.py -d processed_data.csv \
-o plots.png -o metrics.txt \
python train.py
stages: preprocess: cmd: python preprocess.py deps: - preprocess.py - raw_data.csv outs: - processed_data.csv
train: cmd: python train.py deps: - processed_data.csv - train.py outs: - plots.png
dvc repro
-> dvc repro Running stage 'preprocess': > python preprocess.py
Running stage 'train': > python train.py Updating lock file 'dvc.lock'
dvc.lock
is generated.dvc
file, captures MD5 hashesgit add dvc.lock && git commit -m "first pipeline repro"`
-> dvc repro
Stage 'preprocess' didn't change, skipping
Running stage 'train' with command: ...
-> dvc dag
+------------+
| preprocess |
+------------+
*
*
*
+-------+
| train |
+-------+
dvc.yaml
and dvc.lock
dvc stage add
dvc repro
dvc dag
CI/CD for Machine Learning