CI/CD for Machine Learning
Ravi Bhadauria
Machine Learning Engineer

Sequence of stages defining ML workflow and dependencies
Defined in dvc.yaml file
deps)cmd)outs)metrics and plotsSimilar to the GitHub Actions workflow
dvc stage adddvc stage add \
-n preprocess \
-d raw_data.csv -d preprocess.py \
-o processed_data.csv \
python preprocess.py
dvc.yaml contentsstages:
preprocess:
cmd: python preprocess.py
deps:
- preprocess.py
- raw_data.csv
outs:
- processed_data.csv
dvc stage add \
-n train \
-d train.py -d processed_data.csv \
-o plots.png -o metrics.txt \
python train.py
stages: preprocess: cmd: python preprocess.py deps: - preprocess.py - raw_data.csv outs: - processed_data.csvtrain: cmd: python train.py deps: - processed_data.csv - train.py outs: - plots.png
dvc repro-> dvc repro Running stage 'preprocess': > python preprocess.pyRunning stage 'train': > python train.py Updating lock file 'dvc.lock'
dvc.lock is generated.dvc file, captures MD5 hashesgit add dvc.lock && git commit -m "first pipeline repro"`-> dvc repro
Stage 'preprocess' didn't change, skipping
Running stage 'train' with command: ...
-> dvc dag
+------------+
| preprocess |
+------------+
*
*
*
+-------+
| train |
+-------+
dvc.yaml and dvc.lockdvc stage adddvc reprodvc dagCI/CD for Machine Learning