CI/CD for Machine Learning
Ravi Bhadauria
Machine Learning Engineer
outsstages:
preprocess:
...
train:
...
outs:
- metrics.json
- confusion_matrix.png
metricsstages: preprocess: ... train: ... outs: - confusion_matrix.pngmetrics: - metrics.json: cache: false
-> dvc metrics showPath accuracy f1_score precision recall metrics.json 0.947 0.8656 0.988 0.7702
Change a hyperparameter and rerun dvc repro
-> dvc metrics diffPath Metric HEAD workspace Change metrics.json accuracy 0.947 0.9995 0.0525 metrics.json f1_score 0.8656 0.9989 0.1333 metrics.json precision 0.988 0.9993 0.0113 metrics.json recall 0.7702 0.9986 0.2284
setup-dvc GitHub Actionsteps: ... - name: Setup DVC uses: iterative/setup-dvc@v1- name: Run DVC pipeline run: dvc repro
- name: Write CML report env: REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | # Print metrics of current branch dvc metrics show --md >> report.md# Compare metrics with main branch git fetch --prune dvc metrics diff --md main >> report.md# Create CML report cml comment create report.md

scatter - scatter plotlinear - interactive linear plotsimple - non-interactive customizable linear plotsmooth - linear plot with smoothingconfusion - confusion matrixconfusion_normalized - confusion matrix with values normalized to <0, 1> rangebar_horizontal - horizontal bar plotbar_horizontal_sorted - horizontal bar plot sorted by bar sizestages: train: ... plots: - predictions.csv: # Name of file containing predictions template: confusion # Style of plotx: predicted_label # X-axis column name in csv file y: true_label # Y-axis column name in csv file x_label: 'Predicted label' y_label: 'True label' title: Confusion matrixcache: false # Save in Git
-> dvc plots show predictions.csv
file:///path/to/index.html

-> dvc plots diff --target predictions.csv main
file:///path/to/index.html

# Changes in Python
y_proba = model.predict_proba(X_test)
fpr, tpr, _ = roc_curve(y_test,
y_proba[:, 1])
# Changes in dvc.yaml
plots:
- roc_curve.csv:
template: simple
x: fpr
y: tpr
x_label: 'False Positive Rate'
y_label: 'True Positive Rate'
title: ROC curve
cache: false

CI/CD for Machine Learning