CI/CD for Machine Learning
Ravi Bhadauria
Machine Learning Engineer
outs
stages:
preprocess:
...
train:
...
outs:
- metrics.json
- confusion_matrix.png
metrics
stages: preprocess: ... train: ... outs: - confusion_matrix.png
metrics: - metrics.json: cache: false
-> dvc metrics show
Path accuracy f1_score precision recall metrics.json 0.947 0.8656 0.988 0.7702
Change a hyperparameter and rerun dvc repro
-> dvc metrics diff
Path Metric HEAD workspace Change metrics.json accuracy 0.947 0.9995 0.0525 metrics.json f1_score 0.8656 0.9989 0.1333 metrics.json precision 0.988 0.9993 0.0113 metrics.json recall 0.7702 0.9986 0.2284
setup-dvc
GitHub Actionsteps: ... - name: Setup DVC uses: iterative/setup-dvc@v1
- name: Run DVC pipeline run: dvc repro
- name: Write CML report env: REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | # Print metrics of current branch dvc metrics show --md >> report.md
# Compare metrics with main branch git fetch --prune dvc metrics diff --md main >> report.md
# Create CML report cml comment create report.md
scatter
- scatter plotlinear
- interactive linear plotsimple
- non-interactive customizable linear plotsmooth
- linear plot with smoothingconfusion
- confusion matrixconfusion_normalized
- confusion matrix with values normalized to <0, 1> rangebar_horizontal
- horizontal bar plotbar_horizontal_sorted
- horizontal bar plot sorted by bar sizestages: train: ... plots: - predictions.csv: # Name of file containing predictions template: confusion # Style of plot
x: predicted_label # X-axis column name in csv file y: true_label # Y-axis column name in csv file x_label: 'Predicted label' y_label: 'True label' title: Confusion matrix
cache: false # Save in Git
-> dvc plots show predictions.csv
file:///path/to/index.html
-> dvc plots diff --target predictions.csv main
file:///path/to/index.html
# Changes in Python
y_proba = model.predict_proba(X_test)
fpr, tpr, _ = roc_curve(y_test,
y_proba[:, 1])
# Changes in dvc.yaml
plots:
- roc_curve.csv:
template: simple
x: fpr
y: tpr
x_label: 'False Positive Rate'
y_label: 'True Positive Rate'
title: ROC curve
cache: false
CI/CD for Machine Learning