Comparing metrics and plots in DVC

CI/CD for Machine Learning

Ravi Bhadauria

Machine Learning Engineer

Configuring DVC YAML file

  • Configure DVC YAML file to track metrics across experiments
  • Change from outs
stages:
  preprocess:
    ...
  train:
    ...
    outs:
    - metrics.json
    - confusion_matrix.png
  • To metrics
stages:
  preprocess:
    ...
  train:
    ...
    outs:
    - confusion_matrix.png

metrics: - metrics.json: cache: false
CI/CD for Machine Learning

Querying and comparing DVC metrics

-> dvc metrics show

Path accuracy f1_score precision recall metrics.json 0.947 0.8656 0.988 0.7702

Change a hyperparameter and rerun dvc repro

-> dvc metrics diff

Path Metric HEAD workspace Change metrics.json accuracy 0.947 0.9995 0.0525 metrics.json f1_score 0.8656 0.9989 0.1333 metrics.json precision 0.988 0.9993 0.0113 metrics.json recall 0.7702 0.9986 0.2284
1 https://dvc.org/doc/command-reference/metrics
CI/CD for Machine Learning

Setting up DVC Github Action

  • Add setup-dvc GitHub Action
  • Replace running Python scripts with DVC pipeline
steps:
  ...
  - name: Setup DVC
    uses: iterative/setup-dvc@v1

- name: Run DVC pipeline run: dvc repro
CI/CD for Machine Learning

Setting up DVC Github Action

- name: Write CML report
  env:
    REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  run: |
    # Print metrics of current branch
    dvc metrics show --md >> report.md

# Compare metrics with main branch git fetch --prune dvc metrics diff --md main >> report.md
# Create CML report cml comment create report.md
CI/CD for Machine Learning

Pipeline in action

Screenshot of a pull request comment showing metrics diff against main branch

CI/CD for Machine Learning

Plot types in DVC

  • scatter - scatter plot
  • linear - interactive linear plot
  • simple - non-interactive customizable linear plot
  • smooth - linear plot with smoothing
  • confusion - confusion matrix
  • confusion_normalized - confusion matrix with values normalized to <0, 1> range
  • bar_horizontal - horizontal bar plot
  • bar_horizontal_sorted - horizontal bar plot sorted by bar size
1 https://dvc.org/doc/user-guide/experiment-management/visualizing-plots#plot-templates-data-series-only
CI/CD for Machine Learning

Configuring DVC YAML for plots

stages:
  train:
    ...
    plots:
    - predictions.csv: # Name of file containing predictions
        template: confusion # Style of plot

x: predicted_label # X-axis column name in csv file y: true_label # Y-axis column name in csv file x_label: 'Predicted label' y_label: 'True label' title: Confusion matrix
cache: false # Save in Git
CI/CD for Machine Learning

Plotting Confusion Matrix

-> dvc plots show predictions.csv
file:///path/to/index.html

Confusion matrix display plot generated by DVC

CI/CD for Machine Learning

Comparing Confusion Matrix

-> dvc plots diff --target predictions.csv main
file:///path/to/index.html

Confusion matrix diff plot generated by DVC

CI/CD for Machine Learning

Comparing ROC Curves

# Changes in Python
y_proba = model.predict_proba(X_test)
fpr, tpr, _ = roc_curve(y_test, 
                        y_proba[:, 1])
# Changes in dvc.yaml
plots:
- roc_curve.csv:
    template: simple
    x: fpr
    y: tpr
    x_label: 'False Positive Rate'
    y_label: 'True Positive Rate'
    title: ROC curve
    cache: false

ROC curve diff plot generated by DVC

CI/CD for Machine Learning

Let's practice!

CI/CD for Machine Learning

Preparing Video For Download...