Introduction to Data Versioning with DVC
Ravi Bhadauria
Machine Learning Engineer
outsstages:
train_and_evaluate:
outs:
- metrics.json
- plots.png
$$ $$
metricsstages:
train_and_evaluate:
outs:
- plots.png
metrics:
- metrics.json:
cache: false
$ dvc metrics show
Path accuracy f1_score precision recall
metrics.json 0.947 0.8656 0.988 0.7702
dvc repro$ dvc metrics diff
Path Metric HEAD workspace Change
metrics.json accuracy 0.947 0.9995 0.0525
metrics.json f1_score 0.8656 0.9989 0.1333
metrics.json precision 0.988 0.9993 0.0113
metrics.json recall 0.7702 0.9986 0.2284
stages: train_and_evaluate: ... plots: - predictions.csv: # Name of file containing predictions template: confusion # Style of plotx: predicted_label # X-axis column name in csv file y: true_label # Y-axis column name in csv file x_label: 'Predicted label' y_label: 'True label' title: Confusion matrixcache: false # Save in Git
$ dvc plots show predictions.csv
file:///path/to/index.html

# comapre plot in predictions.csv against branch main
$ dvc plots diff --target predictions.csv <branch name or commit SHA>

Introduction to Data Versioning with DVC