Introduction to Data Versioning with DVC
Ravi Bhadauria
Machine Learning Engineer
outs
stages:
train_and_evaluate:
outs:
- metrics.json
- plots.png
$$ $$
metrics
stages:
train_and_evaluate:
outs:
- plots.png
metrics:
- metrics.json:
cache: false
$ dvc metrics show
Path accuracy f1_score precision recall
metrics.json 0.947 0.8656 0.988 0.7702
dvc repro
$ dvc metrics diff
Path Metric HEAD workspace Change
metrics.json accuracy 0.947 0.9995 0.0525
metrics.json f1_score 0.8656 0.9989 0.1333
metrics.json precision 0.988 0.9993 0.0113
metrics.json recall 0.7702 0.9986 0.2284
stages: train_and_evaluate: ... plots: - predictions.csv: # Name of file containing predictions template: confusion # Style of plot
x: predicted_label # X-axis column name in csv file y: true_label # Y-axis column name in csv file x_label: 'Predicted label' y_label: 'True label' title: Confusion matrix
cache: false # Save in Git
$ dvc plots show predictions.csv
file:///path/to/index.html
# comapre plot in predictions.csv against branch main
$ dvc plots diff --target predictions.csv <branch name or commit SHA>
Introduction to Data Versioning with DVC