Introduction to Data Versioning with DVC
Ravi Bhadauria
Machine Learning Engineer
$ dvc push <target>
$ dvc pull <target>
$ dvc push data.csv
$ dvc push
$ dvc fetch
-r
flag$ dvc push -r aws_remote data.csv
dvc pull
Function: Downloads remote data to DVC workspace
Use Case: Large datasets or model artifacts
dvc push
Function: Uploads data to remote storage
Use Case: Sharing or storing data artifacts
git pull
Function: Fetch/Merge data remote Git repo
Use Case: Local branch in sync with remote
git push
Function: Uploads local changes to remote
Use Case: Share changes to Git remote
.dvc
is tracked by Git, not DVC
Leverage this to checkout specific version of data file
Checkout .dvc
file
$ git checkout <commit_hash|tag|branch>
.dvc
file$ dvc checkout <target>
$ dvc add <target>
.dvc
file to Git$ git add <target>.dvc
$ git commit <target>.dvc \
-m "Dataset updates"
$ git push origin main
$ dvc push
Introduction to Data Versioning with DVC