Introduction to Data Versioning with DVC
Ravi Bhadauria
Machine Learning Engineer
pip
$ pip install dvc
$ dvc version
DVC version: 3.40.1 (pip)
Platform: Python 3.9.16 on macOS-14.2.1-arm64-arm-64bit
Config: Global: /Users/<username>/Library/Application Support/dvc System: /Library/Application Support/dvc
Repo: dvc, git
$ git init
Initialized empty Git repository in /path/to/repo/.git/
$ dvc init
Initialized DVC repository.
You can now commit the changes to git.
$ git status
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: .dvc/.gitignore
new file: .dvc/config
new file: .dvcignore
$ git commit -m "initialized dvc"
Similar to .gitignore
file
Useful when tracking many data files not needed
# .dvcignore # Ignore all files in the 'data' directory data/*
# But don't ignore 'data/data.csv' !data/data.csv
# Ignore all .tmp files *.tmp
dvc check-ignore
command$ dvc check-ignore data/file.txt
data/file.txt
-d
flag to get details$ dvc check-ignore -d data/file.txt
.dvcignore:3:data/* data/file.txt
pip install dvc
dvc version
dvc init
.dvcignore
files are used to specify excluded files.gitignore
, follows same syntaxdvc check-ignore <filename>
Introduction to Data Versioning with DVC