Introduzione al versionamento dei dati con DVC
Ravi Bhadauria
Machine Learning Engineer
.dvc della workspace$ dvc cache dir ~/mycache

$ dvc add data.csv
100% Adding...|====================|1/1 [00:00, 53.55file/s]
To track the changes with git, run:
git add data.csv.dvc
To enable auto staging, run:
dvc config core.autostage true
Ogni file tracciato da DVC ha il relativo file .dvc
data.csv -> data.csv.dvcPer versionare il file dati, usa git commit -m "data.csv.dvc"
Contenuto dei file .dvc
outs:- md5: f38a850818377e97155d22755caa39d0size: 16hash: md5path: data.csv
$ find .dvc/cache -type f
.dvc/cache/f3/8a850818377e97155d22755caa39d0
$ md5 data.csv
MD5 (data.csv) = f38a850818377e97155d22755caa39d0
dvc add -v per output dettagliato
dvc remove$ dvc remove data.csv.dvc
dvc gc-w rimuovi la cache della workspace$ dvc gc -w
WARNING: This will remove all cache except items used in the workspace of the current repo.
Are you sure you want to proceed? [y/n]: y
Removed 1 objects from repo cache.
dvc cache dir ~/mycachedvc add data.csv.dvc con metadatidvc remove data.csv.dvcdvc gc -wIntroduzione al versionamento dei dati con DVC