Introductie tot dataversiebeheer met DVC
Ravi Bhadauria
Machine Learning Engineer
stages:
preprocess:
cmd: python3 preprocess.py
params:
- preprocess
deps:
- preprocess.py
- raw_data.csv
outs:
- processed_data.csv
stages:
train_and_evaluate:
cmd: python3 train_and_evaluate.py
params:
- train_and_evaluate
deps:
- processed_data.csv
- train_and_evaluate.py
outs:
- plots.png
- metrics.json
dvc repro$ dvc repro
Running stage 'preprocess': > python preprocess.pyRunning stage 'train_and_evaluate': > python train_and_evaluate.py Updating lock file 'dvc.lock'
dvc.lock gegenereerd.dvc, bevat MD5-hashes$ git add dvc.lock && git commit -m "first pipeline run"
$ dvc repro
Stage 'preprocess' didn't change, skipping
Running stage 'train_and_evaluate' with command: ...

--dry om alleen commando's te tonen zonder de pipeline te draaien$ dvc repro --dry
Running stage 'preprocess':
> python3 preprocess_dataset.py
Running stage 'train_and_evaluate':
> python3 train_and_evaluate.py
dvc repro linear/dvc.yamldvc.yaml-bestanden in één map zijn niet toegestaandvc repro <target_stage>dvc repro -fdvc repro --no-commitdvc commit
# Draai A2 en z'n upstream afhankelijkheden
$ dvc repro A2
# Draai B2 en z'n upstream afhankelijkheden
$ dvc repro B2
$ dvc repro train
Stage 'A2' didn't change, skipping
Stage 'B2' didn't change, skipping
Running stage 'train' with command: ...
Introductie tot dataversiebeheer met DVC