Pengantar Versioning Data dengan DVC
Ravi Bhadauria
Machine Learning Engineer
Urutan tahap yang mendefinisikan alur kerja Machine Learning dan dependensi
Didefinisikan di file dvc.yaml
deps)params)cmd)outs)metrics dan plotsdvc stage adddvc stage add \
-n preprocess \
-p params.yaml:preprocess \
-d raw_data.csv \
-d preprocess.py \
-o processed_data.csv \
python3 preprocess.py
stages:
preprocess:
cmd: python3 preprocess.py
params:
# Keys from params.yaml
- params.yaml
- preprocess
deps:
- preprocess.py
- raw_data.csv
outs:
- processed_data.csv
dvc stage add \
-n train_and_evaluate \
-p train_and_evaluate \
-d train_and_evaluate.py \
-d processed_data.csv \
-o plots.png \
-o metrics.json \
python3 train_and_evaluate.py
stages:
train_and_evaluate:
cmd: python3 train_and_evaluate.py
params:
# Lewati penentuan file parameter
# Default ke params.yaml
- train_and_evaluate
deps:
- processed_data.csv
- train_and_evaluate.py
outs:
- plots.png
- metrics.json
dvc stage add beberapa kaliERROR: Stage 'train_and_evaluate'
sudah ada di 'dvc.yaml'.
Gunakan '--force' untuk menimpa.
dvc stage add --forcedvc stage add --force \
-n train_and_evaluate \
-p train_and_evaluate \
-d train_and_evaluate.py \
-d processed_data.csv \
-o plots.png \
-o metrics.json \
python3 train_and_evaluate.py
# Cetak DAG di terminal
dvc dag
# Tampilkan DAG hingga langkah tertentu
dvc dag <target>
+------------+
| preprocess |
+------------+
*
*
*
+--------------------+
| train_and_evaluate |
+--------------------+
# Tampilkan keluaran langkah sebagai node
dvc dag --outs
+-------------------------------+
| processed_dataset/weather.csv |
+-------------------------------+
*** ***
*** ***
** **
+--------------+ +-----------+
| metrics.json | | plots.png |
+--------------+ +-----------+
dvc dag --dot
strict digraph {
"preprocess";
"train_and_evaluate";
"preprocess" -> "train_and_evaluate";
}

Pengantar Versioning Data dengan DVC