CI/CD untuk Machine Learning
Ravi Bhadauria
Machine Learning Engineer
outsstages:
preprocess:
...
train:
...
outs:
- metrics.json
- confusion_matrix.png
metricsstages: preprocess: ... train: ... outs: - confusion_matrix.pngmetrics: - metrics.json: cache: false
-> dvc metrics showPath accuracy f1_score precision recall metrics.json 0.947 0.8656 0.988 0.7702
Ubah satu hiperparameter dan jalankan ulang dvc repro
-> dvc metrics diffPath Metric HEAD workspace Change metrics.json accuracy 0.947 0.9995 0.0525 metrics.json f1_score 0.8656 0.9989 0.1333 metrics.json precision 0.988 0.9993 0.0113 metrics.json recall 0.7702 0.9986 0.2284
setup-dvcsteps: ... - name: Setup DVC uses: iterative/setup-dvc@v1- name: Jalankan pipeline DVC run: dvc repro
- name: Tulis laporan CML env: REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | # Cetak metrik cabang saat ini dvc metrics show --md >> report.md# Bandingkan metrik dengan cabang main git fetch --prune dvc metrics diff --md main >> report.md# Buat laporan CML cml comment create report.md

scatter - plot sebarlinear - plot linear interaktifsimple - plot linear kustom non-interaktifsmooth - plot linear dengan pemulusanconfusion - matriks kebingunganconfusion_normalized - matriks kebingungan dengan nilai dinormalisasi ke rentang <0, 1>bar_horizontal - plot batang horizontalbar_horizontal_sorted - plot batang horizontal diurutkan menurut ukuran batangstages: train: ... plots: - predictions.csv: # Nama file yang memuat prediksi template: confusion # Gaya plotx: predicted_label # Nama kolom sumbu X di file csv y: true_label # Nama kolom sumbu Y di file csv x_label: 'Predicted label' y_label: 'True label' title: Confusion matrixcache: false # Simpan di Git
-> dvc plots show predictions.csv
file:///path/to/index.html

-> dvc plots diff --target predictions.csv main
file:///path/to/index.html

# Perubahan di Python
y_proba = model.predict_proba(X_test)
fpr, tpr, _ = roc_curve(y_test,
y_proba[:, 1])
# Perubahan di dvc.yaml
plots:
- roc_curve.csv:
template: simple
x: fpr
y: tpr
x_label: 'False Positive Rate'
y_label: 'True Positive Rate'
title: ROC curve
cache: false

CI/CD untuk Machine Learning