Pelatihan model dengan GitHub Actions

CI/CD untuk Machine Learning

Ravi Bhadauria

Machine Learning Engineer

Dataset: Prediksi Cuaca di Australia

  • Klasifikasi biner
    • Memprediksi hujan besok
  • 5 fitur kategorikal
    • Location
    • WindGustDir
    • WindDir9am
    • WindDir3pm
    • RainToday
  • 17 fitur numerik
    • MinTemp
    • MaxTemp
    • Rainfall
    • Evaporation
    • ...
    • WindGustSpeed
    • Cloud3pm
    • Temp9am
    • RISK_MM
1 https://www.kaggle.com/datasets/rever3nd/weather-data
CI/CD untuk Machine Learning

Alur kerja pemodelan

  • Prapemrosesan data
    • Ubah fitur kategorikal menjadi numerik
    • Ganti nilai hilang
    • Skala fitur
  • Random Forest Classifier
    • max_depth = 2, n_estimators = 50
  • Metrik standar pada data uji
    • Plot kinerja
      • Plot confusion matrix
CI/CD untuk Machine Learning

Persiapan data: target encoding

def target_encode_categorical_features(
    df: pd.DataFrame, categorical_columns: List[str], target_column: str
) -> pd.DataFrame:
    encoded_data = df.copy()

    # Iterate through categorical columns
    for col in categorical_columns:
        # Calculate mean target value for each category
        encoding_map = df.groupby(col)[target_column].mean().to_dict()

        # Apply target encoding
        encoded_data[col] = encoded_data[col].map(encoding_map)

    return encoded_data
1 https://maxhalford.github.io/blog/target-encoding/
CI/CD untuk Machine Learning

Imputasi dan penskalaan

def impute_and_scale_data(df_features: pd.DataFrame) -> pd.DataFrame:
    # Impute data with mean strategy
    imputer = SimpleImputer(strategy="mean")
    X_preprocessed = imputer.fit_transform(df_features.values)

    # Scale and fit with zero mean and unit variance
    scaler = StandardScaler()
    X_preprocessed = scaler.fit_transform(X_preprocessed)

    return pd.DataFrame(X_preprocessed, columns=df_features.columns)
CI/CD untuk Machine Learning

Pelatihan

  • Pembagian train/test
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
  data.drop(TARGET_COLUMN), data[TARGET_COLUMN], random_state=1993)
  • Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(
  max_depth=2, n_estimators=50, random_state=1993)
clf.fit(X_train, y_train)
CI/CD untuk Machine Learning

Metrik

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

# Calculate predictions
y_pred = model.predict(X_test)

# Calculate accuracy accuracy = accuracy_score(y_test, y_pred)
# Calculate precision precision = precision_score(y_test, y_pred)
# Calculate recall recall = recall_score(y_test, y_pred)
# Calculate f1 score f1 = f1_score(y_test, y_pred)
1 https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics
CI/CD untuk Machine Learning

Plot

from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay.from_estimator(model, X_test, y_test,cmap=plt.cm.Blues)

Plot matriks kebingungan pada set uji

CI/CD untuk Machine Learning

Alur kerja GitHub Actions

Diagram alur CI pelatihan model

  • Continuous Machine Learning (CML)
    • Alat CI/CD untuk Machine Learning
    • Integrasi GitHub Actions
      • Menyediakan mesin pelatihan
      • Melakukan pelatihan dan evaluasi
      • Membandingkan eksperimen
      • Memantau dataset
      • Laporan visual
1 https://cml.dev/ 2 https://martinfowler.com/bliki/FeatureBranch.html
CI/CD untuk Machine Learning

Perintah CML

# Enable setup-cml action to be used later
- uses: iterative/setup-cml@v1
- name: Train model
  run: |
    # Your ML workflow goes here
    pip install -r requirements.txt
    python3 train.py
1 https://www.markdownguide.org/basic-syntax/#images
CI/CD untuk Machine Learning

Perintah CML

- name: Write CML report
  run: |
    # Add results and plots to markdown
    cat results.txt >> report.md
    echo "![training graph](./graph.png)" >> report.md

# Create comment from markdown report cml comment create report.md
env: REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
CI/CD untuk Machine Learning

Keluaran

Tangkapan layar halaman pull request dengan komentar yang dibuat oleh CML

CI/CD untuk Machine Learning

Ayo berlatih!

CI/CD untuk Machine Learning

Preparing Video For Download...