Persiapan data untuk NannyML

Pemantauan Machine Learning dengan Python

Hakim Elakhrass

Co-founder and CEO of NannyML

Memuat data

dataset_name = "green_taxi_dataset.csv"
data = pd.read_csv(dataset_name)
data.head()

Gambar adalah tangkapan layar lima baris pertama dari dataset.

Pemantauan Machine Learning dengan Python

Memproses data

# Create data partition
data['partition'] = pd.cut(
    data['lpep_pickup_datetime'],
    bins= [pd.to_datetime('2016-12-01'),
           pd.to_datetime('2016-12-08'),
           pd.to_datetime('2016-12-16'),
           pd.to_datetime('2017-01-01')],
    right=False,
    labels= ['train', 'test', 'prod']
)
Pemantauan Machine Learning dengan Python

Membagi data

# Target column name
target = 'tip_amount'
# Features column name
features = ["PULocationID", "DOLocationID", "trip_distance", "VendorID", "pickup_time"]
# Train set
X_train = data.loc[data['partition'] == 'train', features]
y_train = data.loc[data['partition'] == 'train', target]

# Test set (later reference set)
X_test = data.loc[data['partition'] == 'test', features]
y_test = data.loc[data['partition'] == 'test', target]

# Production set (later analysis set)
X_prod = data.loc[data['partition'] == 'prod', features]
y_prod = data.loc[data['partition'] == 'prod', target]
Pemantauan Machine Learning dengan Python

Membangun model

  • Latih LGBMRegressor dengan pustaka lightgbm
  • Evaluasi model pada set uji
  • Deploy model
# Training the model
model = LGBMRegressor(random_state=42)
model.fit(X_train, y_train)

# Making predictions
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

# Evaluating the model on train and test set
mae_train = MAE(y_train, y_pred_train)
mae_test = MAE(y_test, y_pred_test)

# Deploying the model to production
y_pred_prod = model.predict(X_prod)
Pemantauan Machine Learning dengan Python

Membuat set referensi dan analisis

Periode referensi

  • Menggunakan set uji

  • Memerlukan ground truth

  • Menetapkan kinerja baseline

Periode analisis

  • Data produksi terbaru

  • Ground truth opsional

  • NannyML menganalisis drift data dan kinerja

# Creating reference set
reference = X_test.copy() # Test set features
reference['y_pred'] = y_pred_test # Predictions
reference['tip_amount'] = y_test # Labels
reference = reference.join(
    data['lpep_pickup_datetime']) # Timestamp
# Creating analysis set
analysis = X_prod.copy() # Production features
analysis['y_pred'] = y_pred_prod # Predictions
analysis = analysis.join(
    data['lpep_pickup_datetime']) # Timestamp
Pemantauan Machine Learning dengan Python

Contoh set referensi

  • Timestamp - waktu observasi terjadi (opsional)
  • Fitur - fitur yang diberikan ke model
  • Keluaran model
    • Prediksi - skor prediksi dari model
    • Label kelas prediksi - skor probabilitas yang ditreshold
  • Target - berisi ground truth

Gambar menampilkan lima baris pertama dari set referensi.

Pemantauan Machine Learning dengan Python

Ayo berlatih!

Pemantauan Machine Learning dengan Python

Preparing Video For Download...