Fungsi loss Bagian I

Merancang Alur Kerja Machine Learning di Python

Dr. Chris Anagnostopoulos

Honorary Associate Professor

Dataset KDD '99 cup

kdd.iloc[0]
kdd.iloc[0]
duration                         51
protocol_type                   tcp
service                        smtp
flag                             SF
src_bytes                      1169
dst_bytes                       332
land                              0
...
dst_host_rerror_rate              0
dst_host_srv_rerror_rate          0
label                          good
Merancang Alur Kerja Machine Learning di Python

False positive vs false negative

Binerkan label:

kdd['label'] = kdd['label'] == 'bad'

Latih klasifier Gaussian Naive Bayes:

clf = GaussianNB().fit(X_train, y_train)
predictions = clf.predict(X_test)
results = pd.DataFrame({
    'actual': y_test,
    'predicted': predictions
})

Merancang Alur Kerja Machine Learning di Python

False positive vs false negative

Binerkan label:

kdd['label'] = kdd['label'] == 'bad'

Latih klasifier Gaussian Naive Bayes:

clf = GaussianNB().fit(X_train, y_train)
predictions = clf.predict(X_test)
results = pd.DataFrame({
    'actual': y_test,
    'predicted': predictions
})

Ada empat kemungkinan konfigurasi label vs prediksi: keduanya True, keduanya False, label True dengan prediksi False, dan label False dengan prediksi True. Kombinasi terakhir disorot di sini.

Merancang Alur Kerja Machine Learning di Python

False positive vs false negative

Binerkan label:

kdd['label'] = kdd['label'] == 'bad'

Latih klasifier Gaussian Naive Bayes:

clf = GaussianNB().fit(X_train, y_train)
predictions = clf.predict(X_test)
results = pd.DataFrame({
    'actual': y_test,
    'predicted': predictions
})

Sekarang, kombinasi label True dan prediksi False yang disorot.

Merancang Alur Kerja Machine Learning di Python

False positive vs false negative

Binerkan label:

kdd['label'] = kdd['label'] == 'bad'

Latih klasifier Gaussian Naive Bayes:

clf = GaussianNB().fit(X_train, y_train)
predictions = clf.predict(X_test)
results = pd.DataFrame({
    'actual': y_test,
    'predicted': predictions
})

Dua kasus saat prediksi sesuai dengan label kini disorot.

Merancang Alur Kerja Machine Learning di Python

Confusion matrix

conf_mat = confusion_matrix(
    ground_truth, predictions)
array([[9477,   19],
       [ 397, 2458]])
tn, fp, fn, tp = conf_mat.ravel()
(fp, fn)
(19, 397)

Sebuah confusion matrix yang menghitung kasus untuk masing-masing dari empat kombinasi yang disebutkan sebelumnya pada dataset ini.

Merancang Alur Kerja Machine Learning di Python

Metrik kinerja skalar

accuracy = 1-(fp + fn)/len(ground_truth)

recall = tp/(tp+fn)
fpr = fp/(tn+fp)
precision = tp/(tp+fp)
f1 = 2*(precision*recall)/(precision+recall)
accuracy_score(ground_truth, predictions)
recall_score(ground_truth, predictions)
precision_score(ground_truth, predictions)
f1_score(ground_truth, predictions)
Merancang Alur Kerja Machine Learning di Python

False positive vs false negative

Klasifier A:

tn, fp, fn, tp = confusion_matrix(
    ground_truth, predictions_A).ravel()
(fp,fn)
(3, 3)
cost = 10 * fp + fn
33

Klasifier B:

tn, fp, fn, tp = confusion_matrix(
    ground_truth, predictions_B).ravel()
(fp,fn)
(0, 26)

cost = 10 * fp + fn
26
Merancang Alur Kerja Machine Learning di Python

Klasifier mana yang lebih baik?

Merancang Alur Kerja Machine Learning di Python

Preparing Video For Download...