Regresi Logistik

Machine Learning dengan PySpark

Andrew Collier

Data Scientist, Fathom Data

Kurva Logistik

Sebuah kurva logistik.

Machine Learning dengan PySpark

Kurva Logistik

Kurva logistik dengan arsiran di atas ambang

Machine Learning dengan PySpark

Kurva Logistik

Kurva logistik dengan arsiran di bawah ambang

Machine Learning dengan PySpark

Kurva Logistik

Kurva logistik digeser ke kanan

Machine Learning dengan PySpark

Kurva Logistik

Kurva logistik digeser ke kiri

Machine Learning dengan PySpark

Kurva Logistik

Kurva logistik dengan transisi bertahap

Machine Learning dengan PySpark

Kurva Logistik

Kurva logistik dengan transisi cepat

Machine Learning dengan PySpark

Mobil: ulasan kembali

Siapkan untuk pemodelan:

  • gabungkan prediktor ke satu kolom (features) dan
  • bagi data menjadi train dan test.
+---+----+------+------+----+-----------+----------------------------------+-----+
|cyl|size|mass  |length|rpm |consumption|features                          |label|
+---+----+------+------+----+-----------+----------------------------------+-----+
|6  |3.0 |1451.0|4.775 |5200|9.05       |[6.0,3.0,1451.0,4.775,5200.0,9.05]|1.0  |
|4  |2.2 |1129.0|4.623 |5200|6.53       |[4.0,2.2,1129.0,4.623,5200.0,6.53]|0.0  |
|4  |2.2 |1399.0|4.547 |5600|7.84       |[4.0,2.2,1399.0,4.547,5600.0,7.84]|1.0  |
|4  |1.8 |1147.0|4.343 |6500|7.84       |[4.0,1.8,1147.0,4.343,6500.0,7.84]|0.0  |
|4  |1.6 |1111.0|4.216 |5750|9.05       |[4.0,1.6,1111.0,4.216,5750.0,9.05]|0.0  |
+---+----+------+------+----+-----------+----------------------------------+-----+
Machine Learning dengan PySpark

Bangun model Regresi Logistik

from pyspark.ml.classification import LogisticRegression

Buat classifier Regresi Logistik.

logistic = LogisticRegression()

Latih dengan data train.

logistic = logistic.fit(cars_train)
Machine Learning dengan PySpark

Prediksi

prediction = logistic.transform(cars_test)
+-----+----------+---------------------------------------+
|label|prediction|probability                            |
+-----+----------+---------------------------------------+
|0.0  |0.0       |[0.8683802216422138,0.1316197783577862]|
|0.0  |1.0       |[0.1343792056399585,0.8656207943600416]|
|0.0  |0.0       |[0.9773546766387631,0.0226453233612368]|
|1.0  |1.0       |[0.0170508265586195,0.9829491734413806]|
|1.0  |0.0       |[0.6122241729292978,0.3877758270707023]|
+-----+----------+---------------------------------------+
Machine Learning dengan PySpark

Precision dan Recall

Seberapa baik model pada data test?

Lihat confusion matrix.

+-----+----------+-----+
|label|prediction|count|
+-----+----------+-----+
|  1.0|       1.0|    8| - TP (true positive)
|  0.0|       1.0|    4| - FP (false positive)
|  1.0|       0.0|    2| - FN (false negative)
|  0.0|       0.0|   10| - TN (true negative)
+-----+----------+-----+
# Precision (positif)
TP / (TP + FP)
0.6666666666666666
# Recall (positif)
TP / (TP + FN)
0.8
Machine Learning dengan PySpark

Metrik berbobot

from pyspark.ml.evaluation import MulticlassClassificationEvaluator

evaluator = MulticlassClassificationEvaluator()

evaluator.evaluate(prediction, {evaluator.metricName: 'weightedPrecision'})
0.7638888888888888

Metrik lain:

  • weightedRecall
  • accuracy
  • f1
Machine Learning dengan PySpark

ROC dan AUC

Kurva ROC

ROC = "Receiver Operating Characteristic"

  • TP versus FP
  • ambang = 0 (kanan atas)
  • ambang = 1 (kiri bawah)

AUC = "Area under the curve"

  • idealnya AUC = 1
Machine Learning dengan PySpark

Mari lakukan Regresi Logistik!

Machine Learning dengan PySpark

Preparing Video For Download...