Averaging

Ensemble Methods in Python

Román de las Heras

Data Scientist, Appodeal

Counting Jelly Beans

berry-festival-jelly-bean-guess.jpg

How to provide a good estimate?

  • Guessing (random number)
  • Volume approximation
  • Many more approaches

Actual Value ~ mean(estimates)

Ensemble Methods in Python

Averaging (Soft Voting)

Properties

  • Classification & Regression problems
  • Soft Voting: Mean
    • Regression: mean of predicted values
    • Classification: mean of predicted probabilities
  • Need at least 2 estimators
Ensemble Methods in Python

Averaging ensemble with scikit-learn

Averaging Classifier

from sklearn.ensemble import VotingClassifier

clf_voting = VotingClassifier( estimators=[ ('label1', clf_1), ('label2', clf_2), ... ('labelN', clf_N)], voting='soft', weights=[w_1, w_2, ..., w_N] )

Averaging Regressor

from sklearn.ensemble import VotingRegressor

reg_voting = VotingRegressor( estimators=[ ('label1', reg_1), ('label2', reg_2), ... ('labelN', reg_N)], weights=[w_1, w_2, ..., w_N] )
Ensemble Methods in Python

scikit-learn example

# Instantiate the individual models
clf_knn = KNeighborsClassifier(5)
clf_dt = DecisionTreeClassifier()
clf_lr = LogisticRegression()
# Create an averaging classifier
clf_voting = VotingClassifier(
    estimators=[
       ('knn', clf_knn), 
       ('dt', clf_dt), 
       ('lr', clf_lr)],
    voting='soft',
    weights=[1, 2, 1]
)
Ensemble Methods in Python

Game of Thrones deaths

Target:

  • Predict whether a character is alive or not

Features:

  • Age
  • Gender
  • Books of appearance
  • Popularity
  • Whether relatives are alive or not

shutterstock_1038048793.jpg

Ensemble Methods in Python

Time to practice!

Ensemble Methods in Python

Preparing Video For Download...