BaggingClassifier: nuts and bolts

Ensemble Methods in Python

Román de las Heras

Data Scientist, Appodeal

Heterogeneous vs Homogeneous Functions

Heterogeneous Ensemble Function

het_est = HeterogeneousEnsemble(
    estimators=[('est1', est1), ('est2', est2), ...],
    # additional parameters
)

Homogeneous Ensemble Function

hom_est = HomogeneousEnsemble(
    est_base,
    n_estimators=chosen_number,
    # additional parameters
)
Ensemble Methods in Python

BaggingClassifier

Bagging Classifier example:

# Instantiate the base estimator ("weak" model)
clf_dt = DecisionTreeClassifier(max_depth=3)
# Build the Bagging classifier with 5 estimators
clf_bag = BaggingClassifier(
    clf_dt,
    n_estimators=5
)
# Fit the Bagging model to the training set
clf_bag.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf_bag.predict(X_test)
Ensemble Methods in Python

BaggingRegressor

Bagging Regressor example:

# Instantiate the base estimator ("weak" model)
reg_lr = LinearRegression()
# Build the Bagging regressor with 10 estimators
reg_bag = BaggingRegressor(
    reg_lr
)
# Fit the Bagging model to the training set
reg_bag.fit(X_train, y_train)
# Make predictions on the test set
y_pred = reg_bag.predict(X_test)
Ensemble Methods in Python

Out-of-bag score

  • Calculate the individual predictions using all estimators for which an instance was out of the sample
  • Combine the individual predictions
  • Evaluate the metric on those predictions:
    • Classification: accuracy
    • Regression: R^2
clf_bag = BaggingClassifier(
    clf_dt,
    oob_score=True
)
clf_bag.fit(X_train, y_train)
print(clf_bag.oob_score_)
0.9328125
pred = clf_bag.predict(X_test)
print(accuracy_score(y_test, pred))
0.9625
Ensemble Methods in Python

Now it's your turn!

Ensemble Methods in Python

Preparing Video For Download...