Ensemble methods

Practicing Machine Learning Interview Questions in Python

Lisa Stuart

Data Scientist

Ensemble learning techniques

  • Bootstrap Aggregation
  • Boosting
  • Model stacking
Practicing Machine Learning Interview Questions in Python

Error measurement

Practicing Machine Learning Interview Questions in Python

Short trees

Practicing Machine Learning Interview Questions in Python

Tall trees

Practicing Machine Learning Interview Questions in Python

Fat trees

Practicing Machine Learning Interview Questions in Python

Linear model

Practicing Machine Learning Interview Questions in Python

Bias

Linear relationship assumption (incorrect)

  • High bias
  • Underfitting
  • Poor model generalization
  • Increasing complexity decreases bias
Practicing Machine Learning Interview Questions in Python

Complex model

Practicing Machine Learning Interview Questions in Python

Variance

High complexity models:

  • High variance
  • Overfitting
  • Poor model generalization
Practicing Machine Learning Interview Questions in Python

Bias-Variance Trade-Off

1 Source: Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman
Practicing Machine Learning Interview Questions in Python

Bagging (Bootstrap aggregation)

  • Bootstrapped samples
    • Subset selected with replacement
    • Same row of data may be chosen
  • Model built for each sample
  • Average the output
  • Reduces variance

1 https://medium.com/@rrfd/boosting-bagging-and-stacking-ensemble-methods-with-sklearn-and-mlens-a455c0c982de
Practicing Machine Learning Interview Questions in Python

Boosting

  • Multiple models built sequentially
  • Incorrect predictions are weighted
  • Reduces bias

1 https://blog.bigml.com/2017/03/14/introduction-to-boosted-trees/
Practicing Machine Learning Interview Questions in Python

Model stacking

  • Model 1 predictions
  • Model 2 predictions...
  • Model N predictions
  • Stack for highest accuracy model
    • Uses base model (Model N) predictions as input to 2nd level model

1 http://supunsetunga.blogspot.com/
Practicing Machine Learning Interview Questions in Python

Vecstack package

# import modules
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import AdaBoostClassifier
from xgboost import XGBClassifier
from vecstack import stacking

# Create list: stacked_models
stacked_models = [BaggingClassifier(n_estimators=25, random_state=123), AdaBoostClassifier(n_estimators=25, random_state=123)]

# Stack the models: stack_train, stack_test
stack_train, stack_test = stacking(stacked_models, X_train, y_train, X_test, regression=False, mode='oof_pred_bag', 
                                   needs_proba=False, metric=accuracy_score, n_folds=4, stratified=True, shuffle=True, random_state=0, verbose=2)

# Initialize and fit 2nd level model
final_model = XGBClassifier(random_state=123, n_jobs=-1, learning_rate=0.1, n_estimators=10, max_depth=3)
final_model_fit = final_model.fit(stack_train, y_train)

# Predict: stacked_pred
stacked_pred = final_model.predict(stack_test)

# Final prediction score
print('Final prediction score: [%.8f]' % accuracy_score(y_test, stacked_pred))
1 https://towardsdatascience.com/automate-stacking-in-python-fc3e7834772e
Practicing Machine Learning Interview Questions in Python

Ensemble functions

Algorithm Function
Bootstrap aggregation sklearn.ensemble.BaggingClassifier()
Boosting sklearn.ensemble.AdaBoostClassifier()
XGBoost xgboost.XGBClassifier()
Practicing Machine Learning Interview Questions in Python

Bagging vs boosting

Technique Bias Variance
Bootstrap aggregation (Bagging) Increase Decrease
Boosting Decrease Increase
Practicing Machine Learning Interview Questions in Python

Major ensemble techniques MCQ

Which of the following statements is true about the three major techniques used for ensemble methods in Machine Learning? Select the statement that is true:

  • Boosting methods decrease model variance.
  • Boosting methods increase the predictive abilities of a classifier.
  • Bootstrap aggregation, or bagging, decreases model bias.
  • Model stacking takes the predictions from individual models and combines them to create a higher accuracy model.
Practicing Machine Learning Interview Questions in Python

Major ensemble techniques MCQ: answer

Which of the following statements is true about the three major techniques used for ensemble methods in Machine Learning? The correct answer is:

  • Model stacking takes the predictions from individual models and combines them to create a higher accuracy model. (The final model obtained from the predictions of several individual models almost always outperforms the individuals.)
Practicing Machine Learning Interview Questions in Python

Major ensemble techniques MCQ: incorrect answers

Which of the following statements is true about the three major techniques used for ensemble methods in Machine Learning?

  • Boosting methods decrease model variance. (Boosting methods decrease model bias which, at the same time, helps increase variance to find that sweet spot for best model generalization.)
  • Boosting methods increase the predictive abilities of a classifier. (Boosting decreases model bias, which may or may not increase the predictive abilities of a classifier.)
  • Bootstrap aggregation, or bagging, decreases model bias. (Bagging decreases model variance.)
Practicing Machine Learning Interview Questions in Python

Let's practice!

Practicing Machine Learning Interview Questions in Python

Preparing Video For Download...