Regression: feature selection

Practicing Machine Learning Interview Questions in Python

Lisa Stuart

Data Scientist

Selecting the correct features:

  • Reduces overfitting
  • Improves accuracy
  • Increases interpretability
  • Reduces training time

Feature Selection Steps

1 https://www.analyticsindiamag.com/what-are-feature-selection-techniques-in-machine-learning/
Practicing Machine Learning Interview Questions in Python

Feature selection methods

  • Filter: Rank features based on statistical performance
  • Wrapper: Use an ML method to evaluate performance
  • Embedded: Iterative model training to extract features
  • Feature importance: tree-based ML models
Practicing Machine Learning Interview Questions in Python

Compare and contrast methods

Method Use an ML model Select best subset Can overfit
Filter No No No
Wrapper Yes Yes Sometimes
Embedded Yes Yes Yes
Feature importance Yes Yes Yes
Practicing Machine Learning Interview Questions in Python

Correlation coefficient statistical tests

Feature/Response Continuous Categorical
Continuous Pearson's Correlation LDA
Categorical ANOVA Chi-Square
Practicing Machine Learning Interview Questions in Python

Filter functions

Function returns
df.corr() Pearson's correlation matrix
sns.heatmap(corr_object) heatmap plot
abs() absolute value
Practicing Machine Learning Interview Questions in Python

Wrapper methods

  1. Forward selection (LARS-least angle regression)
    • Starts with no features, adds one at a time
  2. Backward elimination
    • Starts with all features, eliminates one at a time
  3. Forward selection/backward elimination combination (bidirectional elimination)
  4. Recursive feature elimination
    • RFECV
Practicing Machine Learning Interview Questions in Python

Embedded methods

  1. Lasso Regression
  2. Ridge Regression
  3. ElasticNet

Lasso, Ridge, and ElasticNet

Practicing Machine Learning Interview Questions in Python

Tree-based feature importance methods

  • Random Forest --> sklearn.ensemble.RandomForestRegressor
  • Extra Trees --> sklearn.ensemble.ExtraTreesRegressor
  • After model fit --> tree_mod.feature_importances_
Practicing Machine Learning Interview Questions in Python
Function returns
sklearn.svm.SVR support vector regression estimator
sklearn.feature_selection.RFECV recursive feature elimination with cross-val
rfe_mod.support_ boolean array of selected features
ref_mod.ranking_ feature ranking, selected=1
sklearn.linear_model.LinearRegression linear model estimator
sklearn.linear_model.LarsCV least angle regression with cross-val
LarsCV.score r-squared score
LarsCV.alpha_ estimated regularization parameter
Practicing Machine Learning Interview Questions in Python

Let's practice!

Practicing Machine Learning Interview Questions in Python

Preparing Video For Download...