Practicing Machine Learning Interview Questions in Python
Lisa Stuart
Data Scientist

| Method | Use an ML model | Select best subset | Can overfit |
|---|---|---|---|
| Filter | No | No | No |
| Wrapper | Yes | Yes | Sometimes |
| Embedded | Yes | Yes | Yes |
| Feature importance | Yes | Yes | Yes |
| Feature/Response | Continuous | Categorical |
|---|---|---|
| Continuous | Pearson's Correlation | LDA |
| Categorical | ANOVA | Chi-Square |
| Function | returns |
|---|---|
df.corr() |
Pearson's correlation matrix |
sns.heatmap(corr_object) |
heatmap plot |
abs() |
absolute value |

sklearn.ensemble.RandomForestRegressorsklearn.ensemble.ExtraTreesRegressortree_mod.feature_importances_| Function | returns |
|---|---|
sklearn.svm.SVR |
support vector regression estimator |
sklearn.feature_selection.RFECV |
recursive feature elimination with cross-val |
rfe_mod.support_ |
boolean array of selected features |
ref_mod.ranking_ |
feature ranking, selected=1 |
sklearn.linear_model.LinearRegression |
linear model estimator |
sklearn.linear_model.LarsCV |
least angle regression with cross-val |
LarsCV.score |
r-squared score |
LarsCV.alpha_ |
estimated regularization parameter |
Practicing Machine Learning Interview Questions in Python