Combining feature selectors

Dimensionality Reduction in Python

Jeroen Boeye

Head of Machine Learning, Faktion

Lasso regressor

from sklearn.linear_model import Lasso

la = Lasso(alpha=0.05)
la.fit(X_train, y_train)

# Actual coefficients = [5 2 0]
print(la.coef_)

[ 4.91  1.76 0. ]

print(la.score(X_test, y_test))

0.974

LassoCV regressor

from sklearn.linear_model import LassoCV

lcv = LassoCV()

lcv.fit(X_train, y_train)

print(lcv.alpha_)

0.09

LassoCV regressor

mask = lcv.coef_ != 0

print(mask)

[ True True False ]

reduced_X = X.loc[:, mask]

Taking a step back

Random forest is combination of decision trees.
We can use combination of models for feature selection too.

Feature selection with LassoCV

from sklearn.linear_model import LassoCV

lcv = LassoCV()
lcv.fit(X_train, y_train)

lcv.score(X_test, y_test)

0.99

lcv_mask = lcv.coef_ != 0
sum(lcv_mask)

Feature selection with random forest

from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestRegressor

rfe_rf = RFE(estimator=RandomForestRegressor(), 
             n_features_to_select=66, step=5, verbose=1)

rfe_rf.fit(X_train, y_train)

rf_mask = rfe_rf.support_

Feature selection with gradient boosting

from sklearn.feature_selection import RFE
from sklearn.ensemble import GradientBoostingRegressor

rfe_gb = RFE(estimator=GradientBoostingRegressor(), 
             n_features_to_select=66, step=5, verbose=1)

rfe_gb.fit(X_train, y_train)

gb_mask = rfe_gb.support_

Combining the feature selectors

import numpy as np

votes = np.sum([lcv_mask, rf_mask, gb_mask], axis=0)

print(votes)

array([3, 2, 2, ..., 3, 0, 1])

mask = votes >= 2

reduced_X = X.loc[:, mask]

Let's practice!

Dimensionality Reduction in Python