Dimensionality Reduction in Python
Jeroen Boeye
Head of Machine Learning, Faktion
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
print(accuracy_score(y_test, rf.predict(X_test)))
0.99
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
print(rf.feature_importances_)
array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.04, 0. , 0.01, 0.01,
0. , 0. , 0. , 0. , 0.01, 0.01, 0. , 0. , 0. , 0. , 0.05,
...
0. , 0.14, 0. , 0. , 0. , 0.06, 0. , 0. , 0. , 0. , 0. ,
0. , 0.07, 0. , 0. , 0.01, 0. ])
print(sum(rf.feature_importances_))
1.0
mask = rf.feature_importances_ > 0.1
print(mask)
array([False, False, ..., True, False])
X_reduced = X.loc[:, mask]
print(X_reduced.columns)
Index(['chestheight', 'neckcircumference', 'neckcircumferencebase',
'shouldercircumference'], dtype='object')
from sklearn.feature_selection import RFE
rfe = RFE(estimator=RandomForestClassifier(),
n_features_to_select=6, verbose=1)
rfe.fit(X_train,y_train)
Fitting estimator with 94 features.
Fitting estimator with 93 features
...
Fitting estimator with 8 features.
Fitting estimator with 7 features.
print(accuracy_score(y_test, rfe.predict(X_test))
0.99
from sklearn.feature_selection import RFE
rfe = RFE(estimator=RandomForestClassifier(),
n_features_to_select=6, step=10, verbose=1)
rfe.fit(X_train,y_train)
Fitting estimator with 94 features.
Fitting estimator with 84 features.
...
Fitting estimator with 24 features.
Fitting estimator with 14 features.
print(X.columns[rfe.support_])
Index(['biacromialbreadth', 'handbreadth', 'handcircumference',
'neckcircumference', 'neckcircumferencebase', 'shouldercircumference'], dtype='object')
Dimensionality Reduction in Python