Feature importances and gradient boosting

Machine Learning for Finance in Python

Nathan George

Data Science Professor

day of week split

Machine Learning for Finance in Python

200-day SMA split

Machine Learning for Finance in Python

Extracting feature importances

from sklearn.ensemble import RandomForestRegressor

random_forest = RandomForestRegressor()
random_forest.fit(train_features, train_targets)

feature_importances = random_forest.feature_importances_

print(feature_importances)
[0.07586547 0.10697602 0.12215955 0.23969227 0.29010304 0.0314028
 0.11977058 0.00276721 0.00246329 0.0026431  0.00615667]
Machine Learning for Finance in Python

Sorting and plotting

# feature importances from random forest model
importances = random_forest.feature_importances_

# index of greatest to least feature importances
sorted_index = np.argsort(importances)[::-1]

x = range(len(importances)) # create tick labels labels = np.array(feature_names)[sorted_index] plt.bar(x, importances[sorted_index], tick_label=labels) # rotate tick labels to vertical plt.xticks(rotation=90) plt.show()
Machine Learning for Finance in Python

feature importance plot

Machine Learning for Finance in Python

Linear models vs gradient boosting

Machine Learning for Finance in Python

boosting diagram

Machine Learning for Finance in Python

Boosted models

Available boosted models:

  • Gradient boosting
  • Adaboost
Machine Learning for Finance in Python

Fitting a gradient boosting model

from sklearn.ensemble import GradientBoostingRegressor

gbr = GradientBoostingRegressor(max_features=4,
                                learning_rate=0.01,
                                n_estimators=200,
                                subsample=0.6,
                                random_state=42)

gbr.fit(train_features, train_targets)
Machine Learning for Finance in Python

Get boosted!

Machine Learning for Finance in Python

Preparing Video For Download...