Overfitting and ensembling

Machine Learning for Finance in Python

Nathan George

Data Science Professor

overfitting

Machine Learning for Finance in Python

Simplify your model

limited net

Machine Learning for Finance in Python

Neural network options

Options to combat overfitting:

  • Decrease number of nodes
  • Use L1/L2 regulariation
  • Dropout
  • Autoencoder architecture
  • Early stopping
  • Adding noise to data
  • Max norm constraints
  • Ensembling
Machine Learning for Finance in Python

Dropout

dropout

Machine Learning for Finance in Python

Dropout in keras

from keras.layers import Dense, Dropout

model = Sequential() model.add(Dense(500, input_dim=scaled_train_features.shape[1], activation='relu')) model.add(Dropout(0.5)) model.add(Dense(100, activation='relu')) model.add(Dense(1, activation='linear'))
Machine Learning for Finance in Python

Test set comparison

R$^2$ values on AMD without dropout:

  • train: 0.91
  • test: -0.72

With dropout:

  • train: 0.46
  • test: -0.22
Machine Learning for Finance in Python

Ensembling

random forest

Machine Learning for Finance in Python

Implementing ensembling

# make predictions from 2 neural net models
test_pred1 = model_1.predict(scaled_test_features)
test_pred2 = model_2.predict(scaled_test_features)

# horizontally stack predictions and take the average across rows test_preds = np.mean(np.hstack((test_pred1, test_pred2)), axis=1)
Machine Learning for Finance in Python

Comparing the ensemble

Model 1 R$^2$ score on test set:

  • -0.179

model 2:

  • -0.148

ensemble (averaged predictions):

  • -0.146
Machine Learning for Finance in Python

Dropout and ensemble!

Machine Learning for Finance in Python

Preparing Video For Download...