Scaling data and KNN Regression

Machine Learning for Finance in Python

Nathan George

Data Science Professor

feature importances

Machine Learning for Finance in Python

Feature selection: remove weekdays

print(feature_names)
['10d_close_pct',
 '14-day SMA',
 '14-day RSI',
 '200-day SMA',
 '200-day RSI',
 'Adj_Volume_1d_change',
 'Adj_Volume_1d_change_SMA',
 'weekday_1',
 'weekday_2',
 'weekday_3',
 'weekday_4']
print(feature_names[:-4])
['10d_close_pct',
 '14-day SMA',
 '14-day RSI',
 '200-day SMA',
 '200-day RSI',
 'Adj_Volume_1d_change',
 'Adj_Volume_1d_change_SMA']
Machine Learning for Finance in Python

Remove weekdays

train_features = train_features.iloc[:, :-4]
test_features = test_features.iloc[:, :-4]
Machine Learning for Finance in Python

2D feature plot

Machine Learning for Finance in Python

knn prediction unknown

Machine Learning for Finance in Python

knn prediction with 2 nearest points

Machine Learning for Finance in Python

minowski distance

Machine Learning for Finance in Python

large and small feature

Machine Learning for Finance in Python

Scaling options

Scaling options:

  • min-max
  • standardization
  • median-MAD
  • map to arbitrary function (e.g. sigmoid, tanh)
Machine Learning for Finance in Python

2D feature plots before and after scaling

Machine Learning for Finance in Python

sklearn's scale

from sklearn.preprocessing import scale

sc = scale()
scaled_train_features = sc.fit_transform(train_features)
scaled_test_features = sc.transform(test_features)
Machine Learning for Finance in Python

before and after standardization

Machine Learning for Finance in Python

Making subplots

# create figure and list containing axes
f, ax = plt.subplots(nrows=2, ncols=1)

# plot histograms of before and after scaling train_features.iloc[:, 2].hist(ax=ax[0]) ax[1].hist(scaled_train_features[:, 2]) plt.show()
Machine Learning for Finance in Python

Scale data and use KNN!

Machine Learning for Finance in Python

Preparing Video For Download...