Designing Machine Learning Workflows in Python
Dr. Chris Anagnostopoulos
Honorary Associate Professor
elec
dataset:
class=1
represents price went up relative to last 24 hours, and 0
means down. day period nswprice ... vicdemand transfer class
0 2 0.000000 0.056443 ... 0.422915 0.414912 1
1 2 0.553191 0.042482 ... 0.422915 0.414912 0
2 2 0.574468 0.044374 ... 0.422915 0.414912 1
[3 rows x 8 columns]
Sliding window
window = (t_now-window_size+1):t_now
sliding_window = elec.loc[window]
Expanding window
window = 0:t_now
expanding_window = elec.loc[window]
# t_now = 40000, window_size = 20000
clf_full = RandomForestClassifier().fit(X, y)
clf_sliding = RandomForestClassifier().fit(sliding_X, sliding_y)
# Use future data as test
test = elec.loc[t_now:elec.shape[0]]
test_X = test.drop('class', 1); test_y = test['class']
roc_auc_score(test_y, clf_full.predict(test_X))
roc_auc_score(test_y, clf_sliding.predict(test_X))
0.775
0.780
for w_size in range(10, 100, 10):
sliding = arrh.loc[
(t_now - w_size + 1):t_now
]
X = sliding.drop('class', 1)
y = sliding['class']
clf = GaussianNB()
clf.fit(X, y)
preds = clf.predict(test_X)
roc_auc_score(test_y, preds)
arrhythmia
dataset:
age sex height ... chV6_TwaveAmp chV6_QRSA chV6_QRSTA class
0 75 0 190 ... 2.9 23.3 49.4 0
1 56 1 165 ... 2.1 20.4 38.8 0
2 54 0 172 ... 3.4 12.3 49.0 0
3 55 0 175 ... 2.6 34.6 61.6 1
4 75 0 190 ... 3.9 25.4 62.8 0
[5 rows x 280 columns]
Designing Machine Learning Workflows in Python