Feature engineering voor Machine Learning in Python
Robet O'Callaghan
Director of Data Science, Ordergroove
scaler = StandardScaler()
scaler.fit(train[['col']])
train['scaled_col'] = scaler.transform(train[['col']])
# FIT SOME MODEL
# ....
test = pd.read_csv('test_csv')
test['scaled_col'] = scaler.transform(test[['col']])
train_mean = train[['col']].mean()
train_std = train[['col']].std()
cut_off = train_std * 3
train_lower = train_mean - cut_off
train_upper = train_mean + cut_off
# Subset train data
test = pd.read_csv('test_csv')
# Subset test data
test = test[(test[['col']] < train_upper) &
(test[['col']] > train_lower)]
Data leakage: Data gebruiken waar je geen toegang toe hebt bij het evalueren van je modelprestaties
Feature engineering voor Machine Learning in Python