Scaling data for machine learning

Analyzing IoT Data in Python

Matthias Voppichler

IT Developer

Evaluate the model

logreg = LogisticRegression()
logreg.fit(X_train, y_train)

print(logreg.score(X_test, y_test))
0.78145113
Analyzing IoT Data in Python

Scaling

scikit-learn's StandardScaler
  • remove mean
  • scale data to variance

Picture explaining scaling - features are now centered around 0

Analyzing IoT Data in Python

Unscaled data

print(data.head())
                     humidity  temperature  pressure
timestamp                                           
2018-10-01 00:00:00      81.0         11.8    1013.4
2018-10-01 00:15:00      79.7         11.9    1013.1
2018-10-01 00:30:00      81.0         12.1    1013.0
2018-10-01 00:45:00      79.7         11.7    1012.7
2018-10-01 01:00:00      84.3         11.2    1012.6
Analyzing IoT Data in Python

Standardscaler

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
sc.fit(data)
print(sc.mean_) print(sc.var_)
[  71.8826716    14.17002019 1018.17042396]
[372.78261022  20.37926608  53.67519188]
data_scaled = sc.transform(data)
Analyzing IoT Data in Python

Standardscaler

df_scaled = pd.DataFrame(data_scaled, 
                         columns=data.columns, 
                         index=data.index)
print(data_scaled.head())
                     humidity  temperature  pressure
timestamp                                           
2018-10-01 00:00:00  0.472215    -0.524998 -0.651134
2018-10-01 00:15:00  0.404884    -0.502847 -0.692082
2018-10-01 00:30:00  0.472215    -0.458543 -0.705731
2018-10-01 00:45:00  0.404884    -0.547150 -0.746679
2018-10-01 01:00:00  0.643132    -0.657908 -0.760329
Analyzing IoT Data in Python

Evaluate the model

logreg = LogisticRegression()
logreg.fit(X_train_scaled, y_train_scaled)

print(logreg.score(X_test_scaled, y_test_scaled))
0.88145113
Analyzing IoT Data in Python

Let's practice!

Analyzing IoT Data in Python

Preparing Video For Download...