Scaling data

Preprocessing for Machine Learning in Python

James Chapman

Curriculum Manager, DataCamp

What is feature scaling?

  • Features on different scales
  • Model with linear characteristics
  • Center features around $0$ and transform to variance of $1$
  • Transforms to approximately normal distribution
Preprocessing for Machine Learning in Python

How to scale data

print(df)
   col1  col2   col3
0  1.00  48.0  100.0
1  1.20  45.5  101.3
2  0.75  46.2  103.5
3  1.60  50.0  104.0
print(df.var())
col1    0.128958
col2    4.055833
col3    3.526667
dtype: float64
Preprocessing for Machine Learning in Python

How to scale data

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print(df_scaled)
       col1      col2      col3
0 -0.442127  0.329683 -1.352726
1  0.200967 -1.103723 -0.553388
2 -1.245995 -0.702369  0.799338
3  1.487156  1.476409  1.106776
print(df_scaled.var())
col1    1.333333
col2    1.333333
col3    1.333333
dtype: float64
Preprocessing for Machine Learning in Python

Let's practice!

Preprocessing for Machine Learning in Python

Preparing Video For Download...