Scaling data

Preprocessing for Machine Learning in Python

James Chapman

Curriculum Manager, DataCamp

What is feature scaling?

Features on different scales
Model with linear characteristics
Center features around $0$ and transform to variance of $1$
Transforms to approximately normal distribution

How to scale data

print(df)

   col1  col2   col3
0  1.00  48.0  100.0
1  1.20  45.5  101.3
2  0.75  46.2  103.5
3  1.60  50.0  104.0

print(df.var())

col1    0.128958
col2    4.055833
col3    3.526667
dtype: float64

How to scale data

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

df_scaled = pd.DataFrame(scaler.fit_transform(df), 
                         columns=df.columns)

print(df_scaled)

       col1      col2      col3
0 -0.442127  0.329683 -1.352726
1  0.200967 -1.103723 -0.553388
2 -1.245995 -0.702369  0.799338
3  1.487156  1.476409  1.106776

print(df_scaled.var())

col1    1.333333
col2    1.333333
col3    1.333333
dtype: float64

Let's practice!

Preprocessing for Machine Learning in Python