Log normalization

Preprocessing for Machine Learning in Python

James Chapman

Curriculum Manager, DataCamp

What is log normalization?

 

  • Useful for features with high variance
  • Applies logarithm transformation
  • Natural log using the constant $e$ ($\approx2.718$)
Preprocessing for Machine Learning in Python

What is log normalization?

 

  • Useful for features with high variance
  • Applies logarithm transformation
  • Natural log using the constant $e$ ($\approx2.718$)
  • $e^{3.4} = 30$

 

  • Captures relative changes, the magnitude of change, and keeps everything positive

 

Number Log
30 3.4
300 5.7
3000 8
Preprocessing for Machine Learning in Python

Log normalization in Python

print(df)
   col1   col2
0  1.00    3.0
1  1.20   45.5
2  0.75   28.0
3  1.60  100.0
print(df.var())
col1       0.128958
col2    1691.729167
dtype: float64
import numpy as np
df["log_2"] = np.log(df["col2"])
print(df)
   col1   col2     log_2
0  1.00    3.0  1.098612
1  1.20   45.5  3.817712
2  0.75   28.0  3.332205
3  1.60  100.0  4.605170
print(df[["col1", "log_2"]].var())
col1     0.128958
log_2    2.262886
dtype: float64
Preprocessing for Machine Learning in Python

Let's practice!

Preprocessing for Machine Learning in Python

Preparing Video For Download...