Standardization

Preprocessing for Machine Learning in Python

James Chapman

Curriculum Manager, DataCamp

What is standardization?

 

Standardization: transform continuous data to appear normally distributed

  • scikit-learn models assume normally distributed data
  • Using non-normal training data can introduce bias
  • Log normalization and feature scaling in this course
  • Applied to continuous numerical data
Preprocessing for Machine Learning in Python

When to standardize: linear distances

  • Model in linear space

 

Examples:

  • k-Nearest Neighbors (kNN)
  • Linear regression
  • K-Means Clustering

A knn example.

Preprocessing for Machine Learning in Python

When to standardize: high variance

  • Model in linear space

 

Examples:

  • k-Nearest Neighbors (kNN)
  • Linear regression
  • K-Means Clustering

 

  • Dataset features have high variance

A knn example.

Preprocessing for Machine Learning in Python

When to standardize: different scales

 

  • Features are on different scales

 

Example:

  • Predicting house prices using no. bedrooms and last sale price

 

  • Linearity assumptions
Preprocessing for Machine Learning in Python

Let's practice!

Preprocessing for Machine Learning in Python

Preparing Video For Download...