Sequence of structuring pre-processing steps

Customer Segmentation in Python

Karolis Urbonas

Head of Data Science, Amazon

Why the sequence matters?

  • Log transformation only works with positive data
  • Normalization forces data to have negative values and log will not work
Customer Segmentation in Python

Sequence

  1. Unskew the data - log transformation
  2. Standardize to the same average values
  3. Scale to the same standard deviation
  4. Store as a separate array to be used for clustering
Customer Segmentation in Python

Coding the sequence

# Unskew the data
import numpy as np
datamart_log = np.log(datamart_rfm)

# Normalize the variables from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(datamart_log)
# Store for clustering datamart_normalized = scaler.transform(datamart_log)
Customer Segmentation in Python

Practice on RFM data!

Customer Segmentation in Python

Preparing Video For Download...