Managing skewed variables

Customer Segmentation in Python

Karolis Urbonas

Head of Data Science, Amazon

Identifying skewness

  • Visual analysis of the distribution
  • If it has a tail - it's skewed

Customer Segmentation in Python

Exploring distribution of recency

sns.distplot(datamart['Recency'])
plt.show()

Customer Segmentation in Python

Exploring distribution of frequency

sns.distplot(datamart['Frequency'])
plt.show()

Customer Segmentation in Python

Data transformations to manage skewness

  • Logarithmic transformation (positive values only)
import numpy as np
frequency_log= np.log(datamart['Frequency'])

sns.distplot(frequency_log) plt.show()

Customer Segmentation in Python

Dealing with negative values

  • Adding a constant before log transformation
  • Cube root transformation
Customer Segmentation in Python

Let's practice how to identify and manage skewed variables!

Customer Segmentation in Python

Preparing Video For Download...