Data Preparation

Marketing Analytics: Predicting Customer Churn in Python

Mark Peterson

Director of Data Science, Infoblox

Model assumptions

  • Some assumptions that models make:
    • That the features are normally distributed
    • That the features are on the same scale

   

Marketing Analytics: Predicting Customer Churn in Python

Data types

  • Machine learning algorithms require numeric data types
    • Need to encode categorical variables as numeric
Marketing Analytics: Predicting Customer Churn in Python
telco.dtypes
Account_Length      int64
Vmail_Message       int64
Day_Mins          float64
Eve_Mins          float64
Night_Mins        float64
Intl_Mins         float64
CustServ_Calls      int64
Churn              object
Intl_Plan          object
Vmail_Plan         object
Day_Calls           int64
Day_Charge        float64
Eve_Calls           int64
Eve_Charge        float64
Night_Calls         int64
Night_Charge      float64
Intl_Calls          int64
Intl_Charge       float64
State              object
Area_Code           int64
Phone              object
dtype: object
Marketing Analytics: Predicting Customer Churn in Python

Encoding binary features

telco['Intl_Plan'].head()
0     no
1     no
2     no
3    yes
4    yes
Name: Intl_Plan, dtype: object
Marketing Analytics: Predicting Customer Churn in Python

Encoding binary features

Option 1: .replace()

 

telco['Intl_Plan'].replace({'no':0 , 'yes':1})

telco['Intl_Plan'].head()
0    0
1    0
2    0
3    1
4    1
Name: Intl_Plan, dtype: int64

Option 2: LabelEncoder()

from sklearn.preprocessing import LabelEncoder

LabelEncoder().fit_transform(telco["Intl_Plan"])

telco['Intl_Plan'].head()
0    0
1    0
2    0
3    1
4    1
Name: Intl_Plan, dtype: int64
Marketing Analytics: Predicting Customer Churn in Python

Encoding state

telco['State'].head(4)
0    KS
1    OH
2    NJ
3    OH
Name: State, dtype: object
  • Could assign a number to each state
0    0
1    1
2    2
3    1
Name: State, dtype: int64
  • Bad idea
  • Would make your model less effective
Marketing Analytics: Predicting Customer Churn in Python

One hot encoding

ohe.png

Marketing Analytics: Predicting Customer Churn in Python

One hot encoding

ohe_part2.png

Marketing Analytics: Predicting Customer Churn in Python

One hot encoding

ohe_part3.png

Marketing Analytics: Predicting Customer Churn in Python

Feature scaling

  • Features should be on the same scale
  • Rarely true of real-world data
Marketing Analytics: Predicting Customer Churn in Python

Feature scaling

telco['Intl_Calls'].describe()
count    3333.000000
mean        4.479448
std         2.461214
min         0.000000
25%         3.000000
50%         4.000000
75%         6.000000
max        20.000000
Name: Intl_Calls, dtype: float64
telco['Night_Mins'].describe()
count    3333.000000
mean      200.872037
std        50.573847
min        23.200000
25%       167.000000
50%       201.200000
75%       235.300000
max       395.000000
Name: Night_Mins, dtype: float64
Marketing Analytics: Predicting Customer Churn in Python

Standardization

  • Centers the distribution around the mean
  • Calculates the number of standard deviations away from the mean each point is
from sklearn.preprocessing import StandardScaler

df = StandardScaler().fit_transform(df)
Marketing Analytics: Predicting Customer Churn in Python

Let's practice!

Marketing Analytics: Predicting Customer Churn in Python

Preparing Video For Download...