Preparation for modeling

Machine Learning for Marketing in Python

Karolis Urbonas

Head of Analytics & Science, Amazon

Data sample

telco_raw.head()

Telelcom data header

Machine Learning for Marketing in Python

Data types

telco_raw.dtypes
customerID           object
gender               object
SeniorCitizen        object
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object

OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges        float64
Churn                object
Machine Learning for Marketing in Python

Separate categorical and numerical columns

Separate the identifier and target variable names as lists

custid = ['customerID']
target = ['Churn']

Separate categorical and numeric column names as lists

categorical = telco_raw.nunique()[telcom.nunique()<10].keys().tolist()

categorical.remove(target[0])
numerical = [col for col in telco_raw.columns if col not in custid+target+categorical]
Machine Learning for Marketing in Python

One-hot encoding

This is a typical categorical data type column

Color
Red
White
Blue
Red
Machine Learning for Marketing in Python

One-hot encoding result

And this is how it looks when we transform it with one-hot encoding.

Color Red White Blue
Red ----------> 1 0 0
White ----------> 0 1 0
Blue ----------> 0 0 1
Red ----------> 1 0 0
Machine Learning for Marketing in Python

One-hot encoding categorical variables

One-hot encoding categorical variables

telco_raw = pd.get_dummies(data=telco_raw, columns=categorical, drop_first=True)
Machine Learning for Marketing in Python

Scaling numerical features

# Import StandardScaler library
from sklearn.preprocessing import StandardScaler

# Initialize StandardScaler instance scaler = StandardScaler()
# Fit the scaler to numerical columns scaled_numerical = scaler.fit_transform(telco_raw[numerical])
# Build a DataFrame scaled_numerical = pd.DataFrame(scaled_numerical, columns=numerical)
Machine Learning for Marketing in Python

Bringing it all together

# Drop non-scaled numerical columns 
telco_raw = telco_raw.drop(columns=numerical, axis=1)

# Merge the non-numerical with the scaled numerical data telco = telco_raw.merge(right=scaled_numerical, how='left', left_index=True, right_index=True )
Machine Learning for Marketing in Python

Let's practice pre-processing data!

Machine Learning for Marketing in Python

Preparing Video For Download...