Machine Learning for Marketing in Python
Karolis Urbonas
Head of Analytics & Science, Amazon
telco_raw.head()
telco_raw.dtypes
customerID object
gender object
SeniorCitizen object
Partner object
Dependents object
tenure int64
PhoneService object
MultipleLines object
InternetService object
OnlineSecurity object
OnlineBackup object
DeviceProtection object
TechSupport object
StreamingTV object
StreamingMovies object
Contract object
PaperlessBilling object
PaymentMethod object
MonthlyCharges float64
TotalCharges float64
Churn object
Separate the identifier and target variable names as lists
custid = ['customerID']
target = ['Churn']
Separate categorical and numeric column names as lists
categorical = telco_raw.nunique()[telcom.nunique()<10].keys().tolist()
categorical.remove(target[0])
numerical = [col for col in telco_raw.columns if col not in custid+target+categorical]
This is a typical categorical data type column
Color |
---|
Red |
White |
Blue |
Red |
And this is how it looks when we transform it with one-hot encoding.
Color | Red | White | Blue | |
---|---|---|---|---|
Red | ----------> | 1 | 0 | 0 |
White | ----------> | 0 | 1 | 0 |
Blue | ----------> | 0 | 0 | 1 |
Red | ----------> | 1 | 0 | 0 |
One-hot encoding categorical variables
telco_raw = pd.get_dummies(data=telco_raw, columns=categorical, drop_first=True)
# Import StandardScaler library from sklearn.preprocessing import StandardScaler
# Initialize StandardScaler instance scaler = StandardScaler()
# Fit the scaler to numerical columns scaled_numerical = scaler.fit_transform(telco_raw[numerical])
# Build a DataFrame scaled_numerical = pd.DataFrame(scaled_numerical, columns=numerical)
# Drop non-scaled numerical columns telco_raw = telco_raw.drop(columns=numerical, axis=1)
# Merge the non-numerical with the scaled numerical data telco = telco_raw.merge(right=scaled_numerical, how='left', left_index=True, right_index=True )
Machine Learning for Marketing in Python