Machine Learning voor marketing in Python
Karolis Urbonas
Head of Analytics & Science, Amazon
telco_raw.head()

telco_raw.dtypes
customerID object
gender object
SeniorCitizen object
Partner object
Dependents object
tenure int64
PhoneService object
MultipleLines object
InternetService object
OnlineSecurity object
OnlineBackup object
DeviceProtection object
TechSupport object
StreamingTV object
StreamingMovies object
Contract object
PaperlessBilling object
PaymentMethod object
MonthlyCharges float64
TotalCharges float64
Churn object
Zet identifier en target in lijsten
custid = ['customerID']
target = ['Churn']
Splits categorische en numerieke kolomnamen in lijsten
categorical = telco_raw.nunique()[telcom.nunique()<10].keys().tolist()categorical.remove(target[0])numerical = [col for col in telco_raw.columns if col not in custid+target+categorical]
Dit is een typische categorische kolom
| Kleur |
|---|
| Rood |
| Wit |
| Blauw |
| Rood |
Zo ziet het eruit na one-hot encoding.
| Kleur | Rood | Wit | Blauw | |
|---|---|---|---|---|
| Rood | ----------> | 1 | 0 | 0 |
| Wit | ----------> | 0 | 1 | 0 |
| Blauw | ----------> | 0 | 0 | 1 |
| Rood | ----------> | 1 | 0 | 0 |
One-hot encoding voor categorische variabelen
telco_raw = pd.get_dummies(data=telco_raw, columns=categorical, drop_first=True)
# Import StandardScaler library from sklearn.preprocessing import StandardScaler# Initialize StandardScaler instance scaler = StandardScaler()# Fit the scaler to numerical columns scaled_numerical = scaler.fit_transform(telco_raw[numerical])# Build a DataFrame scaled_numerical = pd.DataFrame(scaled_numerical, columns=numerical)
# Drop non-scaled numerical columns telco_raw = telco_raw.drop(columns=numerical, axis=1)# Merge the non-numerical with the scaled numerical data telco = telco_raw.merge(right=scaled_numerical, how='left', left_index=True, right_index=True )
Machine Learning voor marketing in Python