Machine Learning per il marketing con Python
Karolis Urbonas
Head of Analytics & Science, Amazon
telco_raw.head()

telco_raw.dtypes
customerID object
gender object
SeniorCitizen object
Partner object
Dependents object
tenure int64
PhoneService object
MultipleLines object
InternetService object
OnlineSecurity object
OnlineBackup object
DeviceProtection object
TechSupport object
StreamingTV object
StreamingMovies object
Contract object
PaperlessBilling object
PaymentMethod object
MonthlyCharges float64
TotalCharges float64
Churn object
Separa i nomi di identificatore e target in liste
custid = ['customerID']
target = ['Churn']
Separa i nomi delle colonne categoriche e numeriche in liste
categorical = telco_raw.nunique()[telcom.nunique()<10].keys().tolist()categorical.remove(target[0])numerical = [col for col in telco_raw.columns if col not in custid+target+categorical]
Questo è un tipico campo categorico
| Colore |
|---|
| Rosso |
| Bianco |
| Blu |
| Rosso |
Ecco come appare dopo il one-hot encoding.
| Colore | Rosso | Bianco | Blu | |
|---|---|---|---|---|
| Rosso | ----------> | 1 | 0 | 0 |
| Bianco | ----------> | 0 | 1 | 0 |
| Blu | ----------> | 0 | 0 | 1 |
| Rosso | ----------> | 1 | 0 | 0 |
One-hot encoding delle variabili categoriche
telco_raw = pd.get_dummies(data=telco_raw, columns=categorical, drop_first=True)
# Import StandardScaler library from sklearn.preprocessing import StandardScaler# Initialize StandardScaler instance scaler = StandardScaler()# Fit the scaler to numerical columns scaled_numerical = scaler.fit_transform(telco_raw[numerical])# Build a DataFrame scaled_numerical = pd.DataFrame(scaled_numerical, columns=numerical)
# Drop non-scaled numerical columns telco_raw = telco_raw.drop(columns=numerical, axis=1)# Merge the non-numerical with the scaled numerical data telco = telco_raw.merge(right=scaled_numerical, how='left', left_index=True, right_index=True )
Machine Learning per il marketing con Python