Intermediate Predictive Analytics in Python
Nele Verbiest, Ph. D.
Senior Data Scientist @PythonPredictions
Logistic regression: $logit(a_1x_1 + a_2x_2 + ... + a_nx_n + b)$
| donor_id | gender | country | segment |
|---|---|---|---|
| 5 | F | India | Gold |
| 3 | M | USA | Silver |
| 2 | M | India | Bronze |
| 8 | F | UK | Silver |
| 1 | F | USA | Bronze |
Logistic regression: $logit(a_1x_1 + a_2x_2 + ... + a_nx_n + b)$
| donor_id | gender | country | segment | gender_F | gender_M |
|---|---|---|---|---|---|
| 5 | F | India | Gold | 1 | 0 |
| 3 | M | USA | Silver | 0 | 1 |
| 2 | M | India | Bronze | 0 | 1 |
| 8 | F | UK | Silver | 1 | 0 |
| 1 | F | USA | Bronze | 1 | 0 |
| donor_id | gender | gender_F | gender_M |
|---|---|---|---|
| 5 | F | 1 | 0 |
| 3 | M | 0 | 1 |
| 2 | M | 0 | 1 |
| 8 | F | 1 | 0 |
| 1 | F | 1 | 0 |
| donor_id | gender | gender_F |
|---|---|---|
| 5 | F | 1 |
| 3 | M | 0 |
| 2 | M | 0 |
| 8 | F | 1 |
| 1 | F | 1 |
| donor_id | country | country_USA | country_India | country_UK |
|---|---|---|---|---|
| 5 | India | 0 | 1 | 0 |
| 3 | USA | 1 | 0 | 0 |
| 2 | India | 0 | 1 | 0 |
| 8 | UK | 0 | 0 | 1 |
| 1 | USA | 1 | 0 | 0 |
| donor_id | country | country_USA | country_India |
|---|---|---|---|
| 5 | India | 0 | 1 |
| 3 | USA | 1 | 0 |
| 2 | India | 0 | 1 |
| 8 | UK | 0 | 0 |
| 1 | USA | 1 | 0 |
donor_id segment
0 32770 Gold
1 32776 Silver
2 32777 Bronze
3 65552 Bronze
# Create the dummy variable dummies_segment = pd.get_dummies(basetable["segment"],drop_first=True)# Add the dummy variable to the basetable basetable = pd.concat([basetable, dummies_segment], axis=1)# Delete the original variable from the basetable del basetable["segment"]
donor_id Gold Silver
0 32770 1 0
1 32776 0 1
2 32777 0 0
3 65552 0 0
Intermediate Predictive Analytics in Python