Creating dummies

Intermediate Predictive Analytics in Python

Nele Verbiest, Ph. D.

Senior Data Scientist @PythonPredictions

Motivation for creating dummy variables (1)

Logistic regression: $logit(a_1x_1 + a_2x_2 + ... + a_nx_n + b)$

donor_id gender country segment
5 F India Gold
3 M USA Silver
2 M India Bronze
8 F UK Silver
1 F USA Bronze

Intermediate Predictive Analytics in Python

Motivation for creating dummy variables (2)

Logistic regression: $logit(a_1x_1 + a_2x_2 + ... + a_nx_n + b)$

donor_id gender country segment gender_F gender_M
5 F India Gold 1 0
3 M USA Silver 0 1
2 M India Bronze 0 1
8 F UK Silver 1 0
1 F USA Bronze 1 0
Intermediate Predictive Analytics in Python

Preventing Multicollinearity (1)

donor_id gender gender_F gender_M
5 F 1 0
3 M 0 1
2 M 0 1
8 F 1 0
1 F 1 0
Intermediate Predictive Analytics in Python

Preventing Multicollinearity (2)

donor_id gender gender_F
5 F 1
3 M 0
2 M 0
8 F 1
1 F 1
Intermediate Predictive Analytics in Python

Preventing Multicollinearity (3)

donor_id country country_USA country_India country_UK
5 India 0 1 0
3 USA 1 0 0
2 India 0 1 0
8 UK 0 0 1
1 USA 1 0 0
Intermediate Predictive Analytics in Python

Preventing Multicollinearity (4)

donor_id country country_USA country_India
5 India 0 1
3 USA 1 0
2 India 0 1
8 UK 0 0
1 USA 1 0
Intermediate Predictive Analytics in Python

Adding dummy variables in Python

    donor_id segment
0     32770  Gold
1     32776  Silver
2     32777  Bronze
3     65552  Bronze
# Create the dummy variable
dummies_segment = pd.get_dummies(basetable["segment"],drop_first=True)

# Add the dummy variable to the basetable basetable = pd.concat([basetable, dummies_segment], axis=1)
# Delete the original variable from the basetable del basetable["segment"]
    donor_id Gold Silver
0     32770  1    0
1     32776  0    1
2     32777  0    0
3     65552  0    0
Intermediate Predictive Analytics in Python

Let's practice!

Intermediate Predictive Analytics in Python

Preparing Video For Download...