Missing values

Intermediate Predictive Analytics in Python

Nele Verbiest

Senior Data Scientist @PythonPredictions

Replacing missing values by an aggregate (1)

donor_id age
5 -
3 25
2 36
8 40
1 26
Intermediate Predictive Analytics in Python

Replacing missing values by an aggregate (2)

donor_id age
5 38
3 25
2 36
8 40
1 26

Mean age: 38

Intermediate Predictive Analytics in Python

Replacing missing values by an aggregate (3)

donor_id max_donation
5 -
3 1 000 000
2 100
8 40
1 120

Mean max_donation: 25 065

Median max_donation: 110

Intermediate Predictive Analytics in Python

Replacing missing values by an aggregate (4)

donor_id max_donation
5 110
3 1 000 000
2 100
8 40
1 120

Mean max_donation: 25 065

Median max_donation: 110

Intermediate Predictive Analytics in Python

Replacing missing values by a fixed value (1)

donor_id sum_donations
5 130
3 10
2 -
8 40
1 120
Intermediate Predictive Analytics in Python

Replacing missing values by a fixed value (2)

donor_id sum_donations
5 130
3 10
2 0
8 40
1 120
Intermediate Predictive Analytics in Python

Replacing missing values in Python

# Replace missing values by 0
replacement = 0
basetable["donations_last_year"] = 
    basetable["donations_last_year"].fillna(replacement)

# Replace missing values by mean replacement = basetable["age"].mean() basetable["age"] = basetable["age"].fillna(replacement)
Intermediate Predictive Analytics in Python

Missing value dummies

    donor_id email
0     32770  [email protected]
1     32776  nan
2     32777  [email protected]
3     65552  nan
basetable["no_email"] = pd.Series(
                            [0 if email==email else 1 
                            for email in basetable["email"]])
      donor_id email                    no_email
0     32770  [email protected]   0
1     32776  nan                        1
2     32777  [email protected]   0
3     65552  nan                        1
Intermediate Predictive Analytics in Python

Let's practice!

Intermediate Predictive Analytics in Python

Preparing Video For Download...