Intermediate Predictive Analytics in Python
Nele Verbiest
Senior Data Scientist @PythonPredictions
| donor_id | age |
|---|---|
| 5 | - |
| 3 | 25 |
| 2 | 36 |
| 8 | 40 |
| 1 | 26 |
| donor_id | age |
|---|---|
| 5 | 38 |
| 3 | 25 |
| 2 | 36 |
| 8 | 40 |
| 1 | 26 |
Mean age: 38
| donor_id | max_donation |
|---|---|
| 5 | - |
| 3 | 1 000 000 |
| 2 | 100 |
| 8 | 40 |
| 1 | 120 |
Mean max_donation: 25 065
Median max_donation: 110
| donor_id | max_donation |
|---|---|
| 5 | 110 |
| 3 | 1 000 000 |
| 2 | 100 |
| 8 | 40 |
| 1 | 120 |
Mean max_donation: 25 065
Median max_donation: 110
| donor_id | sum_donations |
|---|---|
| 5 | 130 |
| 3 | 10 |
| 2 | - |
| 8 | 40 |
| 1 | 120 |
| donor_id | sum_donations |
|---|---|
| 5 | 130 |
| 3 | 10 |
| 2 | 0 |
| 8 | 40 |
| 1 | 120 |
# Replace missing values by 0 replacement = 0 basetable["donations_last_year"] = basetable["donations_last_year"].fillna(replacement)# Replace missing values by mean replacement = basetable["age"].mean() basetable["age"] = basetable["age"].fillna(replacement)
donor_id email
0 32770 [email protected]
1 32776 nan
2 32777 [email protected]
3 65552 nan
basetable["no_email"] = pd.Series(
[0 if email==email else 1
for email in basetable["email"]])
donor_id email no_email
0 32770 [email protected] 0
1 32776 nan 1
2 32777 [email protected] 0
3 65552 nan 1
Intermediate Predictive Analytics in Python