Adding aggregated variables

Intermediate Predictive Analytics in Python

Nele Verbiest

Senior Data Scientist @PythonPredictions

Motivation for aggregated variables (1)

Intermediate Predictive Analytics in Python

Motivation for aggregated variables (2)

Intermediate Predictive Analytics in Python

Adding total value last year (1)

id    date        amount
1    2015-10-16    75
1    2014-02-11    111
2    2012-03-28    93
# Start and end date of the aggregation period
start_date = datetime.date(2016,1,1)
end_date = datetime.date(2017,1,1)

# Select gifts made in 2016 gifts_2016 = gifts[(gifts["date"] >= start_date) & (gifts["date"] <= end_date)]
Intermediate Predictive Analytics in Python

Adding total value last year (2)

# Sum of gifts per donor in 2016
gifts_2016_bydonor = gifts_2016.groupby(["id"])["amount"].sum().reset_index()
gifts_2016_bydonor.columns = ["donor_ID","sum_2016"]

# Add sum of gifts to the basetable basetable = pd.merge(basetable, gifts_2016_bydonor, how = "left", on = "donor_ID") print(basetable.head())
donor_id sum_2016
1        837
2        29
3        682
Intermediate Predictive Analytics in Python

Adding number of donations to the basetable

# Number of gifts per donor in 2016
gifts_2016_bydonor = gifts_2016.groupby(["id"]).size().reset_index()
gifts_2016_bydonor.columns = ["donor_ID","count_2016"]

# Add number of gifts to the basetable basetable = pd.merge(basetable, gifts_2016_bydonor, how = "left", on = "donor_ID") print(basetable.head())
donor_id count_2016
1        4
2        9
3        2
Intermediate Predictive Analytics in Python

Let's practice!

Intermediate Predictive Analytics in Python

Preparing Video For Download...