Adding aggregated variables

Predictive Analytics Tingkat Menengah dengan Python

Nele Verbiest

Senior Data Scientist @PythonPredictions

Motivation for aggregated variables (1)

Predictive Analytics Tingkat Menengah dengan Python

Motivation for aggregated variables (2)

Predictive Analytics Tingkat Menengah dengan Python

Adding total value last year (1)

id    date        amount
1    2015-10-16    75
1    2014-02-11    111
2    2012-03-28    93
# Start and end date of the aggregation period
start_date = datetime.date(2016,1,1)
end_date = datetime.date(2017,1,1)

# Select gifts made in 2016 gifts_2016 = gifts[(gifts["date"] >= start_date) & (gifts["date"] <= end_date)]
Predictive Analytics Tingkat Menengah dengan Python

Adding total value last year (2)

# Sum of gifts per donor in 2016
gifts_2016_bydonor = gifts_2016.groupby(["id"])["amount"].sum().reset_index()
gifts_2016_bydonor.columns = ["donor_ID","sum_2016"]

# Add sum of gifts to the basetable basetable = pd.merge(basetable, gifts_2016_bydonor, how = "left", on = "donor_ID") print(basetable.head())
donor_id sum_2016
1        837
2        29
3        682
Predictive Analytics Tingkat Menengah dengan Python

Adding number of donations to the basetable

# Number of gifts per donor in 2016
gifts_2016_bydonor = gifts_2016.groupby(["id"]).size().reset_index()
gifts_2016_bydonor.columns = ["donor_ID","count_2016"]

# Add number of gifts to the basetable basetable = pd.merge(basetable, gifts_2016_bydonor, how = "left", on = "donor_ID") print(basetable.head())
donor_id count_2016
1        4
2        9
3        2
Predictive Analytics Tingkat Menengah dengan Python

Let's practice!

Predictive Analytics Tingkat Menengah dengan Python

Preparing Video For Download...