Understanding and visualizing trends in customer data

Customer Analytics and A/B Testing in Python

Ryan Grossman

Data Scientist, EDO

Further techniques for uncovering trends

alt

Subscribers Per Day

# Find the days-to-subscribe of our loaded usa subs data set
usa_subscriptions['sub_day'] = (usa_subscriptions.sub_date - 
    usa_subscriptions.lapse_date).dt.days


# Filter out those who subscribed in the past week 
usa_subscriptions = usa_subscriptions[usa_subscriptions.sub_day <= 7]


# Find the total subscribers per day 
usa_subscriptions = usa_subscriptions.groupby(
    by=['sub_date'], as_index = False
).agg({'subs': ['sum']})

Weekly seasonality and our pricing change

# plot USA subscribcers per day 
usa_subscriptions.plot(x='sub_date', y='subs')
plt.show()

Weekly Seasonality: Trends following the day of the week
- Potentially more likely to subscribe on the weekend
- Seasonality can hide larger trends...the impact of our price change?

alt

Correcting for seasonality with trailing averages

Trailing Average: smoothing technique that averages over a lagging window
- Reveal hidden trends by smoothing out seasonality
- Average across the period of seasonality
- 7-day window to smooth weekly seasonality
- Average out day level effects to produce the average week effect

Calculating Trailing Averages

Calculate the rolling average over the USA subscribers data with .rolling()
- Call this on the Series of interest
- window: Data points to average
- center: If true set the average at the center of the window

# calling rolling on the "subs" Series
rolling_subs =  usa_subscriptions.subs.rolling(

    # How many data points to average over
    window=7,

    # Specify to average backwards
    center=False
)

Smoothing our USA subscription data

# find the rolling average
usa_subscriptions['rolling_subs'] 
            = rolling_subs.mean()

usa_subscriptions.tail()

sub_date    subs        rolling_subs
2018-03-14  89          94.714286
2018-03-15  96          95.428571
2018-03-16  102         96.142857

.rolling like groupby specifies a grouping of data points
We still need to calculate a summary over this group (e.g. .mean())

alt

Noisy data - Highest SKU purchases by date

Noisy Data: data with high variation over time

# Load a dataset of our highest sku purchases
high_sku_purchases = pd.read_csv(
    'high_sku_purchases.csv', 
    parse_dates=True, 
    infer_datetime_format=True
)

# Plot the count of purchases by day of purchase
high_sku_purchases.plot(x='date', y='purchases')
plt.show()

alt

Smoothing with an exponential moving average

Exponential Moving Average: Weighted moving (rolling) average
- Weights more recent items in the window more
- Applies weights according to an exponential distribution
- Averages back to a central trend without masking any recent movements

Smoothed purchases by date

.ewm(): exponential weighting function
span: Window to apply weights over

# Calculate the exp. avg. over our high sku
#  purchase count 
exp_mean = high_sku_purchases.purchases.ewm(
    span=30)

# Find the weighted mean over this period
high_sku_purchases['exp_mean'] = exp_mean.mean()

High Sku Purchase Data

alt

Summary - Data Smoothing Techniques

Trailing Average:
- Smooths seasonality by averaging over the periodicity
Exponential Moving Average:
- Reveals trends by pulling towards the central tendency
- Weights the more recent values relative to the window more heavily
You can use .rolling() and .ewm() for many more methods of smoothing

Let's practice!

Customer Analytics and A/B Testing in Python