Working with time series data in pandas

Customer Analytics and A/B Testing in Python

Ryan Grossman

Data Scientist, EDO

Exploratory Data Analysis

  • Exploratory Data Analysis (EDA)
  • Working with time series data
  • Uncovering trends in KPIs over time
Customer Analytics and A/B Testing in Python

Review: Manipulating dates & times

Customer Analytics and A/B Testing in Python

Example: Week Two Conversion Rate

  • Week 2 Conversion Rate Users who subscribe in the second week after the free trial
  • Users must have:
    • Completed the free trial
    • Not subscribed in the first week
    • Had a full second week to subscribe or not
Customer Analytics and A/B Testing in Python

Using the Timedelta class

  • Lapse Date: Date the trial ends for a given user
import pandas as pd
from datetime import timedelta 

# Define the most recent date in our data
current_date = pd.to_datetime('2018-03-17')

# The last date a user could lapse be included max_lapse_date = current_date - timedelta(days=14)
# Filter down to only eligible users conv_sub_data = sub_data_demo[ sub_data_demo.lapse_date < max_lapse_date]
Customer Analytics and A/B Testing in Python

Date differences

  • Step 1: Filter to the relevant set of users
  • Step 2: Calculate the time between a users lapse and subscribed dates
# How many days passed before the user subscribed
sub_time = conv_sub_data.subscription_date - conv_sub_data.lapse_date

# Save this value in our dataframe conv_sub_data['sub_time'] = sub_time
Customer Analytics and A/B Testing in Python

Date components

  • Step 1: Filter to the relevant set of users
  • Step 2: Calculate the time between a users lapse and subscribed dates
  • Step 3: Convert the sub_time from a timedelta to an int
# Extract the days field from the sub_time
conv_sub_data['sub_time'] = conv_sub_data.sub_time.dt.days
Customer Analytics and A/B Testing in Python

Conversion rate calculation

# filter to users who have did not subscribe in the right window
conv_base = conv_sub_data[(conv_sub_data.sub_time.notnull()) | \
    (conv_sub_data.sub_time > 7)]
total_users = len(conv_base)
total_subs = np.where(conv_sub_data.sub_time.notnull() & \
    (conv_base.sub_time <= 14), 1, 0)
total_subs = sum(total_subs)
conversion_rate = total_subs / total_users
0.0095877277085330784
Customer Analytics and A/B Testing in Python

Parsing dates - on import

pandas.read_csv(...,
    parse_dates=False, 
    infer_datetime_format=False, 
    keep_date_col=False,
    date_parser=None, 
    dayfirst=False,...)
customer_demographics = pd.read_csv('customer_demographics.csv',
    parse_dates=True,
    infer_datetime_format=True)
    uid           reg_date           device gender    country    age
0    54030035.0    2017-06-29        and        M        USA        19
1    72574201.0    2018-03-05        iOS        F        TUR        22
2    64187558.0    2016-02-07        iOS        M        USA        16
3    92513925.0    2017-05-25        and        M        BRA        41
4    99231338.0    2017-03-26        iOS        M        FRA        59
Customer Analytics and A/B Testing in Python

Parsing dates - manually

pandas.to_datetime(arg, errors='raise', ..., format=None, ...)

strftime

1993-01-27 --"%Y-%m-%d"

05/13/2017 05:45:37 -- "%m/%d/%Y %H:%M:%S"

September 01, 2017 -- "%B %d, %Y"

Customer Analytics and A/B Testing in Python

Let's practice!

Customer Analytics and A/B Testing in Python

Preparing Video For Download...