Machine Learning for Time Series Data in Python
Chris Holdgraf
Fellow, Berkeley Institute for Data Science
# Visualize the raw data
print(prices.head(3))
symbol AIG ABT
date
2010-01-04 29.889999 54.459951
2010-01-05 29.330000 54.019953
2010-01-06 29.139999 54.319953
# Calculate a rolling window, then extract two features
feats = prices.rolling(20).aggregate([np.std, np.max]).dropna()
print(feats.head(3))
AIG ABT
std amax std amax
date
2010-02-01 2.051966 29.889999 0.868830 56.239949
2010-02-02 2.101032 29.629999 0.869197 56.239949
2010-02-03 2.157249 29.629999 0.852509 56.239949
# If we just take the mean, it returns a single value
a = np.array([[0, 1, 2], [0, 1, 2], [0, 1, 2]])
print(np.mean(a))
1.0
# We can use the partial function to initialize np.mean
# with an axis parameter
from functools import partial
mean_over_first_axis = partial(np.mean, axis=0)
print(mean_over_first_axis(a))
[0. 1. 2.]
np.mean
)print(np.percentile(np.linspace(0, 200), q=20))
40.0
data = np.linspace(0, 100)
# Create a list of functions using a list comprehension
percentile_funcs = [partial(np.percentile, q=ii) for ii in [20, 40, 60]]
# Calculate the output of each function in the same way
percentiles = [i_func(data) for i_func in percentile_funcs]
print(percentiles)
[20.0, 40.00000000000001, 60.0]
# Calculate multiple percentiles of a rolling window
data.rolling(20).aggregate(percentiles)
# Ensure our index is datetime
prices.index = pd.to_datetime(prices.index)
# Extract datetime features
day_of_week_num = prices.index.weekday
print(day_of_week_num[:10])
Index([0 1 2 3 4 0 1 2 3 4], dtype='object')
day_of_week = prices.index.weekday_name
print(day_of_week[:10])
Index(['Monday' 'Tuesday' 'Wednesday' 'Thursday' 'Friday' 'Monday' 'Tuesday'
'Wednesday' 'Thursday' 'Friday'], dtype='object')
Machine Learning for Time Series Data in Python