Manipulating Time Series Data in Python
Stefan Jansen
Founder & Lead Data Scientist at Applied Artificial Intelligence
ozone = pd.read_csv('ozone.csv', parse_dates=['date'], index_col='date')
ozone.info()
DatetimeIndex: 6291 entries, 2000-01-01 to 2017-03-31
Data columns (total 1 columns):
Ozone 6167 non-null float64
dtypes: float64(1)
ozone = ozone.resample('D').asfreq()
ozone.info()
DatetimeIndex: 6300 entries, 1998-01-05 to 2017-03-31
Freq: D
Data columns (total 1 columns):
Ozone 6167 non-null float64
dtypes: float64(1)
ozone.resample('M').mean().head()
Ozone
date
2000-01-31 0.010443
2000-02-29 0.011817
2000-03-31 0.016810
2000-04-30 0.019413
2000-05-31 0.026535
.resample().mean()
: Monthly average, assigned to end of calendar month
ozone.resample('M').median().head()
Ozone
date
2000-01-31 0.009486
2000-02-29 0.010726
2000-03-31 0.017004
2000-04-30 0.019866
2000-05-31 0.026018
ozone.resample('M').agg(['mean', 'std']).head()
Ozone
mean std
date
2000-01-31 0.010443 0.004755
2000-02-29 0.011817 0.004072
2000-03-31 0.016810 0.004977
2000-04-30 0.019413 0.006574
2000-05-31 0.026535 0.008409
.resample().agg()
: List of aggregation functions like groupbyozone = ozone.loc['2016':]
ax = ozone.plot()
monthly = ozone.resample('M').mean()
monthly.add_suffix('_monthly').plot(ax=ax)
data = pd.read_csv('ozone_pm25.csv', parse_dates=['date'], index_col='date')
data = data.resample('D').asfreq()
data.info()
DatetimeIndex: 6300 entries, 2000-01-01 to 2017-03-31
Freq: D
Data columns (total 2 columns):
Ozone 6167 non-null float64
PM25 6167 non-null float64
dtypes: float64(2)
data = data.resample('BM').mean()
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 207 entries, 2000-01-31 to 2017-03-31
Freq: BM
Data columns (total 2 columns):
ozone 207 non-null float64
pm25 207 non-null float64
dtypes: float64(2)
df.resample('M').first().head(4)
Ozone PM25
date
2000-01-31 0.005545 20.800000
2000-02-29 0.016139 6.500000
2000-03-31 0.017004 8.493333
2000-04-30 0.031354 6.889474
df.resample('MS').first().head()
Ozone PM25
date
2000-01-01 0.004032 37.320000
2000-02-01 0.010583 24.800000
2000-03-01 0.007418 11.106667
2000-04-01 0.017631 11.700000
2000-05-01 0.022628 9.700000
Manipulating Time Series Data in Python