Machine Learning for Time Series Data in Python
Chris Holdgraf
Fellow, Berkeley Institute for Data Science
from glob import glob
files = glob('data/heartbeat-sounds/files/*.wav')
print(files)
['data/heartbeat-sounds/proc/files/murmur__201101051104.wav',
...
'data/heartbeat-sounds/proc/files/murmur__201101051114.wav']
import librosa as lr
# `load` accepts a path to an audio file
audio, sfreq = lr.load('data/heartbeat-sounds/proc/files/murmur__201101051104.wav')
print(sfreq)
2205
In this case, the sampling frequency is 2205
, meaning there are 2205
samples per second
Create an array of indices, one for each sample, and divide by the sampling frequency
indices = np.arange(0, len(audio))
time = indices / sfreq
Find the time stamp for the N-1th data point. Then use linspace()
to interpolate from zero to that time
final_time = (len(audio) - 1) / sfreq
time = np.linspace(0, final_time, sfreq)
data = pd.read_csv('path/to/data.csv')
data.columns
Index(['date', 'symbol', 'close', 'volume'], dtype='object')
data.head()
date symbol close volume
0 2010-01-04 AAPL 214.009998 123432400.0
1 2010-01-04 ABT 54.459951 10829000.0
2 2010-01-04 AIG 29.889999 7750900.0
3 2010-01-04 AMAT 14.300000 18615100.0
4 2010-01-04 ARNC 16.650013 11512100.0
dtypes
attributedf['date'].dtypes
0 object
1 object
2 object
dtype: object
to_datetime()
functiondf['date'] = pd.to_datetime(df['date'])
df['date']
0 2017-01-01
1 2017-01-02
2 2017-01-03
Name: date, dtype: datetime64[ns]
Machine Learning for Time Series Data in Python