Machine Learning for Time Series Data in Python
Chris Holdgraf
Fellow, Berkeley Institute for Data Science
from glob import glob
files = glob('data/heartbeat-sounds/files/*.wav')
print(files)
['data/heartbeat-sounds/proc/files/murmur__201101051104.wav',
...
'data/heartbeat-sounds/proc/files/murmur__201101051114.wav']
import librosa as lr
# `load` accepts a path to an audio file
audio, sfreq = lr.load('data/heartbeat-sounds/proc/files/murmur__201101051104.wav')
print(sfreq)
2205
In this case, the sampling frequency is 2205, meaning there are 2205 samples per second
Create an array of indices, one for each sample, and divide by the sampling frequency
indices = np.arange(0, len(audio))
time = indices / sfreq
Find the time stamp for the N-1th data point. Then use linspace() to interpolate from zero to that time
final_time = (len(audio) - 1) / sfreq
time = np.linspace(0, final_time, sfreq)
data = pd.read_csv('path/to/data.csv')
data.columns
Index(['date', 'symbol', 'close', 'volume'], dtype='object')
data.head()
date symbol close volume
0 2010-01-04 AAPL 214.009998 123432400.0
1 2010-01-04 ABT 54.459951 10829000.0
2 2010-01-04 AIG 29.889999 7750900.0
3 2010-01-04 AMAT 14.300000 18615100.0
4 2010-01-04 ARNC 16.650013 11512100.0
dtypes attributedf['date'].dtypes
0 object
1 object
2 object
dtype: object
to_datetime() functiondf['date'] = pd.to_datetime(df['date'])
df['date']
0 2017-01-01
1 2017-01-02
2 2017-01-03
Name: date, dtype: datetime64[ns]
Machine Learning for Time Series Data in Python