Handling sequences with PyTorch

Intermediate Deep Learning with PyTorch

Michal Oleszak

Machine Learning Engineer

Sequential data

  • Ordered in time or space
  • Order of the data points contains dependencies between them
  • Examples of sequential data:
    • Time series
    • Text
    • Audio waves

A time series displayed on a computer screen.

An open book.

A set of loudspeakers and a computer screen with audio processing software open.

Intermediate Deep Learning with PyTorch

Electricity consumption prediction

  • Task: predict future electricity consumption based on past patterns

  • Electricity consumption dataset:

                 timestamp  consumption
0      2011-01-01 00:15:00    -0.704319
1      2011-01-01 00:30:00    -0.704319
...                    ...          ...
140254 2014-12-31 23:45:00    -0.095751
140255 2015-01-01 00:00:00    -0.095751
1 Trindade,Artur. (2015). ElectricityLoadDiagrams20112014. UCI Machine Learning Repository. https://doi.org/10.24432/C58C86.
Intermediate Deep Learning with PyTorch

Train-test split

  • No random splitting for time series!
  • Look-ahead bias: model has info about the future
  • Solution: split by time

Visually separated train set for years 2011-2013 shown in blue and test set for year 2014 shown in orange.

Intermediate Deep Learning with PyTorch

Creating sequences

  • Sequence length = number of data points in one training example
    • 24 × 4 = 96 -> consider last 24 hours
  • Predict single next data point

Visually separated input sequences of equal length shown in blue and the target value shown in green.

Intermediate Deep Learning with PyTorch

Creating sequences in Python

import numpy as np

def create_sequences(df, seq_length):

xs, ys = [], []
for i in range(len(df) - seq_length):
x = df.iloc[i:(i+seq_length), 1] y = df.iloc[i+seq_length, 1]
xs.append(x) ys.append(y)
return np.array(xs), np.array(ys)
  • Take data and sequence length as inputs
  • Initialize inputs and targets lists
  • Iterate over data points
  • Define inputs and target
  • Append to pre-initialized lists
  • Return inputs and targets as NumPy arrays
Intermediate Deep Learning with PyTorch

TensorDataset

Create training examples

X_train, y_train = create_sequences(train_data, seq_length)
print(X_train.shape, y_train.shape)
(34944, 96) (34944,)

Convert them to a Torch Dataset

from torch.utils.data import TensorDataset

dataset_train = TensorDataset(
    torch.from_numpy(X_train).float(),
    torch.from_numpy(y_train).float(),
)
Intermediate Deep Learning with PyTorch

Applicability to other sequential data

Same techniques are applicable to other sequences:

  • Large Language Models
  • Speech recognition
Intermediate Deep Learning with PyTorch

Let's practice!

Intermediate Deep Learning with PyTorch

Preparing Video For Download...