Training and evaluating RNNs

Intermediate Deep Learning with PyTorch

Michal Oleszak

Machine Learning Engineer

Mean Squared Error Loss

  • Error:

    $$prediction - target$$

  • Squared Error:

    $$(prediction - target)^2$$

  • Mean Squared Error:

    $$avg[(prediction - target)^2]$$

Squaring the error:

  • Ensures positive and negative errors don't cancel out
  • Penalizes large errors more
  • In PyTorch:
      criterion = nn.MSELoss()
    
Intermediate Deep Learning with PyTorch

Expanding tensors

  • Recurrent layers expect input shape (batch_size, seq_length, num_features)
  • We got (batch_size, seq_length)
  • We must add one dimension at the end
for seqs, labels in dataloader_train:
    print(seqs.shape)
torch.Size([32, 96])
seqs = seqs.view(32, 96, 1)
print(seqs.shape)
torch.Size([32, 96, 1])
Intermediate Deep Learning with PyTorch

Squeezing tensors

  • In evaluation loop, we need to revert the reshaping done in the training loop
  • Labels are of shape (batch_size)

    for seqs, labels in test_loader:
      print(labels.shape)
    
    torch.Size([32])
    
  • Model outputs are (batch_size, 1)

    out = net(seqs)
    
    torch.Size([32, 1])
    
  • Shapes of model outputs and labels must match for the loss function
  • We can drop the last dimension from model outputs

    out = net(seqs).squeeze()
    
    torch.Size([32])
    
Intermediate Deep Learning with PyTorch

Training loop

net = Net()
criterion = nn.MSELoss()
optimizer = optim.Adam(
  net.parameters(), lr=0.001
)


for epoch in range(num_epochs): for seqs, labels in dataloader_train:
seqs = seqs.view(32, 96, 1)
outputs = net(seqs) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step()
  • Instantiate model, define loss & optimizer
  • Iterate over epochs and data batches
  • Reshape input sequence
  • The rest: as usual
Intermediate Deep Learning with PyTorch

Evaluation loop

mse = torchmetrics.MeanSquaredError()


net.eval() with torch.no_grad(): for seqs, labels in test_loader:
seqs = seqs.view(32, 96, 1)
outputs = net(seqs).squeeze()
mse(outputs, labels)
print(f"Test MSE: {mse.compute()}")
Test MSE: 0.13292162120342255
  • Set up MSE metric
  • Iterate through test data with no gradients
  • Reshape model inputs
  • Squeeze model outputs
  • Update the metric
  • Compute final metric value
Intermediate Deep Learning with PyTorch

LSTM vs. GRU

  • LSTM:
Test MSE: 0.13292162120342255
  • GRU:
Test MSE: 0.12187089771032333
  • GRU preferred: same or better results with less processing power
Intermediate Deep Learning with PyTorch

Let's practice!

Intermediate Deep Learning with PyTorch

Preparing Video For Download...