Using derivatives to update model parameters

Introduction to Deep Learning with PyTorch

Jasmin Ludolf

Senior Data Science Content Developer, DataCamp

An analogy for derivatives

$$

Derivative represents the slope of the curve

$$

  • Steep slopes (red arrows):
    • Large steps, derivative is high
  • Gentler slopes (green arrows):
    • Small steps, derivative is low
  • Valley floor (blue arrow):
    • Flat, derivative is zero

$$

an image of a valley

Introduction to Deep Learning with PyTorch

Convex and non-convex functions

This is a convex function

an example of convex function

This is a non-convex function

an example of non-convex function with global minimum highlighted

Introduction to Deep Learning with PyTorch

Connecting derivatives and model training

  • Compute the loss in the forward pass during training

$$ Calculating loss

Introduction to Deep Learning with PyTorch

Connecting derivatives and model training

  • Gradients help minimize loss, tune layer weights and biases
  • Repeat until the layers are tuned

$$ Computing gradients

Introduction to Deep Learning with PyTorch

Backpropagation concepts

$$

  • Consider a network made of three layers:

    • Begin with loss gradients for $L2$
    • Use $L2$ to compute $L1$ gradients
    • Repeat for all layers ($L1$, $L0$)

Backpropagation diagram

Introduction to Deep Learning with PyTorch

Backpropagation in PyTorch

# Run a forward pass 
model = nn.Sequential(nn.Linear(16, 8),
                      nn.Linear(8, 4),
                      nn.Linear(4, 2))
prediction = model(sample)


# Calculate the loss and gradients criterion = CrossEntropyLoss() loss = criterion(prediction, target) loss.backward()
# Access each layer's gradients
model[0].weight.grad
model[0].bias.grad
model[1].weight.grad
model[1].bias.grad
model[2].weight.grad
model[2].bias.grad
Introduction to Deep Learning with PyTorch

Updating model parameters manually

# Learning rate is typically small
lr = 0.001

# Update the weights
weight = model[0].weight
weight_grad = model[0].weight.grad


weight = weight - lr * weight_grad
# Update the biases bias = model[0].bias bias_grad = model[0].bias.grad
bias = bias - lr * bias_grad

$$

  • Access each layer gradient
  • Multiply by the learning rate
  • Subtract this product from the weight
Introduction to Deep Learning with PyTorch

Gradient descent

  • For non-convex functions, we will use gradient descent

  • PyTorch simplifies this with optimizers

    • Stochastic gradient descent (SGD)
import torch.optim as optim

# Create the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001)

# Perform parameter updates optimizer.step()
Introduction to Deep Learning with PyTorch

Let's practice!

Introduction to Deep Learning with PyTorch

Preparing Video For Download...