Introduction to Deep Learning with PyTorch
Jasmin Ludolf
Senior Data Science Content Developer, DataCamp
Stochastic Gradient Descent (SGD) optimizer
sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)
This is a convex function.
This is a non-convex function.
lr = 0.01
momentum = 0
, after 100 steps minimum found for x = -1.23
and y = -0.14
lr = 0.01
momentum = 0.9
, after 100 steps minimum found for x = 0.92
and y = -2.04
$$
Learning Rate | Momentum |
---|---|
Controls the step size | Controls the inertia |
Too high → poor performance | Helps escape local minimum |
Too low → slow training | Too small → optimizer gets stuck |
Typical range: 0.01 ($10^{-2}$) and 0.0001 ($10^{-4}$) | Typical range: 0.85 to 0.99 |
Introduction to Deep Learning with PyTorch