Gradient descent

Introduction to Deep Learning in Python

Dan Becker

Data Scientist and contributor to Keras and TensorFlow libraries

Gradient descent

ch2_2.003.png

Gradient descent

ch2_2.004.png

Gradient descent

ch2_2.005.png

Gradient descent

ch2_2.006.png

Gradient descent

ch2_2.008.png

Gradient descent

ch2_2.009.png

Gradient descent

ch2_2.010.png

Gradient descent

ch2_2.011.png

Gradient descent

ch2_2.012.png

Gradient descent

ch2_2.013.png

Gradient descent

ch2_2.014.png

Gradient descent

If the slope is positive:
- Going opposite the slope means moving to lower numbers
- Subtract the slope from the current value
- Too big a step might lead us astray
Solution: learning rate
- Update each weight by subtracting learning rate * slope

Slope calculation example

ch2_2.022.png

To calculate the slope for a weight, need to multiply:
- Slope of the loss function w.r.t value at the node we feed into
- The value of the node that feeds into our weight
- Slope of the activation function w.r.t value we feed into

Slope calculation example

ch2_2.028.png

To calculate the slope for a weight, need to multiply:
- Slope of the loss function w.r.t value at the node we feed into
- The value of the node that feeds into our weight
- Slope of the activation function w.r.t value we feed into

Slope calculation example

ch2_2.029.png

Slope of mean-squared loss function w.r.t prediction:
- 2 (Predicted Value - Actual Value) = 2 Error
- 2 * -4

Slope calculation example

ch2_2.033.png

To calculate the slope for a weight, need to multiply:
- Slope of the loss function w.r.t value at the node we feed into
- The value of the node that feeds into our weight
- Slope of the activation function w.r.t value we feed into

Slope calculation example

ch2_2.035.png

To calculate the slope for a weight, need to multiply:
- Slope of the loss function w.r.t value at the node we feed into
- The value of the node that feeds into our weight
- Slope of the activation function w.r.t value we feed into

Slope calculation example

ch2_2.037.png

To calculate the slope for a weight, need to multiply:
- Slope of the loss function w.r.t value at the node we feed into
- The value of the node that feeds into our weight
- Slope of the activation function w.r.t value we feed into

Slope calculation example

ch2_2.038.png

To calculate the slope for a weight, need to multiply:
- Slope of the loss function w.r.t value at the node we feed into
- The value of the node that feeds into our weight
- ~~Slope of the activation function w.r.t value we feed into~~

Slope calculation example

ch2_2.044.png

2 * -4 * 3
-24
If learning rate is 0.01, the new weight would be
2 - 0.01(-24) = 2.24

Network with two inputs affecting prediction

ch2_2.045.png

Code to calculate slopes and update weights

import numpy as np
weights = np.array([1, 2])
input_data = np.array([3, 4])
target = 6
learning_rate = 0.01
preds = (weights * input_data).sum()
error = preds - target

print(error)

Code to calculate slopes and update weights

gradient = 2 * input_data * error

gradient

array([30, 40])

weights_updated = weights - learning_rate * gradient
preds_updated = (weights_updated * input_data).sum()
error_updated = preds_updated - target

print(error_updated)

2.5

Let's practice!

Introduction to Deep Learning in Python