ReLU activation functions

Introduction to Deep Learning with PyTorch

Jasmin Ludolf

Senior Data Science Content Developer, DataCamp

Sigmoid and softmax functions

$$

SIGMOID for BINARY classification

A neural network with sigmoid function

$$

SOFTMAX for MULTI-CLASS classification

A neural network with softmax function

Limitations of the sigmoid and softmax function

Sigmoid function:

Outputs bounded between 0 and 1
Usable anywhere in a network

Gradients:

Very small for large and small values of x
Cause saturation, leading to the vanishing gradients problem

$$

The softmax function also suffers from saturation

The sigmoid function

ReLU

Rectified Linear Unit (ReLU):

f(x) = max(x, 0)
For positive inputs: output equals input
For negative inputs: output is 0
Helps overcome vanishing gradients

$$

In PyTorch:

relu = nn.ReLU()

ReLU function

Leaky ReLU

Leaky ReLU:

Positive inputs behave like ReLU
Negative inputs are scaled by a small coefficient (default 0.01)
Gradients for negative inputs are non-zero

$$

In PyTorch:

leaky_relu = nn.LeakyReLU(
  negative_slope = 0.05)

Leaky ReLU

Let's practice!

Introduction to Deep Learning with PyTorch

Preparing Video For Download...