Convolutional Neural Networks

Intermediate Deep Learning with PyTorch

Michal Oleszak

Machine Learning Engineer

Why not use linear layers?

A black square representing a 256x256 grayscale image.

Intermediate Deep Learning with PyTorch

Why not use linear layers?

A 256 by 256 grayscale image comprises around 65k pixel values.

Intermediate Deep Learning with PyTorch

Why not use linear layers?

A linear layer of 1000 neurons processes the 65k pixel values.

Intermediate Deep Learning with PyTorch

Why not use linear layers?

There are 65m connections between the image and the layer.

Intermediate Deep Learning with PyTorch

Why not use linear layers?

With an RGB color image, there are 200m connections.

Intermediate Deep Learning with PyTorch

Why not use linear layers?

  • Linear layers:
    • Slow training
    • Overfitting
    • Don't recognize spatial patterns
  • A better alternative: convolutional layers!

Image with cat in corner and the neurons reading that part of the image

Intermediate Deep Learning with PyTorch

Convolutional layer

Filter of size 3 by 3 slides over the input of size 5 by 5 to produce a feature map of size 3 by 3.

  • Slide filter(s) of parameters over the input
  • At each position, perform convolution
  • Resulting feature map:
    • Preservers spatial patterns from input
    • Uses fewer parameters than linear layer
  • One filter = one feature map
  • Apply activations to feature maps
  • All feature maps combined form the output
  • nn.Conv2d(3, 32, kernel_size=3)
Intermediate Deep Learning with PyTorch

Convolution

Two 3 by 3 matrices are element-wise multiplied with each other and all numbers in the resulting matrix are summed.

  1. Compute dot product of input patch and filter
    • Top-left field: 2 × 1 = 2
  2. Sum the result
Intermediate Deep Learning with PyTorch

Zero-padding

A 4 by 4 image surrounded by a frame of pixels with value zero.

  • Add a frames of zeros to convolutional layer's input
nn.Conv2d(
  3, 32, kernel_size=3, padding=1
)
  • Maintains spatial dimensions of the input and output tensors
  • Ensures border pixels are treated equally to others
Intermediate Deep Learning with PyTorch

Max Pooling

A 4 by 4 matrics with each 2 by 2 quarter marked with a different color becomes a 2 by 2 matrix after max pooling.

  • Slide non-overlapping window over input
  • At each position, retain only the maximum value
  • Used after convolutional layers to reduce spatial dimensions
  • nn.MaxPool2d(kernel_size=2)
Intermediate Deep Learning with PyTorch

Convolutional Neural Network

class Net(nn.Module):
    def __init__(self, num_classes):
        super().__init__()

self.feature_extractor = nn.Sequential( nn.Conv2d(3, 32, kernel_size=3, padding=1), nn.ELU(), nn.MaxPool2d(kernel_size=2), nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.ELU(), nn.MaxPool2d(kernel_size=2), nn.Flatten(), )
self.classifier = nn.Linear(64*16*16, num_classes)
def forward(self, x): x = self.feature_extractor(x) x = self.classifier(x) return x
  • feature_extractor: (convolution, activation, pooling), repeated twice and flattened
  • classifier: single linear layer
  • forward(): pass input image through feature extractor and classifier
Intermediate Deep Learning with PyTorch

Feature extractor output size

self.feature_extractor = nn.Sequential(
  nn.Conv2d(3, 32, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Conv2d(32, 64, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

Schematic diagram showing how input images of shape 3 by 64 by 64 passes through a conv layer, a pooling layer, another conv layer, and another pooling layer.

Intermediate Deep Learning with PyTorch

Feature extractor output size

self.feature_extractor = nn.Sequential(
  nn.Conv2d(3, 32, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Conv2d(32, 64, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

Schematic diagram showing how input images of shape 3 by 64 by 64 passes through a conv layer, a pooling layer, another conv layer, and another pooling layer.

Intermediate Deep Learning with PyTorch

Feature extractor output size

self.feature_extractor = nn.Sequential(
  nn.Conv2d(3, 32, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Conv2d(32, 64, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

Schematic diagram showing how input images of shape 3 by 64 by 64 passes through a conv layer, a pooling layer, another conv layer, and another pooling layer.

Intermediate Deep Learning with PyTorch

Feature extractor output size

self.feature_extractor = nn.Sequential(
  nn.Conv2d(3, 32, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Conv2d(32, 64, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

Schematic diagram showing how input images of shape 3 by 64 by 64 passes through a conv layer, a pooling layer, another conv layer, and another pooling layer.

Intermediate Deep Learning with PyTorch

Feature extractor output size

self.feature_extractor = nn.Sequential(
  nn.Conv2d(3, 32, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Conv2d(32, 64, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

Schematic diagram showing how input images of shape 3 by 64 by 64 passes through a conv layer, a pooling layer, another conv layer, and another pooling layer.

Intermediate Deep Learning with PyTorch

Let's practice!

Intermediate Deep Learning with PyTorch

Preparing Video For Download...