Convolutional Neural Networks

Deep Learning intermedio con PyTorch

Michal Oleszak

Machine Learning Engineer

Why not use linear layers?

A black square representing a 256x256 grayscale image.

Deep Learning intermedio con PyTorch

Why not use linear layers?

A 256 by 256 grayscale image comprises around 65k pixel values.

Deep Learning intermedio con PyTorch

Why not use linear layers?

A linear layer of 1000 neurons processes the 65k pixel values.

Deep Learning intermedio con PyTorch

Why not use linear layers?

There are 65m connections between the image and the layer.

Deep Learning intermedio con PyTorch

Why not use linear layers?

With an RGB color image, there are 200m connections.

Deep Learning intermedio con PyTorch

Why not use linear layers?

  • Linear layers:
    • Slow training
    • Overfitting
    • Don't recognize spatial patterns
  • A better alternative: convolutional layers!

Image with cat in corner and the neurons reading that part of the image

Deep Learning intermedio con PyTorch

Convolutional layer

Filter of size 3 by 3 slides over the input of size 5 by 5 to produce a feature map of size 3 by 3.

  • Slide filter(s) of parameters over the input
  • At each position, perform convolution
  • Resulting feature map:
    • Preservers spatial patterns from input
    • Uses fewer parameters than linear layer
  • One filter = one feature map
  • Apply activations to feature maps
  • All feature maps combined form the output
  • nn.Conv2d(3, 32, kernel_size=3)
Deep Learning intermedio con PyTorch

Convolution

Two 3 by 3 matrices are element-wise multiplied with each other and all numbers in the resulting matrix are summed.

  1. Compute dot product of input patch and filter
    • Top-left field: 2 × 1 = 2
  2. Sum the result
Deep Learning intermedio con PyTorch

Zero-padding

A 4 by 4 image surrounded by a frame of pixels with value zero.

  • Add a frames of zeros to convolutional layer's input
nn.Conv2d(
  3, 32, kernel_size=3, padding=1
)
  • Maintains spatial dimensions of the input and output tensors
  • Ensures border pixels are treated equally to others
Deep Learning intermedio con PyTorch

Max Pooling

A 4 by 4 matrics with each 2 by 2 quarter marked with a different color becomes a 2 by 2 matrix after max pooling.

  • Slide non-overlapping window over input
  • At each position, retain only the maximum value
  • Used after convolutional layers to reduce spatial dimensions
  • nn.MaxPool2d(kernel_size=2)
Deep Learning intermedio con PyTorch

Convolutional Neural Network

class Net(nn.Module):
    def __init__(self, num_classes):
        super().__init__()

self.feature_extractor = nn.Sequential( nn.Conv2d(3, 32, kernel_size=3, padding=1), nn.ELU(), nn.MaxPool2d(kernel_size=2), nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.ELU(), nn.MaxPool2d(kernel_size=2), nn.Flatten(), )
self.classifier = nn.Linear(64*16*16, num_classes)
def forward(self, x): x = self.feature_extractor(x) x = self.classifier(x) return x
  • feature_extractor: (convolution, activation, pooling), repeated twice and flattened
  • classifier: single linear layer
  • forward(): pass input image through feature extractor and classifier
Deep Learning intermedio con PyTorch

Feature extractor output size

self.feature_extractor = nn.Sequential(
  nn.Conv2d(3, 32, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Conv2d(32, 64, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

Schematic diagram showing how input images of shape 3 by 64 by 64 passes through a conv layer, a pooling layer, another conv layer, and another pooling layer.

Deep Learning intermedio con PyTorch

Feature extractor output size

self.feature_extractor = nn.Sequential(
  nn.Conv2d(3, 32, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Conv2d(32, 64, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

Schematic diagram showing how input images of shape 3 by 64 by 64 passes through a conv layer, a pooling layer, another conv layer, and another pooling layer.

Deep Learning intermedio con PyTorch

Feature extractor output size

self.feature_extractor = nn.Sequential(
  nn.Conv2d(3, 32, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Conv2d(32, 64, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

Schematic diagram showing how input images of shape 3 by 64 by 64 passes through a conv layer, a pooling layer, another conv layer, and another pooling layer.

Deep Learning intermedio con PyTorch

Feature extractor output size

self.feature_extractor = nn.Sequential(
  nn.Conv2d(3, 32, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Conv2d(32, 64, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

Schematic diagram showing how input images of shape 3 by 64 by 64 passes through a conv layer, a pooling layer, another conv layer, and another pooling layer.

Deep Learning intermedio con PyTorch

Feature extractor output size

self.feature_extractor = nn.Sequential(
  nn.Conv2d(3, 32, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Conv2d(32, 64, kernel_size=3, padding=1),
  nn.ELU(),
  nn.MaxPool2d(kernel_size=2),
  nn.Flatten(),
)
self.classifier = nn.Linear(64*16*16, num_classes)
`

Schematic diagram showing how input images of shape 3 by 64 by 64 passes through a conv layer, a pooling layer, another conv layer, and another pooling layer.

Deep Learning intermedio con PyTorch

Let's practice!

Deep Learning intermedio con PyTorch

Preparing Video For Download...