Convolutional Neural Networks

Intermediate Deep Learning with PyTorch

Michal Oleszak

Machine Learning Engineer

Why not use linear layers?

A black square representing a 256x256 grayscale image.

Why not use linear layers?

A 256 by 256 grayscale image comprises around 65k pixel values.

Why not use linear layers?

A linear layer of 1000 neurons processes the 65k pixel values.

Why not use linear layers?

There are 65m connections between the image and the layer.

Why not use linear layers?

With an RGB color image, there are 200m connections.

Why not use linear layers?

Linear layers:
- Slow training
- Overfitting
- Don't recognize spatial patterns
A better alternative: convolutional layers!

Image with cat in corner and the neurons reading that part of the image

Convolutional layer

Filter of size 3 by 3 slides over the input of size 5 by 5 to produce a feature map of size 3 by 3.

Slide filter(s) of parameters over the input
At each position, perform convolution
Resulting feature map:
- Preservers spatial patterns from input
- Uses fewer parameters than linear layer
One filter = one feature map
Apply activations to feature maps
All feature maps combined form the output
nn.Conv2d(3, 32, kernel_size=3)

Convolution

Two 3 by 3 matrices are element-wise multiplied with each other and all numbers in the resulting matrix are summed.

Compute dot product of input patch and filter
- Top-left field: 2 × 1 = 2
Sum the result

Zero-padding

A 4 by 4 image surrounded by a frame of pixels with value zero.

Add a frames of zeros to convolutional layer's input

nn.Conv2d(
  3, 32, kernel_size=3, padding=1
)

Maintains spatial dimensions of the input and output tensors
Ensures border pixels are treated equally to others

Max Pooling

A 4 by 4 matrics with each 2 by 2 quarter marked with a different color becomes a 2 by 2 matrix after max pooling.

Slide non-overlapping window over input
At each position, retain only the maximum value
Used after convolutional layers to reduce spatial dimensions
nn.MaxPool2d(kernel_size=2)

Convolutional Neural Network

class Net(nn.Module):
    def __init__(self, num_classes):
        super().__init__()

        self.feature_extractor = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.ELU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ELU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Flatten(),
        )

        self.classifier = nn.Linear(64*16*16, num_classes)


    def forward(self, x):  
        x = self.feature_extractor(x)
        x = self.classifier(x)
        return x

feature_extractor: (convolution, activation, pooling), repeated twice and flattened
classifier: single linear layer
forward(): pass input image through feature extractor and classifier