Convolutional layers for images

Deep Learning for Images with PyTorch

Michal Oleszak

Machine Learning Engineer

Convolutional layers for images

Apply convolutional layers to image data
Access and add convolutional layers
Create convolutional blocks

Used to adapt models to a specific task

Box around a cat

Conv2d: input channels

RGB channels

Grayscale image: in_channels=1
RGB image (red, green, blue): in_channels=3
Transparency includes alpha channel: in_channels=4

from torchvision.transforms import functional
image = PIL.Image.open("dog.png")
num_channels = functional.get_image_num_channels(image)
print("Number of channels: ", num_channels)

Number of channels: 3

Conv2d: kernel

filters

Input tensor Kernel Output tensor (feature map)

Kernel (colored in green) moves from left to right, top to bottom of the image$^1$

¹ Thevenot, Axel. 2020. A visual and mathematical explanation of the 2D convolution layer.

Kernel sizes

matrix calculation

The most common kernel sizes: 3x3 (Conv2d) and 2x2 (MaxPool2d)
Convolution is a dot product of the kernel (green) and the image region (pink)
The sum of the dot product creates a feature map (blue)

Kernel is a filter

Capture image patterns

area filter

line filter

Conv2d: output channels

Input channel Kernel filters Output channels

The number of output channels determines how many filters are applied
Each output channel corresponds to a distinct filter
A higher number of output channels allows the layer to learn more complex features
Output channel numbers are commonly chosen as powers of 2 (16, 32, 64, 128)
- It simplifies the process of combining and dividing channels in subsequent layers

Adding convolutional layers

import torch
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)

conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)

model = Net()

model.add_module('conv2', conv2)

Accessing convolutional layers

print(model)

Net(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)

model.conv2

Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

Creating convolutional blocks

Stacking convolutional layers in a block with nn.Sequential()

class BinaryImageClassification(nn.Module):
    def __init__(self):
        super(BinaryImageClassification, self).__init__()

        self.conv_block = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

     def forward(self, x):
         x = self.conv_block(x)

Let's practice!

Deep Learning for Images with PyTorch