Object detection using R-CNN

Deep Learning for Images with PyTorch

Michal Oleszak

Machine Learning Engineer

Region-based CNN family: R-CNN

R-CNN family: R-CNN, Fast-CNN, Faster CNN

R-CNN

  • Module 1: generation of region proposals
1 Citation: Jason Brownlee. 2019. Deep Learning for Computer Vision.
Deep Learning for Images with PyTorch

Region-based CNN family: R-CNN

R-CNN family: R-CNN, Fast-CNN, Faster CNN

R-CNN

  • Module 1: generation of region proposals
  • Module 2: feature extraction (convolutional layers)
1 Citation: Jason Brownlee. 2019. Deep Learning for Computer Vision.
Deep Learning for Images with PyTorch

Region-based CNN family: R-CNN

R-CNN family: R-CNN, Fast-CNN, Faster CNN

R-CNN

  • Module 1: generation of region proposals
  • Module 2: feature extraction (convolutional layers)
  • Module 3: class and bounding box prediction
1 Citation: Jason Brownlee. 2019. Deep Learning for Computer Vision.
Deep Learning for Images with PyTorch

R-CNN: backbone

  • Convolutional layers: pre-trained models
    • Backbone: the core CNN architecture responsible for feature extraction

  backbone

  • Convolutional & pooling layers
  • Extract features for region proposals and object detection
Deep Learning for Images with PyTorch

R-CNN: backbone with PyTorch

import torch.nn as nn
from torchvision.models import vgg16,
    VGG16_Weights


vgg = vgg16(weights=VGG16_Weights.DEFAULT)

vgg model

Deep Learning for Images with PyTorch

R-CNN: backbone with PyTorch

import torch.nn as nn
from torchvision.models import vgg16,
    VGG16_Weights


vgg = vgg16(weights=VGG16_Weights.DEFAULT)

vgg model features

  • .features: only convolutional layers
Deep Learning for Images with PyTorch

R-CNN: backbone with PyTorch

import torch.nn as nn
from torchvision.models import vgg16,
    VGG16_Weights


vgg = vgg16(weights=VGG16_Weights.DEFAULT)

vgg model

  • .features: only convolutional layers
  • .children(): all layers from block
Deep Learning for Images with PyTorch

R-CNN: backbone with PyTorch

import torch.nn as nn
from torchvision.models import vgg16,
    VGG16_Weights


vgg = vgg16(weights=VGG16_Weights.DEFAULT)
backbone = nn.Sequential( *list(vgg.features.children()) )
  • nn.Sequential(*list()): all sub-layers are placed into a sequential block as a list
    • *: unpacks the elements from the list

vgg model

  • .features: only convolutional layers
  • .children(): all layers from block
Deep Learning for Images with PyTorch

R-CNN: classifier layer

  • Extract backbone's output size
input_dimension = nn.Sequential(*list(
    vgg_backbone.classifier.children())
)[0].in_features
  • Create a new classifier
classifier = nn.Sequential(
    nn.Linear(input_dimension, 512),
    nn.ReLU(),
    nn.Linear(512, num_classes),
)
Deep Learning for Images with PyTorch

R-CNN: box regressor layer

  • Sits on top of the backbone
  • 4 outputs for the 4 box coordinates
box_regressor = nn.Sequential(
    nn.Linear(input_dimension, 32),
    nn.ReLU(),
    nn.Linear(32, 4),
)
Deep Learning for Images with PyTorch

Putting it all together: object detection model

class ObjectDetectorCNN(nn.Module):
    def __init__(self):
        super(ObjectDetectorCNN, self).__init__()

vgg = vgg16(weights=VGG16_Weights.DEFAULT) self.backbone = nn.Sequential(*list(vgg.features.children()))
input_features = nn.Sequential(*list(vgg.classifier.children()))[0].in_features
self.classifier = nn.Sequential( nn.Linear(input_features, 512), nn.ReLU(), nn.Linear(512, 2), )
self.box_regressor = nn.Sequential( nn.Linear(input_features, 32), nn.ReLU(), nn.Linear(32, 4), )
Deep Learning for Images with PyTorch

Putting it all together: object detection model

class ObjectDetector(nn.Module):
    (...)

    def forward(self, x):

features = self.backbone(x)
bboxes = self.regressor(features) classes = self.classifier(features) return bboxes, classes
Deep Learning for Images with PyTorch

Running object recognition

  1. Load and transform the image
  2. unsqueeze() the image to add the batch dimension
  3. Pass the image tensor to the model
  4. Run Non-Max Suppression (nms()) over model's output
  5. draw_bounding_boxes() on top of the image
Deep Learning for Images with PyTorch

Let's practice!

Deep Learning for Images with PyTorch

Preparing Video For Download...