Bounding boxes

Deep Learning for Images with PyTorch

Michal Oleszak

Machine Learning Engineer

What is object recognition?

Object recognition identifies objects in images:

Location of each object in an image (bounding box)
Class label of each object

Applications: surveillance, medical diagnosis, traffic management, sports analytics

In this video: annotation with bounding boxes
In later videos: evaluation and models

driving car detection

Bounding box representation

A rectangular box describing the object's spatial location
Training data annotations & model outputs
Ground truth bounding box: precise object location

bounding box coordinates

Bounding box representation

A rectangular box describing the object's spatial location
Training data annotations & model outputs
Ground truth bounding box: precise object location
Bounding box coordinates:
- Top left and bottom right
- Bounding box = (x1, y1, x2, y2)
- x1 = x_min, x2 = x_max, ...

bounding box coordinates

Pixels and coordinates

box coordinates

Coordinates: x - the column number, y - the row number
Origin: (0, 0) - the top left corner

Converting pixels to tensors

Transforming with ToTensor()

Tensor type:
- torch.float
Scaled tensor range:
- [0.0, 1.0]

import torchvision.transforms as transforms

transform = transforms.Compose([
            transforms.Resize(224),
            transforms.ToTensor()
            ])
image_tensor = transform(image)

Tranforming with PILToTensor()

Tensor type:
- torch.uint8 (8-bit integer)
Unscaled tensor range:
- [0, 255]

import torchvision.transforms as transforms
transform = transforms.Compose([
            transforms.Resize(224),
            transforms.PILToTensor()
            ])
image_tensor = transform(image)

Drawing the bounding box

from torchvision.utils import draw_bounding_boxes


bbox = torch.tensor([x_min, y_min, x_max, y_max])
bbox = bbox.unsqueeze(0)

bbox_image = draw_bounding_boxes(
    image_tensor, bbox, width=3, colors="red"
)


transform = transforms.Compose([
    transforms.ToPILImage()
])
pil_image = transform(bbox_image)

import matplotlib.pyplot as plt
plt.imshow(pil_image)

Import draw_bounding_boxes
Collect coordinates into a tensor
Unsqueeze to two dimensions
Transform to image and plot

cat with box

Let's practice!

Deep Learning for Images with PyTorch