Bounding boxes

Deep Learning for Images with PyTorch

Michal Oleszak

Machine Learning Engineer

What is object recognition?

Object recognition identifies objects in images:

  • Location of each object in an image (bounding box)

  • Class label of each object

Applications: surveillance, medical diagnosis, traffic management, sports analytics

  • In this video: annotation with bounding boxes
  • In later videos: evaluation and models

 

driving car detection

Deep Learning for Images with PyTorch

Bounding box representation

  • A rectangular box describing the object's spatial location
  • Training data annotations & model outputs
  • Ground truth bounding box: precise object location

bounding box coordinates

Deep Learning for Images with PyTorch

Bounding box representation

  • A rectangular box describing the object's spatial location
  • Training data annotations & model outputs
  • Ground truth bounding box: precise object location
  • Bounding box coordinates:
    • Top left and bottom right
    • Bounding box = (x1, y1, x2, y2)
    • x1 = x_min, x2 = x_max, ...

bounding box coordinates

Deep Learning for Images with PyTorch

Pixels and coordinates

box coordinates

  • Coordinates: x - the column number, y - the row number
  • Origin: (0, 0) - the top left corner
Deep Learning for Images with PyTorch

Converting pixels to tensors

Transforming with ToTensor()

  • Tensor type:
    • torch.float
  • Scaled tensor range:
    • [0.0, 1.0]
import torchvision.transforms as transforms

transform = transforms.Compose([ transforms.Resize(224), transforms.ToTensor() ]) image_tensor = transform(image)

Tranforming with PILToTensor()

  • Tensor type:
    • torch.uint8 (8-bit integer)
  • Unscaled tensor range:
    • [0, 255]
import torchvision.transforms as transforms
transform = transforms.Compose([
            transforms.Resize(224),
            transforms.PILToTensor()
            ])
image_tensor = transform(image)
Deep Learning for Images with PyTorch

Drawing the bounding box

from torchvision.utils import draw_bounding_boxes


bbox = torch.tensor([x_min, y_min, x_max, y_max]) bbox = bbox.unsqueeze(0)
bbox_image = draw_bounding_boxes( image_tensor, bbox, width=3, colors="red" )
transform = transforms.Compose([ transforms.ToPILImage() ]) pil_image = transform(bbox_image) import matplotlib.pyplot as plt plt.imshow(pil_image)
  • Import draw_bounding_boxes
  • Collect coordinates into a tensor
  • Unsqueeze to two dimensions
  • Transform to image and plot

cat with box

Deep Learning for Images with PyTorch

Let's practice!

Deep Learning for Images with PyTorch

Preparing Video For Download...