Evaluating object recognition models

Deep Learning for Images with PyTorch

Michal Oleszak

Machine Learning Engineer

Classification and localization

object localization

Output 1: Classification (e.g., cat)

Classification and localization

object localization

Output 1: Classification (e.g., cat)
Output 2: Bounding box regression [x1, y1, x2, y2]

Intersection over union (IoU)

Object of interest: object in image we want to detect (e.g., dog)
Ground truth box: the accurate bounding box around the object of interest
Intersection over Union: a metric to measure the overlap between two boxes

jaccard

IoU = Area of Intersection / Area of Union
- IoU = 0 no overlap, IoU = 1 perfect overlap
- IoU >0.5 is a good prediction

IoU in PyTorch

bbox1 = [50, 50, 150, 150]
bbox2 = [100, 100, 200, 200]


bbox1 = torch.tensor(bbox1).unsqueeze(0)
bbox2 = torch.tensor(bbox2).unsqueeze(0)

from torchvision.ops import box_iou

iou = box_iou(bbox1, bbox2)
print(iou)

tensor([[0.1429]])

Two sets of boxes (x1, y1, x2, y2)

set of 2 boxes

Convert vectors to 2-D tensors
Calculate IoU

Predicting bounding boxes

model.eval()
with torch.no_grad():

    output = model(input_image)

print(output)

[{'boxes': tensor([[ 42.8553, 271.9481, 180.6003, 346.7082],
                  [191.6016,  80.4759, 247.8009, 387.5475], ....),
'scores': tensor([1.0000, 1.0000, 0.9998, ... ]),
'labels': tensor([18,  1, 20, 18, 18, 18 ...])
}]

boxes = output[0]["boxes"]

scores = output[0]["scores"]

Non-max suppression (NMS)

multiple boxes

Non-max suppression (NMS)

multiple boxes

Non-max suppression: a common technique to select the most relevant bounding boxes

Non-max: discarding boxes with low confidence score to contain an object
Suppression: discarding boxes with low IoU

Non-max suppression in PyTorch

from torchvision.ops import nms


box_indices = nms(
    boxes=boxes,
    scores=scores,
    iou_threshold=0.5,
)
print(box_indices)

tensor([ 0,   1,   2,   8])

filtered_boxes = boxes[box_indices]

Boxes: tensors with the bounding box coordinates of the shape [N, 4]
Scores: tensor with the confidence score for each box of the shape [N]
iou_threshold: the threshold between 0.0 and 1.0
Output: indices of filtered bounding boxes

Let's practice!

Deep Learning for Images with PyTorch