Handling images with PyTorch

Intermediate Deep Learning with PyTorch

Michal Oleszak

Machine Learning Engineer

Clouds dataset

Samples from the clouds dataset: five images showing different types of clouds.

¹ https://www.kaggle.com/competitions/cloud-type-classification2/data

What is an image?

A cloud image with a part of it zoomed-in so that pixels are visible.

Image consists of pixels ("picture elements")
Each pixel contains color information
Grayscale images: integer in 0 - 255
- 30:

Gray colored box

Color images: three integers, one for each color channel (Red, Green, Blue)
- RGB = (52, 171, 235):

Blue colored box

Loading images to PyTorch

Desired directory structure:

clouds_train

  - cumulus

    - 75cbf18.jpg
    - ...

  - cumulonimbus
  - ...

clouds_test

  - cumulus
  - cumulonimbus
  - ...

Main folders: clouds_train and clouds_test
Inside each main folder: one folder per category
Inside each class folder: image files

Loading images to PyTorch

from torchvision.datasets import ImageFolder
from torchvision import transforms


train_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Resize((128, 128)),
])


dataset_train = ImageFolder(
    "data/clouds_train",
    transform=train_transforms,
)

Define transformations:
- Parse to tensor
- Resize to 128x128
Create dataset passing:
- Path to data
- Predefined transformations

Displaying images

dataloader_train = DataLoader(
    dataset_train, 
    shuffle=True, 
    batch_size=1,
)

image, label = next(iter(dataloader_train))
print(image.shape)

torch.Size([1, 3, 128, 128])

image = image.squeeze().permute(1, 2, 0)
print(image.shape)

torch.Size([128, 128, 3])

import matplotlib.pyplot as plt
plt.imshow(image)
plt.show()

Cloud image output of plt.show

Data augmentation

train_transforms = transforms.Compose([

    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(45),

    transforms.ToTensor(),
    transforms.Resize((128, 128)),
])


dataset_train = ImageFolder(
    "data/clouds/train",
    transform=train_transforms,
)

Data augmentation: Generating more data by applying random transformations to original images

Increase the size and diversity of the training set
Improve model robustness
Reduce overfitting

Three images of clouds showing the rotation transformation

Let's practice!

Intermediate Deep Learning with PyTorch