A deeper dive into loading data

Introduction to Deep Learning with PyTorch

Jasmin Ludolf

Senior Data Science Content Developer, DataCamp

Our animals dataset

import pandas as pd
animals = pd.read_csv('animal_dataset.csv')

animal_name	hair	feathers	eggs	milk	predator	legs	tail	type
sparrow	0	1	1	0	0	2	1	0
eagle	0	1	1	0	1	2	1	0
cat	1	0	0	1	1	4	1	1
dog	1	0	0	1	0	4	1	1
lizard	0	0	1	0	1	4	1	2

Type categories: bird (0), mammal (1), reptile (2)

Our animals dataset: defining features

import numpy as np

# Define input features
features = animals.iloc[:, 1:-1]


X = features.to_numpy()
print(X)

[[0 1 1 0 0 2 1]
 [0 1 1 0 1 2 1]
 [1 0 0 1 1 4 1]
 [1 0 0 1 0 4 1]
 [0 0 1 0 1 4 1]]

Back to our animals dataset: defining target values

# Define target values (ground truth)
target = animals.iloc[:, -1]
y = target.to_numpy()
print(y)

[0 0 1 1 2]

TensorDataset

import torch
from torch.utils.data import TensorDataset


# Instantiate dataset class
dataset = TensorDataset(torch.tensor(X), torch.tensor(y))


# Access an individual sample
input_sample, label_sample = dataset[0]
print('input sample:', input_sample)  
print('label sample:', label_sample)

input sample: tensor([0, 1, 1, 0, 0, 2, 1])
label sample: tensor(0)

DataLoader

from torch.utils.data import DataLoader


batch_size = 2

shuffle = True


# Create a DataLoader
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)

Epoch: one full pass through the training dataloader
Generalization: model performs well with unseen data

DataLoader

# Iterate over the dataloader
for batch_inputs, batch_labels in dataloader:
    print('batch_inputs:', batch_inputs)
    print('batch_labels:', batch_labels)

batch_inputs: tensor([[1, 0, 0, 1, 1, 4, 1],
        [1, 0, 0, 1, 0, 4, 1]])
batch_labels: tensor([1, 1])

batch_inputs: tensor([[0, 1, 1, 0, 1, 2, 1],
        [0, 0, 1, 0, 1, 4, 1]])
batch_labels: tensor([0, 2])

batch_inputs: tensor([[0, 1, 1, 0, 0, 2, 1]])
batch_labels: tensor([0])

Let's practice!

Introduction to Deep Learning with PyTorch