Loading data in PyTorch

Introduction to Deep Learning with PyTorch

Maham Faisal Khan

Senior Data Scientist

Introducing the PyTorch Dataset class

  • We need a way to load our dataset
  • PyTorch Dataset class provides an interface to store and manipulate data
  • PyTorch Dataset allows us to decouple data preprocessing from training, leading to better readability and modularity
Introduction to Deep Learning with PyTorch

The TensorDataset class

from torch.data.utils import TensorDataset

data = np.array([[1, 2, 3], [4, 5, 6]])
tensor = torch.tensor(data)

dataset = TensorDataset(tensor)
dataset[0]
tensor([[1, 2, 3], dtype=torch.float64)

TensorDataset:

  • takes any number of PyTorch tensors as inputs
  • outputs a tuple that contains the indexed element of each input tensor
  • useful when the data is loaded as a NumPy array
df = pd.read_csv(...)
df_numpy = df.to_numpy()
dataset = TensorDataset(df_numpy)
Introduction to Deep Learning with PyTorch

Creating a custom dataset

  • Nine features and one target (potability)

a sample of the water potability dataset

  • The potability (suitability for drinking) is either zero (non potable) or one (potable)
  • The dataset has been normalized: each feature is bounded between zero and one
Introduction to Deep Learning with PyTorch

Create a custom dataset

from torch.utils.data import Dataset

class WaterDataset(Dataset):
    def __init__(self, csv_path):
        super(WaterDataset, self).__init__()
        df = pd.read_csv(dataset_path, index_col=0)
        self.data = df.to_numpy()

    def __len__(self):
        # Return the number of samples in our dataset
        return self.data.shape[0]

    def __getitem__(self, index):
        # Return features (array for 9 values) and label (single float) at the given index
        return features, label
Introduction to Deep Learning with PyTorch

PyTorch dataloader

from torch.utils.data import DataLoader

dataset = WaterDataset(csv_path)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)
  • PyTorch DataLoader:
    • takes a PyTorch Dataset as input
    • returns an iterable
    • allows batching (returns multiple samples at once)
    • allows shuffling (returns the data in random order).
  • Outputs of the DataLoader are inputs of the neural network
Introduction to Deep Learning with PyTorch

PyTorch dataloader

# Create an iterator
dataloader = iter(dataloader)
# Get the next data sample
features, labels = next(dataloader)
# Run a forward pass
predictions = model(features)
# Loop through the dataloader directly
for data in dataloader:
  # Extract features and labels
  features, labels = data
  # Run a forward pass
  predictions = model(features)
Introduction to Deep Learning with PyTorch

Let's practice!

Introduction to Deep Learning with PyTorch

Preparing Video For Download...