Gradient accumulation

Efficient AI Model Training with PyTorch

Dennis Lee

Data Engineer

Distributed training

 

 

Flowchart illustrating the course topics: data preparation, distributed training, efficient training, and optimizers.

Efficient AI Model Training with PyTorch

Efficient training

 

 

Flowchart illustrating the course topics: data preparation, distributed training, efficient training, and optimizers.

Efficient AI Model Training with PyTorch

Improving training efficiency

 

 

Icons representing memory efficiency, communication efficiency, and computational efficiency.

Efficient AI Model Training with PyTorch

Improving training efficiency

 

 

Icons representing memory efficiency, communication efficiency, and computational efficiency.

Efficient AI Model Training with PyTorch

Improving training efficiency

 

 

Icons representing memory efficiency, communication efficiency, and computational efficiency.

Efficient AI Model Training with PyTorch

Gradient accumulation improves memory efficiency

 

 

Icons representing memory efficiency, communication efficiency, and computational efficiency.

Efficient AI Model Training with PyTorch

The problem with large batch sizes

  • Large batch sizes: Robust gradient estimates for quicker learning
  • GPU memory constrains batch sizes

 

 

Diagram showing that large batch sizes can result in out of memory errors, while smaller batch sizes can help training complete successfully.

Efficient AI Model Training with PyTorch

How does gradient accumulation work?

Diagram showing a large batch split into smaller batches.

  • Gradient accumulation: Sum gradients over smaller batches
  • Effectively train the model on a large batch
  • Update model parameters after summing gradients

Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

PyTorch, Accelerator, and Trainer

Chart comparing ease of use vs. ability to customize for PyTorch, Accelerator, and Trainer.

Efficient AI Model Training with PyTorch

PyTorch, Accelerator, and Trainer

Chart comparing ease of use vs. ability to customize for PyTorch, Accelerator, and Trainer.

Efficient AI Model Training with PyTorch

PyTorch, Accelerator, and Trainer

Chart comparing ease of use vs. ability to customize for PyTorch, Accelerator, and Trainer.

Efficient AI Model Training with PyTorch

Gradient accumulation with PyTorch

for index, batch in enumerate(dataloader):
    inputs, targets = (batch["input_ids"], 
                       batch["labels"])
    inputs, targets = (inputs.to(device), 
                       targets.to(device))









Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

Gradient accumulation with PyTorch

for index, batch in enumerate(dataloader):
    inputs, targets = (batch["input_ids"], 
                       batch["labels"])
    inputs, targets = (inputs.to(device), 
                       targets.to(device))
    outputs = model(inputs, labels=targets)
    loss = outputs.loss







Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

Gradient accumulation with PyTorch

for index, batch in enumerate(dataloader):
    inputs, targets = (batch["input_ids"], 
                       batch["labels"])
    inputs, targets = (inputs.to(device), 
                       targets.to(device))
    outputs = model(inputs, labels=targets)
    loss = outputs.loss
    loss = loss / gradient_accumulation_steps







Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

Gradient accumulation with PyTorch

for index, batch in enumerate(dataloader):
    inputs, targets = (batch["input_ids"], 
                       batch["labels"])
    inputs, targets = (inputs.to(device), 
                       targets.to(device))
    outputs = model(inputs, labels=targets)
    loss = outputs.loss
    loss = loss / gradient_accumulation_steps
    loss.backward()







Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

Gradient accumulation with PyTorch

for index, batch in enumerate(dataloader):
    inputs, targets = (batch["input_ids"], 
                       batch["labels"])
    inputs, targets = (inputs.to(device), 
                       targets.to(device))
    outputs = model(inputs, labels=targets)
    loss = outputs.loss
    loss = loss / gradient_accumulation_steps
    loss.backward()
    if ((index + 1) 
        % gradient_accumulation_steps == 0):





Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

Gradient accumulation with PyTorch

for index, batch in enumerate(dataloader):
    inputs, targets = (batch["input_ids"], 
                       batch["labels"])
    inputs, targets = (inputs.to(device), 
                       targets.to(device))
    outputs = model(inputs, labels=targets)
    loss = outputs.loss
    loss = loss / gradient_accumulation_steps
    loss.backward()
    if ((index + 1) 
        % gradient_accumulation_steps == 0):
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

From PyTorch to Accelerator

Chart comparing ease of use vs. ability to customize for PyTorch, Accelerator, and Trainer.

Efficient AI Model Training with PyTorch

From PyTorch to Accelerator

Chart comparing ease of use vs. ability to customize for PyTorch, Accelerator, and Trainer.

Efficient AI Model Training with PyTorch

Gradient accumulation with Accelerator

accelerator = \
    Accelerator(gradient_accumulation_steps=2)

for index, batch in enumerate(dataloader): inputs, targets = (batch["input_ids"], batch["labels"])

Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

Gradient accumulation with Accelerator

accelerator = \
    Accelerator(gradient_accumulation_steps=2)

for index, batch in enumerate(dataloader):

        inputs, targets = (batch["input_ids"],
                           batch["labels"])
        outputs = model(inputs, 
                        labels=targets)
        loss = outputs.loss





Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

Gradient accumulation with Accelerator

accelerator = \
    Accelerator(gradient_accumulation_steps=2)

for index, batch in enumerate(dataloader):
    with accelerator.accumulate(model):
        inputs, targets = (batch["input_ids"],
                           batch["labels"])
        outputs = model(inputs, 
                        labels=targets)
        loss = outputs.loss




Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

Gradient accumulation with Accelerator

accelerator = \
    Accelerator(gradient_accumulation_steps=2)

for index, batch in enumerate(dataloader):
    with accelerator.accumulate(model):
        inputs, targets = (batch["input_ids"],
                           batch["labels"])
        outputs = model(inputs, 
                        labels=targets)
        loss = outputs.loss
        accelerator.backward(loss)



Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

Gradient accumulation with Accelerator

accelerator = \
    Accelerator(gradient_accumulation_steps=2)

for index, batch in enumerate(dataloader):
    with accelerator.accumulate(model):
        inputs, targets = (batch["input_ids"],
                           batch["labels"])
        outputs = model(inputs, 
                        labels=targets)
        loss = outputs.loss
        accelerator.backward(loss)



Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

Gradient accumulation with Accelerator

accelerator = \
    Accelerator(gradient_accumulation_steps=2)

for index, batch in enumerate(dataloader):
    with accelerator.accumulate(model):
        inputs, targets = (batch["input_ids"],
                           batch["labels"])
        outputs = model(inputs, 
                        labels=targets)
        loss = outputs.loss
        accelerator.backward(loss)
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

Diagram depicting gradient accumulation as the sum of gradients from multiple batches.

Efficient AI Model Training with PyTorch

From Accelerator to Trainer

Chart comparing ease of use vs. ability to customize for PyTorch, Accelerator, and Trainer.

Efficient AI Model Training with PyTorch

From Accelerator to Trainer

Chart comparing ease of use vs. ability to customize for PyTorch, Accelerator, and Trainer.

Efficient AI Model Training with PyTorch

Gradient accumulation with Trainer

training_args = TrainingArguments(output_dir="./results",
                                  evaluation_strategy="epoch",
                                  gradient_accumulation_steps=2)

trainer = Trainer(model=model, args=training_args, train_dataset=dataset["train"], eval_dataset=dataset["validation"], compute_metrics=compute_metrics)
trainer.train()
{'epoch': 1.0, 'eval_loss': 0.73, 'eval_accuracy': 0.03, 'eval_f1': 0.05}
{'epoch': 2.0, 'eval_loss': 0.68, 'eval_accuracy': 0.19, 'eval_f1': 0.25}
Efficient AI Model Training with PyTorch

Let's practice!

Efficient AI Model Training with PyTorch

Preparing Video For Download...