Prepare models with AutoModel and Accelerator

Efficient AI Model Training with PyTorch

Dennis Lee

Data Engineer

Meet your instructor!

$$

  • Data engineer

Instructor photograph

Icon of a server

Efficient AI Model Training with PyTorch

Meet your instructor!

$$

  • Data engineer
  • Data scientist

Instructor photograph

Icons of a server and a lightbulb

Efficient AI Model Training with PyTorch

Meet your instructor!

$$

  • Data engineer
  • Data scientist
  • Ph.D. in Electrical Engineering

Instructor photograph

Icons of a server, a lighhtbulb, and a graduation hat

Efficient AI Model Training with PyTorch

Meet your instructor!

$$

  • Data engineer
  • Data scientist
  • Ph.D. in Electrical Engineering

Instructor photograph

Icons of a server, a lighhtbulb, and a graduation hat

$$

$$

$$

Excited to share best practices!

Efficient AI Model Training with PyTorch

Our roadmap to efficient AI training

  • Distributed AI model training

Laptop

Efficient AI Model Training with PyTorch

Our roadmap to efficient AI training

  • Distributed AI model training

A laptop and a person waiting

Efficient AI Model Training with PyTorch

Our roadmap to efficient AI training

  • Distributed AI model training

A laptop, a person waiting, and a memory chip

  • ↓ ↓ Training times for large language models

A calendar

Efficient AI Model Training with PyTorch

Our roadmap to efficient AI training

Flowchart illustrating the course topics: data preparation, distributed training, efficient training, and optimizers.

$$

  • Data preparation: placing data on multiple devices
Efficient AI Model Training with PyTorch

Our roadmap to efficient AI training

Flowchart illustrating the course topics: data preparation, distributed training, efficient training, and optimizers.

$$

  • Data preparation: placing data on multiple devices
  • Distributed training: scaling training to multiple devices
Efficient AI Model Training with PyTorch

Our roadmap to efficient AI training

Flowchart illustrating the course topics: data preparation, distributed training, efficient training, and optimizers.

$$

  • Data preparation: placing data on multiple devices
  • Distributed training: scaling training to multiple devices
  • Efficient training: optimizing available devices
Efficient AI Model Training with PyTorch

Our roadmap to efficient AI training

Flowchart illustrating the course topics: data preparation, distributed training, efficient training, and optimizers.

$$

  • Data preparation: placing data on multiple devices
  • Distributed training: scaling training to multiple devices
  • Efficient training: optimizing available devices
  • Optimizers: accelerating training
Efficient AI Model Training with PyTorch

CPUs

  • Most laptops have CPUs

         

A CPU laptop

GPUs

  • GPUs can train large models

         

A GPU laptop

Efficient AI Model Training with PyTorch

CPUs vs GPUs

CPUs

  • Most laptops have CPUs
  • Designed for general purpose computing
  • Better control flow

         

A laptop used for daily tasks

GPUs

  • GPUs can train large models
  • Specialize in highly parallel computing
  • Excel at matrix operations

         

A laptop used for parallel computing

Efficient AI Model Training with PyTorch

Distributed training

Diagram of distributed training showing model replication and data sharding.

  • Data sharding: each device processes a subset of data in parallel
Efficient AI Model Training with PyTorch

Distributed training

Diagram of distributed training showing model replication and data sharding.

  • Data sharding: each device processes a subset of data in parallel
  • Model replication: each device performs forward/backward passes
  • Gradient aggregation: designated device aggregates gradients
  • Parameter synchronization: designated device shares updated parameters
Efficient AI Model Training with PyTorch

Effortless efficiency: leveraging pre-trained models

  • Leverage pre-trained Transformer models
  • Initialize model parameters by calling AutoModelForSequenceClassification
  • Display the configuration
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(model_name)

print(model.config)
DistilBertConfig {
  "architectures": ["DistilBertForMaskedLM"],
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  ...
Efficient AI Model Training with PyTorch

Device placement with Accelerator

  • A Hugging Face class 🤗
  • Accelerator detects which devices are available on our computer
  • Automate device placement and data parallelism: accelerator.prepare()
  • Place the model (with type torch.nn.Module) on the first available GPU
  • Defaults to the CPU if no GPU is found
from accelerate import Accelerator
accelerator = Accelerator()
model = accelerator.prepare(model)

print(accelerator.device)
cpu
Efficient AI Model Training with PyTorch

Let's practice!

Efficient AI Model Training with PyTorch

Preparing Video For Download...