Applying dynamic quantization

Scalable AI Models with PyTorch Lightning

Sergiy Tkachuk

Director, GenAI Productivity

Why use quantization?

$$

  • Memory reduction

Icon representing a reduction in computing memory

Scalable AI Models with PyTorch Lightning

Why use quantization?

$$

  • Memory reduction
  • CPU acceleration

Icons representing a reduction in computing memory and a lightning bolt

Scalable AI Models with PyTorch Lightning

Why use quantization?

$$

  • Memory reduction
  • CPU acceleration
  • Mobile inference

Icons representing a reduction in computing memory, a lightning bolt, and a mobile device

Scalable AI Models with PyTorch Lightning

What is dynamic quantization?

$$

  • Reduce precision of weights and operations
  • Improves inference speed
  • Ideal for deployment on resource-constrained devices

$$

import torch
from torch.quantization 
import quantize_dynamic


model_quantized = quantize_dynamic( model, {torch.nn.Linear}, dtype=torch.qint8 )
Scalable AI Models with PyTorch Lightning

Evaluating quantization impact

A diagram of a scale balancing two models, the original and the quantized one, representing the tradeoff between accuracy and efficiency

Scalable AI Models with PyTorch Lightning

Performance comparison

$$

  • ⚡ Compare inference speed and memory footprint

$$

  • 📊 Determine acceptable accuracy trade-offs

$$

  • ⛗ Decide on quantization suitability based on deployment needs
Scalable AI Models with PyTorch Lightning

Comparing performance

import time

def measure_time(model, data_loader):
      model.eval()  # Set model to evaluation mode
    start_time = time.time()
    for inputs in data_loader:

_ = model(inputs) end_time = time.time() return end_time - start_time
original_time = measure_time(model, test_loader) quant_time = measure_time(model_quant, test_loader) print(f"Original Model Time: {original_time:.2f}s") print(f"Quantized Model Time: {quant_time:.2f}s")
Scalable AI Models with PyTorch Lightning

Let's practice!

Scalable AI Models with PyTorch Lightning

Preparing Video For Download...