Model fine-tuning using human feedback

Introduction to LLMs in Python

Iván Palomares Carrascosa, PhD

Senior Data Science & AI Manager

Why human feedback in LLMs

"What makes an LLM good?", "What is the LLM user looking for?"

  • Objective, subjective and context-dependent criteria
    • Truthfulness, originality, fine-grained detail vs. concise responses, etc.

 

  • Objective metrics cannot fully capture subjective quality in LLM outputs
  • Use human feedback as a guide (loss function) to optimize LLM outputs

Human feedback to fine-tune an LLM

Introduction to LLMs in Python

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement Learning (RL): an agent learns to make decisions upon feedback -rewards-, adapting its behavior to maximize cumulative reward over time

Reinforcement Learning from Human Feedback

  1. Initial LLM
Introduction to LLMs in Python

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement Learning (RL): an agent learns to make decisions upon feedback -rewards-, adapting its behavior to maximize cumulative reward over time

Reinforcement Learning from Human Feedback

  1. Initial LLM
  2. Train a Reward Model (RM)
Introduction to LLMs in Python

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement Learning (RL): an agent learns to make decisions upon feedback -rewards-, adapting its behavior to maximize cumulative reward over time

Reinforcement Learning from Human Feedback

  1. Initial LLM
  2. Train a Reward Model (RM)
  3. Optimize (fine-tune) LLM using RL algorithm (e.g. PPO) based on trained RM
Introduction to LLMs in Python

Building a reward model

Reward model in RLHF

  • Pre-trained LLM that generates text
    • Collect samples of LLM inputs-outputs
Introduction to LLMs in Python

Building a reward model

Reward model in RLHF

  • Generate dataset to train a reward model: human preferences
    • Training instances are sample-reward pairs
Introduction to LLMs in Python

Building a reward model

Reward model in RLHF

  • Train Reward Model (RM) capable of predicting rewards for LLM input-outputs
Introduction to LLMs in Python

Building a reward model

Reward model in RLHF

  • Train Reward Model (RM) capable of predicting rewards for LLM input-outputs
    • The trained RM is used by an RL algorithm to fine-tune the original LLM
Introduction to LLMs in Python

TRL: Transformer Reinforcement Learning

TRL: a library to train transformer-based LLMs using a variety of RL approaches

Proximal Policy Optimization (PPO): optimize LLM upon <prompt, response, reward> triplets

  • AutoModelForCausalLMWithValueHead: it incorporates a value head for RL scenarios
  • model_ref: reference model, e.g. the loaded pre-trained model before optimizing
  • respond_to_batch: similar purpose as model.generate(), adapted to RL
  • Set up PPOTrainer instance

PPO set-up example:

from trl import PPOTrainer, PPOConfig, create_reference_model, 
                AutoModelForCausalLMWithValueHead
from trl.core import respond_to_batch


model = AutoModelForCausalLMWithValueHead.from_pretrained('gpt2') model_ref = create_reference_model(model) tokenizer = AutoTokenizer.from_pretrained('gpt2') if tokenizer.pad_token is None: tokenizer.add_special_tokens({'pad_token': '[PAD]'})
prompt = "My plan today is to " input = tokenizer.encode(query_txt, return_tensors="pt") response = respond_to_batch(model, input)
ppo_config = PPOConfig(batch_size=1, mini_batch_size=1) ppo_trainer = PPOTrainer(ppo_config, model, model_ref, tokenizer)
reward = [torch.tensor(1.0)] train_stats = ppo_trainer.step([input[0]], [response[0]], reward)
Introduction to LLMs in Python

Let's practice!

Introduction to LLMs in Python

Preparing Video For Download...