Reward models explored

Reinforcement Learning from Human Feedback (RLHF)

Mina Parham

AI Engineer

Process so far

A diagram representing the part of RLHF process covered so far.

Reinforcement Learning from Human Feedback (RLHF)

Process so far

A diagram representing the part of RLHF process covered so far, and an arrow pointing to the next step: reward models.

Reinforcement Learning from Human Feedback (RLHF)

What is a reward model?

 

  A diagram showing an AI model with an arrow pointing to an output.

Reinforcement Learning from Human Feedback (RLHF)

What is a reward model?

  • Model informs the agent
  • Agent evaluates the model to maximize rewards

A diagram showing an AI model and an agent informed by a reward scheme with an arrow pointing to an output.

Reinforcement Learning from Human Feedback (RLHF)

Using the reward trainer

from trl import RewardTrainer, RewardConfig

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset
# Load pre-trained model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=1)
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Load dataset in the required format dataset = load_dataset("path/to/dataset")
Reinforcement Learning from Human Feedback (RLHF)

Training the reward model

# Define training arguments
training_args = RewardConfig(

output_dir="path/to/output/dir",
per_device_train_batch_size=8, per_device_eval_batch_size=8,
num_train_epochs=3,
learning_rate=1e-3
)
Reinforcement Learning from Human Feedback (RLHF)

Training the reward model

# Initialize the RewardTrainer
trainer = RewardTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    tokenizer=tokenizer,
)
# Train the reward model
trainer.train()
Reinforcement Learning from Human Feedback (RLHF)

Let's practice!

Reinforcement Learning from Human Feedback (RLHF)

Preparing Video For Download...