Introduction to RLHF

Reinforcement Learning from Human Feedback (RLHF)

Mina Parham

AI Engineer

Welcome to the course!

Instructor: Mina Parham

AI Engineer
Large Language Models (LLMs)
Reinforcement Learning from Human Feedback (RLHF)

Topic: Reinforcement Learning from Human Feedback (RLHF)

A diagram representing an AI model with an additional step where a human is involved.

Welcome to the course!

Instructor: Mina Parham

AI Engineer
Large Language Models (LLMs)
Reinforcement Learning from Human Feedback (RLHF)

Topic: Reinforcement Learning from Human Feedback (RLHF)

A diagram representing an AI model with an additional step where a human is involved, leading to better results.

Reinforcement learning review

A diagram showing an icon of an agent, an action and a reward policy in a cycle, representing the process of reinforcement learning.

Reinforcement learning review

A diagram showing an icon of an agent, an action and a reward policy in a cycle, representing the process of reinforcement learning.

Reinforcement learning review

A diagram showing an icon of an agent, an action and a reward policy in a cycle, representing the process of reinforcement learning.

Reinforcement learning review

A diagram showing an icon of an agent, an action and a reward policy in a cycle, representing the process of reinforcement learning.

From RL to RLHF

A diagram showing an icon of an LLM, a text output and a human evaluator, representing part of the cycle of reinforcement learning from human feedback.

From RL to RLHF

A diagram showing an icon of an LLM, a text output and a human evaluator, representing part of the cycle of reinforcement learning from human feedback.

From RL to RLHF

Training the reward model
Alignment with human preferences

A diagram showing an icon of an LLM, a text output and a human evaluator, representing part of the cycle of reinforcement learning from human feedback.

LLM fine-tuning in RLHF

An icon of a large language model.

LLM fine-tuning in RLHF

Training the initial LLM

An icon of a large language model fine-tuned using an input dataset.

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" going into an LLM.

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" with an LLM answering: "a 16th Century author".

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" with an LLM answering: "a 16th Century author", and an additional model, a policy model, receiving the prompt.

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" with an LLM answering: "a 16th Century author", and an additional model, a policy model, receiving the prompt and being trained using a reward model.

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" with an LLM answering: "a 16th Century author", and an additional model, a policy model, being trained using a reward model, giving the answer "William Shakespeare".

The full RLHF process

Interacting with RLHF-tuned LLMs

Pre-trained RLHF models on Hugging Face 🤗

from transformers import pipeline

text_generator = pipeline('text-generation', model='lvwerra/gpt2-imdb-pos-v2')

# Provide a review prompt
review_prompt = "This is definitely a"

# Generate the continuation
output = text_generator(review_prompt, max_length=50)

#Print the generated text
print(output[0]['generated_text'])

This is definitely a crucial improvement.

Interacting with RLHF-tuned LLMs

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer


# Instantiate the pre-trained model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("lvwerra/distilbert-imdb")
tokenizer = AutoTokenizer.from_pretrained("lvwerra/distilbert-imdb")


# Use pipeline to create the sentiment analyzer
sentiment_analyzer = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

# Pass the text to the sentiment analyzer and print the result
sentiment = sentiment_analyzer("This is definitely a crucial improvement.")

print(f"Sentiment Analysis Result: {sentiment}")

positive

Let's practice!

Reinforcement Learning from Human Feedback (RLHF)