Introduction to RLHF

Reinforcement Learning from Human Feedback (RLHF)

Mina Parham

AI Engineer

Welcome to the course!

 

  • Instructor: Mina Parham

 

  • AI Engineer
  • Large Language Models (LLMs)
  • Reinforcement Learning from Human Feedback (RLHF)

 

  • Topic: Reinforcement Learning from Human Feedback (RLHF)

A diagram representing an AI model with an additional step where a human is involved.

Reinforcement Learning from Human Feedback (RLHF)

Welcome to the course!

 

  • Instructor: Mina Parham

 

  • AI Engineer
  • Large Language Models (LLMs)
  • Reinforcement Learning from Human Feedback (RLHF)

 

  • Topic: Reinforcement Learning from Human Feedback (RLHF)

A diagram representing an AI model with an additional step where a human is involved, leading to better results.

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement learning review

A diagram showing an icon of an agent, an action and a reward policy in a cycle, representing the process of reinforcement learning.

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement learning review

A diagram showing an icon of an agent, an action and a reward policy in a cycle, representing the process of reinforcement learning.

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement learning review

A diagram showing an icon of an agent, an action and a reward policy in a cycle, representing the process of reinforcement learning.

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement learning review

A diagram showing an icon of an agent, an action and a reward policy in a cycle, representing the process of reinforcement learning.

Reinforcement Learning from Human Feedback (RLHF)

From RL to RLHF

 

  A diagram showing an icon of an LLM, a text output and a human evaluator, representing part of the cycle of reinforcement learning from human feedback.

Reinforcement Learning from Human Feedback (RLHF)

From RL to RLHF

 

  A diagram showing an icon of an LLM, a text output and a human evaluator, representing part of the cycle of reinforcement learning from human feedback.

Reinforcement Learning from Human Feedback (RLHF)

From RL to RLHF

  • Training the reward model
  • Alignment with human preferences

A diagram showing an icon of an LLM, a text output and a human evaluator, representing part of the cycle of reinforcement learning from human feedback.

Reinforcement Learning from Human Feedback (RLHF)

LLM fine-tuning in RLHF

 

An icon of a large language model.

Reinforcement Learning from Human Feedback (RLHF)

LLM fine-tuning in RLHF

  • Training the initial LLM

An icon of a large language model fine-tuned using an input dataset.

Reinforcement Learning from Human Feedback (RLHF)

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" going into an LLM.

Reinforcement Learning from Human Feedback (RLHF)

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" with an LLM answering: "a 16th Century author".

Reinforcement Learning from Human Feedback (RLHF)

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" with an LLM answering: "a 16th Century author", and an additional model, a policy model, receiving the prompt.

Reinforcement Learning from Human Feedback (RLHF)

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" with an LLM answering: "a 16th Century author", and an additional model, a policy model, receiving the prompt and being trained using a reward model.

Reinforcement Learning from Human Feedback (RLHF)

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" with an LLM answering: "a 16th Century author", and an additional model, a policy model, being trained using a reward model, giving the answer "William Shakespeare".

Reinforcement Learning from Human Feedback (RLHF)

The full RLHF process

A prompt asking "Who wrote Romeo and Juliet" with an LLM answering: "a 16th Century author", and an additional model, a policy model, being trained using a reward model, giving the answer "William Shakespeare", and a check between the two results.

Reinforcement Learning from Human Feedback (RLHF)

Interacting with RLHF-tuned LLMs

  • Pre-trained RLHF models on Hugging Face 🤗
from transformers import pipeline

text_generator = pipeline('text-generation', model='lvwerra/gpt2-imdb-pos-v2')
# Provide a review prompt review_prompt = "This is definitely a" # Generate the continuation output = text_generator(review_prompt, max_length=50) #Print the generated text print(output[0]['generated_text'])
This is definitely a crucial improvement.
Reinforcement Learning from Human Feedback (RLHF)

Interacting with RLHF-tuned LLMs

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer


# Instantiate the pre-trained model and tokenizer model = AutoModelForSequenceClassification.from_pretrained("lvwerra/distilbert-imdb") tokenizer = AutoTokenizer.from_pretrained("lvwerra/distilbert-imdb")
# Use pipeline to create the sentiment analyzer sentiment_analyzer = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer) # Pass the text to the sentiment analyzer and print the result sentiment = sentiment_analyzer("This is definitely a crucial improvement.")
print(f"Sentiment Analysis Result: {sentiment}")
positive
Reinforcement Learning from Human Feedback (RLHF)

Let's practice!

Reinforcement Learning from Human Feedback (RLHF)

Preparing Video For Download...