Advanced fine-tuning

Large Language Models (LLMs) Concepts

Vidhi Chugh

AI strategist and ethicist

Where are we?

progress image showing we are at Advanced fine-tuning stage

Large Language Models (LLMs) Concepts

Reinforcement Learning through Human Feedback

 

  • Pre-training

 

  • Fine-tuning

 

  • Reinforcement Learning through Human Feedback (RLHF)

 

Illustration of four people providing positive feedback using emojis and stars.

Large Language Models (LLMs) Concepts

Pre-training

  • Large amounts of text data:
    • Websites, books and articles
    • Transformer architecture
    • Learns general language patterns, grammar, and facts

 

  • Next-word prediction
  • Masked language modeling

Pre-training process to build LLMs

1 Freepik
Large Language Models (LLMs) Concepts

Fine-tuning

 

  • N-shot training

 

  • Small labeled dataset for related task

Fine tuning process

Large Language Models (LLMs) Concepts

But, why RLHF?

  • General-purpose training data lacks quality
    • Noise
    • Errors
    • Inconsistencies
    • Reduced accuracy

Example of reduced accuracy:

  • Trained on data from online discussion forums
  • Unvalidated opinions and facts
  • Needs external expert validation

 

Archery target with arrows that missed the bullseye

Large Language Models (LLMs) Concepts

Starts with the need to fine-tune

  • Pre-training
    • Learns underlying language patterns
    • Doesn't capture context-specific complexities

 

  • Fine-tuning
    • Quality labeled data improves performance

 

  • Enter RLHF!
    • Human feedback
Large Language Models (LLMs) Concepts

Simplifying RLHF

 

  • Model output reviewed by human
  • Updates model based on the feedback

 

  • Step 1:
    • Receives a prompt
    • Generates multiple responses

 

 

an LLM consuming an input prompt and generating a response

Large Language Models (LLMs) Concepts

Enters human expert

 

  • Step 2:
    • Human expert checks these responses
    • Ranks the responses based on quality
      • Accuracy
      • Relevance
      • Coherence

adding human verification to LLMs response

Large Language Models (LLMs) Concepts

Time for feedback

  • Step 3:
    • Learns from expert's ranking
    • To align its response in future with their preferences

 

  • And it goes on!
    • Continues to generate responses
    • Receives expert's rankings
    • Adjusts the learning

 

 

Human response is fed back to the LLM

Large Language Models (LLMs) Concepts

Recap

  • Pre-training to learn general language knowledge

 

  • Fine-tuning for specific tasks

 

  • RLHF techniques to enhance fine-tuning through human feedback

 

  • Combination is highly effective!
Large Language Models (LLMs) Concepts

Completing the LLM

The complete LLM training process

Large Language Models (LLMs) Concepts

Let's practice!

Large Language Models (LLMs) Concepts

Preparing Video For Download...