Advanced fine-tuning

Concetti sui Large Language Models (LLM)

Vidhi Chugh

AI strategist and ethicist

Where are we?

progress image showing we are at Advanced fine-tuning stage

Concetti sui Large Language Models (LLM)

Reinforcement Learning through Human Feedback

 

  • Pre-training

 

  • Fine-tuning

 

  • Reinforcement Learning through Human Feedback (RLHF)

 

Illustration of four people providing positive feedback using emojis and stars.

Concetti sui Large Language Models (LLM)

Pre-training

  • Large amounts of text data:
    • Websites, books and articles
    • Transformer architecture
    • Learns general language patterns, grammar, and facts

 

  • Next-word prediction
  • Masked language modeling

Pre-training process to build LLMs

1 Freepik
Concetti sui Large Language Models (LLM)

Fine-tuning

 

  • N-shot training

 

  • Small labeled dataset for related task

Fine tuning process

Concetti sui Large Language Models (LLM)

But, why RLHF?

  • General-purpose training data lacks quality
    • Noise
    • Errors
    • Inconsistencies
    • Reduced accuracy

Example of reduced accuracy:

  • Trained on data from online discussion forums
  • Unvalidated opinions and facts
  • Needs external expert validation

 

Archery target with arrows that missed the bullseye

Concetti sui Large Language Models (LLM)

Starts with the need to fine-tune

  • Pre-training
    • Learns underlying language patterns
    • Doesn't capture context-specific complexities

 

  • Fine-tuning
    • Quality labeled data improves performance

 

  • Enter RLHF!
    • Human feedback
Concetti sui Large Language Models (LLM)

Simplifying RLHF

 

  • Model output reviewed by human
  • Updates model based on the feedback

 

  • Step 1:
    • Receives a prompt
    • Generates multiple responses

 

 

an LLM consuming an input prompt and generating a response

Concetti sui Large Language Models (LLM)

Enters human expert

 

  • Step 2:
    • Human expert checks these responses
    • Ranks the responses based on quality
      • Accuracy
      • Relevance
      • Coherence

adding human verification to LLMs response

Concetti sui Large Language Models (LLM)

Time for feedback

  • Step 3:
    • Learns from expert's ranking
    • To align its response in future with their preferences

 

  • And it goes on!
    • Continues to generate responses
    • Receives expert's rankings
    • Adjusts the learning

 

 

Human response is fed back to the LLM

Concetti sui Large Language Models (LLM)

Recap

  • Pre-training to learn general language knowledge

 

  • Fine-tuning for specific tasks

 

  • RLHF techniques to enhance fine-tuning through human feedback

 

  • Combination is highly effective!
Concetti sui Large Language Models (LLM)

Completing the LLM

The complete LLM training process

Concetti sui Large Language Models (LLM)

Let's practice!

Concetti sui Large Language Models (LLM)

Preparing Video For Download...