RAG versus fine-tuning

LLMOps Concepts

Max Knobbout, PhD

Applied Scientist, Uber

LLM lifecyle: RAG versus fine-tuning

Overview of the LLM application lifecycle phases

LLMOps Concepts

Retrieval Augmented Generation (RAG)

Playful cartoon using a banana to power a tablet

 

  • Combine LLMs' reasoning capabilities with external knowledge.
  • Three steps in a chain:

    1. Retrieve related documents
    2. Augment prompt with examples
    3. Generate output
  • Often implemented using vector databases.

LLMOps Concepts

RAG-chain with vector database

  1. Retrieve:
    • Convert input to embedding
    • Search vector database
    • Retrieve most similar documents

Retrieve chain

LLMOps Concepts

RAG-chain with vector database

  1. Retrieve:
    • Generate embedding from input
    • Search vector database
    • Retrieve most similar documents
  2. Augment:
    • Combines input with documents to create final prompt

Augment chain

LLMOps Concepts

RAG-chain with vector database

  1. Retrieve:
    • Generate embedding from input
    • Search vector database
    • Retrieve most similar documents
  2. Augment:
    • Combine input with top-k documents and create augmented prompt
  3. Generate:
    • Uses prompt to create an output

Many implementation choices and embedding models. Experiment and test!

Generate chain

LLMOps Concepts

Fine-tuning

 

Playful image of a cartoon getting a fine-tuned suit

 

 

  • Adjusts the LLM's weights
  • Expand to specific tasks and domains:
    • Different languages
    • Specialized fields
LLMOps Concepts

Fine-tuning

Supervised fine-tuning                             (transfer learning)

Type of data needed 📂:

  • Demonstration data (inputs with desired outputs)

Reinforcement Learning from Human Feedback (RLHF)

Type of data needed 📂:

  • Rankings or quality scores (obtained from likes & dislikes)

Approach 🔍:

  • Re-train (parts of) the model

Approach 🔍:

  • Train an extra reward model
  • Optimize original LLM to maximize this
LLMOps Concepts

RAG or fine-tuning

RAG
  • Use when including factual knowledge
  • ✅ Keeps capabilities of LLM, easy to implement, always up-to-date
  • ❌ Adds extra components, requires careful engineering

Playful cartoon using a banana to power a tablet

Playful image of a cartoon getting a fine-tuned suit

Fine-tuning
  • Use when specializing in new domain
  • ✅ Full control and no extra components
  • ❌ Needs labeled data & specialized knowledge, bias amplification, catastrophic forgetting
LLMOps Concepts

The development cycle

Development cycle where we added the activity of chain and agent development

LLMOps Concepts

The development cycle

Development cycle where we added the activity of RAG

LLMOps Concepts

The development cycle

Development cycle where we added the activity of fine-tuning

LLMOps Concepts

Let's practice!

LLMOps Concepts

Preparing Video For Download...