RAG versus fine-tuning

LLMOps Concepts

Max Knobbout, PhD

Applied Scientist, Uber

LLM lifecyle: RAG versus fine-tuning

Overview of the LLM application lifecycle phases

Playful cartoon using a banana to power a tablet

Combine LLMs' reasoning capabilities with external knowledge.
Three steps in a chain:
1. Retrieve related documents
2. Augment prompt with examples
3. Generate output
Often implemented using vector databases.

Retrieve:
- Convert input to embedding
- Search vector database
- Retrieve most similar documents

Retrieve chain

Retrieve:
- Generate embedding from input
- Search vector database
- Retrieve most similar documents
Augment:
- Combines input with documents to create final prompt

Augment chain

Retrieve:
- Generate embedding from input
- Search vector database
- Retrieve most similar documents
Augment:
- Combine input with top-k documents and create augmented prompt
Generate:
- Uses prompt to create an output

Many implementation choices and embedding models. Experiment and test!

Generate chain

Adjusts the LLM's weights
Expand to specific tasks and domains:
- Different languages
- Specialized fields

Supervised fine-tuning (transfer learning)

Type of data needed 📂:

Reinforcement Learning from Human Feedback (RLHF)

Type of data needed 📂:

Approach 🔍:

Approach 🔍:

Playful cartoon using a banana to power a tablet

Use when specializing in new domain
✅ Full control and no extra components
❌ Needs labeled data & specialized knowledge, bias amplification, catastrophic forgetting

Development cycle where we added the activity of chain and agent development

Development cycle where we added the activity of RAG

Development cycle where we added the activity of fine-tuning

LLMOps Concepts