Testing

LLMOps Concepts

Max Knobbout, PhD

Applied Scientist, Uber

LLM lifecyle: Testing

Overview of the LLM application lifecycle phases

Traditional supervised machine learning:

Picture of train and test set for traditional ML

LLM applications:

Picture of train and test set for LLM applications

If there is a correct answer...

Flowchart pointing to "Use ML metrics"

If there is a reference answer...

Flowchart pointing to "Use text comparison metrics"

If we have access to human feedback...

... let humans rate the text. Examples:
- Rate quality
- Rate relevance
- Rate coherence
... use model-based approach. Example:
- Predict rating based on past feedback
- Ask LLM judge if feedback was incorporated

Flowchart pointing to "Use feedback score metrics"

If there's no human feedback...

Flowchart pointing to "Use unsupervised metrics"

Development cycle where we added the activity of fine-tuning

Development cycle where we added the activity of testing

Development cycle where we added the activity of testing

Development cycle where we added the activity of deploying

LLMOps Concepts