Cost management

LLMOps Concepts

Max Knobbout, PhD

Applied Scientist, Uber

LLM lifecycle: Cost management

Overview of the LLM application lifecycle phases

Cost management

Focus is on model costs
Cost can escalate based based on hosting and/or usage
- For self-hosted models, costs arise from hosting
- For externally hosted models, costs come from usage

Breaking down LLM costs

Self-hosted (open source)

Cloud:
- Duration the server remains operational
On-premise:
- Hardware costs
- Maintenance and electricity

Externally hosted (proprietary)

Proprietary:
- The number of calls
- The number of tokens per call

Strategy 1: Choose the right model

Use most cost-effective model that still accomplishes the task
Use multiple smaller task-specific models
For self-hosting, consider model-size reduction techniques

Strategy 2: Optimize prompts

Playful image of a cartoon drawing an idea

Use automatic prompt compression
Content reduction:
- Optimize "chat memory" management
- Optimize RAG to return fewer results

Strategy 3: Optimize the number of calls

Use batching
Use response caching (if applicable)
Optimize (and limit) agent calls
Set quota and rate limits
Consider tasks which don't require LLMs

Cost metrics and prognosis

Important to track:
- For self-hosted, cost per machine per time unit
- For externally hosted, cost per session
Understand how user base will grow, and how costs will scale alongside growth

Let's practice!

LLMOps Concepts

Preparing Video For Download...