Cost management
LLMOps Concepts
Max Knobbout, PhD
Applied Scientist, Uber
LLM lifecycle: Cost management
Cost management
- Focus is on model costs
- Cost can escalate based based on hosting and/or usage
- For self-hosted models, costs arise from hosting
- For externally hosted models, costs come from usage
Breaking down LLM costs
Self-hosted (open source)
- Cloud:
- Duration the server remains operational
- On-premise:
- Hardware costs
- Maintenance and electricity
Externally hosted (proprietary)
- Proprietary:
- The number of calls
- The number of tokens per call
Strategy 1: Choose the right model
- Use most cost-effective model that still accomplishes the task
- Use multiple smaller task-specific models
- For self-hosting, consider model-size reduction techniques
Strategy 2: Optimize prompts
- Use automatic prompt compression
- Content reduction:
- Optimize "chat memory" management
- Optimize RAG to return fewer results
Strategy 3: Optimize the number of calls
- Use batching
- Use response caching (if applicable)
- Optimize (and limit) agent calls
- Set quota and rate limits
- Consider tasks which don't require LLMs
Cost metrics and prognosis
- Important to track:
- For self-hosted, cost per machine per time unit
- For externally hosted, cost per session
- Understand how user base will grow, and how costs will scale alongside growth
Let's practice!
LLMOps Concepts
Preparing Video For Download...