Cost management

LLMOps Concepts

Max Knobbout, PhD

Applied Scientist, Uber

LLM lifecycle: Cost management

Overview of the LLM application lifecycle phases

LLMOps Concepts

Cost management

Playful image of a cartoon with cost envelopes

 

  • Focus is on model costs
  • Cost can escalate based based on hosting and/or usage
    • For self-hosted models, costs arise from hosting
    • For externally hosted models, costs come from usage
LLMOps Concepts

Breaking down LLM costs

Self-hosted (open source)

  • Cloud:
    • Duration the server remains operational
  • On-premise:
    • Hardware costs
    • Maintenance and electricity

Externally hosted (proprietary)

  • Proprietary:
    • The number of calls
    • The number of tokens per call
LLMOps Concepts

Strategy 1: Choose the right model

 

Playful image of a cartoon choosing a star

 

 

  • Use most cost-effective model that still accomplishes the task
  • Use multiple smaller task-specific models
  • For self-hosting, consider model-size reduction techniques
LLMOps Concepts

Strategy 2: Optimize prompts

 

Playful image of a cartoon drawing an idea

 

 

  • Use automatic prompt compression
  • Content reduction:
    • Optimize "chat memory" management
    • Optimize RAG to return fewer results
LLMOps Concepts

Strategy 3: Optimize the number of calls

 

 

Playful image of a cartoon counting envelopes on the ground

 

  • Use batching
  • Use response caching (if applicable)
  • Optimize (and limit) agent calls
  • Set quota and rate limits
  • Consider tasks which don't require LLMs
LLMOps Concepts

Cost metrics and prognosis

 

  • Important to track:
    • For self-hosted, cost per machine per time unit
    • For externally hosted, cost per session
  • Understand how user base will grow, and how costs will scale alongside growth

Playful image of a cartoon pointing at a bar chart

LLMOps Concepts

Let's practice!

LLMOps Concepts

Preparing Video For Download...