Monitoring and observability

LLMOps Concepts

Max Knobbout, PhD

Applied Scientist, Uber

LLM lifecycle: Monitoring and observability

Overview of the LLM application lifecycle phases

LLMOps Concepts

Monitoring and observability

 

Playful image of a cartoon observing some items

 

  • Monitoring continuously watches a system.
  • Observability reveals internal states to external observers.
  • Data sources for observability:
    1. Logs
    2. Metrics
    3. Traces
LLMOps Concepts

Input monitoring

  • Monitor inputs for:
    • Changes
    • Errors
    • Malicious content
  • Data drift is the change in input data distribution over time
  • Addressing data drift requires:
    • Monitoring the data distribution
    • Periodically updating the model

Input monitoring

LLMOps Concepts

Functional monitoring

  • Examples:

    • Response time
    • Request volume
    • Downtime
    • Error rates
  • For LLM applications:

    • Chain and agent execution
    • System resources (GPU)
    • Costs

Functional monitoring

LLMOps Concepts

Output monitoring

  • Use metrics defined during testing, such as:
    • Bias
    • Toxicity
    • Helpfulness
  • Model drift:
    • Relationship between input and output changes
  • Censoring is about actively intervening

Output monitoring

LLMOps Concepts

Alert handling

 

 

Playful image of a cartoon fixing alerts

 

 

  • Be notified when issues arise
  • Have clear procedures
  • Service-Level Agreements (SLAs) might be in place
LLMOps Concepts

Let's practice!

LLMOps Concepts

Preparing Video For Download...