Failing Gracefully: Mitigating Risks

Building Scalable Agentic Systems

Korey Stegared-Pace

Senior AI Cloud Advocate, Microsoft

Tool call failures

Solutions

 

  • Defining clear parameters and usage

 

  • Tool output validation (unit testing!)

 

  • Verification checks on tool selection

 

  • MCP!
Problems

tool_call_failures.png

Building Scalable Agentic Systems

Tool call failures - retry mechanism

 

  • External services may break or be temporarily slow

 

  • Retry

    • Exponential backoffslowly decreasing retries
    • Inform users of issues using callbacks

Building Scalable Agentic Systems

Tool call failures - caching

 

  • Tool outputs are cached as a fallback

 

  • Works for: Static data with infrequent updates

 

  • Won't work for: tools that serve real-time information

caching.png

Building Scalable Agentic Systems

Tool call failures - queue management

 

  • Tool calls may be reliant on each another

 

  • Example: flights must be booked before the taxi

 

  • Unsuccessful tool calls can be moved down the queue

 

queue_management.png

Building Scalable Agentic Systems

Authentication

authentication_it.png

 

  • Tools may require access and permissions to private data

 

  • Example: IT support agent
    • Valid access: system logs
    • Blocked access: passwords, location, etc.
Building Scalable Agentic Systems

Authentication - unique agent identifiers

agent_ids0.jpg

Building Scalable Agentic Systems

Authentication - isolated environments

agent_ids.jpg

Building Scalable Agentic Systems

Authentication - guardrails and action restraints

restrictions.jpg

Building Scalable Agentic Systems

Let's practice!

Building Scalable Agentic Systems

Preparing Video For Download...