Failing Gracefully: Mitigating Risks
Building Scalable Agentic Systems
Korey Stegared-Pace
Senior AI Cloud Advocate, Microsoft
Tool call failures
Solutions
Defining clear parameters and usage
Tool output validation (unit testing!)
Verification checks on tool selection
MCP!
Problems
Tool call failures - retry mechanism
External services may
break
or be temporarily
slow
Retry
Exponential backoff
→
slowly decreasing
retries
Inform users of issues using
callbacks
Tool call failures - caching
Tool outputs are cached as a
fallback
Works for
: Static data with infrequent updates
Won't work for
: tools that serve real-time information
Tool call failures - queue management
Tool calls may be
reliant
on each another
Example
: flights must be booked before the taxi
Unsuccessful tool calls can be moved down the queue
Authentication
Tools may require
access
and
permissions
to
private
data
Example
: IT support agent
Valid access: system logs
Blocked access: passwords, location, etc.
Authentication - unique agent identifiers
Authentication - isolated environments
Authentication - guardrails and action restraints
Let's practice!
Building Scalable Agentic Systems
Preparing Video For Download...