AI

CCA-F Study Day 19/20: Escalation Patterns & Error Propagation

Domain 5: Context Management & Reliability (~15% of exam)


📌 Today's Focus

Yesterday you mastered progressive summarization and context management — how to deal with finite context windows and the five risks of compaction. Today we tackle the reliability side of Domain 5: how agents should escalate to humans, how errors cascade through multi-agent systems, and the circuit breaker pattern that stops everything from falling over.

This is exam-critical territory. The exam loves testing escalation anti-patterns — specifically, it will present sentiment-based escalation and self-reported confidence as plausible answers. You need to instantly recognize these as wrong.


📚 Core Concepts

1. Escalation Triggers — What SHOULD Cause Handoff to Humans

Escalation means the agent stops trying and hands off to a human. The exam tests whether you know the correct triggers vs. the tempting-but-wrong ones.

✅ Correct escalation triggers:

  • Policy gap — The agent encounters a question not covered by its knowledge base or decision tree
  • Financial threshold — Action exceeds authorized limits (e.g., refund > $500)
  • Repeated tool failure — A tool fails 3+ consecutive times (circuit breaker territory)
  • Explicit user request — User asks to speak with a human
  • Compliance requirement — Action requires human sign-off by policy (e.g., PII deletion)
  • Task complexity exceeding capability — Multi-system coordination beyond agent's tool set

❌ WRONG escalation triggers (exam traps!):

  • Sentiment analysis — "Customer sounds angry" is NOT a valid escalation reason
  • Self-reported confidence — Claude saying "I'm not sure" is NOT programmatic
  • Arbitrary turn count — "Conversation went over 10 turns" is not a valid trigger

Why sentiment fails: An angry customer asking a simple question (e.g., "WHERE IS MY ORDER?!") doesn't need escalation — the agent can answer it. A calm customer asking to change their SSN DOES need escalation. Escalation should be about task complexity and policy, not emotional tone.

2. Programmatic Escalation Implementation

The exam expects you to know what correct escalation looks like architecturally:


# ✅ Correct: Structured, programmatic escalation criteria
escalation_triggers = {
    "policy_gap": lambda ctx: ctx.question_not_in_knowledge_base,
    "financial_threshold": lambda ctx: ctx.refund_amount > 500,
    "repeated_failure": lambda ctx: ctx.tool_retry_count >= 3,
    "explicit_request": lambda ctx: ctx.user_requested_human,
    "compliance_required": lambda ctx: ctx.action_requires_approval,
    "multi_system_failure": lambda ctx: ctx.failed_systems_count >= 2,
}

def check_escalation(context):
    for trigger_name, check in escalation_triggers.items():
        if check(context):
            return escalate_to_human(
                reason=trigger_name,
                context=context.summary,
                conversation_id=context.session_id
            )
    return None  # No escalation needed

Key insight: Every escalation trigger is a programmatic check — a lambda, a threshold comparison, a boolean flag. None of them ask Claude "are you confident?" or parse emotional tone.

3. Error Propagation in Multi-Agent Systems

In multi-agent architectures (hub-and-spoke, fan-out/fan-in), errors don't just affect the agent that encounters them — they cascade:

  • Subagent tool error → stale context in coordinator → wrong assumptions → more errors
  • Network timeout → partial data returned → downstream agent makes decisions on incomplete info
  • Rate limit hit → retries consume tokens → budget exhausted before task completes

Cascading failure anatomy:

  1. Subagent A calls a database tool → timeout
  2. Subagent A retries 3x → all timeout
  3. Coordinator receives error from A → tries to proceed with partial data
  4. Subagent B receives partial context from coordinator → makes wrong inference
  5. Final synthesis has compounding errors from both A and B

The correct response? Fail fast and escalate, don't paper over errors.

4. The Circuit Breaker Pattern

Borrowed from electrical engineering and microservices architecture. The circuit breaker prevents cascading failures by stopping retries once a failure threshold is reached:


class CircuitBreaker:
    """Prevents cascading failures in agentic systems."""
    
    def __init__(self, max_failures=3, reset_timeout=60):
        self.failures = 0
        self.max_failures = max_failures
        self.reset_timeout = reset_timeout
        self.state = "closed"       # closed = normal operation
        self.last_failure_time = None
    
    def call(self, func, *args):
        # OPEN state: immediately fail without trying
        if self.state == "open":
            if self._should_attempt_reset():
                self.state = "half-open"
            else:
                return {
                    "is_error": True,
                    "errorCategory": "circuit_open",
                    "isRetryable": False,
                    "context": "Service unavailable — circuit breaker is open",
                    "suggestion": "Escalate to human or try alternative path"
                }
        
        # CLOSED or HALF-OPEN: attempt the call
        try:
            result = func(*args)
            # Success: reset the breaker
            self.failures = 0
            self.state = "closed"
            return result
        except Exception as e:
            self.failures += 1
            self.last_failure_time = time.time()
            if self.failures >= self.max_failures:
                self.state = "open"
            raise
    
    def _should_attempt_reset(self):
        """After timeout, try one request (half-open state)."""
        elapsed = time.time() - self.last_failure_time
        return elapsed >= self.reset_timeout

Three states of a circuit breaker:

State Behavior Transitions To
Closed (normal) All calls pass through Open (after N failures)
Open (blocking) All calls fail immediately — no retries Half-Open (after timeout)
Half-Open(testing) One test call allowed through Closed (if success) or Open (if fail)

5. Retry Budgets

Beyond circuit breakers, production agentic systems need retry budgets at multiple levels:

  • Per-tool: Max 3 retries per individual tool call
  • Per-turn: Max total retries across all tools in one turn
  • Per-session: Global retry budget for the entire agent run

Use exponential backoff for transient failures (rate limits, timeouts) but immediate failure for non-retryable errors (auth failure, not found, validation error).


🚫 Anti-Patterns & Exam Traps

❌ Anti-Pattern (Wrong Answer) ✅ Correct Approach Why It's Wrong
Sentiment-based escalation Escalate on task complexity, policy gaps, thresholds Angry customers with simple questions don't need humans; calm customers with complex needs DO
Self-reported confidence scores Programmatic checks with structured criteria Claude's self-assessed confidence is unreliable and non-deterministic — it varies run to run
No error propagation handling Circuit breakers, retry budgets, graceful degradation Without containment, one tool failure brings down the entire multi-agent system
Unlimited retries ("just keep trying") Bounded retries with exponential backoff + circuit breaker Infinite retries waste tokens/budget and delay human intervention
Silently swallowing errors from subagents Propagate structured errors to coordinator for decision Coordinator needs to know about failures to adjust strategy or escalate

🎯 Exam Tip: When you see an answer choice that says "escalate when the customer seems frustrated" or "escalate when Claude reports low confidence" — that's the wrong answer. ALWAYS choose the option with programmatic checks, thresholds, and structured criteria.


💻 Code Examples

Complete Escalation System for Customer Support Agent


import time
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class AgentContext:
    """Tracks state for escalation decisions."""
    session_id: str
    tool_retry_count: int = 0
    failed_systems: list = field(default_factory=list)
    refund_amount: float = 0.0
    user_requested_human: bool = False
    action_requires_approval: bool = False
    question_not_in_knowledge_base: bool = False
    
    @property
    def failed_systems_count(self):
        return len(self.failed_systems)


class EscalationManager:
    """Deterministic escalation — no sentiment, no self-confidence."""
    
    def __init__(self):
        self.triggers = {
            "policy_gap": lambda ctx: ctx.question_not_in_knowledge_base,
            "financial_threshold": lambda ctx: ctx.refund_amount > 500,
            "repeated_failure": lambda ctx: ctx.tool_retry_count >= 3,
            "explicit_request": lambda ctx: ctx.user_requested_human,
            "compliance_required": lambda ctx: ctx.action_requires_approval,
            "multi_system_failure": lambda ctx: ctx.failed_systems_count >= 2,
        }
    
    def should_escalate(self, context: AgentContext) -> Optional[dict]:
        for trigger_name, check in self.triggers.items():
            if check(context):
                return {
                    "escalate": True,
                    "reason": trigger_name,
                    "session_id": context.session_id,
                    "handoff_context": self._build_handoff(context)
                }
        return None
    
    def _build_handoff(self, context: AgentContext) -> dict:
        """Package context for human agent."""
        return {
            "conversation_summary": "...",  # Progressive summary
            "failed_tools": context.failed_systems,
            "customer_action_requested": f"Refund ${context.refund_amount}" 
                if context.refund_amount > 0 else "Unknown",
            "attempts_made": context.tool_retry_count,
        }


# Integration with the agentic loop
circuit_breaker = CircuitBreaker(max_failures=3)
escalation_mgr = EscalationManager()
context = AgentContext(session_id="sess_abc123")

def execute_tool_with_reliability(tool_name, tool_input):
    """Execute tool with circuit breaker + escalation awareness."""
    try:
        result = circuit_breaker.call(run_tool, tool_name, tool_input)
        context.tool_retry_count = 0  # Reset on success
        return result
    except CircuitOpenError:
        context.failed_systems.append(tool_name)
        escalation = escalation_mgr.should_escalate(context)
        if escalation:
            return escalation  # Hand off to human
        return {"is_error": True, "errorCategory": "service_unavailable",
                "isRetryable": False}
    except Exception as e:
        context.tool_retry_count += 1
        escalation = escalation_mgr.should_escalate(context)
        if escalation:
            return escalation
        raise

📖 Reading

  • Primary: Trustworthy Agents in Practice — Anthropic's April 2026 research post on building agents that are safe, reliable, and trustworthy. Covers deterministic safety patterns, permission frameworks, and reliability guarantees.
  • Secondary: How We Built Our Multi-Agent Research System — Anthropic engineering deep dive on agent coordination, error handling, and reliability in production multi-agent systems.
  • Bonus: Building Effective Agents — Revisit the "error handling" and "when to build agents" sections with today's escalation lens.

🛠️ Hands-On Exercise (20 minutes)

Build a complete reliability layer for a customer support agent:

  1. Implement the CircuitBreaker class above (copy it, make sure it runs)
  2. Write an EscalationManager with at least 5 triggers — all programmatic, none sentiment-based
  3. Create a mock scenario: A tool (lookup_order) fails 3 times. Your code should: 
    • Trip the circuit breaker
    • Trigger the "repeated_failure" escalation
    • Produce a handoff payload with conversation context
  4. Add a second scenario: Refund request for $750. Your code should trigger "financial_threshold" immediately (no tool failure needed)

Success criteria: Running your code produces structured escalation responses for both scenarios — no sentiment analysis, no "Claude, are you confident?"


📝 Quick Quiz

Q1. A customer support agent receives the message: "I AM SO FRUSTRATED! I've been waiting 20 minutes for a response!" The correct escalation decision is:

A) Escalate immediately — customer sentiment indicates high frustration B) Ask Claude to self-assess confidence level before deciding C) Check programmatic criteria: is the question within policy? Has a tool failed? Does the action exceed thresholds? D) Escalate after 3 angry messages from the same customer

Q2. In a hub-and-spoke multi-agent system, Subagent B's database tool fails 4 times consecutively. The circuit breaker opens. What should happen next?

A) Retry with exponential backoff until the circuit resets B) The coordinator should immediately halt all subagents and escalate C) Return a structured error to the coordinator with isRetryable: false, allowing it to decide whether to proceed with partial data or escalate D) Silently skip the database step and continue with available data

Q3. Which combination represents valid escalation triggers for a production agent?

A) Sentiment score > 0.8 angry, self-reported confidence < 0.5, conversation > 15 turns B) Policy gap detected, refund > $500 threshold, tool retry count ≥ 3, user explicitly requests human C) Claude responds "I'm not sure", customer uses caps lock, agent has been running > 5 minutes D) Any tool returns an error, conversation involves money, customer mentions "manager"


Answers: Q1: C — Always use programmatic criteria. Sentiment (A, D) and self-reported confidence (B) are anti-patterns. Q2: C — The circuit breaker returns a structured error to the coordinator, which makes the decision. Don't silently skip (D), don't retry an open circuit (A), and halting everything (B) is too aggressive — the coordinator should decide. Q3: B — All triggers are programmatic, threshold-based, and deterministic. A and C use sentiment/self-confidence. D is too broad (not every error warrants escalation).


👀 Tomorrow's Preview

Day 20 is your capstone review — Human-in-the-Loop patterns, information provenance tracking, and a comprehensive review of all 10 anti-patterns across all 5 domains. We'll also do cross-domain scenario practice with full exam-style questions. You're almost at the finish line! 🏁