🎯 CCA-F Study Day 21/20: Cross-Domain Scenario Blitz

Review Cycle Day 1 — The 6 Exam Scenarios & Top 10 Anti-Patterns

📌 Today's Focus

Congratulations — you've completed the 20-day core curriculum! Now we shift into review mode. The CCA-F exam gives you 4 of 6 scenarios randomly, each testing concepts across multiple domains. Today we attack all 6 scenarios head-on, focusing on which domains each scenario tests and the cross-domain traps the exam sets.

Why this matters: The exam doesn't test domains in isolation. A "Customer Support" question might test your knowledge of hooks (Domain 1), structured errors (Domain 2), AND escalation patterns (Domain 5) in a single scenario. You need to recognize which pattern applies regardless of the scenario wrapper.

🧠 Core Concepts: The 6 Exam Scenarios

Scenario 1: Customer Support Resolution Agent

Domains tested: D1 (Agentic Architecture), D2 (Tool Design), D5 (Reliability)

This scenario tests your ability to design an agent that handles customer queries with appropriate escalation, compliance enforcement, and error handling.

Key concepts to nail:

Escalation triggers: Task complexity, policy gaps, multi-system failures — neversentiment
Hooks for compliance: Use PreToolUse hooks to block refunds over $X, log PII access, enforce business rules deterministically
Structured error responses: When a tool fails (e.g., CRM timeout), return isRetryable: true + errorCategory: "timeout" so the agent can decide whether to retry or escalate
Tool count: 4-5 tools max (lookup_order, issue_refund, escalate_to_human, check_inventory, update_ticket)

Sample exam question pattern: "The customer support agent needs to enforce a policy that refunds over $500 require manager approval. What is the BEST approach?"

❌ Add instructions to the system prompt saying "never approve refunds over $500 without asking"
❌ Set a confidence threshold and escalate when the agent is less than 80% confident
❌ Add a validation step in the prompt that asks Claude to check the amount first
✅ Implement a PreToolUse hook on the refund tool that blocks execution when amount > $500 and routes to manager queue

Scenario 2: Code Generation with Claude Code

Domains tested: D3 (Claude Code), D4 (Prompt Engineering)

Tests your knowledge of CLAUDE.md configuration, plan mode, slash commands, and TDD workflows.

Key concepts to nail:

CLAUDE.md hierarchy: Enterprise > User > Project root > Project local > Parent dirs > Child dirs
Plan mode phases: Analyze → Plan → Execute → Verify
When to use plan mode vs direct: Plan mode for complex multi-file changes; direct for simple edits
TDD pattern: Write test → Run (fail) → Implement → Run (pass) → Refactor

Scenario 3: Multi-Agent Research System

Domains tested: D1 (Orchestration), D5 (Context Management)

Tests hub-and-spoke coordination, context isolation, error propagation, and information provenance.

Key concepts to nail:

Hub-and-spoke pattern: Coordinator delegates to specialist researchers, each in isolated context
Context isolation: Subagents only see what coordinator passes them — prevents context contamination
Information provenance: Track source → extraction method → confidence for every fact
Error propagation: If one researcher fails, coordinator handles gracefully (circuit breaker pattern)
Separate sessions for verification: Generator and verifier must NOT share session context

Scenario 4: Developer Productivity with Claude

Domains tested: D2 (Tools/MCP), D3 (Claude Code)

Tests built-in tool selection, MCP integration, and codebase exploration strategies.

Key concepts to nail:

Built-in tool selection: Glob (find files) → Grep (search content) → Read (examine file) → Edit/Write (modify)
MCP servers: Add external tools via claude mcp add; tool naming: mcp__<server>__<action>
ToolSearch for scale: When you have many MCP servers, use dynamic tool discovery instead of preloading all
Parallel execution: Read-only tools (Read, Glob, Grep) can run concurrently; Write/Bash run sequentially

Scenario 5: Claude Code for CI/CD

Domains tested: D3 (Claude Code), D1 (Orchestration)

Tests non-interactive mode, batch processing, multi-pass review, and session isolation.

Key concepts to nail:

-p flag: Non-interactive mode for pipelines — no approval prompts
--output-format json: Structured output for pipeline parsing
Session isolation: Generator in session A, reviewer in session B (avoids reasoning context bias)
Multi-pass code review: Pass 1 (correctness), Pass 2 (security), Pass 3 (style) — each in separate session
Batch API: For processing many files/requests efficiently with claude batch submit

Scenario 6: Structured Data Extraction

Domains tested: D4 (Prompt Engineering), D5 (Reliability)

Tests JSON schemas, tool_use for structured output, validation-retry loops, and few-shot prompting.

Key concepts to nail:

Forced tool_use: tool_choice: {"type": "tool", "name": "extract_data"} for guaranteed structured output
Validation-retry loop: Extract → Validate → If invalid, feed errors back → Retry (max 3)
Few-shot examples: Use XML-tagged examples to show desired extraction format
Per-field confidence: Track confidence per extracted field, flag low-confidence for human review
Multi-pass extraction: Pass 1 extracts, Pass 2 validates against source, Pass 3 assigns confidence

🚫 Anti-Patterns & Exam Traps: The Complete Top 10

The exam LOVES presenting anti-patterns as plausible-sounding answer choices. Memorize these cold:

#	❌ Anti-Pattern (WRONG)	✅ Correct Approach	Why It's Wrong
1	Parsing natural language for loop termination	Check stop_reasonprogrammatically	NL is unreliable; "I'm done" could be quoting a user
2	Arbitrary iteration caps as primary stop	Let agentic loop terminate via stop_reason	Caps are safety nets, not control flow
3	Prompt-based enforcement for critical rules	Programmatic hooks (deterministic)	Prompts are advisory; hooks are guaranteed
4	Self-reported confidence for escalation	Structured criteria + programmatic checks	Models are poorly calibrated on their own confidence
5	Sentiment-based escalation	Task complexity / policy gap triggers	Angry customers may have simple requests; calm ones may need escalation
6	Generic error messages ("failed")	Structured errors with category + retryability	Agent can't reason about recovery without context
7	Silently suppressing errors (empty = success)	Explicit failure distinction	"No results found" ≠ "access denied" — different recovery paths
8	Too many tools (18+) per agent	4-5 tools per agent, distribute across subagents	Selection accuracy degrades with tool count
9	Same-session self-review	Separate sessions for generator/verifier	Reasoning context bias — reviewer sees generator's thought process
10	Aggregate accuracy metrics only	Per-document-type tracking	90% overall might hide 40% accuracy on one doc type

💻 Code Examples: Cross-Domain Integration

Complete Customer Support Agent (D1 + D2 + D5)

import anthropic
import json

client = anthropic.Anthropic()

# Domain 2: Well-designed tools (4 tools, clear descriptions)
tools = [
    {
        "name": "lookup_order",
        "description": "Look up a customer order by order ID. Returns order status, items, "
                       "shipping info. Use when customer asks about an existing order.",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string", "description": "Order ID (format: ORD-XXXXX)"}
            },
            "required": ["order_id"]
        }
    },
    {
        "name": "issue_refund",
        "description": "Process a refund for a given order. Returns refund confirmation. "
                       "Use only after confirming with customer.",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string"},
                "amount": {"type": "number", "description": "Refund amount in USD"},
                "reason": {"type": "string", "enum": ["defective", "not_received", "wrong_item", "changed_mind"]}
            },
            "required": ["order_id", "amount", "reason"]
        }
    },
    {
        "name": "escalate_to_human",
        "description": "Escalate the conversation to a human agent. Use when: policy gap detected, "
                       "multi-system failure, or task exceeds complexity threshold.",
        "input_schema": {
            "type": "object",
            "properties": {
                "reason": {"type": "string", "enum": ["policy_gap", "system_failure", "complexity", "customer_request"]},
                "context": {"type": "string", "description": "Summary for the human agent"}
            },
            "required": ["reason", "context"]
        }
    },
    {
        "name": "check_inventory",
        "description": "Check product availability. Returns stock count and restock date if unavailable.",
        "input_schema": {
            "type": "object",
            "properties": {
                "product_id": {"type": "string"}
            },
            "required": ["product_id"]
        }
    }
]

# Domain 1: Hooks for deterministic enforcement
def pre_tool_use_hook(tool_name, tool_input):
    """Deterministic policy enforcement — NOT prompt-based!"""
    if tool_name == "issue_refund" and tool_input.get("amount", 0) > 500:
        return {"blocked": True, "reason": "Refunds > $500 require manager approval",
                "action": "escalate_to_human"}
    return {"blocked": False}

# Domain 2: Structured error handling
def execute_tool(tool_name, tool_input):
    # Check hook first (Domain 1: deterministic enforcement)
    hook_result = pre_tool_use_hook(tool_name, tool_input)
    if hook_result["blocked"]:
        return {
            "is_error": True,
            "errorCategory": "policy_violation",
            "isRetryable": False,
            "context": hook_result["reason"],
            "suggestion": f"Auto-escalating: {hook_result['action']}"
        }
    
    try:
        result = call_actual_tool(tool_name, tool_input)
        return result
    except TimeoutError:
        return {
            "is_error": True,
            "errorCategory": "timeout",
            "isRetryable": True,
            "context": f"{tool_name} timed out after 30s",
            "suggestion": "Retry once, then escalate if still failing"
        }
    except PermissionError:
        return {
            "is_error": True,
            "errorCategory": "authentication",
            "isRetryable": False,
            "context": "Insufficient permissions for this operation",
            "suggestion": "Escalate to human agent with elevated access"
        }

# Domain 1: The agentic loop with proper termination
def run_support_agent(user_message):
    messages = [{"role": "user", "content": user_message}]
    consecutive_failures = 0  # Domain 5: Circuit breaker
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system="You are a customer support agent. Escalate based on task complexity, "
               "policy gaps, or system failures — never based on customer sentiment.",
        tools=tools,
        messages=messages
    )
    
    # ✅ Terminate on stop_reason, NOT natural language parsing
    while response.stop_reason == "tool_use":
        tool_block = next(b for b in response.content if b.type == "tool_use")
        result = execute_tool(tool_block.name, tool_block.input)
        
        # Domain 5: Circuit breaker pattern
        if isinstance(result, dict) and result.get("is_error"):
            consecutive_failures += 1
            if consecutive_failures >= 3:
                result = execute_tool("escalate_to_human", {
                    "reason": "system_failure",
                    "context": f"Circuit breaker triggered: {consecutive_failures} consecutive failures"
                })
        else:
            consecutive_failures = 0
        
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": [
            {"type": "tool_result", "tool_use_id": tool_block.id, "content": json.dumps(result)}
        ]})
        
        response = client.messages.create(
            model="claude-sonnet-4-20250514", max_tokens=1024,
            tools=tools, messages=messages
        )
    
    return response.content[0].text

Multi-Pass Extraction with Session Isolation (D4 + D5)

# Each pass runs in a SEPARATE session (separate API call = separate context)
def multi_pass_extraction(document_text):
    # Pass 1: Extract (fresh session)
    extraction = client.messages.create(
        model="claude-sonnet-4-20250514",
        tools=[extraction_tool],
        tool_choice={"type": "tool", "name": "extract_fields"},
        messages=[{"role": "user", "content": f"Extract all fields:\n{document_text}"}]
    )
    extracted = json.loads(extraction.content[0].input)
    
    # Pass 2: Validate (SEPARATE session - no access to Pass 1's reasoning)
    validation = client.messages.create(
        model="claude-sonnet-4-20250514",
        tools=[validation_tool],
        tool_choice={"type": "tool", "name": "validate_extraction"},
        messages=[{"role": "user", "content": 
            f"Source document:\n{document_text}\n\n"
            f"Extracted data:\n{json.dumps(extracted)}\n\n"
            "Validate each field against the source. Flag any mismatches."}]
    )
    
    # Pass 3: Confidence scoring (SEPARATE session)
    confidence = client.messages.create(
        model="claude-sonnet-4-20250514",
        tools=[confidence_tool],
        tool_choice={"type": "tool", "name": "score_confidence"},
        messages=[{"role": "user", "content":
            f"For each field, assign confidence 0.0-1.0:\n{json.dumps(extracted)}"}]
    )
    
    # Flag low-confidence fields for human review
    scores = json.loads(confidence.content[0].input)
    for field, score in scores["fields"].items():
        if score["confidence"] < 0.7:
            scores["fields"][field]["needs_human_review"] = True
    
    return scores

🎬 Video to Watch

How We Build Effective Agents — Barry Zhang (Anthropic) at AI Engineer Summit

Barry Zhang from Anthropic's Applied AI team walks through the three core principles: (1) don't build agents for everything, (2) keep designs simple and composable, (3) think like the agent. This talk directly maps to the CCA-F exam's emphasis on choosing the right architecture pattern and avoiding over-engineering. Most relevant sections: The discussion of when NOT to use agents, and the breakdown of composable patterns (first 15 minutes).

This follow-up talk introduces the "Skills" concept — portable procedural knowledge that agents load dynamically. Directly relevant to the Claude Code scenarios (Domain 3) on the exam, particularly slash commands and the distinction between CLAUDE.md(always loaded) vs. skills (loaded on demand).

📖 Reading

Primary: Building Effective Agents — Anthropic's canonical blog post. Re-read with fresh eyes now that you've studied all 5 domains.
Secondary: Demystifying Evals for AI Agents — How to evaluate agentic systems (directly relevant to exam scenarios on measuring agent performance).

🛠️ Hands-On Exercise (30 min)

Cross-Domain Scenario Walkthrough:

Pick ANY two of the 6 scenarios above
For each, write a complete architectural diagram showing:
- How many agents are involved and their roles
- Which tools each agent has (keep to 4-5!)
- Where hooks enforce business rules
- How errors propagate and are handled
- Where session isolation boundaries exist
For each diagram, identify which of the 10 anti-patterns a wrong design would hit
Write one exam-style question for each scenario where the wrong answers are anti-patterns

📝 Quick Quiz

Q1: In a CI/CD pipeline using Claude Code, you need to generate code and then review it for security vulnerabilities. What is the BEST architecture?

A) Use Claude Code with the -p flag and ask it to generate and then review its own code B) Run generation in one session, then pass the output to a separate review session with --output-format json C) Add "always review your code for security issues" to CLAUDE.md D) Use a single session with a high effort level to ensure thorough self-review

Q2: A multi-agent research system has a coordinator that delegates to 3 researcher subagents. One researcher's API source returns a 503 error on the 3rd consecutive call. What should happen?

A) The researcher should retry indefinitely until the API responds B) The researcher should report low confidence and let the coordinator decide C) The circuit breaker should open, the researcher should return a structured error to the coordinator, and the coordinator should synthesize results from the other 2 researchers D) The coordinator should escalate the entire research task to a human

Q3: You're building a structured data extraction pipeline for invoices. The extracted "total_amount" field has a confidence score of 0.45. What is the CORRECT next step?

A) Accept the extraction since the model produced a value B) Re-run the entire extraction with a higher effort level C) Flag this specific field for human review while accepting high-confidence fields D) Reject the entire document and ask for a clearer scan

Answers

Q1: B — Session isolation is critical. Same-session self-review (A, D) is anti-pattern #9. CLAUDE.md instructions (C) are advisory, not deterministic. Separate sessions with structured output is the correct CI/CD pattern.

Q2: C — The circuit breaker pattern (3 failures → open circuit) is the correct reliability pattern. The researcher returns a structured error; the coordinator is resilient and works with partial results. Retrying indefinitely (A) wastes resources. Reporting "low confidence" (B) is the self-reported confidence anti-pattern. Escalating everything (D) is too aggressive — the coordinator can still produce useful output.

Q3: C — Per-field confidence tracking with human review for low-confidence fields is the correct pattern. Accepting blindly (A) ignores reliability requirements. Re-running entirely (B) is wasteful. Rejecting the whole document (D) throws away good extractions from other fields.

🔮 Tomorrow's Preview

Day 22 will be a Timed Practice Exam Simulation — 15 scenario-based questions across all domains, with a focus on the trickiest cross-domain traps. We'll work under exam conditions and then do a detailed answer review.