CCA-F Study Day 22/20 BONUS: Cross-Domain Scenario Blitz — The 6 Exam Scenarios
🎯 CCA-F Study Day 21/20: Cross-Domain Scenario Blitz
Review Cycle Day 1 — The 6 Exam Scenarios & Top 10 Anti-Patterns
📌 Today's Focus
Congratulations — you've completed the 20-day core curriculum! Now we shift into review mode. The CCA-F exam gives you 4 of 6 scenarios randomly, each testing concepts across multiple domains. Today we attack all 6 scenarios head-on, focusing on which domains each scenario tests and the cross-domain traps the exam sets.
Why this matters: The exam doesn't test domains in isolation. A "Customer Support" question might test your knowledge of hooks (Domain 1), structured errors (Domain 2), AND escalation patterns (Domain 5) in a single scenario. You need to recognize which pattern applies regardless of the scenario wrapper.
🧠 Core Concepts: The 6 Exam Scenarios
Scenario 1: Customer Support Resolution Agent
Domains tested: D1 (Agentic Architecture), D2 (Tool Design), D5 (Reliability)
This scenario tests your ability to design an agent that handles customer queries with appropriate escalation, compliance enforcement, and error handling.
Key concepts to nail:
- Escalation triggers: Task complexity, policy gaps, multi-system failures — neversentiment
- Hooks for compliance: Use
PreToolUsehooks to block refunds over $X, log PII access, enforce business rules deterministically - Structured error responses: When a tool fails (e.g., CRM timeout), return
isRetryable: true+errorCategory: "timeout"so the agent can decide whether to retry or escalate - Tool count: 4-5 tools max (lookup_order, issue_refund, escalate_to_human, check_inventory, update_ticket)
Sample exam question pattern: "The customer support agent needs to enforce a policy that refunds over $500 require manager approval. What is the BEST approach?"
- ❌ Add instructions to the system prompt saying "never approve refunds over $500 without asking"
- ❌ Set a confidence threshold and escalate when the agent is less than 80% confident
- ❌ Add a validation step in the prompt that asks Claude to check the amount first
- ✅ Implement a
PreToolUsehook on the refund tool that blocks execution when amount > $500 and routes to manager queue
Scenario 2: Code Generation with Claude Code
Domains tested: D3 (Claude Code), D4 (Prompt Engineering)
Tests your knowledge of CLAUDE.md configuration, plan mode, slash commands, and TDD workflows.
Key concepts to nail:
- CLAUDE.md hierarchy: Enterprise > User > Project root > Project local > Parent dirs > Child dirs
- Plan mode phases: Analyze → Plan → Execute → Verify
- When to use plan mode vs direct: Plan mode for complex multi-file changes; direct for simple edits
- TDD pattern: Write test → Run (fail) → Implement → Run (pass) → Refactor
Scenario 3: Multi-Agent Research System
Domains tested: D1 (Orchestration), D5 (Context Management)
Tests hub-and-spoke coordination, context isolation, error propagation, and information provenance.
Key concepts to nail:
- Hub-and-spoke pattern: Coordinator delegates to specialist researchers, each in isolated context
- Context isolation: Subagents only see what coordinator passes them — prevents context contamination
- Information provenance: Track source → extraction method → confidence for every fact
- Error propagation: If one researcher fails, coordinator handles gracefully (circuit breaker pattern)
- Separate sessions for verification: Generator and verifier must NOT share session context
Scenario 4: Developer Productivity with Claude
Domains tested: D2 (Tools/MCP), D3 (Claude Code)
Tests built-in tool selection, MCP integration, and codebase exploration strategies.
Key concepts to nail:
- Built-in tool selection: Glob (find files) → Grep (search content) → Read (examine file) → Edit/Write (modify)
- MCP servers: Add external tools via
claude mcp add; tool naming:mcp__<server>__<action> - ToolSearch for scale: When you have many MCP servers, use dynamic tool discovery instead of preloading all
- Parallel execution: Read-only tools (Read, Glob, Grep) can run concurrently; Write/Bash run sequentially
Scenario 5: Claude Code for CI/CD
Domains tested: D3 (Claude Code), D1 (Orchestration)
Tests non-interactive mode, batch processing, multi-pass review, and session isolation.
Key concepts to nail:
-pflag: Non-interactive mode for pipelines — no approval prompts--output-format json: Structured output for pipeline parsing- Session isolation: Generator in session A, reviewer in session B (avoids reasoning context bias)
- Multi-pass code review: Pass 1 (correctness), Pass 2 (security), Pass 3 (style) — each in separate session
- Batch API: For processing many files/requests efficiently with
claude batch submit
Scenario 6: Structured Data Extraction
Domains tested: D4 (Prompt Engineering), D5 (Reliability)
Tests JSON schemas, tool_use for structured output, validation-retry loops, and few-shot prompting.
Key concepts to nail:
- Forced tool_use:
tool_choice: {"type": "tool", "name": "extract_data"}for guaranteed structured output - Validation-retry loop: Extract → Validate → If invalid, feed errors back → Retry (max 3)
- Few-shot examples: Use XML-tagged examples to show desired extraction format
- Per-field confidence: Track confidence per extracted field, flag low-confidence for human review
- Multi-pass extraction: Pass 1 extracts, Pass 2 validates against source, Pass 3 assigns confidence
🚫 Anti-Patterns & Exam Traps: The Complete Top 10
The exam LOVES presenting anti-patterns as plausible-sounding answer choices. Memorize these cold:
| # | ❌ Anti-Pattern (WRONG) | ✅ Correct Approach | Why It's Wrong |
|---|---|---|---|
| 1 | Parsing natural language for loop termination | Check stop_reasonprogrammatically | NL is unreliable; "I'm done" could be quoting a user |
| 2 | Arbitrary iteration caps as primary stop | Let agentic loop terminate via stop_reason | Caps are safety nets, not control flow |
| 3 | Prompt-based enforcement for critical rules | Programmatic hooks (deterministic) | Prompts are advisory; hooks are guaranteed |
| 4 | Self-reported confidence for escalation | Structured criteria + programmatic checks | Models are poorly calibrated on their own confidence |
| 5 | Sentiment-based escalation | Task complexity / policy gap triggers | Angry customers may have simple requests; calm ones may need escalation |
| 6 | Generic error messages ("failed") | Structured errors with category + retryability | Agent can't reason about recovery without context |
| 7 | Silently suppressing errors (empty = success) | Explicit failure distinction | "No results found" ≠ "access denied" — different recovery paths |
| 8 | Too many tools (18+) per agent | 4-5 tools per agent, distribute across subagents | Selection accuracy degrades with tool count |
| 9 | Same-session self-review | Separate sessions for generator/verifier | Reasoning context bias — reviewer sees generator's thought process |
| 10 | Aggregate accuracy metrics only | Per-document-type tracking | 90% overall might hide 40% accuracy on one doc type |
💻 Code Examples: Cross-Domain Integration
Complete Customer Support Agent (D1 + D2 + D5)
import anthropic
import json
client = anthropic.Anthropic()
# Domain 2: Well-designed tools (4 tools, clear descriptions)
tools = [
{
"name": "lookup_order",
"description": "Look up a customer order by order ID. Returns order status, items, "
"shipping info. Use when customer asks about an existing order.",
"input_schema": {
"type": "object",
"properties": {
"order_id": {"type": "string", "description": "Order ID (format: ORD-XXXXX)"}
},
"required": ["order_id"]
}
},
{
"name": "issue_refund",
"description": "Process a refund for a given order. Returns refund confirmation. "
"Use only after confirming with customer.",
"input_schema": {
"type": "object",
"properties": {
"order_id": {"type": "string"},
"amount": {"type": "number", "description": "Refund amount in USD"},
"reason": {"type": "string", "enum": ["defective", "not_received", "wrong_item", "changed_mind"]}
},
"required": ["order_id", "amount", "reason"]
}
},
{
"name": "escalate_to_human",
"description": "Escalate the conversation to a human agent. Use when: policy gap detected, "
"multi-system failure, or task exceeds complexity threshold.",
"input_schema": {
"type": "object",
"properties": {
"reason": {"type": "string", "enum": ["policy_gap", "system_failure", "complexity", "customer_request"]},
"context": {"type": "string", "description": "Summary for the human agent"}
},
"required": ["reason", "context"]
}
},
{
"name": "check_inventory",
"description": "Check product availability. Returns stock count and restock date if unavailable.",
"input_schema": {
"type": "object",
"properties": {
"product_id": {"type": "string"}
},
"required": ["product_id"]
}
}
]
# Domain 1: Hooks for deterministic enforcement
def pre_tool_use_hook(tool_name, tool_input):
"""Deterministic policy enforcement — NOT prompt-based!"""
if tool_name == "issue_refund" and tool_input.get("amount", 0) > 500:
return {"blocked": True, "reason": "Refunds > $500 require manager approval",
"action": "escalate_to_human"}
return {"blocked": False}
# Domain 2: Structured error handling
def execute_tool(tool_name, tool_input):
# Check hook first (Domain 1: deterministic enforcement)
hook_result = pre_tool_use_hook(tool_name, tool_input)
if hook_result["blocked"]:
return {
"is_error": True,
"errorCategory": "policy_violation",
"isRetryable": False,
"context": hook_result["reason"],
"suggestion": f"Auto-escalating: {hook_result['action']}"
}
try:
result = call_actual_tool(tool_name, tool_input)
return result
except TimeoutError:
return {
"is_error": True,
"errorCategory": "timeout",
"isRetryable": True,
"context": f"{tool_name} timed out after 30s",
"suggestion": "Retry once, then escalate if still failing"
}
except PermissionError:
return {
"is_error": True,
"errorCategory": "authentication",
"isRetryable": False,
"context": "Insufficient permissions for this operation",
"suggestion": "Escalate to human agent with elevated access"
}
# Domain 1: The agentic loop with proper termination
def run_support_agent(user_message):
messages = [{"role": "user", "content": user_message}]
consecutive_failures = 0 # Domain 5: Circuit breaker
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a customer support agent. Escalate based on task complexity, "
"policy gaps, or system failures — never based on customer sentiment.",
tools=tools,
messages=messages
)
# ✅ Terminate on stop_reason, NOT natural language parsing
while response.stop_reason == "tool_use":
tool_block = next(b for b in response.content if b.type == "tool_use")
result = execute_tool(tool_block.name, tool_block.input)
# Domain 5: Circuit breaker pattern
if isinstance(result, dict) and result.get("is_error"):
consecutive_failures += 1
if consecutive_failures >= 3:
result = execute_tool("escalate_to_human", {
"reason": "system_failure",
"context": f"Circuit breaker triggered: {consecutive_failures} consecutive failures"
})
else:
consecutive_failures = 0
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": [
{"type": "tool_result", "tool_use_id": tool_block.id, "content": json.dumps(result)}
]})
response = client.messages.create(
model="claude-sonnet-4-20250514", max_tokens=1024,
tools=tools, messages=messages
)
return response.content[0].text
Multi-Pass Extraction with Session Isolation (D4 + D5)
# Each pass runs in a SEPARATE session (separate API call = separate context)
def multi_pass_extraction(document_text):
# Pass 1: Extract (fresh session)
extraction = client.messages.create(
model="claude-sonnet-4-20250514",
tools=[extraction_tool],
tool_choice={"type": "tool", "name": "extract_fields"},
messages=[{"role": "user", "content": f"Extract all fields:\n{document_text}"}]
)
extracted = json.loads(extraction.content[0].input)
# Pass 2: Validate (SEPARATE session - no access to Pass 1's reasoning)
validation = client.messages.create(
model="claude-sonnet-4-20250514",
tools=[validation_tool],
tool_choice={"type": "tool", "name": "validate_extraction"},
messages=[{"role": "user", "content":
f"Source document:\n{document_text}\n\n"
f"Extracted data:\n{json.dumps(extracted)}\n\n"
"Validate each field against the source. Flag any mismatches."}]
)
# Pass 3: Confidence scoring (SEPARATE session)
confidence = client.messages.create(
model="claude-sonnet-4-20250514",
tools=[confidence_tool],
tool_choice={"type": "tool", "name": "score_confidence"},
messages=[{"role": "user", "content":
f"For each field, assign confidence 0.0-1.0:\n{json.dumps(extracted)}"}]
)
# Flag low-confidence fields for human review
scores = json.loads(confidence.content[0].input)
for field, score in scores["fields"].items():
if score["confidence"] < 0.7:
scores["fields"][field]["needs_human_review"] = True
return scores
🎬 Video to Watch
How We Build Effective Agents — Barry Zhang (Anthropic) at AI Engineer Summit
Barry Zhang from Anthropic's Applied AI team walks through the three core principles: (1) don't build agents for everything, (2) keep designs simple and composable, (3) think like the agent. This talk directly maps to the CCA-F exam's emphasis on choosing the right architecture pattern and avoiding over-engineering. Most relevant sections: The discussion of when NOT to use agents, and the breakdown of composable patterns (first 15 minutes).
Also recommended: Don't Build Agents, Build Skills Instead — Barry Zhang & Mahesh Murag (Anthropic) at AI Engineer Code Summit
This follow-up talk introduces the "Skills" concept — portable procedural knowledge that agents load dynamically. Directly relevant to the Claude Code scenarios (Domain 3) on the exam, particularly slash commands and the distinction between CLAUDE.md(always loaded) vs. skills (loaded on demand).
📖 Reading
- Primary: Building Effective Agents — Anthropic's canonical blog post. Re-read with fresh eyes now that you've studied all 5 domains.
- Secondary: Demystifying Evals for AI Agents — How to evaluate agentic systems (directly relevant to exam scenarios on measuring agent performance).
🛠️ Hands-On Exercise (30 min)
Cross-Domain Scenario Walkthrough:
- Pick ANY two of the 6 scenarios above
- For each, write a complete architectural diagram showing:
- How many agents are involved and their roles
- Which tools each agent has (keep to 4-5!)
- Where hooks enforce business rules
- How errors propagate and are handled
- Where session isolation boundaries exist
- For each diagram, identify which of the 10 anti-patterns a wrong design would hit
- Write one exam-style question for each scenario where the wrong answers are anti-patterns
📝 Quick Quiz
Q1: In a CI/CD pipeline using Claude Code, you need to generate code and then review it for security vulnerabilities. What is the BEST architecture?
A) Use Claude Code with the -p flag and ask it to generate and then review its own code
B) Run generation in one session, then pass the output to a separate review session with --output-format json
C) Add "always review your code for security issues" to CLAUDE.md
D) Use a single session with a high effort level to ensure thorough self-review
Q2: A multi-agent research system has a coordinator that delegates to 3 researcher subagents. One researcher's API source returns a 503 error on the 3rd consecutive call. What should happen?
A) The researcher should retry indefinitely until the API responds B) The researcher should report low confidence and let the coordinator decide C) The circuit breaker should open, the researcher should return a structured error to the coordinator, and the coordinator should synthesize results from the other 2 researchers D) The coordinator should escalate the entire research task to a human
Q3: You're building a structured data extraction pipeline for invoices. The extracted "total_amount" field has a confidence score of 0.45. What is the CORRECT next step?
A) Accept the extraction since the model produced a value B) Re-run the entire extraction with a higher effort level C) Flag this specific field for human review while accepting high-confidence fields D) Reject the entire document and ask for a clearer scan
Answers
Q1: B — Session isolation is critical. Same-session self-review (A, D) is anti-pattern #9. CLAUDE.md instructions (C) are advisory, not deterministic. Separate sessions with structured output is the correct CI/CD pattern.
Q2: C — The circuit breaker pattern (3 failures → open circuit) is the correct reliability pattern. The researcher returns a structured error; the coordinator is resilient and works with partial results. Retrying indefinitely (A) wastes resources. Reporting "low confidence" (B) is the self-reported confidence anti-pattern. Escalating everything (D) is too aggressive — the coordinator can still produce useful output.
Q3: C — Per-field confidence tracking with human review for low-confidence fields is the correct pattern. Accepting blindly (A) ignores reliability requirements. Re-running entirely (B) is wasteful. Rejecting the whole document (D) throws away good extractions from other fields.
🔮 Tomorrow's Preview
Day 22 will be a Timed Practice Exam Simulation — 15 scenario-based questions across all domains, with a focus on the trickiest cross-domain traps. We'll work under exam conditions and then do a detailed answer review.