CCA-F Study Day 2/20: Effort Levels, Permissions & Session Management
Domain 1: Agentic Architecture & Orchestration (~25-27% of exam)
📌 Today's Focus
Yesterday you mastered the agentic loop and stop_reason mechanics. Today we go deeper into three critical configuration dimensions that control how agents behave: effort levels (how much reasoning depth the agent applies), permission modes (what the agent is allowed to do), and session management (how conversations persist and resume). These are heavily tested on the exam because they represent the architectural decisions that differentiate toy prototypes from production systems.
📚 Core Concepts
1. Effort Levels
Effort levels control how deeply Claude thinks before responding. They map directly to the amount of extended thinking (chain-of-thought) the model performs internally. This is a cost/quality/latency tradeoff that architects must tune per task.
| Level | Behavior | Use Case | Latency | Cost |
|---|---|---|---|---|
| "low" | Minimal reasoning, fast responses | File lookups, listing directories, simple retrievals | Fastest | Lowest |
| "medium" | Balanced analysis | Routine edits, standard tasks, straightforward Q&A | Moderate | Moderate |
| "high" | Thorough analysis | Refactoring, debugging, code review | Slower | Higher |
| "xhigh" | Extended reasoning chains | Complex coding, multi-step agentic tasks | Slow | High |
| "max" | Maximum depth, exhaustive reasoning | Deep analysis, multi-step proofs, complex architecture decisions | Slowest | Highest |
Key architectural insight: Effort levels are NOT about accuracy on simple tasks — they're about reasoning depth on complex ones. A "low" effort call reading a file will return the same content as a "max" call. The difference emerges when the task requires multi-step reasoning, planning, or synthesis.
Default behavior: Claude Opus 4.6 and Sonnet 4.6 default to "medium" effort. You can override per-query in the Agent SDK:
async for message in query(
prompt="Analyze the auth module for security vulnerabilities",
options=ClaudeAgentOptions(
allowed_tools=["Read", "Glob", "Grep"],
effort="xhigh" # Deep analysis warranted for security review
),
):
print(message)
In Claude Code CLI, use /effort high or /effort auto (resets to model default).
2. Permission Modes
Permission modes gate tool execution — they determine what happens when the agent tries to use a tool. This is a critical security architecture concern.
| Mode | Behavior | Production Use Case |
|---|---|---|
| "default" | Tools not covered by allow rules trigger an approval callback | Interactive development — human in the loop |
| "acceptEdits" | Auto-approves file edits; Bash commands still follow default rules | Trusted code modification workflows |
| "plan" | Read-only mode — Claude explores and produces a plan only | Pre-execution analysis, architecture planning |
| "dontAsk" | Never prompts; pre-approved tools run, everything else denied | CI/CD pipelines, background automation |
| "auto" | Model classifier approves/denies each tool call | Semi-trusted environments |
| "bypassPermissions" | Runs all tools without asking | Isolated sandboxes ONLY — never production |
Exam-critical detail: "dontAsk" is the correct answer for CI/CD scenarios. It's deterministic (no model judgment involved in permission decisions), doesn't block on human approval, and pre-approved tool lists are explicit.
# CI/CD pipeline permission setup — deterministic, non-interactive
options = ClaudeAgentOptions(
permission_mode="dontAsk",
allowed_tools=["Read", "Glob", "Grep", "Bash"],
# Any tool NOT in allowed_tools is silently denied
)
3. Session Management
A session is the conversation history the SDK accumulates while your agent works. It contains: your prompt, every tool call, every tool result, and every response. Sessions persist to disk automatically (~/.claude/projects/<encoded-cwd>/<session-id>.jsonl).
Three Session Operations:
| Operation | What It Does | When to Use |
|---|---|---|
| Continue | Finds the most recent session in the current directory | Multi-turn chat in one process, or after process restart |
| Resume | Takes a specific session ID to return to that exact session | Multiple concurrent sessions, recovering from budget/turn limits |
| Fork | Creates a NEW session with a COPY of the original's history | Exploring alternatives without losing the original |
Critical Distinctions:
- Sessions persist conversation, NOT filesystem. To snapshot file changes, use file checkpointing separately.
- Continue vs Resume: Continue is automatic (finds most recent). Resume is explicit (you pass an ID). Use resume when you have multiple users/sessions.
- Fork doesn't branch files — only the conversation history. If a forked agent edits files, those changes are real and visible to all sessions in that directory.
Session Code Patterns:
# PATTERN 1: Automatic session continuity (Python)
async with ClaudeSDKClient(options=options) as client:
await client.query("Analyze the auth module")
async for message in client.receive_response():
print_response(message)
# Second query automatically continues same session
await client.query("Now refactor it to use JWT")
async for message in client.receive_response():
print_response(message)
# PATTERN 2: Capture and resume by ID
session_id = None
async for message in query(
prompt="Analyze the auth module",
options=ClaudeAgentOptions(allowed_tools=["Read", "Glob", "Grep"]),
):
if isinstance(message, ResultMessage):
session_id = message.session_id
# Later: resume with full context
async for message in query(
prompt="Now implement the refactoring you suggested",
options=ClaudeAgentOptions(
resume=session_id,
allowed_tools=["Read", "Edit", "Write", "Glob", "Grep"],
),
):
pass
# PATTERN 3: Fork to explore alternatives
async for message in query(
prompt="Instead of JWT, implement OAuth2",
options=ClaudeAgentOptions(
resume=session_id,
fork_session=True, # Original untouched, new branch created
),
):
if isinstance(message, ResultMessage):
forked_id = message.session_id # Different from session_id
4. ResultMessage Subtypes (Budget Exhaustion)
When budget controls are hit, the agent returns a ResultMessage with specific subtypes:
error_max_turns— The agent hit themax_turnslimit before completing the taskerror_max_budget_usd— The agent hit themax_budget_usdcost ceilingsuccess— Normal completion (stop_reason was end_turn)
Architectural pattern: When you get error_max_turns or error_max_budget_usd, you can resume the session with a higher limit:
# Resume after hitting budget limit
async for message in query(
prompt="Continue where you left off",
options=ClaudeAgentOptions(
resume=session_id,
max_budget_usd=10.0, # Higher limit this time
),
):
pass
⚠️ Anti-Patterns & Exam Traps
| ❌ Wrong Answer | ✅ Right Answer | Why |
|---|---|---|
| Use "bypassPermissions" in production for speed | Use "dontAsk" with explicit allowed_tools | bypassPermissions is for isolated sandboxes only — never production |
| Set effort to "max" for all tasks to maximize quality | Match effort level to task complexity | Wastes cost/latency on simple tasks; no quality improvement for trivial operations |
| Create a new session for each follow-up question | Use continue/resume to maintain context | New sessions lose all prior context — files read, analysis performed, decisions made |
| Use "auto" permission mode in CI/CD | Use "dontAsk" with pre-approved tools | Auto uses model judgment (non-deterministic). CI/CD needs deterministic behavior. |
| Rely on session persistence for file state | Use file checkpointing separately | Sessions persist conversation only. File changes need separate snapshot mechanism. |
| Pass session IDs between hosts without moving session files | Either mirror JSONL files to shared storage or pass results as application state | Session files are local to the machine. IDs are meaningless without the underlying JSONL. |
🎬 Video to Watch
How We Build Effective Agents — Barry Zhang, Anthropic (AI Engineer Summit)
Barry Zhang from Anthropic's Applied AI team walks through their philosophy on building production agents. Covers the core loop, when to add complexity vs. keep things simple, and real enterprise implementations. Pay attention to the section on "thinking like the agent" and the tradeoffs between agent autonomy vs. control — this maps directly to today's permission modes discussion. ~30 min watch.
📖 Reading
- Primary: Session Management — Agent SDK Docs (the definitive reference on continue, resume, fork)
- Secondary: Building Agents with the Claude Agent SDK — Anthropic Engineering Blog
- Reference: Agent SDK Overview (permission modes, effort levels in capabilities section)
🛠️ Hands-On Exercise (20 minutes)
- Effort Level Comparison: Pick a moderately complex coding task (e.g., "analyze this file for security issues"). Run it at effort
"low","medium", and"high". Document:- Response latency for each
- Depth of analysis (how many issues found)
- Cost difference (check the ResultMessage for usage)
- Session Resume: Using the Agent SDK Python:
- Query 1: "List all Python files in this directory" — capture the session_id from ResultMessage
- Query 2: Resume with that ID and ask "Which of those files has the most lines?" — verify it remembers the file list without re-reading
- Query 3: Fork from the original and ask something different — verify the fork has independent history
📝 Quick Quiz
Q1: A CI/CD pipeline needs to run Claude Code non-interactively to review pull requests. Which permission mode ensures deterministic behavior without blocking on human approval?
A) "auto" — the model classifies each tool call
B) "bypassPermissions" — runs everything without asking
C) "dontAsk" — pre-approved tools run, all others denied
D) "default" — triggers approval callback for unknown tools
Q2: An agent hits error_max_turns after 15 turns on a complex refactoring task. The work is partially complete. What is the correct architectural approach?
A) Create a new session with max_turns set to 100 B) Resume the session by ID with a higher max_turns limit C) Fork the session and increase the effort level D) Parse the partial output and start over with a better prompt
Q3: A developer wants to explore two different implementation approaches (JWT vs OAuth2) from the same analysis. They've already completed the initial code review in session abc123. What's the correct approach?
A) Resume abc123 twice with different prompts
B) Fork abc123 for one approach, resume the original for the other
C) Create two new sessions and re-run the analysis in each
D) Use continue: true for both approaches sequentially
Answers:
Q1: C — "dontAsk" is deterministic and non-blocking. "auto" uses model judgment (non-deterministic). "bypassPermissions" is unsafe for production. "default" would block waiting for human approval.
Q2: B — Resume preserves all context from the 15 completed turns. A new session (A) loses all progress. Fork (C) is for branching, not continuing. Starting over (D) wastes the partial work already done.
Q3: B — Fork creates an independent branch with the full analysis context intact, while the original remains untouched for the other approach. Resuming twice (A) would make the second approach see the first approach's code. New sessions (C) waste the analysis work. Sequential continue (D) means the second approach sees the first's changes.
👀 Tomorrow's Preview
Day 3 covers Multi-Agent Orchestration Patterns — the five canonical patterns (Generator-Verifier, Hub-and-Spoke, Fan-Out/Fan-In, Pipeline, Competing Hypotheses) and the critical distinction between Agent Teams and Subagents. This is where the exam gets architecturally complex.