CCA-F Study Day 2/20: Effort Levels, Permissions & Session Management

Domain 1: Agentic Architecture & Orchestration (~25-27% of exam)

📌 Today's Focus

Yesterday you mastered the agentic loop and stop_reason mechanics. Today we go deeper into three critical configuration dimensions that control how agents behave: effort levels (how much reasoning depth the agent applies), permission modes (what the agent is allowed to do), and session management (how conversations persist and resume). These are heavily tested on the exam because they represent the architectural decisions that differentiate toy prototypes from production systems.

📚 Core Concepts

1. Effort Levels

Effort levels control how deeply Claude thinks before responding. They map directly to the amount of extended thinking (chain-of-thought) the model performs internally. This is a cost/quality/latency tradeoff that architects must tune per task.

Level	Behavior	Use Case	Latency	Cost
"low"	Minimal reasoning, fast responses	File lookups, listing directories, simple retrievals	Fastest	Lowest
"medium"	Balanced analysis	Routine edits, standard tasks, straightforward Q&A	Moderate	Moderate
"high"	Thorough analysis	Refactoring, debugging, code review	Slower	Higher
"xhigh"	Extended reasoning chains	Complex coding, multi-step agentic tasks	Slow	High
"max"	Maximum depth, exhaustive reasoning	Deep analysis, multi-step proofs, complex architecture decisions	Slowest	Highest

Key architectural insight: Effort levels are NOT about accuracy on simple tasks — they're about reasoning depth on complex ones. A "low" effort call reading a file will return the same content as a "max" call. The difference emerges when the task requires multi-step reasoning, planning, or synthesis.

Default behavior: Claude Opus 4.6 and Sonnet 4.6 default to "medium" effort. You can override per-query in the Agent SDK:

async for message in query(
    prompt="Analyze the auth module for security vulnerabilities",
    options=ClaudeAgentOptions(
        allowed_tools=["Read", "Glob", "Grep"],
        effort="xhigh"  # Deep analysis warranted for security review
    ),
):
    print(message)

In Claude Code CLI, use /effort high or /effort auto (resets to model default).

2. Permission Modes

Permission modes gate tool execution — they determine what happens when the agent tries to use a tool. This is a critical security architecture concern.

Mode	Behavior	Production Use Case
"default"	Tools not covered by allow rules trigger an approval callback	Interactive development — human in the loop
"acceptEdits"	Auto-approves file edits; Bash commands still follow default rules	Trusted code modification workflows
"plan"	Read-only mode — Claude explores and produces a plan only	Pre-execution analysis, architecture planning
"dontAsk"	Never prompts; pre-approved tools run, everything else denied	CI/CD pipelines, background automation
"auto"	Model classifier approves/denies each tool call	Semi-trusted environments
"bypassPermissions"	Runs all tools without asking	Isolated sandboxes ONLY — never production

Exam-critical detail: "dontAsk" is the correct answer for CI/CD scenarios. It's deterministic (no model judgment involved in permission decisions), doesn't block on human approval, and pre-approved tool lists are explicit.

# CI/CD pipeline permission setup — deterministic, non-interactive
options = ClaudeAgentOptions(
    permission_mode="dontAsk",
    allowed_tools=["Read", "Glob", "Grep", "Bash"],
    # Any tool NOT in allowed_tools is silently denied
)

3. Session Management

A session is the conversation history the SDK accumulates while your agent works. It contains: your prompt, every tool call, every tool result, and every response. Sessions persist to disk automatically (~/.claude/projects/<encoded-cwd>/<session-id>.jsonl).

Three Session Operations:

Operation	What It Does	When to Use
Continue	Finds the most recent session in the current directory	Multi-turn chat in one process, or after process restart
Resume	Takes a specific session ID to return to that exact session	Multiple concurrent sessions, recovering from budget/turn limits
Fork	Creates a NEW session with a COPY of the original's history	Exploring alternatives without losing the original

Critical Distinctions:

Sessions persist conversation, NOT filesystem. To snapshot file changes, use file checkpointing separately.
Continue vs Resume: Continue is automatic (finds most recent). Resume is explicit (you pass an ID). Use resume when you have multiple users/sessions.
Fork doesn't branch files — only the conversation history. If a forked agent edits files, those changes are real and visible to all sessions in that directory.

Session Code Patterns:

# PATTERN 1: Automatic session continuity (Python)
async with ClaudeSDKClient(options=options) as client:
    await client.query("Analyze the auth module")
    async for message in client.receive_response():
        print_response(message)
    
    # Second query automatically continues same session
    await client.query("Now refactor it to use JWT")
    async for message in client.receive_response():
        print_response(message)

# PATTERN 2: Capture and resume by ID
session_id = None
async for message in query(
    prompt="Analyze the auth module",
    options=ClaudeAgentOptions(allowed_tools=["Read", "Glob", "Grep"]),
):
    if isinstance(message, ResultMessage):
        session_id = message.session_id

# Later: resume with full context
async for message in query(
    prompt="Now implement the refactoring you suggested",
    options=ClaudeAgentOptions(
        resume=session_id,
        allowed_tools=["Read", "Edit", "Write", "Glob", "Grep"],
    ),
):
    pass

# PATTERN 3: Fork to explore alternatives
async for message in query(
    prompt="Instead of JWT, implement OAuth2",
    options=ClaudeAgentOptions(
        resume=session_id,
        fork_session=True,  # Original untouched, new branch created
    ),
):
    if isinstance(message, ResultMessage):
        forked_id = message.session_id  # Different from session_id

4. ResultMessage Subtypes (Budget Exhaustion)

When budget controls are hit, the agent returns a ResultMessage with specific subtypes:

error_max_turns — The agent hit the max_turns limit before completing the task
error_max_budget_usd — The agent hit the max_budget_usd cost ceiling
success — Normal completion (stop_reason was end_turn)

Architectural pattern: When you get error_max_turns or error_max_budget_usd, you can resume the session with a higher limit:

# Resume after hitting budget limit
async for message in query(
    prompt="Continue where you left off",
    options=ClaudeAgentOptions(
        resume=session_id,
        max_budget_usd=10.0,  # Higher limit this time
    ),
):
    pass

⚠️ Anti-Patterns & Exam Traps

❌ Wrong Answer	✅ Right Answer	Why
Use "bypassPermissions" in production for speed	Use "dontAsk" with explicit allowed_tools	bypassPermissions is for isolated sandboxes only — never production
Set effort to "max" for all tasks to maximize quality	Match effort level to task complexity	Wastes cost/latency on simple tasks; no quality improvement for trivial operations
Create a new session for each follow-up question	Use continue/resume to maintain context	New sessions lose all prior context — files read, analysis performed, decisions made
Use "auto" permission mode in CI/CD	Use "dontAsk" with pre-approved tools	Auto uses model judgment (non-deterministic). CI/CD needs deterministic behavior.
Rely on session persistence for file state	Use file checkpointing separately	Sessions persist conversation only. File changes need separate snapshot mechanism.
Pass session IDs between hosts without moving session files	Either mirror JSONL files to shared storage or pass results as application state	Session files are local to the machine. IDs are meaningless without the underlying JSONL.

🎬 Video to Watch

How We Build Effective Agents — Barry Zhang, Anthropic (AI Engineer Summit)

Barry Zhang from Anthropic's Applied AI team walks through their philosophy on building production agents. Covers the core loop, when to add complexity vs. keep things simple, and real enterprise implementations. Pay attention to the section on "thinking like the agent" and the tradeoffs between agent autonomy vs. control — this maps directly to today's permission modes discussion. ~30 min watch.

📖 Reading

Primary: Session Management — Agent SDK Docs (the definitive reference on continue, resume, fork)
Secondary: Building Agents with the Claude Agent SDK — Anthropic Engineering Blog
Reference: Agent SDK Overview (permission modes, effort levels in capabilities section)

🛠️ Hands-On Exercise (20 minutes)

Effort Level Comparison: Pick a moderately complex coding task (e.g., "analyze this file for security issues"). Run it at effort "low", "medium", and "high". Document:
- Response latency for each
- Depth of analysis (how many issues found)
- Cost difference (check the ResultMessage for usage)
Session Resume: Using the Agent SDK Python:
- Query 1: "List all Python files in this directory" — capture the session_id from ResultMessage
- Query 2: Resume with that ID and ask "Which of those files has the most lines?" — verify it remembers the file list without re-reading
- Query 3: Fork from the original and ask something different — verify the fork has independent history

📝 Quick Quiz

Q1: A CI/CD pipeline needs to run Claude Code non-interactively to review pull requests. Which permission mode ensures deterministic behavior without blocking on human approval?

A) "auto" — the model classifies each tool call B) "bypassPermissions" — runs everything without asking C) "dontAsk" — pre-approved tools run, all others denied D) "default" — triggers approval callback for unknown tools

Q2: An agent hits error_max_turns after 15 turns on a complex refactoring task. The work is partially complete. What is the correct architectural approach?

A) Create a new session with max_turns set to 100 B) Resume the session by ID with a higher max_turns limit C) Fork the session and increase the effort level D) Parse the partial output and start over with a better prompt

Q3: A developer wants to explore two different implementation approaches (JWT vs OAuth2) from the same analysis. They've already completed the initial code review in session abc123. What's the correct approach?

A) Resume abc123 twice with different prompts B) Fork abc123 for one approach, resume the original for the other C) Create two new sessions and re-run the analysis in each D) Use continue: true for both approaches sequentially

Answers:

Q1: C — "dontAsk" is deterministic and non-blocking. "auto" uses model judgment (non-deterministic). "bypassPermissions" is unsafe for production. "default" would block waiting for human approval.

Q2: B — Resume preserves all context from the 15 completed turns. A new session (A) loses all progress. Fork (C) is for branching, not continuing. Starting over (D) wastes the partial work already done.

Q3: B — Fork creates an independent branch with the full analysis context intact, while the original remains untouched for the other approach. Resuming twice (A) would make the second approach see the first approach's code. New sessions (C) waste the analysis work. Sequential continue (D) means the second approach sees the first's changes.

👀 Tomorrow's Preview

Day 3 covers Multi-Agent Orchestration Patterns — the five canonical patterns (Generator-Verifier, Hub-and-Spoke, Fan-Out/Fan-In, Pipeline, Competing Hypotheses) and the critical distinction between Agent Teams and Subagents. This is where the exam gets architecturally complex.