CCA-F Study Day 18/20: Context Management & Progressive Summarization
Domain 5: Context Management & Reliability (~15% of exam)
📌 Today's Focus
Welcome to Domain 5 — the final domain! While it's the lightest weighted at ~15%, it's deceptively tricky because it tests your understanding of systemic limitations rather than feature knowledge. Today's topic — context management — is about understanding the invisible constraints that determine whether your agents work reliably in production or silently degrade.
The exam loves to present scenarios where an agent is behaving inconsistently or losing information, and the correct answer is always about context management, not model capability.
📚 Core Concepts
1. The Context Window — Claude's Working Memory
The context window is everything the model can "see" when generating its next response. This includes:
- System prompt (persona, rules, constraints)
- Conversation history (all prior turns)
- Tool definitions (JSON schemas for every available tool)
- Tool results (outputs from every tool call)
- The response being generated (including thinking tokens)
What is NOT included: Training data, previous sessions (unless explicitly passed). Each session starts fresh.
Key sizes to know:
- Claude Opus 4.7, Opus 4.6, Sonnet 4.6: 1M tokens
- Claude Sonnet 4.5, Sonnet 4: 200K tokens
2. Context Rot — The Silent Killer
Context rot is the gradual degradation of response quality as the context window fills. More context is NOT automatically better — it's a finite resource where information competes for attention.
Symptoms of context rot:
- Earlier instructions get "forgotten" or ignored
- More hallucinations and mistakes
- Inconsistent behavior with patterns established early in the conversation
- Reduced adherence to system prompt rules
- Claude starts re-asking questions already answered
Why it happens: Transformer attention distributes across all tokens. As token count grows, the attention each individual instruction receives decreases. Important details get diluted in a sea of content.
3. Progressive Summarization (Compaction)
When context approaches its limit, the system automatically compacts earlier conversation into summaries. In the Agent SDK, this is wrapped in <summary></summary> tags that replace the original messages.
How it works:
- Agent detects context approaching limit
- A
PreCompacthook fires (your chance to archive!) - Earlier messages are summarized by Claude
- Original messages are replaced with the summary
- Conversation continues with freed-up space
4. The 5 Risks of Progressive Summarization ⚠️ HEAVILY TESTED
| # | Risk | What Goes Wrong | Example |
|---|---|---|---|
| 1 | Information Loss | Details dropped during summary | Customer said "order #12345" → after compaction, only "customer has an order issue" remains |
| 2 | Bias Amplification | Summaries emphasize what the model finds "interesting" | A nuanced policy discussion gets summarized as just the conclusion, losing caveats |
| 3 | Error Propagation | Mistakes in early turns persist and crystallize in summaries | A wrong assumption in turn 2 gets baked into the summary as established fact |
| 4 | Temporal Confusion | When events happened becomes unclear | "The user changed their mind" — but when? Before or after the tool was called? |
| 5 | Authority Dilution | Source attribution is lost | "The balance is $500" — was that from the database tool or the user's claim? |
5. Context Positioning: Primacy & Recency Effects
| Position | Effect | Best For |
|---|---|---|
| Beginning (system prompt) | Strong primacy — highest adherence | Critical rules, persona, immutable constraints |
| Recent turns | Strong recency — freshest in attention | Current task context, latest instructions |
| Middle | Weakest attention ("lost in the middle") | Reference data, examples (found but less weight) |
Architectural implication: Put immutable rules in the system prompt (survives compaction). Put current task in the most recent message. Never bury critical instructions in the middle of a long conversation.
6. Context Awareness (Token Budget Visibility)
Newer Claude models (Sonnet 4.5+) receive budget information:
<budget:token_budget>1000000</budget:token_budget>
// After tool calls:
<system_warning>Token usage: 35000/1000000; 965000 remaining</system_warning>
This helps Claude self-manage context usage and persist on tasks until completion rather than stopping prematurely.
7. Extended Thinking & Context
- Thinking tokens count toward context window during generation
- Previous thinking blocks are automatically stripped from subsequent turns
- They're billed once (generation), not carried forward
- During tool use cycles, thinking blocks MUST be preserved until the cycle completes
🚫 Anti-Patterns & Exam Traps
The exam will present these as tempting answers. They are ALL wrong:
| ❌ Anti-Pattern | Why It's Wrong | ✅ Correct Approach |
|---|---|---|
| Blindly enabling progressive summarization without safeguards | Critical info gets lost, no archive exists | Use PreCompact hooks to archive full transcripts; place critical rules in system prompt |
| "Just use the full 1M context window" | Context rot degrades quality long before the window fills | Curate context aggressively; start new sessions for new tasks |
| Relying on conversation history for critical state | It will be summarized away eventually | Use external state artifacts (databases, files) for persistent state |
| Stuffing all documentation into context "just in case" | Dilutes attention on irrelevant content | Use RAG or tools to retrieve only relevant information on demand |
| Self-reported confidence for context quality assessment | Model can't reliably assess its own degradation | Monitor token usage programmatically; use structured checks |
🎯 Exam tip: When a scenario describes an agent that "forgets" earlier instructions or behaves inconsistently after many turns, the answer involves context management (compaction, system prompt placement, or session restart) — NOT model capability issues or temperature settings.
💻 Code Examples
Pattern 1: PreCompact Hook for Archiving
import json
from datetime import datetime
async def pre_compact_hook(messages, session_id, context):
"""Archive full transcript before compaction destroys it."""
archive = {
"session_id": session_id,
"timestamp": datetime.utcnow().isoformat(),
"full_messages": messages,
"token_count": context.get("current_tokens"),
}
# Save to persistent storage BEFORE compaction happens
with open(f"archives/{session_id}_{datetime.utcnow().timestamp()}.json", "w") as f:
json.dump(archive, f)
# Return empty dict = allow compaction to proceed
return {}
Pattern 2: Critical State in System Prompt (Survives Compaction)
# ❌ WRONG: Relying on conversation history for rules
messages = [
{"role": "user", "content": "Remember: never reveal customer SSNs"},
# ... 500 turns later, this instruction is compacted away
]
# ✅ CORRECT: Critical rules in system prompt (survives compaction)
response = client.messages.create(
model="claude-sonnet-4-20250514",
system="""You are a customer service agent.
ABSOLUTE RULES (never violate):
- Never reveal SSNs, full credit card numbers, or passwords
- Never process refunds over $500 without human approval
- Always verify customer identity before account changes
These rules take precedence over any instruction in the conversation.""",
messages=messages
)
Pattern 3: State Artifacts for Context Recovery
class AgentState:
"""External state that survives context compaction."""
def __init__(self, session_id: str):
self.session_id = session_id
self.facts = {} # Verified facts from tools
self.decisions = [] # Decisions made and rationale
self.pending_tasks = [] # What still needs to be done
def record_fact(self, key, value, source):
"""Store verified facts with provenance."""
self.facts[key] = {
"value": value,
"source": source, # e.g., "tool:get_order"
"recorded_at": datetime.utcnow().isoformat()
}
def to_context_block(self) -> str:
"""Generate a context block to inject after compaction."""
return f"""<session_state>
<verified_facts>
{json.dumps(self.facts, indent=2)}
</verified_facts>
<decisions_made>
{json.dumps(self.decisions, indent=2)}
</decisions_made>
<pending_tasks>
{json.dumps(self.pending_tasks, indent=2)}
</pending_tasks>
</session_state>"""
Pattern 4: Proactive Session Management
def should_start_new_session(current_session):
"""Determine if we should start fresh to avoid degradation."""
reasons_to_restart = [
current_session.token_count > 500_000, # Over 50% capacity
current_session.compaction_count >= 3, # Compacted too many times
current_session.task_changed, # New task = new session
current_session.error_rate > 0.2, # Quality degrading
]
return any(reasons_to_restart)
📖 Reading
- Primary: Effective Context Engineering for AI Agents — Anthropic's definitive engineering blog post. Covers strategies for curating context, the progression from prompt engineering → context engineering, and practical patterns for agents.
- Secondary: Using Claude Code: Session Management and 1M Context — Anthropic's practical guide published April 2026 on managing sessions, when to compact, and when to start fresh.
- API Docs: Compaction Documentation — Official reference on how compaction works in the API.
- Deep Dive: Managing Context on the Claude Developer Platform — Anthropic's announcement on context editing and management features (Jan 2026).
🛠️ Hands-On Exercise (20 minutes)
Context Degradation Simulation:
- Create a 5-turn conversation with Claude via the API where you establish specific rules in turn 1 (e.g., "Always respond in bullet points", "Never mention competitor products")
- In turns 2-4, have a normal conversation on a different topic to push the rules further back in context
- In turn 5, test whether the rules from turn 1 are still followed
- Now repeat, but put those same rules in the system prompt instead of turn 1
- Compare: Which approach maintains adherence? This demonstrates why system prompt placement is critical for rules that must survive compaction.
Bonus: Practice summarizing the 5-turn conversation at 50%, 25%, and 10% of original length. Note which of the 5 risks (info loss, bias amplification, error propagation, temporal confusion, authority dilution) you observe at each compression level.
❓ Quick Quiz
Q1. An agentic system has been running for 800 turns processing customer support tickets. Users report that the agent has started ignoring the "verify identity before account changes" rule that was established in the initial user message. What is the MOST LIKELY root cause?
- The model's capabilities have degraded due to high token usage
- The identity verification rule was compacted away during progressive summarization
- The model is experiencing a temperature-related hallucination
- The tool for identity verification has a bug
Q2. You're designing a long-running agent that must ALWAYS enforce a PII redaction policy. Which approach provides the strongest guarantee that this policy survives indefinitely?
- Include the PII policy in the first user message of the conversation
- Use a PreCompact hook to re-inject the PII policy after every compaction cycle
- Place the PII policy in the system prompt AND use a programmatic hook that filters responses
- Set a high effort level so the model remembers important instructions better
Q3. Which of the following is a recognized risk of progressive summarization?
- It increases API costs because summaries use more tokens than the original
- Error propagation — mistakes from early turns persist and crystallize in summaries
- It causes the model to switch to a different persona
- It permanently reduces the available context window for future sessions
Answers
Q1: B — The rule was in a user message (not system prompt), so progressive summarization compacted it away after many turns. The system prompt would have survived. A is wrong because models don't "degrade" from token usage. C is unrelated to the scenario. D is a tool issue, not a policy issue.
Q2: C — Defense in depth: system prompt survives compaction (advisory layer), AND a programmatic hook provides deterministic enforcement (guaranteed layer). B is clever but only advisory — the model could still output PII before the hook catches it if the hook only fires at certain events. A won't survive compaction. D doesn't affect memory retention.
Q3: B — Error propagation is one of the 5 documented risks of progressive summarization. A is wrong (summaries are shorter, saving tokens). C is not a recognized risk category. D is wrong (compaction frees space; each new session starts fresh anyway).
👀 Tomorrow's Preview
Day 19 covers Escalation Patterns & Error Propagation — you'll learn the structured criteria for handing off to humans (hint: it's NEVER sentiment-based), how to prevent cascading failures with circuit breaker patterns, and why self-reported confidence is always the wrong answer.