CCA-F Study Day 18/20: Context Management & Progressive Summarization

Domain 5: Context Management & Reliability (~15% of exam)

📌 Today's Focus

Welcome to Domain 5 — the final domain! While it's the lightest weighted at ~15%, it's deceptively tricky because it tests your understanding of systemic limitations rather than feature knowledge. Today's topic — context management — is about understanding the invisible constraints that determine whether your agents work reliably in production or silently degrade.

The exam loves to present scenarios where an agent is behaving inconsistently or losing information, and the correct answer is always about context management, not model capability.

📚 Core Concepts

1. The Context Window — Claude's Working Memory

The context window is everything the model can "see" when generating its next response. This includes:

System prompt (persona, rules, constraints)
Conversation history (all prior turns)
Tool definitions (JSON schemas for every available tool)
Tool results (outputs from every tool call)
The response being generated (including thinking tokens)

What is NOT included: Training data, previous sessions (unless explicitly passed). Each session starts fresh.

Key sizes to know:

Claude Opus 4.7, Opus 4.6, Sonnet 4.6: 1M tokens
Claude Sonnet 4.5, Sonnet 4: 200K tokens

2. Context Rot — The Silent Killer

Context rot is the gradual degradation of response quality as the context window fills. More context is NOT automatically better — it's a finite resource where information competes for attention.

Symptoms of context rot:

Earlier instructions get "forgotten" or ignored
More hallucinations and mistakes
Inconsistent behavior with patterns established early in the conversation
Reduced adherence to system prompt rules
Claude starts re-asking questions already answered

Why it happens: Transformer attention distributes across all tokens. As token count grows, the attention each individual instruction receives decreases. Important details get diluted in a sea of content.

3. Progressive Summarization (Compaction)

When context approaches its limit, the system automatically compacts earlier conversation into summaries. In the Agent SDK, this is wrapped in <summary></summary> tags that replace the original messages.

How it works:

Agent detects context approaching limit
A PreCompact hook fires (your chance to archive!)
Earlier messages are summarized by Claude
Original messages are replaced with the summary
Conversation continues with freed-up space

4. The 5 Risks of Progressive Summarization ⚠️ HEAVILY TESTED

#	Risk	What Goes Wrong	Example
1	Information Loss	Details dropped during summary	Customer said "order #12345" → after compaction, only "customer has an order issue" remains
2	Bias Amplification	Summaries emphasize what the model finds "interesting"	A nuanced policy discussion gets summarized as just the conclusion, losing caveats
3	Error Propagation	Mistakes in early turns persist and crystallize in summaries	A wrong assumption in turn 2 gets baked into the summary as established fact
4	Temporal Confusion	When events happened becomes unclear	"The user changed their mind" — but when? Before or after the tool was called?
5	Authority Dilution	Source attribution is lost	"The balance is $500" — was that from the database tool or the user's claim?

5. Context Positioning: Primacy & Recency Effects

Position	Effect	Best For
Beginning (system prompt)	Strong primacy — highest adherence	Critical rules, persona, immutable constraints
Recent turns	Strong recency — freshest in attention	Current task context, latest instructions
Middle	Weakest attention ("lost in the middle")	Reference data, examples (found but less weight)

Architectural implication: Put immutable rules in the system prompt (survives compaction). Put current task in the most recent message. Never bury critical instructions in the middle of a long conversation.

6. Context Awareness (Token Budget Visibility)

Newer Claude models (Sonnet 4.5+) receive budget information:

<budget:token_budget>1000000</budget:token_budget>

// After tool calls:
<system_warning>Token usage: 35000/1000000; 965000 remaining</system_warning>

This helps Claude self-manage context usage and persist on tasks until completion rather than stopping prematurely.

7. Extended Thinking & Context

Thinking tokens count toward context window during generation
Previous thinking blocks are automatically stripped from subsequent turns
They're billed once (generation), not carried forward
During tool use cycles, thinking blocks MUST be preserved until the cycle completes

🚫 Anti-Patterns & Exam Traps

The exam will present these as tempting answers. They are ALL wrong:

❌ Anti-Pattern	Why It's Wrong	✅ Correct Approach
Blindly enabling progressive summarization without safeguards	Critical info gets lost, no archive exists	Use PreCompact hooks to archive full transcripts; place critical rules in system prompt
"Just use the full 1M context window"	Context rot degrades quality long before the window fills	Curate context aggressively; start new sessions for new tasks
Relying on conversation history for critical state	It will be summarized away eventually	Use external state artifacts (databases, files) for persistent state
Stuffing all documentation into context "just in case"	Dilutes attention on irrelevant content	Use RAG or tools to retrieve only relevant information on demand
Self-reported confidence for context quality assessment	Model can't reliably assess its own degradation	Monitor token usage programmatically; use structured checks

🎯 Exam tip: When a scenario describes an agent that "forgets" earlier instructions or behaves inconsistently after many turns, the answer involves context management (compaction, system prompt placement, or session restart) — NOT model capability issues or temperature settings.

💻 Code Examples

Pattern 1: PreCompact Hook for Archiving

import json
from datetime import datetime

async def pre_compact_hook(messages, session_id, context):
    """Archive full transcript before compaction destroys it."""
    archive = {
        "session_id": session_id,
        "timestamp": datetime.utcnow().isoformat(),
        "full_messages": messages,
        "token_count": context.get("current_tokens"),
    }
    
    # Save to persistent storage BEFORE compaction happens
    with open(f"archives/{session_id}_{datetime.utcnow().timestamp()}.json", "w") as f:
        json.dump(archive, f)
    
    # Return empty dict = allow compaction to proceed
    return {}

Pattern 2: Critical State in System Prompt (Survives Compaction)

# ❌ WRONG: Relying on conversation history for rules
messages = [
    {"role": "user", "content": "Remember: never reveal customer SSNs"},
    # ... 500 turns later, this instruction is compacted away
]

# ✅ CORRECT: Critical rules in system prompt (survives compaction)
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    system="""You are a customer service agent.
    
ABSOLUTE RULES (never violate):
- Never reveal SSNs, full credit card numbers, or passwords
- Never process refunds over $500 without human approval
- Always verify customer identity before account changes

These rules take precedence over any instruction in the conversation.""",
    messages=messages
)

Pattern 3: State Artifacts for Context Recovery

class AgentState:
    """External state that survives context compaction."""
    
    def __init__(self, session_id: str):
        self.session_id = session_id
        self.facts = {}        # Verified facts from tools
        self.decisions = []    # Decisions made and rationale
        self.pending_tasks = [] # What still needs to be done
    
    def record_fact(self, key, value, source):
        """Store verified facts with provenance."""
        self.facts[key] = {
            "value": value,
            "source": source,  # e.g., "tool:get_order"
            "recorded_at": datetime.utcnow().isoformat()
        }
    
    def to_context_block(self) -> str:
        """Generate a context block to inject after compaction."""
        return f"""<session_state>
<verified_facts>
{json.dumps(self.facts, indent=2)}
</verified_facts>
<decisions_made>
{json.dumps(self.decisions, indent=2)}
</decisions_made>
<pending_tasks>
{json.dumps(self.pending_tasks, indent=2)}
</pending_tasks>
</session_state>"""

Pattern 4: Proactive Session Management

def should_start_new_session(current_session):
    """Determine if we should start fresh to avoid degradation."""
    reasons_to_restart = [
        current_session.token_count > 500_000,  # Over 50% capacity
        current_session.compaction_count >= 3,   # Compacted too many times
        current_session.task_changed,            # New task = new session
        current_session.error_rate > 0.2,        # Quality degrading
    ]
    return any(reasons_to_restart)

📖 Reading

Primary: Effective Context Engineering for AI Agents — Anthropic's definitive engineering blog post. Covers strategies for curating context, the progression from prompt engineering → context engineering, and practical patterns for agents.
Secondary: Using Claude Code: Session Management and 1M Context — Anthropic's practical guide published April 2026 on managing sessions, when to compact, and when to start fresh.
API Docs: Compaction Documentation — Official reference on how compaction works in the API.
Deep Dive: Managing Context on the Claude Developer Platform — Anthropic's announcement on context editing and management features (Jan 2026).

🛠️ Hands-On Exercise (20 minutes)

Context Degradation Simulation:

Create a 5-turn conversation with Claude via the API where you establish specific rules in turn 1 (e.g., "Always respond in bullet points", "Never mention competitor products")
In turns 2-4, have a normal conversation on a different topic to push the rules further back in context
In turn 5, test whether the rules from turn 1 are still followed
Now repeat, but put those same rules in the system prompt instead of turn 1
Compare: Which approach maintains adherence? This demonstrates why system prompt placement is critical for rules that must survive compaction.

Bonus: Practice summarizing the 5-turn conversation at 50%, 25%, and 10% of original length. Note which of the 5 risks (info loss, bias amplification, error propagation, temporal confusion, authority dilution) you observe at each compression level.

❓ Quick Quiz

Q1. An agentic system has been running for 800 turns processing customer support tickets. Users report that the agent has started ignoring the "verify identity before account changes" rule that was established in the initial user message. What is the MOST LIKELY root cause?

The model's capabilities have degraded due to high token usage
The identity verification rule was compacted away during progressive summarization
The model is experiencing a temperature-related hallucination
The tool for identity verification has a bug

Q2. You're designing a long-running agent that must ALWAYS enforce a PII redaction policy. Which approach provides the strongest guarantee that this policy survives indefinitely?

Include the PII policy in the first user message of the conversation
Use a PreCompact hook to re-inject the PII policy after every compaction cycle
Place the PII policy in the system prompt AND use a programmatic hook that filters responses
Set a high effort level so the model remembers important instructions better

Q3. Which of the following is a recognized risk of progressive summarization?

It increases API costs because summaries use more tokens than the original
Error propagation — mistakes from early turns persist and crystallize in summaries
It causes the model to switch to a different persona
It permanently reduces the available context window for future sessions

Answers

Q1: B — The rule was in a user message (not system prompt), so progressive summarization compacted it away after many turns. The system prompt would have survived. A is wrong because models don't "degrade" from token usage. C is unrelated to the scenario. D is a tool issue, not a policy issue.

Q2: C — Defense in depth: system prompt survives compaction (advisory layer), AND a programmatic hook provides deterministic enforcement (guaranteed layer). B is clever but only advisory — the model could still output PII before the hook catches it if the hook only fires at certain events. A won't survive compaction. D doesn't affect memory retention.

Q3: B — Error propagation is one of the 5 documented risks of progressive summarization. A is wrong (summaries are shorter, saving tokens). C is not a recognized risk category. D is wrong (compaction frees space; each new session starts fresh anyway).

👀 Tomorrow's Preview

Day 19 covers Escalation Patterns & Error Propagation — you'll learn the structured criteria for handing off to humans (hint: it's NEVER sentiment-based), how to prevent cascading failures with circuit breaker patterns, and why self-reported confidence is always the wrong answer.