AI

CCA-F Study Day 18/20: Context Management & Progressive Summarization

Domain 5: Context Management & Reliability (~15% of exam)


📌 Today's Focus

Welcome to Domain 5 — the final domain! While it's the lightest weighted at ~15%, it's deceptively tricky because it tests your understanding of systemic limitations rather than feature knowledge. Today's topic — context management — is about understanding the invisible constraints that determine whether your agents work reliably in production or silently degrade.

The exam loves to present scenarios where an agent is behaving inconsistently or losing information, and the correct answer is always about context management, not model capability.


📚 Core Concepts

1. The Context Window — Claude's Working Memory

The context window is everything the model can "see" when generating its next response. This includes:

  • System prompt (persona, rules, constraints)
  • Conversation history (all prior turns)
  • Tool definitions (JSON schemas for every available tool)
  • Tool results (outputs from every tool call)
  • The response being generated (including thinking tokens)

What is NOT included: Training data, previous sessions (unless explicitly passed). Each session starts fresh.

Key sizes to know:

  • Claude Opus 4.7, Opus 4.6, Sonnet 4.6: 1M tokens
  • Claude Sonnet 4.5, Sonnet 4: 200K tokens

2. Context Rot — The Silent Killer

Context rot is the gradual degradation of response quality as the context window fills. More context is NOT automatically better — it's a finite resource where information competes for attention.

Symptoms of context rot:

  • Earlier instructions get "forgotten" or ignored
  • More hallucinations and mistakes
  • Inconsistent behavior with patterns established early in the conversation
  • Reduced adherence to system prompt rules
  • Claude starts re-asking questions already answered

Why it happens: Transformer attention distributes across all tokens. As token count grows, the attention each individual instruction receives decreases. Important details get diluted in a sea of content.

3. Progressive Summarization (Compaction)

When context approaches its limit, the system automatically compacts earlier conversation into summaries. In the Agent SDK, this is wrapped in <summary></summary> tags that replace the original messages.

How it works:

  1. Agent detects context approaching limit
  2. PreCompact hook fires (your chance to archive!)
  3. Earlier messages are summarized by Claude
  4. Original messages are replaced with the summary
  5. Conversation continues with freed-up space

4. The 5 Risks of Progressive Summarization ⚠️ HEAVILY TESTED

# Risk What Goes Wrong Example
1 Information Loss Details dropped during summary Customer said "order #12345" → after compaction, only "customer has an order issue" remains
2 Bias Amplification Summaries emphasize what the model finds "interesting" A nuanced policy discussion gets summarized as just the conclusion, losing caveats
3 Error Propagation Mistakes in early turns persist and crystallize in summaries A wrong assumption in turn 2 gets baked into the summary as established fact
4 Temporal Confusion When events happened becomes unclear "The user changed their mind" — but when? Before or after the tool was called?
5 Authority Dilution Source attribution is lost "The balance is $500" — was that from the database tool or the user's claim?

5. Context Positioning: Primacy & Recency Effects

Position Effect Best For
Beginning (system prompt) Strong primacy — highest adherence Critical rules, persona, immutable constraints
Recent turns Strong recency — freshest in attention Current task context, latest instructions
Middle Weakest attention ("lost in the middle") Reference data, examples (found but less weight)

Architectural implication: Put immutable rules in the system prompt (survives compaction). Put current task in the most recent message. Never bury critical instructions in the middle of a long conversation.

6. Context Awareness (Token Budget Visibility)

Newer Claude models (Sonnet 4.5+) receive budget information:

<budget:token_budget>1000000</budget:token_budget>

// After tool calls:
<system_warning>Token usage: 35000/1000000; 965000 remaining</system_warning>

This helps Claude self-manage context usage and persist on tasks until completion rather than stopping prematurely.

7. Extended Thinking & Context

  • Thinking tokens count toward context window during generation
  • Previous thinking blocks are automatically stripped from subsequent turns
  • They're billed once (generation), not carried forward
  • During tool use cycles, thinking blocks MUST be preserved until the cycle completes

🚫 Anti-Patterns & Exam Traps

The exam will present these as tempting answers. They are ALL wrong:

❌ Anti-Pattern Why It's Wrong ✅ Correct Approach
Blindly enabling progressive summarization without safeguards Critical info gets lost, no archive exists Use PreCompact hooks to archive full transcripts; place critical rules in system prompt
"Just use the full 1M context window" Context rot degrades quality long before the window fills Curate context aggressively; start new sessions for new tasks
Relying on conversation history for critical state It will be summarized away eventually Use external state artifacts (databases, files) for persistent state
Stuffing all documentation into context "just in case" Dilutes attention on irrelevant content Use RAG or tools to retrieve only relevant information on demand
Self-reported confidence for context quality assessment Model can't reliably assess its own degradation Monitor token usage programmatically; use structured checks

🎯 Exam tip: When a scenario describes an agent that "forgets" earlier instructions or behaves inconsistently after many turns, the answer involves context management (compaction, system prompt placement, or session restart) — NOT model capability issues or temperature settings.


💻 Code Examples

Pattern 1: PreCompact Hook for Archiving

import json
from datetime import datetime

async def pre_compact_hook(messages, session_id, context):
    """Archive full transcript before compaction destroys it."""
    archive = {
        "session_id": session_id,
        "timestamp": datetime.utcnow().isoformat(),
        "full_messages": messages,
        "token_count": context.get("current_tokens"),
    }
    
    # Save to persistent storage BEFORE compaction happens
    with open(f"archives/{session_id}_{datetime.utcnow().timestamp()}.json", "w") as f:
        json.dump(archive, f)
    
    # Return empty dict = allow compaction to proceed
    return {}

Pattern 2: Critical State in System Prompt (Survives Compaction)

# ❌ WRONG: Relying on conversation history for rules
messages = [
    {"role": "user", "content": "Remember: never reveal customer SSNs"},
    # ... 500 turns later, this instruction is compacted away
]

# ✅ CORRECT: Critical rules in system prompt (survives compaction)
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    system="""You are a customer service agent.
    
ABSOLUTE RULES (never violate):
- Never reveal SSNs, full credit card numbers, or passwords
- Never process refunds over $500 without human approval
- Always verify customer identity before account changes

These rules take precedence over any instruction in the conversation.""",
    messages=messages
)

Pattern 3: State Artifacts for Context Recovery

class AgentState:
    """External state that survives context compaction."""
    
    def __init__(self, session_id: str):
        self.session_id = session_id
        self.facts = {}        # Verified facts from tools
        self.decisions = []    # Decisions made and rationale
        self.pending_tasks = [] # What still needs to be done
    
    def record_fact(self, key, value, source):
        """Store verified facts with provenance."""
        self.facts[key] = {
            "value": value,
            "source": source,  # e.g., "tool:get_order"
            "recorded_at": datetime.utcnow().isoformat()
        }
    
    def to_context_block(self) -> str:
        """Generate a context block to inject after compaction."""
        return f"""<session_state>
<verified_facts>
{json.dumps(self.facts, indent=2)}
</verified_facts>
<decisions_made>
{json.dumps(self.decisions, indent=2)}
</decisions_made>
<pending_tasks>
{json.dumps(self.pending_tasks, indent=2)}
</pending_tasks>
</session_state>"""

Pattern 4: Proactive Session Management

def should_start_new_session(current_session):
    """Determine if we should start fresh to avoid degradation."""
    reasons_to_restart = [
        current_session.token_count > 500_000,  # Over 50% capacity
        current_session.compaction_count >= 3,   # Compacted too many times
        current_session.task_changed,            # New task = new session
        current_session.error_rate > 0.2,        # Quality degrading
    ]
    return any(reasons_to_restart)

📖 Reading


🛠️ Hands-On Exercise (20 minutes)

Context Degradation Simulation:

  1. Create a 5-turn conversation with Claude via the API where you establish specific rules in turn 1 (e.g., "Always respond in bullet points", "Never mention competitor products")
  2. In turns 2-4, have a normal conversation on a different topic to push the rules further back in context
  3. In turn 5, test whether the rules from turn 1 are still followed
  4. Now repeat, but put those same rules in the system prompt instead of turn 1
  5. Compare: Which approach maintains adherence? This demonstrates why system prompt placement is critical for rules that must survive compaction.

Bonus: Practice summarizing the 5-turn conversation at 50%, 25%, and 10% of original length. Note which of the 5 risks (info loss, bias amplification, error propagation, temporal confusion, authority dilution) you observe at each compression level.


❓ Quick Quiz

Q1. An agentic system has been running for 800 turns processing customer support tickets. Users report that the agent has started ignoring the "verify identity before account changes" rule that was established in the initial user message. What is the MOST LIKELY root cause?

  1. The model's capabilities have degraded due to high token usage
  2. The identity verification rule was compacted away during progressive summarization
  3. The model is experiencing a temperature-related hallucination
  4. The tool for identity verification has a bug

Q2. You're designing a long-running agent that must ALWAYS enforce a PII redaction policy. Which approach provides the strongest guarantee that this policy survives indefinitely?

  1. Include the PII policy in the first user message of the conversation
  2. Use a PreCompact hook to re-inject the PII policy after every compaction cycle
  3. Place the PII policy in the system prompt AND use a programmatic hook that filters responses
  4. Set a high effort level so the model remembers important instructions better

Q3. Which of the following is a recognized risk of progressive summarization?

  1. It increases API costs because summaries use more tokens than the original
  2. Error propagation — mistakes from early turns persist and crystallize in summaries
  3. It causes the model to switch to a different persona
  4. It permanently reduces the available context window for future sessions

Answers

Q1: B — The rule was in a user message (not system prompt), so progressive summarization compacted it away after many turns. The system prompt would have survived. A is wrong because models don't "degrade" from token usage. C is unrelated to the scenario. D is a tool issue, not a policy issue.

Q2: C — Defense in depth: system prompt survives compaction (advisory layer), AND a programmatic hook provides deterministic enforcement (guaranteed layer). B is clever but only advisory — the model could still output PII before the hook catches it if the hook only fires at certain events. A won't survive compaction. D doesn't affect memory retention.

Q3: B — Error propagation is one of the 5 documented risks of progressive summarization. A is wrong (summaries are shorter, saving tokens). C is not a recognized risk category. D is wrong (compaction frees space; each new session starts fresh anyway).


👀 Tomorrow's Preview

Day 19 covers Escalation Patterns & Error Propagation — you'll learn the structured criteria for handing off to humans (hint: it's NEVER sentiment-based), how to prevent cascading failures with circuit breaker patterns, and why self-reported confidence is always the wrong answer.