CCA-F Study Day 16/20: JSON Schema Design & Validation-Retry Loops
Domain 4: Prompt Engineering & Structured Output (~20% of exam)
๐ Today's Focus
Yesterday you mastered tool_use for structured output โ forcing Claude to produce schema-compliant JSON by defining "fake" tools with tool_choice. Today we go deeper into the schema design itself and the validation-retry loop โ the pattern that makes extraction reliable in production. This is where exam questions get tricky because they test whether you know the difference between structural validation (schema handles it) vs. semantic validation (you must code it).
The exam heavily tests this area through the Structured Data Extraction scenario โ one of the 6 possible scenarios. If it appears on your exam, expect 5-8 questions about schema design, retry logic, and confidence scoring.
๐ Core Concepts
1. JSON Schema Design for tool_use
The schema you pass as input_schema in your tool definition is the contract for Claude's output. Get it right and you eliminate an entire class of parsing failures.
Key Schema Features to Use:
| Feature | Purpose | Example |
|---|---|---|
| enum | Constrain to finite set of values | "status": {"type": "string", "enum": ["approved", "rejected", "pending"]} |
| required | Mark mandatory fields | "required": ["name", "email", "amount"] |
| description | Guide Claude on what to put in each field | "description": "ISO 8601 date (YYYY-MM-DD)" |
| additionalProperties: false | Prevent extra fields (strict mode) | Required for Structured Outputs/strict tool use |
| format | Hint at expected format | "format": "email", "format": "date-time" |
| minimum/maximum | Numeric bounds | "priority": {"type": "integer", "minimum": 1, "maximum": 5} |
| items | Define array element schema | "items": {"type": "string", "enum": ["urgent", "review"]} |
Schema Complexity Guidelines:
- Keep schemas focused โ One extraction task per tool, not a mega-schema
- Avoid deeply nested structures โ More than 3 levels deep increases grammar complexity and error rates
- Max 20 strict tools per request (API constraint)
- Max 24 optional parameters across all strict schemas
- Required properties appear first in output, then optional ones
Complete Schema Example:
{
"name": "extract_invoice",
"description": "Extract structured data from an invoice document. Use for any invoice, receipt, or billing document.",
"input_schema": {
"type": "object",
"properties": {
"vendor_name": {
"type": "string",
"description": "Company or person who issued the invoice"
},
"invoice_number": {
"type": "string",
"description": "Unique invoice identifier (e.g., INV-2024-001)"
},
"date_issued": {
"type": "string",
"description": "Date invoice was issued in ISO 8601 format (YYYY-MM-DD)"
},
"total_amount": {
"type": "number",
"description": "Total amount due in the document's currency"
},
"currency": {
"type": "string",
"enum": ["USD", "EUR", "GBP", "CAD", "AUD"],
"description": "Three-letter currency code"
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "integer", "minimum": 1},
"unit_price": {"type": "number"},
"total": {"type": "number"}
},
"required": ["description", "quantity", "unit_price", "total"]
},
"description": "Individual line items on the invoice"
},
"payment_terms": {
"type": "string",
"enum": ["net_15", "net_30", "net_60", "due_on_receipt", "other"],
"description": "Payment terms specified on the invoice"
}
},
"required": ["vendor_name", "date_issued", "total_amount", "currency", "line_items"],
"additionalProperties": false
}
}
2. Strict Tool Use (Constrained Decoding)
Setting "strict": true on a tool definition activates grammar-constrained sampling โ Claude's token generation is mathematically constrained to only produce schema-valid JSON. This is different from "hoping Claude follows the schema."
tools = [{
"name": "extract_invoice",
"strict": True, # โ This is the key flag
"description": "Extract invoice data",
"input_schema": {
"type": "object",
"properties": {
"vendor": {"type": "string"},
"amount": {"type": "number"},
"status": {"type": "string", "enum": ["paid", "unpaid", "overdue"]}
},
"required": ["vendor", "amount", "status"],
"additionalProperties": False # Required for strict mode
}
}]
What strict mode guarantees:
- Output always matches the schema (100% structural validity)
- No extra fields, no missing required fields, correct types
- Enum values exactly match your defined set
What strict mode does NOT guarantee:
- Semantic correctness (wrong vendor name, hallucinated amounts)
- Business rule compliance (amount > 0, date in the past, etc.)
- Data accuracy (the model might extract the wrong number from the document)
โ ๏ธ Exam trap: A question might ask "Which approach guarantees the extracted data is accurate?" โ Strict mode only guarantees format, not correctness. You still need validation-retry loops for semantic validation.
3. The Validation-Retry Loop Pattern
This is THE critical production pattern for reliable extraction. The loop handles what schemas cannot: business rules, cross-field consistency, and data accuracy.
import anthropic
import json
from typing import Optional
client = anthropic.Anthropic()
def extract_with_validation(
document: str,
tools: list,
tool_name: str,
validation_fn,
max_retries: int = 3
) -> Optional[dict]:
"""Extract structured data with validation-retry loop."""
messages = [{"role": "user", "content": f"Extract data from this document:\n\n{document}"}]
for attempt in range(max_retries):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
tools=tools,
tool_choice={"type": "tool", "name": tool_name},
messages=messages
)
# Get the structured output
tool_use = next(b for b in response.content if b.type == "tool_use")
result = tool_use.input
# Validate against business rules
errors = validation_fn(result)
if not errors:
return result # โ
Valid extraction
# Feed specific errors back for retry
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": json.dumps({
"validation_errors": errors,
"instruction": "Please re-extract, fixing these specific issues."
}),
"is_error": True
}]
})
return None # Failed after max retries โ escalate to human
# Example validation function
def validate_invoice(data: dict) -> list:
"""Business rule validation for invoices."""
errors = []
# Cross-field consistency
if data.get("line_items"):
calculated_total = sum(item["total"] for item in data["line_items"])
if abs(calculated_total - data["total_amount"]) > 0.01:
errors.append(f"Line items sum to {calculated_total}, but total_amount is {data['total_amount']}")
# Date validation
from datetime import datetime
try:
date = datetime.fromisoformat(data["date_issued"])
if date.year < 2000 or date > datetime.now():
errors.append(f"date_issued '{data['date_issued']}' is unrealistic")
except ValueError:
errors.append(f"date_issued '{data['date_issued']}' is not valid ISO 8601")
# Amount sanity
if data["total_amount"] <= 0:
errors.append("total_amount must be positive")
return errors
4. Field-Level Confidence Scores
For uncertain extractions (e.g., handwritten documents, ambiguous text), include confidence as a schema field, not a self-reported model judgment. The model populates it based on extraction clarity:
"properties": {
"vendor_name": {"type": "string"},
"vendor_name_confidence": {
"type": "string",
"enum": ["high", "medium", "low"],
"description": "high = clearly visible in document, medium = partially obscured or ambiguous, low = inferred or guessed"
},
"total_amount": {"type": "number"},
"total_amount_confidence": {
"type": "string",
"enum": ["high", "medium", "low"],
"description": "high = clearly printed number, medium = handwritten or partially visible, low = calculated/inferred"
}
}
โ ๏ธ Critical anti-pattern: Don't use confidence scores for DECISIONS. Use them for ROUTING to human review. The exam tests this distinction heavily.
๐จ Anti-Patterns & Exam Traps
| โ Anti-Pattern (Wrong Answer) | โ Correct Approach | Why It's Wrong |
|---|---|---|
| Self-reported confidence scores for escalation decisions | Use structured validation criteria and programmatic checks | Models are poorly calibrated on their own confidence โ they can be "confidently wrong" |
| No retry logic โ accept first extraction | Implement validation-retry with max_retries and specific error feedback | First extraction may miss cross-field consistency issues |
| Generic retry: "Try again" | Feed back specific validation errors: "Line items don't sum to total" | Without specific errors, the model doesn't know what to fix |
| Aggregate accuracy as sole metric | Track accuracy per document type to catch masked failures | 90% average accuracy could hide 50% failure rate on invoices |
| Deeply nested schemas (5+ levels) | Keep to 3 levels max; split complex extractions into multiple tools | Deep nesting increases grammar complexity and errors |
| Using strict mode = data is correct | Strict mode = format is correct. Still need semantic validation | Schema validates structure, not meaning |
| Unlimited retries | Cap at 3 retries, then escalate to human review | Infinite loops waste tokens and likely won't improve |
๐ป Code Examples
Native Structured Outputs with Pydantic (The Modern Approach)
from pydantic import BaseModel, Field
from typing import List, Optional
from anthropic import Anthropic
class LineItem(BaseModel):
description: str = Field(description="Item or service description")
quantity: int = Field(ge=1, description="Quantity ordered")
unit_price: float = Field(gt=0, description="Price per unit")
total: float = Field(gt=0, description="Line total (quantity ร unit_price)")
class Invoice(BaseModel):
vendor_name: str = Field(description="Company that issued the invoice")
invoice_number: Optional[str] = Field(description="Invoice ID if visible")
date_issued: str = Field(description="ISO 8601 date (YYYY-MM-DD)")
total_amount: float = Field(gt=0, description="Total amount due")
currency: str = Field(description="Three-letter currency code")
line_items: List[LineItem] = Field(description="Individual billed items")
payment_terms: Optional[str] = Field(description="Payment terms if specified")
client = Anthropic()
response = client.messages.parse(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{"role": "user", "content": f"Extract invoice data:\n\n{document_text}"}],
output_format=Invoice,
)
invoice = response.parsed_output # Validated Pydantic model!
print(f"Vendor: {invoice.vendor_name}, Total: {invoice.total_amount} {invoice.currency}")
Two Modes of Structured Outputs Compared:
| Approach | How It Works | When to Use |
|---|---|---|
| tool_choice + strict: true | Force a specific tool call with constrained decoding | When you also need real tool execution in the same request |
| messages.parse() with Pydantic | Native structured output, no tool wrapper needed | Pure extraction tasks, cleaner code |
| output_config.format = "json_schema" | Constrain the response body (not a tool call) | When you want JSON response without tool_use machinery |
๐ฌ Video to Watch
How We Build Effective Agents โ Barry Zhang, Anthropic (AI Engineer Summit)
Barry Zhang from Anthropic's Applied AI team covers the patterns they use in production, including structured outputs and validation loops in agentic systems. Watch the section on "structured tool outputs" and "reliability patterns" (roughly the middle third of the talk) โ it maps directly to today's validation-retry loop and the principle of keeping schemas focused rather than building mega-schemas.
๐ Reading
- Primary: Strict Tool Use โ Anthropic Docs โ Complete reference on
strict: true, constraints, and grammar-constrained sampling - Secondary: Structured Outputs on Claude Platform (Blog) โ Announcement post with architectural explanation of constrained decoding
- Reference: Advanced Tool Use โ Anthropic Engineering โ Schema design patterns and when schemas aren't enough
๐ ๏ธ Hands-On Exercise (20 min)
Build a Validation-Retry Loop for Date Extraction
- Define a tool schema for extracting event dates from natural language text (fields: event_name, start_date, end_date, timezone, is_recurring)
- Write a
validate_dates()function that checks:- Dates are valid ISO 8601
- end_date is after start_date
- Dates are in a reasonable range (not year 1900 or 2099)
- If is_recurring is true, a recurrence_pattern field must exist
- Implement the retry loop (max 3 attempts) that feeds specific errors back
- Test with tricky inputs: "next Tuesday at 3pm", "the meeting runs from Jan 31 to Feb 2", "every other Wednesday starting March 1"
Bonus: Add field-level confidence scoring and route any "low" confidence fields to a mock human review queue.
๐ Quick Quiz
Q1. A developer sets "strict": true on a tool definition for invoice extraction. They're still getting incorrect amounts extracted from documents. What should they add?
A) More detailed enum values in the schema
B) A validation-retry loop that checks business rules programmatically
C) A higher max_tokens value to give Claude more space
D) A self-reported confidence field that blocks low-confidence results
Q2. When implementing a validation-retry loop for structured extraction, what is the MOST important element to include in the retry message?
A) The original document again so Claude can re-read it B) A general instruction like "Please try harder this time" C) The specific validation errors with details about what failed D) A different system prompt with more examples
Q3. An architect is designing a schema for extracting medical records. The schema has 6 levels of nesting to capture patient โ visits โ diagnoses โ medications โ dosages โ schedules. What is the BEST recommendation?
A) Use strict: true to guarantee the nested structure is correct
B) Add more descriptions to each nested level
C) Flatten the schema and split into multiple focused extraction tools
D) Increase max_tokens to accommodate the complex output
Answers:
Q1: B โ Strict mode guarantees format (valid JSON matching schema), not correctness (right amounts). You need programmatic validation for business rules. D is wrong because self-reported confidence for decisions is an anti-pattern.
Q2: C โ Specific errors give Claude actionable feedback. "Line items sum to $450 but total_amount says $500" is far more helpful than "try again." The model already has the document in context from the first attempt.
Q3: C โ Deep nesting (5+ levels) dramatically increases grammar complexity and error rates. The correct approach is to split complex extractions into multiple focused tools (e.g., one for patient info, one for medications, one for visit details). This also maps to the "4-5 tools per agent" best practice.
๐ Tomorrow's Preview
Day 17: Multi-Pass Review & Structured Data Extraction Scenario โ We'll put everything together with the multi-pass pattern (extract โ validate โ confidence-score) using separate sessions, and walk through the complete Structured Data Extraction exam scenario end-to-end.