AI

CCA-F Study Day 16/20: JSON Schema Design & Validation-Retry Loops

Domain 4: Prompt Engineering & Structured Output (~20% of exam)


๐Ÿ“Œ Today's Focus

Yesterday you mastered tool_use for structured output โ€” forcing Claude to produce schema-compliant JSON by defining "fake" tools with tool_choice. Today we go deeper into the schema design itself and the validation-retry loop โ€” the pattern that makes extraction reliable in production. This is where exam questions get tricky because they test whether you know the difference between structural validation (schema handles it) vs. semantic validation (you must code it).

The exam heavily tests this area through the Structured Data Extraction scenario โ€” one of the 6 possible scenarios. If it appears on your exam, expect 5-8 questions about schema design, retry logic, and confidence scoring.


๐Ÿ“š Core Concepts

1. JSON Schema Design for tool_use

The schema you pass as input_schema in your tool definition is the contract for Claude's output. Get it right and you eliminate an entire class of parsing failures.

Key Schema Features to Use:

Feature Purpose Example
enum Constrain to finite set of values "status": {"type": "string", "enum": ["approved", "rejected", "pending"]}
required Mark mandatory fields "required": ["name", "email", "amount"]
description Guide Claude on what to put in each field "description": "ISO 8601 date (YYYY-MM-DD)"
additionalProperties: false Prevent extra fields (strict mode) Required for Structured Outputs/strict tool use
format Hint at expected format "format": "email", "format": "date-time"
minimum/maximum Numeric bounds "priority": {"type": "integer", "minimum": 1, "maximum": 5}
items Define array element schema "items": {"type": "string", "enum": ["urgent", "review"]}

Schema Complexity Guidelines:

  • Keep schemas focused โ€” One extraction task per tool, not a mega-schema
  • Avoid deeply nested structures โ€” More than 3 levels deep increases grammar complexity and error rates
  • Max 20 strict tools per request (API constraint)
  • Max 24 optional parameters across all strict schemas
  • Required properties appear first in output, then optional ones

Complete Schema Example:

{
  "name": "extract_invoice",
  "description": "Extract structured data from an invoice document. Use for any invoice, receipt, or billing document.",
  "input_schema": {
    "type": "object",
    "properties": {
      "vendor_name": {
        "type": "string",
        "description": "Company or person who issued the invoice"
      },
      "invoice_number": {
        "type": "string",
        "description": "Unique invoice identifier (e.g., INV-2024-001)"
      },
      "date_issued": {
        "type": "string",
        "description": "Date invoice was issued in ISO 8601 format (YYYY-MM-DD)"
      },
      "total_amount": {
        "type": "number",
        "description": "Total amount due in the document's currency"
      },
      "currency": {
        "type": "string",
        "enum": ["USD", "EUR", "GBP", "CAD", "AUD"],
        "description": "Three-letter currency code"
      },
      "line_items": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "description": {"type": "string"},
            "quantity": {"type": "integer", "minimum": 1},
            "unit_price": {"type": "number"},
            "total": {"type": "number"}
          },
          "required": ["description", "quantity", "unit_price", "total"]
        },
        "description": "Individual line items on the invoice"
      },
      "payment_terms": {
        "type": "string",
        "enum": ["net_15", "net_30", "net_60", "due_on_receipt", "other"],
        "description": "Payment terms specified on the invoice"
      }
    },
    "required": ["vendor_name", "date_issued", "total_amount", "currency", "line_items"],
    "additionalProperties": false
  }
}

2. Strict Tool Use (Constrained Decoding)

Setting "strict": true on a tool definition activates grammar-constrained sampling โ€” Claude's token generation is mathematically constrained to only produce schema-valid JSON. This is different from "hoping Claude follows the schema."

tools = [{
    "name": "extract_invoice",
    "strict": True,  # โ† This is the key flag
    "description": "Extract invoice data",
    "input_schema": {
        "type": "object",
        "properties": {
            "vendor": {"type": "string"},
            "amount": {"type": "number"},
            "status": {"type": "string", "enum": ["paid", "unpaid", "overdue"]}
        },
        "required": ["vendor", "amount", "status"],
        "additionalProperties": False  # Required for strict mode
    }
}]

What strict mode guarantees:

  • Output always matches the schema (100% structural validity)
  • No extra fields, no missing required fields, correct types
  • Enum values exactly match your defined set

What strict mode does NOT guarantee:

  • Semantic correctness (wrong vendor name, hallucinated amounts)
  • Business rule compliance (amount > 0, date in the past, etc.)
  • Data accuracy (the model might extract the wrong number from the document)

โš ๏ธ Exam trap: A question might ask "Which approach guarantees the extracted data is accurate?" โ€” Strict mode only guarantees format, not correctness. You still need validation-retry loops for semantic validation.

3. The Validation-Retry Loop Pattern

This is THE critical production pattern for reliable extraction. The loop handles what schemas cannot: business rules, cross-field consistency, and data accuracy.

import anthropic
import json
from typing import Optional

client = anthropic.Anthropic()

def extract_with_validation(
    document: str,
    tools: list,
    tool_name: str,
    validation_fn,
    max_retries: int = 3
) -> Optional[dict]:
    """Extract structured data with validation-retry loop."""
    
    messages = [{"role": "user", "content": f"Extract data from this document:\n\n{document}"}]
    
    for attempt in range(max_retries):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            tools=tools,
            tool_choice={"type": "tool", "name": tool_name},
            messages=messages
        )
        
        # Get the structured output
        tool_use = next(b for b in response.content if b.type == "tool_use")
        result = tool_use.input
        
        # Validate against business rules
        errors = validation_fn(result)
        
        if not errors:
            return result  # โœ… Valid extraction
        
        # Feed specific errors back for retry
        messages.append({"role": "assistant", "content": response.content})
        messages.append({
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_use.id,
                "content": json.dumps({
                    "validation_errors": errors,
                    "instruction": "Please re-extract, fixing these specific issues."
                }),
                "is_error": True
            }]
        })
    
    return None  # Failed after max retries โ€” escalate to human


# Example validation function
def validate_invoice(data: dict) -> list:
    """Business rule validation for invoices."""
    errors = []
    
    # Cross-field consistency
    if data.get("line_items"):
        calculated_total = sum(item["total"] for item in data["line_items"])
        if abs(calculated_total - data["total_amount"]) > 0.01:
            errors.append(f"Line items sum to {calculated_total}, but total_amount is {data['total_amount']}")
    
    # Date validation
    from datetime import datetime
    try:
        date = datetime.fromisoformat(data["date_issued"])
        if date.year < 2000 or date > datetime.now():
            errors.append(f"date_issued '{data['date_issued']}' is unrealistic")
    except ValueError:
        errors.append(f"date_issued '{data['date_issued']}' is not valid ISO 8601")
    
    # Amount sanity
    if data["total_amount"] <= 0:
        errors.append("total_amount must be positive")
    
    return errors

4. Field-Level Confidence Scores

For uncertain extractions (e.g., handwritten documents, ambiguous text), include confidence as a schema field, not a self-reported model judgment. The model populates it based on extraction clarity:

"properties": {
    "vendor_name": {"type": "string"},
    "vendor_name_confidence": {
        "type": "string",
        "enum": ["high", "medium", "low"],
        "description": "high = clearly visible in document, medium = partially obscured or ambiguous, low = inferred or guessed"
    },
    "total_amount": {"type": "number"},
    "total_amount_confidence": {
        "type": "string",
        "enum": ["high", "medium", "low"],
        "description": "high = clearly printed number, medium = handwritten or partially visible, low = calculated/inferred"
    }
}

โš ๏ธ Critical anti-pattern: Don't use confidence scores for DECISIONS. Use them for ROUTING to human review. The exam tests this distinction heavily.


๐Ÿšจ Anti-Patterns & Exam Traps

โŒ Anti-Pattern (Wrong Answer) โœ… Correct Approach Why It's Wrong
Self-reported confidence scores for escalation decisions Use structured validation criteria and programmatic checks Models are poorly calibrated on their own confidence โ€” they can be "confidently wrong"
No retry logic โ€” accept first extraction Implement validation-retry with max_retries and specific error feedback First extraction may miss cross-field consistency issues
Generic retry: "Try again" Feed back specific validation errors: "Line items don't sum to total" Without specific errors, the model doesn't know what to fix
Aggregate accuracy as sole metric Track accuracy per document type to catch masked failures 90% average accuracy could hide 50% failure rate on invoices
Deeply nested schemas (5+ levels) Keep to 3 levels max; split complex extractions into multiple tools Deep nesting increases grammar complexity and errors
Using strict mode = data is correct Strict mode = format is correct. Still need semantic validation Schema validates structure, not meaning
Unlimited retries Cap at 3 retries, then escalate to human review Infinite loops waste tokens and likely won't improve

๐Ÿ’ป Code Examples

Native Structured Outputs with Pydantic (The Modern Approach)

from pydantic import BaseModel, Field
from typing import List, Optional
from anthropic import Anthropic

class LineItem(BaseModel):
    description: str = Field(description="Item or service description")
    quantity: int = Field(ge=1, description="Quantity ordered")
    unit_price: float = Field(gt=0, description="Price per unit")
    total: float = Field(gt=0, description="Line total (quantity ร— unit_price)")

class Invoice(BaseModel):
    vendor_name: str = Field(description="Company that issued the invoice")
    invoice_number: Optional[str] = Field(description="Invoice ID if visible")
    date_issued: str = Field(description="ISO 8601 date (YYYY-MM-DD)")
    total_amount: float = Field(gt=0, description="Total amount due")
    currency: str = Field(description="Three-letter currency code")
    line_items: List[LineItem] = Field(description="Individual billed items")
    payment_terms: Optional[str] = Field(description="Payment terms if specified")

client = Anthropic()

response = client.messages.parse(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{"role": "user", "content": f"Extract invoice data:\n\n{document_text}"}],
    output_format=Invoice,
)

invoice = response.parsed_output  # Validated Pydantic model!
print(f"Vendor: {invoice.vendor_name}, Total: {invoice.total_amount} {invoice.currency}")

Two Modes of Structured Outputs Compared:

Approach How It Works When to Use
tool_choice + strict: true Force a specific tool call with constrained decoding When you also need real tool execution in the same request
messages.parse() with Pydantic Native structured output, no tool wrapper needed Pure extraction tasks, cleaner code
output_config.format = "json_schema" Constrain the response body (not a tool call) When you want JSON response without tool_use machinery

๐ŸŽฌ Video to Watch

How We Build Effective Agents โ€” Barry Zhang, Anthropic (AI Engineer Summit)

Barry Zhang from Anthropic's Applied AI team covers the patterns they use in production, including structured outputs and validation loops in agentic systems. Watch the section on "structured tool outputs" and "reliability patterns" (roughly the middle third of the talk) โ€” it maps directly to today's validation-retry loop and the principle of keeping schemas focused rather than building mega-schemas.


๐Ÿ“– Reading


๐Ÿ› ๏ธ Hands-On Exercise (20 min)

Build a Validation-Retry Loop for Date Extraction

  1. Define a tool schema for extracting event dates from natural language text (fields: event_name, start_date, end_date, timezone, is_recurring)
  2. Write a validate_dates() function that checks: 
    • Dates are valid ISO 8601
    • end_date is after start_date
    • Dates are in a reasonable range (not year 1900 or 2099)
    • If is_recurring is true, a recurrence_pattern field must exist
  3. Implement the retry loop (max 3 attempts) that feeds specific errors back
  4. Test with tricky inputs: "next Tuesday at 3pm", "the meeting runs from Jan 31 to Feb 2", "every other Wednesday starting March 1"

Bonus: Add field-level confidence scoring and route any "low" confidence fields to a mock human review queue.


๐Ÿ“ Quick Quiz

Q1. A developer sets "strict": true on a tool definition for invoice extraction. They're still getting incorrect amounts extracted from documents. What should they add?

A) More detailed enum values in the schema B) A validation-retry loop that checks business rules programmatically C) A higher max_tokens value to give Claude more space D) A self-reported confidence field that blocks low-confidence results

Q2. When implementing a validation-retry loop for structured extraction, what is the MOST important element to include in the retry message?

A) The original document again so Claude can re-read it B) A general instruction like "Please try harder this time" C) The specific validation errors with details about what failed D) A different system prompt with more examples

Q3. An architect is designing a schema for extracting medical records. The schema has 6 levels of nesting to capture patient โ†’ visits โ†’ diagnoses โ†’ medications โ†’ dosages โ†’ schedules. What is the BEST recommendation?

A) Use strict: true to guarantee the nested structure is correct B) Add more descriptions to each nested level C) Flatten the schema and split into multiple focused extraction tools D) Increase max_tokens to accommodate the complex output


Answers:

Q1: B โ€” Strict mode guarantees format (valid JSON matching schema), not correctness (right amounts). You need programmatic validation for business rules. D is wrong because self-reported confidence for decisions is an anti-pattern.

Q2: C โ€” Specific errors give Claude actionable feedback. "Line items sum to $450 but total_amount says $500" is far more helpful than "try again." The model already has the document in context from the first attempt.

Q3: C โ€” Deep nesting (5+ levels) dramatically increases grammar complexity and error rates. The correct approach is to split complex extractions into multiple focused tools (e.g., one for patient info, one for medications, one for visit details). This also maps to the "4-5 tools per agent" best practice.


๐Ÿ‘€ Tomorrow's Preview

Day 17: Multi-Pass Review & Structured Data Extraction Scenario โ€” We'll put everything together with the multi-pass pattern (extract โ†’ validate โ†’ confidence-score) using separate sessions, and walk through the complete Structured Data Extraction exam scenario end-to-end.