AI

CCA-F Study Day 15/20: tool_use for Structured Output

Domain 4: Prompt Engineering & Structured Output (~20% of exam)


📌 Today's Focus

Yesterday you nailed explicit criteria and few-shot prompting — the "input side" of getting Claude to produce what you want. Today we flip to the output side: how to guarantee Claude returns structured, schema-valid JSON every single time. This is one of the most tested patterns on the exam because it sits at the intersection of tool design (Domain 2) and prompt engineering (Domain 4).

The exam loves asking: "What is the most reliable way to get structured output from Claude?"There are now TWO correct answers depending on context, and you need to know when to use each.


🧠 Core Concepts

1. The Forced tool_use Pattern (The OG Technique)

The insight: You can define a "fake" tool whose sole purpose is to force Claude to return data matching a specific JSON schema. You're not actually calling an external API — you're hijacking the tool_use mechanism as a structured output guarantee.

How it works:

  1. Define a tool with an input_schema that describes your desired output structure
  2. Set tool_choice: {"type": "tool", "name": "your_tool"} to forceClaude to call that specific tool
  3. The model's "tool input" IS your structured output — guaranteed to match the schema
  4. Extract the data from response.content[0].input (the tool_use block)
import anthropic
import json

client = anthropic.Anthropic()

# Define a "tool" that's really just an output schema
tools = [{
    "name": "classify_ticket",
    "description": "Classify a customer support ticket into structured categories",
    "input_schema": {
        "type": "object",
        "properties": {
            "category": {
                "type": "string",
                "enum": ["billing", "technical", "account", "shipping"],
                "description": "Primary issue category"
            },
            "priority": {
                "type": "integer",
                "minimum": 1,
                "maximum": 5,
                "description": "Priority level (1=lowest, 5=critical)"
            },
            "summary": {
                "type": "string",
                "description": "One-line summary of the issue (max 100 chars)"
            },
            "requires_escalation": {
                "type": "boolean",
                "description": "Whether this needs human review"
            }
        },
        "required": ["category", "priority", "summary", "requires_escalation"]
    }
}]

ticket_text = "I've been charged twice for my subscription and nobody responds to my emails!"

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "tool", "name": "classify_ticket"},  # ← FORCE this tool
    messages=[{"role": "user", "content": f"Classify this support ticket:\n{ticket_text}"}]
)

# Extract structured data from the tool_use block
tool_use_block = next(b for b in response.content if b.type == "tool_use")
structured_data = tool_use_block.input

print(structured_data)
# {"category": "billing", "priority": 4, "summary": "Double-charged for subscription, no response from support", "requires_escalation": true}

2. The tool_choice Parameter (Critical for Exam)

The tool_choice parameter controls HOW Claude uses tools:

Value Behavior Use Case
{"type": "auto"} Claude decides whether to use a tool (default) Normal agent operation
{"type": "any"} Claude MUST use one of the available tools When you need some tool call but don't care which
{"type": "tool", "name": "X"} Claude MUST use tool "X" specifically Structured output extraction
{"type": "none"} Claude cannot use any tools When you want text only despite tools being defined

🚨 Exam trap: The exam may present "type": "required" or "type": "force" as options — these don't exist! The correct value for forcing a specific tool is {"type": "tool", "name": "tool_name"}.

3. Native Structured Outputs (Constrained Decoding) — The Modern Way

Anthropic introduced Structured Outputs as a native feature that uses constrained decoding to mathematically guarantee schema compliance. This eliminates the need for the forced tool_use workaround in many cases.

from pydantic import BaseModel
from anthropic import Anthropic

class TicketClassification(BaseModel):
    category: str
    priority: int
    summary: str
    requires_escalation: bool

client = Anthropic()

# Using messages.parse() with a Pydantic model
response = client.messages.parse(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": f"Classify: {ticket_text}"}],
    output_format=TicketClassification,
)

result = response.parsed_output  # ← Validated Pydantic model!
print(result.category)  # "billing"
print(result.priority)  # 4

Alternative: Raw JSON Schema approach

# Without Pydantic — using raw JSON schema in output_config
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": f"Classify: {ticket_text}"}],
    output_config={
        "format": {
            "type": "json_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "category": {"type": "string", "enum": ["billing", "technical", "account"]},
                    "priority": {"type": "integer"},
                    "summary": {"type": "string"}
                },
                "required": ["category", "priority", "summary"],
                "additionalProperties": False
            }
        }
    }
)
# response.content[0].text is guaranteed valid JSON matching the schema

4. Strict Tool Use — Combining Both Worlds

For agentic systems where Claude uses REAL tools (not fake ones for output), you can add "strict": true to guarantee the tool inputs match the schema exactly:

tools = [{
    "name": "search_flights",
    "strict": True,  # ← Enables constrained decoding for this tool's inputs
    "description": "Search for available flights",
    "input_schema": {
        "type": "object",
        "properties": {
            "origin": {"type": "string", "description": "Airport code (e.g., 'SFO')"},
            "destination": {"type": "string", "description": "Airport code"},
            "date": {"type": "string", "format": "date", "description": "YYYY-MM-DD"}
        },
        "required": ["origin", "destination", "date"],
        "additionalProperties": False  # Required for strict mode
    }
}]

Important Constraints for Strict Mode:

  • Max 20 strict tools per request
  • Max 24 optional parameters across all strict schemas
  • Max 16 parameters with union types
  • Required properties appear first in output, then optional
  • First request has compilation latency (schema is compiled into a grammar); cached for 24 hours after
  • "additionalProperties": false is REQUIRED for strict schemas

5. When to Use Which Approach

Approach Best For Guarantees
Forced tool_use Extraction from text, classification, when you need the "tool_use" stop_reason for loop logic Schema-valid JSON in tool input block
Structured Outputs (output_config) Direct JSON responses, simple extraction, when you DON'T need tool calling Schema-valid JSON in text content
Strict tool use Real agentic tools where input validation matters Tool inputs match schema exactly

⚠️ Anti-Patterns & Exam Traps

❌ Anti-Pattern ✅ Correct Approach Why It's Wrong
Asking Claude to "return JSON" in the prompt and hoping for the best Use tool_choice or output_config for guaranteed structure Prompt-only approach has no schema guarantee — Claude might hallucinate fields, use wrong types, or add commentary outside the JSON
Post-processing/regex to extract JSON from free-text responses Use constrained decoding (Structured Outputs) or forced tool_use Fragile, breaks on edge cases, adds unnecessary complexity
Using tool_choice: {"type": "required"} Use tool_choice: {"type": "tool", "name": "X"} for forced tool, or {"type": "any"} for any tool "required" is NOT a valid tool_choice type in Claude's API
Self-reported confidence scores as the model's own judgment Use structured validation criteria and programmatic checks Models can't reliably self-assess confidence; use external validation
Deeply nested schemas (5+ levels) Keep schemas flat or max 2-3 levels deep Deep nesting increases grammar complexity and compilation time; may hit limits
Skipping additionalProperties: false in strict mode Always include it for strict tool schemas Strict mode requires this field — it won't compile without it

🎯 The #1 exam question pattern: "A developer needs Claude to reliably return data in a specific JSON format. What approach should they use?" The answer is NEVER "ask politely in the prompt" — it's always tool_use forcing OR Structured Outputs.


💻 Code Example: Complete Extraction Pipeline

Here's a production-ready pattern combining forced tool_use with validation:

import anthropic
import json
from typing import Optional

client = anthropic.Anthropic()

def extract_invoice_data(document_text: str) -> dict:
    """Extract structured invoice data using forced tool_use."""
    
    tools = [{
        "name": "extract_invoice",
        "description": "Extract all relevant fields from an invoice document",
        "input_schema": {
            "type": "object",
            "properties": {
                "vendor_name": {"type": "string", "description": "Company or person who issued the invoice"},
                "invoice_number": {"type": "string", "description": "Unique invoice identifier"},
                "invoice_date": {"type": "string", "description": "Invoice date in ISO 8601 format (YYYY-MM-DD)"},
                "due_date": {"type": "string", "description": "Payment due date in ISO 8601 format"},
                "total_amount": {"type": "number", "description": "Total amount due"},
                "currency": {"type": "string", "enum": ["USD", "EUR", "GBP", "CAD", "AUD"]},
                "line_items": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "description": {"type": "string"},
                            "quantity": {"type": "number"},
                            "unit_price": {"type": "number"},
                            "total": {"type": "number"}
                        },
                        "required": ["description", "quantity", "unit_price", "total"]
                    },
                    "description": "Individual line items on the invoice"
                },
                "payment_terms": {
                    "type": "string",
                    "enum": ["net_15", "net_30", "net_60", "due_on_receipt", "other"]
                }
            },
            "required": ["vendor_name", "invoice_date", "total_amount", "currency", "line_items"]
        }
    }]

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        tools=tools,
        tool_choice={"type": "tool", "name": "extract_invoice"},
        messages=[{
            "role": "user",
            "content": f"Extract all invoice data from this document:\n\n{document_text}"
        }]
    )

    tool_use = next(b for b in response.content if b.type == "tool_use")
    return tool_use.input


def validate_invoice(data: dict) -> list:
    """Business rule validation (programmatic, not model-based)."""
    errors = []
    
    # Check line items sum to total
    line_total = sum(item["total"] for item in data.get("line_items", []))
    if abs(line_total - data.get("total_amount", 0)) > 0.01:
        errors.append(f"Line items sum ({line_total}) != total ({data['total_amount']})")
    
    # Validate date format
    import re
    date_pattern = r'^\d{4}-\d{2}-\d{2}$'
    if not re.match(date_pattern, data.get("invoice_date", "")):
        errors.append(f"Invalid date format: {data.get('invoice_date')}")
    
    return errors


# Full extraction with validation-retry
def extract_with_retry(document_text: str, max_retries: int = 3) -> dict:
    tools = [{
        "name": "extract_invoice",
        "description": "Extract all relevant fields from an invoice document",
        "input_schema": { ... }  # same schema as above
    }]
    
    messages = [{"role": "user", "content": f"Extract invoice data:\n\n{document_text}"}]
    
    for attempt in range(max_retries):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            tools=tools,
            tool_choice={"type": "tool", "name": "extract_invoice"},
            messages=messages
        )
        
        tool_use = next(b for b in response.content if b.type == "tool_use")
        result = tool_use.input
        
        errors = validate_invoice(result)
        if not errors:
            return result  # ✅ Valid!
        
        # Feed errors back for retry
        messages.append({"role": "assistant", "content": response.content})
        messages.append({
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_use.id,
                "content": f"Validation failed: {'; '.join(errors)}. Please re-extract with corrections.",
                "is_error": True
            }]
        })
    
    raise ValueError(f"Failed after {max_retries} attempts. Last errors: {errors}")

📖 Reading

🎬 Video Course


🛠️ Hands-On Exercise (20-30 min)

Build a Resume Parser with Forced tool_use:

  1. Define a tool schema for extracting resume data: name, email, phone, education (array of {institution, degree, year}), work_experience (array of {company, title, start_date, end_date, responsibilities[]}), and skills[]
  2. Use tool_choice: {"type": "tool", "name": "parse_resume"} to force extraction
  3. Write a validate_resume() function that checks: (a) email format is valid, (b) dates are in chronological order, (c) at least one education entry exists
  4. Implement a validation-retry loop (max 3 attempts)
  5. Bonus: Try the same extraction using output_config with Structured Outputs. Compare the two approaches — when would you prefer one over the other?

📝 Quick Quiz

Q1: A developer needs Claude to return customer data in a specific JSON format with guaranteed schema compliance. They don't need Claude to call any actual external APIs. What is the BEST approach?

A) Add "Please return valid JSON" to the system prompt B) Use tool_choice: {"type": "any"} with a single extraction tool defined C) Use tool_choice: {"type": "tool", "name": "extract_data"} with a schema-defining tool D) Use output_config with "type": "json_schema" and the desired schema

Q2: When using strict tool use ("strict": true), which of the following is REQUIRED in the tool's input_schema?

A) "maxProperties": 20 B) "additionalProperties": false C) "$schema": "https://json-schema.org/draft-07" D) "strictMode": true

Q3: A data extraction pipeline uses forced tool_use but occasionally returns invalid dates. What should the architect add?

A) More few-shot examples of correct date formats in the prompt B) A self-reported confidence score from Claude on each field C) A programmatic validation-retry loop that feeds errors back to Claude D) A regex post-processor that reformats dates after extraction


Answers:

Q1: D — Since no actual tool calling is needed, native Structured Outputs (output_config) is the most direct and cleanest approach. C is also valid but uses a "fake tool" workaround. In a real exam scenario, if D is available and the scenario says "no external APIs needed," prefer D. If the scenario involves an extraction pipeline that ALSO uses other tools, C integrates better.

Q2: B — "additionalProperties": false is mandatory for strict schemas. Without it, the schema won't compile for constrained decoding.

Q3: C — Programmatic validation with retry is the correct pattern. A (more examples) might help but doesn't guarantee correctness. B (self-reported confidence) is explicitly an anti-pattern. D (regex post-processing) is fragile and misses the opportunity to have Claude self-correct.


👀 Tomorrow's Preview

Day 16 dives deeper into JSON Schema Design & Validation-Retry Loops — we'll cover schema complexity limits, field-level confidence strategies (the correct way, not self-reported!), and how to design schemas that work within strict mode constraints.