Published on 2024-10-28

How To Ensure LLM Output Adheres to a JSON Schema

Understand how to make sure LLM outputs are valid JSON, and valid against a specific JSON schema. Learn how to implement this in practice.

Large Language Models (LLMs) excel at generating text, but reliably extracting structured data from them presents a significant challenge. This is often referred to as constrained generation — guiding the model to produce output adhering to specific formats. Early attempts involved requesting JSON directly within the prompt and subsequently parsing the LLM's response. This method, however, proves unreliable, as the model's interpretation can lead to malformed or incomplete JSON, leaving applications vulnerable to parsing errors.

The field has since evolved, introducing more robust strategies for guaranteed structured data. A fundamental distinction exists between valid JSON and strict JSON Schema adherence. Valid JSON simply confirms correct syntax, while schema adherence ensures the data conforms to a predefined structure, validating data types, required fields, and other constraints.

Different LLM providers offer distinct approaches. Some offer a basic "JSON mode" which encourages valid JSON syntax but doesn't enforce a particular schema. More advanced techniques include dedicated parameters for specifying JSON schemas, allowing developers to define the expected structure precisely.

As a useful fallback solution, tool/function calling provides another avenue for structured data exchange when available with the model used as tools and functions allow for schemas to define their input parameters. You then have to encourage the model to call your specific function and parse the arguments provided by the model.

Approach	Strengths	Weaknesses	Ideal Use Cases
Prompting + Parsing	Simple, widely applicable	Unreliable, prone to errors, requires extensive parsing	Quick prototyping, simple data extraction
JSON Mode	Encourages valid JSON	Doesn't guarantee schema adherence	When strict schema isn't critical, as a first step
Structured Outputs	Strong schema enforcement, type safety	Requires specific model/API support	Applications requiring predictable and reliable structured data
Function/Tool Calling	Extends LLM capabilities, schema-based argument passing	Adds complexity, requires function definition and prompt engineering	Integrating LLMs with external tools, complex data transformations

Valid JSON vs Strict JSON Schema Adherence

Valid JSON simply confirms that the data conforms to the basic JSON syntax rules (e.g., correct use of brackets, quotes, and data types).

Strict JSON Schema adherence, however, goes further by ensuring the data matches a predefined schema, validating not just syntax but also the presence of required fields, data types of those fields, and any other schema constraints. This guarantees predictable data structure and content, crucial for reliable application integration.

Vendor-Specific Strategies

OpenAI

The OpenAI API's Structured Outputs feature empowers developers to obtain predictable, schema-adherent JSON responses from large language models (LLMs), eliminating the need for extensive response validation and complex prompting. This is a significant improvement over the older JSON Mode, which only guaranteed valid JSON but not schema adherence. Structured Outputs is available in recent models like gpt-4o-mini-2024-07-18, gpt-4o-2024-08-06, and later.

How it Works:

Structured Outputs operates through two primary methods within the OpenAI API:

response_format Parameter: For structuring the model's direct responses to user prompts, the response_format parameter is used within the Chat Completions API. This allows you to define a JSON Schema that the model's output will conform to. OpenAI's Python and Node.js SDKs provide convenient helpers for defining these schemas using Pydantic and Zod, respectively, facilitating type safety within your application code. For instance, you can define a Pydantic model in Python and pass it directly to the response_format argument. The API will then parse the model's raw output into this defined structure.
Function Calling: This method is employed when integrating the model with external tools or functionalities within your application. Function calling allows the model to interact with these tools, and Structured Outputs ensures that the data exchanged between the model and your application adheres to predefined schemas. This is especially useful for building AI assistants that can access databases, manipulate UI elements, or perform other actions based on user requests.

import OpenAI from 'openai';
import { zodResponseFormat } from 'openai/helpers/zod';
import { z } from 'zod';


const Step = z.object({
    explanation: z.string(),
    output: z.string(),
})

const MathResponse = z.object({
    steps: z.array(Step),
    final_answer: z.string(),
})


const client = new OpenAI();

const completion = await client.beta.chat.completions.parse({
    model: 'gpt-4o-2024-08-06',
    messages: [
        {
            "role": "system",
            "content": "You are a helpful math tutor. Only use the schema for math responses.",
        },
        { "role": "user", "content": "solve 8x + 3 = 21" },
    ],
    response_format: zodResponseFormat(MathResponse, 'mathResponse'),
});

const message = completion.choices[0]?.message;
if (message?.parsed) {
    console.log(message.parsed.steps);
    console.log(message.parsed.final_answer);
} else {
    console.log(message.refusal);
}

Or in Python:

from pydantic import BaseModel

from openai import OpenAI


class Step(BaseModel):
    explanation: str
    output: str


class MathResponse(BaseModel):
    steps: list[Step]
    final_answer: str


client = OpenAI()

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor."},
        {"role": "user", "content": "solve 8x + 31 = 2"},
    ],
    response_format=MathResponse,
)

message = completion.choices[0].message
if message.parsed:
    print(message.parsed.steps)
    print(message.parsed.final_answer)
else:
    print(message.refusal)

Technical Deep Dive:

Schema Definition: You define a JSON Schema to dictate the structure of the model's output. This schema includes the expected data types, required fields, and other constraints. Crucially, all fields within the schema must be marked as required. While optional fields aren't directly supported, you can emulate them using a union type with null.
Supported Schemas: Structured Outputs supports a subset of the JSON Schema specification, including string, number, boolean, integer, object, array, enum, and anyOf types. Note that the root-level object must be of type 'object' and cannot be 'anyOf'. There are limitations on nesting depth (up to 5 levels) and the total number of object properties (up to 100). The keyword additionalProperties: false is mandatory for objects to prevent the model from hallucinating extra fields. Several type-specific keywords like minLength, maxLength, pattern for strings, and similar constraints for other types are not yet supported. Definitions ($defs) and recursive schemas (using # for root recursion or explicit $ref paths) are supported.
Key Ordering: The model's output will respect the order of keys as defined in the schema.
Refusals: When the model refuses a request due to safety concerns or other reasons, the response will include a refusal field instead of the expected structured data. Your application should handle this scenario gracefully, perhaps by displaying the refusal message to the user.
Handling Edge Cases: Developers must implement error handling to address situations like partial JSON outputs due to context window limitations, content filtering, or network issues. Checking the finish_reason in the API response is crucial for identifying these scenarios.
JSON Mode (Deprecated): While still available, JSON mode is superseded by Structured Outputs. It ensures valid JSON but not schema adherence. It's activated by setting response_format to { "type": "json_object" }. However, it requires explicit instruction to the model to generate JSON within the prompt, and your application must still handle potential edge cases and validate the JSON structure against your schema.

Best Practices:

Clear Prompting for User Input: When handling user-generated input, provide instructions within the prompt on how to manage cases where the input is incompatible with the desired schema.
Handling Mistakes: Structured Outputs doesn't eliminate the possibility of errors in the content itself. Refine prompts, provide examples, or break down complex tasks into simpler subtasks to improve accuracy.
Schema Consistency: Use the Pydantic/Zod SDK support to maintain consistency between your schema and application code. Alternatively, implement CI checks to prevent divergence.

By leveraging Structured Outputs, developers can streamline LLM integration, enhance application reliability, and create more robust and user-friendly experiences.

Google Vertex AI with Gemini

Vertex AI's Gemini models offer structured output capabilities, enabling developers to receive JSON-formatted responses suitable for direct processing in applications. While simply requesting JSON in the prompt can be effective, providing a structured JSON schema ensures predictable and consistent output.

With Prompting

A simple approach involves instructing Gemini to return JSON directly within the prompt. This works well for less complex scenarios where strict schema adherence isn't critical.

Using `responseSchema` for Strict Adherence

For robust integrations, Vertex AI allows defining a responseSchema within the generationConfig. This ensures Gemini validates its response against the specified schema, guaranteeing the presence of required fields and correct data types.

// JavaScript example with responseSchema
const genAI = new GoogleGenerativeAI(process.env.API_KEY);

const schema = {
  description: "List of recipes",
  type: "ARRAY",  // Using string literals here for brevity
  items: {
    type: "OBJECT",
    properties: {
      recipeName: {
        type: "STRING",
        description: "Name of the recipe",
        nullable: false, // Note: nullable fields are supported in Vertex AI
      },
    },
    required: ["recipeName"],
  },
};

const model = genAI.getGenerativeModel({
  model: "gemini-1.5-pro",
  generationConfig: {
    responseMimeType: "application/json",
    responseSchema: schema,
  },
});

const result = await model.generateContent("List a few popular cookie recipes.");
console.log(result.response.text());

Some key differences from OpenAI:

Schema Definition: While OpenAI leverages Pydantic and Zod in its SDKs for schema definition, Vertex AI uses a more direct JSON-like schema representation within the responseSchema. Also, unlike OpenAI's strict requirement for all fields to be required, Vertex AI supports nullable: true for optional fields directly within the schema.
Integration: OpenAI's response_format parameter streamlines schema validation within the API call itself. With Vertex AI, you provide the schema via responseSchema, but client-side validation using libraries like jsonschema is recommended for strict enforcement. This split approach offers flexibility but necessitates extra validation steps in your application code.

By using responseSchema and client-side validation, developers can integrate Gemini seamlessly into their applications, relying on predictable, structured data for enhanced reliability and easier processing.

Tool & Function Calling

When strict schema adherence through response_format or similar mechanisms isn't available or suitable, tool/function calling offers a powerful alternative for obtaining structured data from LLMs. This approach leverages the LLM's ability to interact with external tools or functions, effectively extending its capabilities beyond text generation. The key advantage lies in the ability to define JSON schemas for the input parameters of these tools/functions. By encouraging the model to call a specific function with its output formatted as arguments for that function, developers gain more control over the structure and content of the data received.

Here's how it works:

Define a Function with a Schema: Create a function within your application, specifying a JSON schema for its input parameters. This schema dictates the expected structure, data types, and any other constraints for the data the LLM should provide.
Guide the LLM: Craft your prompt to guide the LLM towards invoking the defined function. Provide clear instructions and examples to demonstrate the desired behavior, prompting the model to generate the function call with appropriately formatted arguments.
Parse and Process: When the LLM calls the function, your application receives the arguments, already structured according to the predefined schema. This eliminates the need for complex parsing or validation of the raw LLM output.

const tools = [
    {
        type: "function",
        function: {
            name: "get_delivery_date",
            description: "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
            parameters: {
                type: "object",
                properties: {
                    order_id: {
                        type: "string",
                        description: "The customer's order ID.",
                    },
                },
                required: ["order_id"],
                additionalProperties: false,
            },
        }
    }
];

const messages = [
    { role: "system", content: "You are a helpful customer support assistant. Use the supplied tools to assist the user." },
    { role: "user", content: "Hi, can you tell me the delivery date for my order?" }
];

const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: messages,
    tools: tools,
});

Key Considerations

Handling Errors & Edge Cases

Context window limitations, content filtering, or network interruptions can lead to incomplete JSON responses. Always check the finish_reason (OpenAI) or similar indicators in the API response to detect truncated outputs. Implement retry mechanisms or fallback strategies to handle these cases gracefully. You can also abstract your calls away and encapsulate this behaviour.

LLMs might refuse requests due to safety concerns or inability to understand the prompt. Handle refusals gracefully by displaying an appropriate message to the user or triggering alternative workflows. Examine the refusal reason provided by the API to gain insights into the cause of the refusal.

# Example error handling with OpenAI (using Pydantic for schema validation)
from openai import OpenAI
from pydantic import BaseModel, ValidationError

client = OpenAI()
# ... (Pydantic model definition) ...

try:
    completion = client.chat.completions.create(...)
    data = completion.choices[0].message
    if data.refusal:
        raise ValueError(f"Model refused the request: {data.refusal.reason}") # Or handle the refusal differently

    parsed_data = YourPydanticModel.model_validate_json(data.content) # Validate using Pydantic

except ValidationError as e:
    print(f"Schema validation error: {e}")
    # Handle the error (e.g., retry, refine prompt, fallback)
except ValueError as e: # For refusals
    print(f"Model refusal: {e}")
    # Display a message to the user or trigger a different action
except Exception as e:
    print(f"An unexpected error occurred: {e}")
    # Handle other potential errors (e.g., network issues)

Security

When using user-provided data within schemas or function arguments, be mindful of potential security risks. Maliciously crafted input could lead to unexpected behavior or vulnerabilities. Sanitize user inputs thoroughly before incorporating them into schemas or function calls to mitigate these risks.

Testing & Validation

Thorough testing is essential to ensure the reliability of LLM integrations with schema validation. Develop comprehensive test cases covering various scenarios, including valid and invalid inputs, edge cases, and model refusals.

Use a dedicated JSON Schema validation library –like ajv for JS, or jsonschema for Python– within your test suite to verify the correctness of the LLM's output. Continuously monitor and validate the integration in production to detect and address any regressions.

If you use any AI observability platform, like Modelmetry, you can then easily send LLM generation traces and run evaluations like JSON validation on these to see how they evolve over time.

Prompt Engineering Techniques

Crafting effective prompts is crucial for guiding LLMs towards generating desired structured outputs. Here are some techniques:

Explicit Instructions: Clearly state the expected JSON format and schema. Provide examples of correctly formatted output.
Few-Shot Learning: Include a few examples of input and correctly formatted JSON output in the prompt.
Schema in Prompt: When using an LLM without response_format-like attribute, embed the schema within the prompt itself as a guide.
Function Call Guidance: For function calling, demonstrate the desired function call with example arguments in the prompt.

author

Lazhar Ichir

Lazhar Ichir is the CEO and founder of Modelmetry, an LLM guardrails and observability platform that helps developers build secure and reliable modern LLM-powered applications.