How To Ensure LLM Output Adheres to a JSON Schema
Understand how to make sure LLM outputs are valid JSON, and valid against a specific JSON schema. Learn how to implement this in practice.
Large Language Models (LLMs) excel at generating text, but reliably extracting structured data from them presents a significant challenge. This is often referred to as constrained generation — guiding the model to produce output adhering to specific formats. Early attempts involved requesting JSON directly within the prompt and subsequently parsing the LLM's response. This method, however, proves unreliable, as the model's interpretation can lead to malformed or incomplete JSON, leaving applications vulnerable to parsing errors.
The field has since evolved, introducing more robust strategies for guaranteed structured data. A fundamental distinction exists between valid JSON and strict JSON Schema adherence. Valid JSON simply confirms correct syntax, while schema adherence ensures the data conforms to a predefined structure, validating data types, required fields, and other constraints.
Different LLM providers offer distinct approaches. Some offer a basic "JSON mode" which encourages valid JSON syntax but doesn't enforce a particular schema. More advanced techniques include dedicated parameters for specifying JSON schemas, allowing developers to define the expected structure precisely.
As a useful fallback solution, tool/function calling provides another avenue for structured data exchange when available with the model used as tools and functions allow for schemas to define their input parameters. You then have to encourage the model to call your specific function and parse the arguments provided by the model.
Approach | Strengths | Weaknesses | Ideal Use Cases |
---|---|---|---|
Prompting + Parsing | Simple, widely applicable | Unreliable, prone to errors, requires extensive parsing | Quick prototyping, simple data extraction |
JSON Mode | Encourages valid JSON | Doesn't guarantee schema adherence | When strict schema isn't critical, as a first step |
Structured Outputs | Strong schema enforcement, type safety | Requires specific model/API support | Applications requiring predictable and reliable structured data |
Function/Tool Calling | Extends LLM capabilities, schema-based argument passing | Adds complexity, requires function definition and prompt engineering | Integrating LLMs with external tools, complex data transformations |
Of course, you could simply defer all this work to a reliable and trusted platform, like Structured Parser. In such cases, you simply provide the platform a JSON schema and you can then, using their SDK, submit documents and receive structured JSON from unstructured documents (e.g., PDF, DOCX, XLSX, Text, HTML, Markdown).
Valid JSON vs Strict JSON Schema Adherence
Valid JSON simply confirms that the data conforms to the basic JSON syntax rules (e.g., correct use of brackets, quotes, and data types).
Strict JSON Schema adherence, however, goes further by ensuring the data matches a predefined schema, validating not just syntax but also the presence of required fields, data types of those fields, and any other schema constraints. This guarantees predictable data structure and content, crucial for reliable application integration.
Vendor-Specific Strategies
OpenAI
The OpenAI API's Structured Outputs feature empowers developers to obtain predictable, schema-adherent JSON responses from large language models (LLMs), eliminating the need for extensive response validation and complex prompting. This is a significant improvement over the older JSON Mode, which only guaranteed valid JSON but not schema adherence. Structured Outputs is available in recent models like gpt-4o-mini-2024-07-18
, gpt-4o-2024-08-06
, and later.
How it Works:
Structured Outputs operates through two primary methods within the OpenAI API:
response_format
Parameter: For structuring the model's direct responses to user prompts, theresponse_format
parameter is used within the Chat Completions API. This allows you to define a JSON Schema that the model's output will conform to. OpenAI's Python and Node.js SDKs provide convenient helpers for defining these schemas using Pydantic and Zod, respectively, facilitating type safety within your application code. For instance, you can define a Pydantic model in Python and pass it directly to theresponse_format
argument. The API will then parse the model's raw output into this defined structure.- Function Calling: This method is employed when integrating the model with external tools or functionalities within your application. Function calling allows the model to interact with these tools, and Structured Outputs ensures that the data exchanged between the model and your application adheres to predefined schemas. This is especially useful for building AI assistants that can access databases, manipulate UI elements, or perform other actions based on user requests.
typescript
Or in Python:
python
Technical Deep Dive:
- Schema Definition: You define a JSON Schema to dictate the structure of the model's output. This schema includes the expected data types, required fields, and other constraints. Crucially, all fields within the schema must be marked as required. While optional fields aren't directly supported, you can emulate them using a union type with null.
- Supported Schemas: Structured Outputs supports a subset of the JSON Schema specification, including string, number, boolean, integer, object, array, enum, and anyOf types. Note that the root-level object must be of type 'object' and cannot be 'anyOf'. There are limitations on nesting depth (up to 5 levels) and the total number of object properties (up to 100). The keyword additionalProperties: false is mandatory for objects to prevent the model from hallucinating extra fields. Several type-specific keywords like minLength, maxLength, pattern for strings, and similar constraints for other types are not yet supported. Definitions (
$defs
) and recursive schemas (using#
for root recursion or explicit $ref paths) are supported. - Key Ordering: The model's output will respect the order of keys as defined in the schema.
- Refusals: When the model refuses a request due to safety concerns or other reasons, the response will include a refusal field instead of the expected structured data. Your application should handle this scenario gracefully, perhaps by displaying the refusal message to the user.
- Handling Edge Cases: Developers must implement error handling to address situations like partial JSON outputs due to context window limitations, content filtering, or network issues. Checking the
finish_reason
in the API response is crucial for identifying these scenarios. - JSON Mode (Deprecated): While still available, JSON mode is superseded by Structured Outputs. It ensures valid JSON but not schema adherence. It's activated by setting response_format to
{ "type": "json_object" }
. However, it requires explicit instruction to the model to generate JSON within the prompt, and your application must still handle potential edge cases and validate the JSON structure against your schema.
Best Practices:
- Clear Prompting for User Input: When handling user-generated input, provide instructions within the prompt on how to manage cases where the input is incompatible with the desired schema.
- Handling Mistakes: Structured Outputs doesn't eliminate the possibility of errors in the content itself. Refine prompts, provide examples, or break down complex tasks into simpler subtasks to improve accuracy.
- Schema Consistency: Use the Pydantic/Zod SDK support to maintain consistency between your schema and application code. Alternatively, implement CI checks to prevent divergence.
By leveraging Structured Outputs, developers can streamline LLM integration, enhance application reliability, and create more robust and user-friendly experiences.
Google Vertex AI with Gemini
Vertex AI's Gemini models offer structured output capabilities, enabling developers to receive JSON-formatted responses suitable for direct processing in applications. While simply requesting JSON in the prompt can be effective, providing a structured JSON schema ensures predictable and consistent output.
With Prompting
A simple approach involves instructing Gemini to return JSON directly within the prompt. This works well for less complex scenarios where strict schema adherence isn't critical.
Using responseSchema
for Strict Adherence
For robust integrations, Vertex AI allows defining a responseSchema
within the generationConfig
. This ensures Gemini validates its response against the specified schema, guaranteeing the presence of required fields and correct data types.
typescript
Some key differences from OpenAI:
- Schema Definition: While OpenAI leverages Pydantic and Zod in its SDKs for schema definition, Vertex AI uses a more direct JSON-like schema representation within the
responseSchema
. Also, unlike OpenAI's strict requirement for all fields to be required, Vertex AI supportsnullable: true
for optional fields directly within the schema. - Integration: OpenAI's
response_format
parameter streamlines schema validation within the API call itself. With Vertex AI, you provide the schema viaresponseSchema
, but client-side validation using libraries like jsonschema is recommended for strict enforcement. This split approach offers flexibility but necessitates extra validation steps in your application code.
By using responseSchema
and client-side validation, developers can integrate Gemini seamlessly into their applications, relying on predictable, structured data for enhanced reliability and easier processing.
Tool & Function Calling
When strict schema adherence through response_format
or similar mechanisms isn't available or suitable, tool/function calling offers a powerful alternative for obtaining structured data from LLMs. This approach leverages the LLM's ability to interact with external tools or functions, effectively extending its capabilities beyond text generation. The key advantage lies in the ability to define JSON schemas for the input parameters of these tools/functions. By encouraging the model to call a specific function with its output formatted as arguments for that function, developers gain more control over the structure and content of the data received.
Here's how it works:
- Define a Function with a Schema: Create a function within your application, specifying a JSON schema for its input parameters. This schema dictates the expected structure, data types, and any other constraints for the data the LLM should provide.
- Guide the LLM: Craft your prompt to guide the LLM towards invoking the defined function. Provide clear instructions and examples to demonstrate the desired behavior, prompting the model to generate the function call with appropriately formatted arguments.
- Parse and Process: When the LLM calls the function, your application receives the arguments, already structured according to the predefined schema. This eliminates the need for complex parsing or validation of the raw LLM output.
typescript
Key Considerations
Handling Errors & Edge Cases
Context window limitations, content filtering, or network interruptions can lead to incomplete JSON responses. Always check the finish_reason
(OpenAI) or similar indicators in the API response to detect truncated outputs. Implement retry mechanisms or fallback strategies to handle these cases gracefully. You can also abstract your calls away and encapsulate this behaviour.
LLMs might refuse requests due to safety concerns or inability to understand the prompt. Handle refusals gracefully by displaying an appropriate message to the user or triggering alternative workflows. Examine the refusal reason provided by the API to gain insights into the cause of the refusal.
python
Security
When using user-provided data within schemas or function arguments, be mindful of potential security risks. Maliciously crafted input could lead to unexpected behavior or vulnerabilities. Sanitize user inputs thoroughly before incorporating them into schemas or function calls to mitigate these risks.
Testing & Validation
Thorough testing is essential to ensure the reliability of LLM integrations with schema validation. Develop comprehensive test cases covering various scenarios, including valid and invalid inputs, edge cases, and model refusals.
Use a dedicated JSON Schema validation library –like ajv
for JS, or jsonschema
for Python– within your test suite to verify the correctness of the LLM's output. Continuously monitor and validate the integration in production to detect and address any regressions.
If you use any AI observability platform, like Modelmetry, you can then easily send LLM generation traces and run evaluations like JSON validation on these to see how they evolve over time.
Prompt Engineering Techniques
Crafting effective prompts is crucial for guiding LLMs towards generating desired structured outputs. Here are some techniques:
- Explicit Instructions: Clearly state the expected JSON format and schema. Provide examples of correctly formatted output.
- Few-Shot Learning: Include a few examples of input and correctly formatted JSON output in the prompt.
- Schema in Prompt: When using an LLM without
response_format
-like attribute, embed the schema within the prompt itself as a guide. - Function Call Guidance: For function calling, demonstrate the desired function call with example arguments in the prompt.

Lazhar Ichir
Lazhar Ichir is the CEO and founder of Modelmetry, an LLM guardrails and observability platform that helps developers build secure and reliable modern LLM-powered applications.