How to Get Reliable Structured JSON from LLMs: A Practical Guide

Large Language Models (LLMs) are incredibly powerful, but getting them to return data in a specific, structured format like JSON can be a challenge. While you can simply ask the model for JSON, the output can often be inconsistent or malformed, especially when dealing with longer, more complex text inputs. This can break your application’s data processing pipeline.

Fortunately, modern LLM APIs, like Google’s Gemini, offer a robust solution: response schemas. By defining a clear schema, you can guide the model to generate valid, structured JSON every single time.

This guide will walk you through the evolution of a Python script designed to segment text and format it as a JSON-based Table of Contents, highlighting the shift from a simple prompt to a reliable, schema-driven approach.

The Goal: Structuring Text into a JSON Table of Contents

Imagine you have a long, plain text document, perhaps a historical manuscript or a report. Your goal is to use an LLM to read the text, break it down into meaningful thematic sections, assign a title to each section, and then output this structure in a clean JSON format.

For instance, given this input text:

1. བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་ཏུ་ཕྱིན་པའི་སྙིང་པོ། །@
2. ༄༅། །རྒྱ་གར་སྐད་དུ། བྷ་ག་བ་ཏི་པྲ་ཛྙ་པ་ར་མི་ཏཱྀ་ཧྲད་ཡ། @
3. བོད་སྐད་དུ། བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་ཏུ་ཕྱིན་པའི་སྙིང་པོ། @
4. བམ་པོ་གཅིག་གོ །@
5. བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་དུ་ཕྱིན་པ་ལ་ཕྱག་འཚལ་ལོ། །@
6. འདི་སྐད་བདག་གིས་ཐོས་པ་དུས་གཅིག་ན། @
7. བཅོམ་ལྡན་འདས་རྒྱལ་པོའི་ཁབ་བྱ་རྒོད་ཕུང་པོའི་རི་ལ་དགེ་སློང་གི་དགེ་འདུན་ཆེན་པོ་དང་། བྱང་ཆུབ་སེམས་དཔའི་དགེ་འདུན་ཆེན་པོ་དང་ཐབས་ཅིག་ཏུ་བཞུགས་ཏེ། @
...


The desired JSON output should look like this:

[
    {
        "title": "<མཚན་དང་ཕྱག་འཚལ་བ།>",
        "segments": [
            "1. བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་ཏུ་ཕྱིན་པའི་སྙིང་པོ། །@",
            "2. ༄༅། །རྒྱ་གར་སྐད་དུ། བྷ་ག་བ་ཏི་པྲ་ཛྙ་པ་ར་མི་ཏཱྀ་ཧྲད་ཡ། @",
            "3. བོད་སྐད་དུ། བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་ཏུ་ཕྱིན་པའི་སྙིང་པོ། @",
            "4. བམ་པོ་གཅིག་གོ །@",
            "5. བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་དུ་ཕྱིན་པ་ལ་ཕྱག་འཚལ་ལོ། །@"
        ]
    },
    {
        "title": "<གླེང་གཞི་དང་སྤྱན་རས་གཟིགས་ཀྱི་བཀའ་ལན།>",
        "segments": [
            "6. འདི་སྐད་བདག་གིས་ཐོས་པ་དུས་གཅིག་ན། @",
            "7. བཅོམ་ལྡན་འདས་རྒྱལ་པོའི་ཁབ་བྱ་རྒོད་ཕུང་པོའི་རི་ལ་དགེ་སློང་གི་དགེ་འདུན་ཆེན་པོ་དང་། བྱང་ཆུབ་སེམས་དཔའི་དགེ་འདུན་ཆེན་པོ་དང་ཐབས་ཅིག་ཏུ་བཞུགས་ཏེ། @",
            ...
   ]
}
]


Approach 1: The Hopeful Request (and Its Pitfalls)

The initial approach was straightforward: tell the model to return JSON in the prompt and set the API’s response MIME type to application/json.

Here’s the initial Python code using the google-generativeai library:

import os
import json
import google.generativeai as genai
from google.generativeai import types

def get_text_sections(text_content: str, language: str):
    # _get_prompt() is a helper function that constructs the detailed prompt
    prompt = _get_prompt(text_content, language)
    client = genai.Client(
        api_key=os.environ.get("GOOGLE_GEMINI_KEY"),
    )

    model = "gemini-flash-latest"
    contents = [
        types.Content(
            role="user",
            parts=[
                types.Part.from_text(text=prompt),
            ],
        ),
    ]
    # Set the desired response type to JSON
    generate_content_config = types.GenerateContentConfig(
        response_mime_type="application/json",
    )

    response = client.models.generate_content(
        model=model,
        contents=contents,
        config=generate_content_config,
    )
    return json.loads(response.text)


This method worked for short texts. However, as the input text grew longer, the model started returning improper or incomplete JSON, causing json.loads() to fail. This is a common issue—LLMs can struggle to maintain perfect syntax over extended outputs.

Approach 2: The Definitive Solution with a Response Schema

To enforce a strict output structure, we can provide the model with a response schema. This schema acts as a template, ensuring the LLM’s output always conforms to the defined structure.

We used Pydantic to define our desired JSON structure in Python. Pydantic models are a clean and intuitive way to declare data schemas.

First, we define the models for a single section and the overall Table of Contents:

from pydantic import BaseModel, Field
from typing import List

class TocSection(BaseModel):
    """Defines the structure for a single section in the table of contents."""
    section_title: str = Field(description="A concise, descriptive title for the thematic section.")
    segments: List[str] = Field(description="An array of original, unmodified text segments belonging to this section.")

class TableOfContents(BaseModel):
    """Defines the root structure for the entire table of contents."""
    toc: List[TocSection] = Field(description="The complete Table of Contents as an array of sections.")


Next, we update our function to pass this schema to the Gemini API:

def get_ai_toc(text_content: str, language: str):
    prompt = _get_prompt(text_content, language)
    client = genai.Client(
        api_key=os.environ.get("GOOGLE_GEMINI_KEY"),
    )

    model = "gemini-flash-latest"
    contents = [
        types.Content(
            role="user",
            parts=[
                types.Part.from_text(text=prompt),
            ],
        ),
    ]
    
    # Configure the response to use our Pydantic schema
    generate_content_config = types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=TableOfContents, # Here is the magic!
    )

    response = client.models.generate_content(
        model=model,
        contents=contents,
        config=generate_content_config,
    )
    
    # The library automatically parses the JSON into our Pydantic model
    parsed_data: TableOfContents = response.parsed
    text_sections = parsed_data.model_dump()
    return text_sections


By setting response_schema=TableOfContents, we instruct the model to strictly adhere to our defined structure. The client library handles the validation, and the response.parsed attribute conveniently gives us a populated Pydantic object, eliminating the need for manual parsing with json.loads().

A Critical Caveat: The Output Token Limit

Even with a response schema, there is a crucial limitation to be aware of: the model’s output token limit. If the generated JSON is too large and exceeds this limit, the output will be truncated, resulting in an incomplete and invalid JSON object.

This means that for very long texts, you may need to implement a chunking strategy:

  1. Break the input text into smaller, manageable parts.

  2. Process each part individually to generate a structured output.

  3. Combine the structured results from all parts.

Key Takeaways for Reliable JSON Output

  1. Don’t Just Ask, Enforce: Simply asking for JSON in a prompt is unreliable. For production systems, always use a formal schema.

  2. Define a Clear Schema: Use tools like Pydantic to create clear, descriptive schemas for your desired JSON output. This serves as an unambiguous contract for the LLM.

  3. Use response_schema: When using APIs that support it (like Google’s Gemini), pass your schema directly in the generation configuration for guaranteed structural integrity.

  4. Mind the Token Limit: Be aware of the model’s output token limit. If your expected output is large, plan for chunking your input to avoid truncated results.

For a deeper look at the implementation, check out the full Python package developed for this purpose on GitHub: OpenPecha/wb_toc_creator.

@Tenzin_Gayche In case of extracting TOC with mentioned structure as an output, I am encountering large text with 100K tokens. In that case, Gemini fails to return proper output as output token is suppose to have more than 100K tokens and gemini’s output token limit is 63k. Is there any work around solution you can share me. it would be very helpful.

look like the solution is very inefficient and tricky for the llm, i think itll be more efficient if you can ask gemini to add tags like header, sub heading etc of a given long chunk of text overlaping text with the tags on overlaped texts (just get the starting text and ending text add the tag using python by searching if fails retry it with error log), tags needs to be parsable so that you can create the table of content using that tags(<header>,<body>)
the solution is very close to how google doc creates the TOC

interesting. I did a similar approach where i changed m prompt and output schema to give me segment number of start segment and end segment of each section. Later map it with original text with segment number programatically. although i didn’t go through each section and check whether they map the theme but i was able to find that segment number in ai output is same as the input segment number.

also what ever solution we come with i think itll have some big flaws anyways as the problem it self for human will be complicated

i agree. as of now i have strictly mentioned that we can’t go beyond 12 sections as i hv seen ai generating segmenting section which contents one line also sometimes. specially with smaller text. but if i give that 12 section instruction it is failing the long text which can have more 12 ideas explained