Large Language Models (LLMs) are incredibly powerful, but getting them to return data in a specific, structured format like JSON can be a challenge. While you can simply ask the model for JSON, the output can often be inconsistent or malformed, especially when dealing with longer, more complex text inputs. This can break your application’s data processing pipeline.
Fortunately, modern LLM APIs, like Google’s Gemini, offer a robust solution: response schemas. By defining a clear schema, you can guide the model to generate valid, structured JSON every single time.
This guide will walk you through the evolution of a Python script designed to segment text and format it as a JSON-based Table of Contents, highlighting the shift from a simple prompt to a reliable, schema-driven approach.
The Goal: Structuring Text into a JSON Table of Contents
Imagine you have a long, plain text document, perhaps a historical manuscript or a report. Your goal is to use an LLM to read the text, break it down into meaningful thematic sections, assign a title to each section, and then output this structure in a clean JSON format.
For instance, given this input text:
1. བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་ཏུ་ཕྱིན་པའི་སྙིང་པོ། །@
2. ༄༅། །རྒྱ་གར་སྐད་དུ། བྷ་ག་བ་ཏི་པྲ་ཛྙ་པ་ར་མི་ཏཱྀ་ཧྲད་ཡ། @
3. བོད་སྐད་དུ། བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་ཏུ་ཕྱིན་པའི་སྙིང་པོ། @
4. བམ་པོ་གཅིག་གོ །@
5. བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་དུ་ཕྱིན་པ་ལ་ཕྱག་འཚལ་ལོ། །@
6. འདི་སྐད་བདག་གིས་ཐོས་པ་དུས་གཅིག་ན། @
7. བཅོམ་ལྡན་འདས་རྒྱལ་པོའི་ཁབ་བྱ་རྒོད་ཕུང་པོའི་རི་ལ་དགེ་སློང་གི་དགེ་འདུན་ཆེན་པོ་དང་། བྱང་ཆུབ་སེམས་དཔའི་དགེ་འདུན་ཆེན་པོ་དང་ཐབས་ཅིག་ཏུ་བཞུགས་ཏེ། @
...
The desired JSON output should look like this:
[
{
"title": "<མཚན་དང་ཕྱག་འཚལ་བ།>",
"segments": [
"1. བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་ཏུ་ཕྱིན་པའི་སྙིང་པོ། །@",
"2. ༄༅། །རྒྱ་གར་སྐད་དུ། བྷ་ག་བ་ཏི་པྲ་ཛྙ་པ་ར་མི་ཏཱྀ་ཧྲད་ཡ། @",
"3. བོད་སྐད་དུ། བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་ཏུ་ཕྱིན་པའི་སྙིང་པོ། @",
"4. བམ་པོ་གཅིག་གོ །@",
"5. བཅོམ་ལྡན་འདས་མ་ཤེས་རབ་ཀྱི་ཕ་རོལ་དུ་ཕྱིན་པ་ལ་ཕྱག་འཚལ་ལོ། །@"
]
},
{
"title": "<གླེང་གཞི་དང་སྤྱན་རས་གཟིགས་ཀྱི་བཀའ་ལན།>",
"segments": [
"6. འདི་སྐད་བདག་གིས་ཐོས་པ་དུས་གཅིག་ན། @",
"7. བཅོམ་ལྡན་འདས་རྒྱལ་པོའི་ཁབ་བྱ་རྒོད་ཕུང་པོའི་རི་ལ་དགེ་སློང་གི་དགེ་འདུན་ཆེན་པོ་དང་། བྱང་ཆུབ་སེམས་དཔའི་དགེ་འདུན་ཆེན་པོ་དང་ཐབས་ཅིག་ཏུ་བཞུགས་ཏེ། @",
...
]
}
]
Approach 1: The Hopeful Request (and Its Pitfalls)
The initial approach was straightforward: tell the model to return JSON in the prompt and set the API’s response MIME type to application/json.
Here’s the initial Python code using the google-generativeai library:
import os
import json
import google.generativeai as genai
from google.generativeai import types
def get_text_sections(text_content: str, language: str):
# _get_prompt() is a helper function that constructs the detailed prompt
prompt = _get_prompt(text_content, language)
client = genai.Client(
api_key=os.environ.get("GOOGLE_GEMINI_KEY"),
)
model = "gemini-flash-latest"
contents = [
types.Content(
role="user",
parts=[
types.Part.from_text(text=prompt),
],
),
]
# Set the desired response type to JSON
generate_content_config = types.GenerateContentConfig(
response_mime_type="application/json",
)
response = client.models.generate_content(
model=model,
contents=contents,
config=generate_content_config,
)
return json.loads(response.text)
This method worked for short texts. However, as the input text grew longer, the model started returning improper or incomplete JSON, causing json.loads() to fail. This is a common issue—LLMs can struggle to maintain perfect syntax over extended outputs.
Approach 2: The Definitive Solution with a Response Schema
To enforce a strict output structure, we can provide the model with a response schema. This schema acts as a template, ensuring the LLM’s output always conforms to the defined structure.
We used Pydantic to define our desired JSON structure in Python. Pydantic models are a clean and intuitive way to declare data schemas.
First, we define the models for a single section and the overall Table of Contents:
from pydantic import BaseModel, Field
from typing import List
class TocSection(BaseModel):
"""Defines the structure for a single section in the table of contents."""
section_title: str = Field(description="A concise, descriptive title for the thematic section.")
segments: List[str] = Field(description="An array of original, unmodified text segments belonging to this section.")
class TableOfContents(BaseModel):
"""Defines the root structure for the entire table of contents."""
toc: List[TocSection] = Field(description="The complete Table of Contents as an array of sections.")
Next, we update our function to pass this schema to the Gemini API:
def get_ai_toc(text_content: str, language: str):
prompt = _get_prompt(text_content, language)
client = genai.Client(
api_key=os.environ.get("GOOGLE_GEMINI_KEY"),
)
model = "gemini-flash-latest"
contents = [
types.Content(
role="user",
parts=[
types.Part.from_text(text=prompt),
],
),
]
# Configure the response to use our Pydantic schema
generate_content_config = types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=TableOfContents, # Here is the magic!
)
response = client.models.generate_content(
model=model,
contents=contents,
config=generate_content_config,
)
# The library automatically parses the JSON into our Pydantic model
parsed_data: TableOfContents = response.parsed
text_sections = parsed_data.model_dump()
return text_sections
By setting response_schema=TableOfContents, we instruct the model to strictly adhere to our defined structure. The client library handles the validation, and the response.parsed attribute conveniently gives us a populated Pydantic object, eliminating the need for manual parsing with json.loads().
A Critical Caveat: The Output Token Limit
Even with a response schema, there is a crucial limitation to be aware of: the model’s output token limit. If the generated JSON is too large and exceeds this limit, the output will be truncated, resulting in an incomplete and invalid JSON object.
This means that for very long texts, you may need to implement a chunking strategy:
-
Break the input text into smaller, manageable parts.
-
Process each part individually to generate a structured output.
-
Combine the structured results from all parts.
Key Takeaways for Reliable JSON Output
-
Don’t Just Ask, Enforce: Simply asking for JSON in a prompt is unreliable. For production systems, always use a formal schema.
-
Define a Clear Schema: Use tools like Pydantic to create clear, descriptive schemas for your desired JSON output. This serves as an unambiguous contract for the LLM.
-
Use
response_schema: When using APIs that support it (like Google’s Gemini), pass your schema directly in the generation configuration for guaranteed structural integrity. -
Mind the Token Limit: Be aware of the model’s output token limit. If your expected output is large, plan for chunking your input to avoid truncated results.
For a deeper look at the implementation, check out the full Python package developed for this purpose on GitHub: OpenPecha/wb_toc_creator.