Model Returns Invalid JSON Because Schema Was Described, Not Enforced

You asked for JSON matching a schema. Most calls return valid JSON, some return prose with JSON inside, some omit fields. Description vs enforcement, and how to fix at the API layer.

Your prompt says “respond in JSON matching this schema: {name: string, age: number, tags: string[]}.” 95% of the time, the model returns valid JSON. 3% of the time, it returns “Sure, here’s the JSON you requested: json\n{...}\n”. 1% of the time, it omits the tags field. 1% of the time, it returns age: "thirty" because the schema didn’t enforce types. In production this means your JSON parser fails on 5% of calls, your downstream code crashes, and you scramble to add try-catch everywhere. The model didn’t disobey — it followed an English description of the schema, not an enforced contract.

Modern model APIs offer real schema enforcement (OpenAI structured outputs, Anthropic tool use, Gemini response schema). If you’re still putting the schema in a prompt and praying, you’re leaving free reliability on the table.

Common causes

1. Schema described in English instead of declared

Return JSON: {name, age, tags}.

The model reads this as soft guidance. Sometimes adds wrapper text. Sometimes omits fields. Without API-level enforcement, this is unreliable.

How to spot it: Look for schema as natural-language prose in the prompt with no response_format= or tools= parameter on the API call.

2. Markdown code-block wrapping

Model returns:

Here is the JSON:
```json
\{"name": "Alice"\}
```

Your parser reads the whole string and JSON.parse() fails. Even with response_format={"type":"json_object"} set, model may emit prose if instruction is contradicted.

How to spot it: Output contains backticks or “Here is” / “Sure” / “Of course” prefix.

3. Schema specifies fields but not types

Prompt says {age: number} but model returns "age": "30". Description allows ambiguity. The model thinks “30” is a number-shaped string.

How to spot it: Validate output with a strict JSON Schema validator. Type mismatches mean schema didn’t enforce.

4. Optional fields modeled as required

Schema says {name, email, phone}. User input only had name. Model returns {"name": "Alice", "email": null, "phone": null} or omits the fields. Downstream code expecting strings gets nulls.

How to spot it: Crashes on null field access; or KeyErrors on field omission.

5. Nested objects flattened or expanded

Schema: {user: {name, age}}. Model sometimes returns {name, age} directly, or expands to {user_name, user_age}. Nesting got lost in translation.

How to spot it: Top-level keys don’t match what was specified.

6. Arrays of objects collapse to comma-separated string

Schema: tags: string[]. Model returns "tags": "blue, red, fast". String, not array. Common when input contains comma-separated values.

How to spot it: Type-checking each field against schema reveals string instead of array.

7. Enum fields not honored

Schema: sentiment: "positive" | "neutral" | "negative". Model returns "sentiment": "very positive" or "sentiment": "neg". Enum was a hint, not a constraint.

How to spot it: Sentiment value not in allowed set.

Shortest path to fix

Step 1: Use real structured output, not prompt-described schema

OpenAI (Python):

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int
    tags: list[str]

resp = client.beta.chat.completions.parse(
    model="gpt-5.5",
    messages=[...],
    response_format=User,
)
user = resp.choices[0].message.parsed  # Already a User instance

The API enforces the schema at the token-sampling layer. Invalid tokens are forbidden by construction.

Step 2: Anthropic — use tool definition as schema

tools = [{
    "name": "extract_user",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"},
            "tags": {"type": "array", "items": {"type": "string"}}
        },
        "required": ["name", "age", "tags"]
    }
}]

msg = client.messages.create(
    model="claude-opus-4-7",
    tools=tools,
    tool_choice={"type": "tool", "name": "extract_user"},
    messages=[...],
)
data = msg.content[0].input  # Validated against schema

Forcing tool use makes the model emit JSON matching the schema.

Step 3: Gemini — pass schema via response_schema

import google.generativeai as genai

resp = model.generate_content(
    prompt,
    generation_config={
        "response_mime_type": "application/json",
        "response_schema": user_schema,
    },
)

Step 4: When stuck on a model without structured output, validate and retry

def get_json(prompt, max_retries=3):
    for i in range(max_retries):
        out = call_llm(prompt)
        try:
            data = json.loads(out)
            User.model_validate(data)
            return data
        except (json.JSONDecodeError, ValidationError) as e:
            prompt += f"\n\nPrevious response failed validation: {e}. Return valid JSON only."
    raise RuntimeError("Failed after retries")

Pass the validation error back to the model — it usually self-corrects on attempt 2.

Step 5: Use json_object mode as a fallback, not a guarantee

response_format={"type": "json_object"}

This prevents prose wrapping but doesn’t enforce schema. Still validate the parsed object.

Step 6: Pre-extract JSON if model insists on wrapping

import re
def extract_json(text):
    # Find first { ... } or [ ... ] block
    match = re.search(r'(\{.*\}|\[.*\])', text, re.DOTALL)
    if match: return match.group(1)
    raise ValueError("No JSON found")

Cheap defense for models that won’t stop adding “Here is your JSON:”.

Step 7: Log schema violations and tune

metrics.increment("schema_violation", tags={"field": field_name, "type": "missing"})

If a particular field is missed 5% of the time, that field’s description in the schema needs work — clarify or add example values.

When this is not on you

Some smaller models flatly cannot follow JSON schemas under any prompting. If you must use a small model, generate JSON-like output and validate / repair downstream — accept some loss rate.

Easy to misdiagnose as

“Bad prompting.” More verbose schema descriptions in the prompt help marginally. The real fix is API-level enforcement. Stop tuning prompts when the answer is “switch to structured outputs.”

Prevention

  • Default to structured-output APIs (OpenAI parse, Anthropic tools, Gemini response_schema).
  • Define schemas as code (Pydantic, Zod) — one source of truth for client and validator.
  • Always run schema validation on parsed JSON, even with structured outputs as a defense-in-depth check.
  • Log validation failures and retry with error feedback.
  • For models without structured-output support, add extract_json regex as a defensive layer.

FAQ

  • Does structured output cost more? Marginal latency overhead, no extra tokens. Almost free reliability.
  • What about nested schemas — 5 levels deep? Structured outputs handle nesting up to provider’s depth limit (usually 5-10 levels). Deeper than that — flatten.

Tags: #Prompt engineering #Troubleshooting #llm-output #json #structured-output #schema-validation