Agent Output Not Machine-Parseable Downstream

Q: JSON mode, native structured outputs, or tool-calling — which should I use?

Native structured outputs (OpenAI `response_format` with a JSON schema/Pydantic model; Anthropic `output_config`/`messages.parse()`) are the strongest option because generation is constrained to your schema at the token level, so non-conforming output cannot be produced. Plain "JSON mode" (`response_format={"type": "json_object"}`) only guarantees syntactically valid JSON, not your specific shape. Forced tool-calling still works well and is fine if you are already using it, but it is no longer the recommended default for structured data.

Q: Can I fix this without changing the agent call?

The `extract_json` wrapper (Step 2) recovers most fence- and prose-wrapping cases in production. But it is a band-aid: it cannot fix truncated JSON or schema drift. Fix the root cause by switching the call to native structured outputs.

Your agent wraps JSON in a markdown fence or adds prose, so the downstream parser crashes. Fix it for good with native structured outputs (June 2026).

Published: May 25, 2026 Updated: Jun 17, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Your LangGraph pipeline has an analysis agent that is supposed to output a JSON object like {"issues": [...], "severity": "high"}. Downstream, a routing agent calls json.loads(output) and crashes with json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0). The analysis agent actually returned:

Here's my analysis:

```json
\{"issues": ["missing null check"], "severity": "high"\}
```

Let me know if you need more detail.

The JSON is there. It is just buried in a markdown code fence and surrounded by prose. This is the single most common output-format failure in multi-agent pipelines, and it compounds: each downstream parse failure either crashes the run or silently feeds garbage data further down the chain.

TL;DR — fastest reliable fix

Stop asking the model to “please return JSON” in a prompt and start using the provider’s native structured outputs, which constrain generation to your schema at the token level so non-conforming text literally cannot be produced. As of June 2026 both major providers ship this:

OpenAI: client.chat.completions.parse(...) (now GA, no .beta prefix) with a Pydantic model passed to response_format.
Anthropic: native structured outputs reached GA after a public beta that opened on November 14, 2025. Use client.messages.parse(...) with a Pydantic model, or the raw output_config={"format": {"type": "json_schema", ...}} parameter. The old “force a tool call” workaround still works but is no longer the recommended path.

If you cannot change the call site today, drop in the extract_json fallback in Step 2 to recover fenced/prose-wrapped JSON, then schedule the real fix.

Which bucket are you in?

Symptom you observe	Most likely cause	Jump to
`Expecting value: line 1 column 1` on every Nth output	Prose/fences around the JSON	Causes 1, 2
Worked for months, started failing on a date	Auto model-version bump	Cause 3
Only large result sets fail; output ends mid-token	Token-limit truncation	Cause 4
Failures only on later conversation turns	Context drift to chatty tone	Cause 5
`json.loads` succeeds but `KeyError`/`None` downstream	Schema drift, parser/agent disagree	Cause 6

Common causes

1. System prompt requests JSON but doesn’t forbid prose

The prompt says “respond with JSON” but does not say “respond with ONLY JSON, no other text.” LLMs default to conversational framing — they add preambles (“Here’s the result:”), postambles (“Let me know if…”), and markdown fences even when asked for raw JSON.

How to spot it: Print the raw string content of the last 10 agent outputs before any parsing. Count how many contain characters before the first { or after the last }. If more than 2 of 10 have leading or trailing text, the prompt is not strict enough — and a prompt alone will never get you to zero.

2. No schema enforcement — relying on the prompt alone

The pipeline relies entirely on prompt instructions to produce structured output. There is no schema validation, no Pydantic model, and no structured-output API call. The model’s compliance is probabilistic, not enforced. This is the root cause behind most of the others.

How to spot it: Check whether the agent call passes response_format (OpenAI) or output_config/output_format (Anthropic). If the model is invoked with a plain string prompt and the response is read as a string, there is no enforcement.

3. A model version change breaks a previously reliable format

Your pipeline worked for months on a pinned snapshot. After an automatic model update — for example moving off a deprecated GPT-4-era snapshot onto GPT-5.5, or a Claude point release — the same prompt now sometimes produces code-fenced JSON. Different checkpoints have different formatting tendencies, and “worked before” is not a guarantee for a new checkpoint.

How to spot it: Check when format failures started. If they correlate with a model version change or a provider infrastructure update, format regression in the new model is the cause. Always pin a dated snapshot in production so updates are deliberate, not silent.

4. Long output triggers partial JSON via truncation

The agent is asked to return a large JSON array. The output hits max_tokens mid-array. The result is valid JSON up to a point, then cut off: ["item1", "item2", "ite — which json.loads() rejects.

How to spot it: Check whether parse failures correlate with large result sets. If the token count of failed outputs sits right at your max_tokens ceiling, truncation is the cause. Inspect response.stop_reason == "max_tokens" (Anthropic) or finish_reason == "length" (OpenAI) — that flag is the smoking gun.

5. Multi-turn conversation accumulates non-JSON turns

In a multi-turn session, the agent emits valid JSON on turn 3 but starts adding commentary on turn 8 as the conversation grows. The model is fitting to the conversational tone of earlier turns in the context window.

How to spot it: Log the turn number on which parse failures occur. If failures cluster on later turns, context drift is causing format regression. Re-stating the format contract in each user message (not only once in the system prompt) measurably helps.

6. Downstream parser assumes one format, the agent changed format

The schema evolved: the agent now returns {"result": {"issues": [...]}} (nested) but the parser still reads data["issues"] (flat). No JSON error — just a KeyError or a silent None where a list was expected.

How to spot it: Compare the schema in the parsing code against the schema the agent returns today. Drift between the two is a format mismatch even though the JSON is well-formed.

Shortest path to fix

Step 1: Use native structured outputs instead of prompt-based formatting

OpenAI — parse() is GA as of June 2026, so use it directly (no client.beta...):

from pydantic import BaseModel

class AnalysisResult(BaseModel):
    issues: list[str]
    severity: str
    confidence: float

completion = client.chat.completions.parse(
    model="gpt-5.5",
    messages=messages,
    response_format=AnalysisResult,
)

msg = completion.choices[0].message
if msg.refusal:                 # safety refusals come back here, not as JSON
    raise OutputFormatError(msg.refusal)
result = msg.parsed             # typed AnalysisResult object

The newer Responses API uses client.responses.parse(..., text_format=AnalysisResult) and exposes the object as response.output_parsed — pick whichever API your codebase already uses.

Note OpenAI’s strict-mode rules: every field must be listed in required and the schema must set additionalProperties: false. To model a genuinely optional field, make its type nullable (for example confidence: float | None) rather than omitting it.

Anthropic — native structured outputs (GA in 2026; the public beta header structured-outputs-2025-11-13 is no longer required):

from anthropic import Anthropic
from pydantic import BaseModel

class AnalysisResult(BaseModel):
    issues: list[str]
    severity: str

client = Anthropic()
response = client.messages.parse(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=messages,
    output_format=AnalysisResult,
)
result = response.parsed_output   # typed AnalysisResult object

If you are not on a recent SDK, pass the raw parameter instead and read response.content[0].text (a guaranteed-conforming JSON string):

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=messages,
    output_config={
        "format": {
            "type": "json_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "issues": {"type": "array", "items": {"type": "string"}},
                    "severity": {"type": "string", "enum": ["low", "medium", "high"]},
                },
                "required": ["issues", "severity"],
                "additionalProperties": False,
            },
        }
    },
)

Anthropic compiles your schema into a grammar and constrains generation token by token, so the model cannot emit text that violates the schema. Two practical notes: the first request for a new schema pays a one-time grammar-compilation latency, then the compiled grammar is cached for 24 hours from last use; and additionalProperties: false is mandatory on every object.

Tool-calling (forcing a single tool via tool_choice={"type": "tool", "name": "submit_analysis"} and reading response.content[0].input) still works and is fine if you are already wired up that way, but native structured outputs are the cleaner default now.

Step 2: Add a JSON extraction wrapper as a fallback

Use this only where native structured outputs are unavailable (a third-party gateway, a local model). It is a band-aid for fence/prose wrapping, not a cure for truncation or schema drift.

import re, json

def extract_json(text: str) -> dict:
    # 1. Try a direct parse first
    try:
        return json.loads(text.strip())
    except json.JSONDecodeError:
        pass

    # 2. Strip a markdown code fence (json / JSON / no language tag)
    fenced = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
    if fenced:
        try:
            return json.loads(fenced.group(1))
        except json.JSONDecodeError:
            pass

    # 3. Fall back to the outermost brace pair
    start, end = text.find("{"), text.rfind("}")
    if start != -1 and end != -1 and end > start:
        try:
            return json.loads(text[start:end + 1])
        except json.JSONDecodeError:
            pass

    raise ValueError(f"Could not extract JSON from agent output: {text[:200]!r}")

Step 3: Harden the system prompt with explicit negative constraints

For any path where native structured outputs are not available, tighten the prompt. Negative constraints and a first/last-character rule outperform a vague “return JSON”:

Respond with ONLY a valid JSON object. No markdown. No code fences. No preamble.
No postamble. No explanation. The first character of your response must be the
opening brace, and the last character must be the closing brace. If you cannot
produce valid JSON, respond with:
{"error": "unable to analyze", "reason": "<one sentence>"}

Step 4: Validate the schema after parsing

Even guaranteed-valid JSON can be the wrong shape after a refactor. Validate before any downstream consumption:

from pydantic import BaseModel, ValidationError

class AnalysisResult(BaseModel):
    issues: list[str]
    severity: str

def parse_and_validate(raw: str) -> AnalysisResult:
    data = extract_json(raw)
    try:
        return AnalysisResult(**data)
    except ValidationError as e:
        raise OutputFormatError(f"Agent output failed schema validation: {e}") from e

Schema validation catches field-level issues (missing keys, wrong types, out-of-range enums) that a bare json.loads() happily lets through.

Step 5: Size max_tokens to prevent truncation

# Right-size max_tokens to the expected output. A JSON object with ~20 issues
# averages roughly 500 tokens, so 4096 is wasteful and a thin ceiling truncates.
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=messages,
)

# If the result set can be large, page it instead of growing the array:
# Prompt: "Return at most 10 issues per call. Include a 'has_more' boolean."

If you still see truncation, check stop_reason/finish_reason (see Cause 4) before assuming a format bug.

Step 6: Write a format regression test

def test_agent_output_format():
    for inp in load_fixture("agent_format_test_inputs.json"):
        raw = run_agent(inp)
        result = parse_and_validate(raw)
        assert result.issues is not None
        assert result.severity in ("low", "medium", "high")

Run this in CI against the production model snapshot. When a model version updates and format regresses, CI catches it before users do.

How to confirm it’s fixed

Replay your last 100 failing inputs through the new code path; the parse-success rate should be at or near 100%.
Confirm the call actually uses structured outputs — log the request and verify response_format/output_config is present. (A common false fix is “tightened the prompt” while the unconstrained call is unchanged.)
Run the Step 6 regression test in CI against the pinned snapshot.
Watch the production parse-failure metric for 24 hours; it should sit well under 1%.

Prevention

Use the provider’s native structured outputs (OpenAI response_format with parse(), Anthropic output_config/messages.parse()) instead of relying on prompt instructions.
Where structured outputs are unavailable, harden prompts with explicit negative constraints (no prose, no fences, first/last character is a brace).
Validate output against a Pydantic schema immediately after parsing, before any downstream consumption.
Size max_tokens to the expected output, not the model maximum — truncation-induced parse failures are easy to prevent.
Pin a dated model snapshot in production so version bumps are deliberate; write format regression tests that run in CI against that snapshot.
Version your output schema explicitly; when it changes, update the agent’s schema and the parser together in the same commit.
Log the raw pre-parse string for every output that fails validation — you need the exact characters to diagnose format issues.
Monitor the parse-failure rate in production and alert when it exceeds 1% of outputs.

FAQ

Q: JSON mode, native structured outputs, or tool-calling — which should I use? A: Native structured outputs (OpenAI response_format with a JSON schema/Pydantic model; Anthropic output_config/messages.parse()) are the strongest option because generation is constrained to your schema at the token level, so non-conforming output cannot be produced. Plain “JSON mode” (response_format={"type": "json_object"}) only guarantees syntactically valid JSON, not your specific shape. Forced tool-calling still works well and is fine if you are already using it, but it is no longer the recommended default for structured data.

Q: Can I fix this without changing the agent call? A: The extract_json wrapper (Step 2) recovers most fence- and prose-wrapping cases in production. But it is a band-aid: it cannot fix truncated JSON or schema drift. Fix the root cause by switching the call to native structured outputs.

Q: Why did this start failing when I hadn’t changed any code? A: Almost certainly an automatic model-version update (Cause 3). If you were on a floating model alias, the provider rolled you onto a new checkpoint with different formatting habits. Pin a dated snapshot and add a CI format test so the next bump is caught before production.

Q: Does my first structured-output request being slow mean something is wrong? A: No. Anthropic compiles each new schema into a grammar on first use, which adds one-time latency; the compiled grammar is then cached for 24 hours from last use, so subsequent calls are fast. Reuse the same schema rather than regenerating it per request, and avoid changing field structure unnecessarily (changing only description text does not invalidate the cache).

Q: How do I handle streaming responses that need to be parsed? A: Buffer the full stream, then parse — partial streams produce fragmented JSON. Structured outputs work with streaming, but you still accumulate all events before deserializing. If you need real-time progress, emit explicit progress events ({"type": "progress", "pct": 50}) rather than partial result objects.

Tags: #AI coding #Agents #Troubleshooting