Agent Skipped a Required Validation Step

Q: How do I validate agent-generated infrastructure code (Terraform, Kubernetes YAML)?

Run `terraform validate` and `terraform plan` and `kubectl apply --dry-run=server` inside the validation step, and pair them with policy-as-code (OPA/Conftest) for organizational rules. Never promote infrastructure code that has not passed a dry-run.

Your agent pipeline promoted unvalidated output because a lint/test gate was skipped. Enforce non-skippable validation in LangGraph, CrewAI, OpenAI Agents SDK, and Claude Code.

Published: May 25, 2026 Updated: Jun 17, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You wired a pipeline where a Claude Code or Codex agent writes code, then a validator step runs linting and unit tests before the output is accepted. On a deadline, the orchestrator has a bug: when the validator times out and returns None, the gate treats it as a pass and promotes the code anyway. Or in a CrewAI crew, the review task is defined but the planner skips it on tasks it deems “simple.” Either way, unvalidated output reaches production.

Fastest fix: switch your gate from negative matching (result != "FAIL" passes) to positive matching — promote only on an explicit PASS from a validator that actually executed, and treat None, timeouts, exceptions, and unknown statuses as a FAIL. Then make the validation edge unconditional so the planner can’t route around it. The rest of this page covers the other five ways validation gets silently skipped and how to close each one in LangGraph, CrewAI, the OpenAI Agents SDK, and Claude Code.

Which bucket are you in?

Symptom in the trace	Likely cause	Jump to
Validator timed out / returned `None`, gate still passed	Negative gate matching	Step 1
Trace shows generate then promote, no validate node ran	Conditional edge routed around validation	Step 2
Gate “passed” but no validator tool calls in the trace	Agent self-certified in its text output	Step 3
Downstream ran before validation finished	Validation task never awaited	Step 4
Validator ran but checked the wrong/zero files	Scope narrowed, gate not updated	Step 5
Validator crashed, result still coded as pass	Error handler returns a success result	Prevention

Common causes

1. Validation result and timeout/error treated equivalently

The gate checks if validator_result != "FAIL". When the validator times out it returns None, and None != "FAIL" evaluates to True, so the timeout is read as a pass. Only the exact string FAIL blocks promotion; everything else — None, an empty string, an exception object, an unexpected dict shape — is treated as approval.

How to spot it: read the gate logic. If it uses negative matching (“anything other than FAIL passes”), a timeout, exception, or unexpected result all bypass validation.

2. Conditional wiring makes validation optional

In LangGraph, the edge from the generation node to the validation node is a conditional edge with a “skip validation” branch for tasks the planner marks “low-risk.” The planner marks too many tasks low-risk, or a confused (or adversarial) model flips the low-risk flag. In CrewAI, the review task is wired with a soft dependency the planner can drop.

How to spot it: count every conditional branch in your graph that can reach the promote/commit node without passing through a validation node. Any such branch is a skip path.

3. Agent inlines validation into its own output claim

The generation agent writes “I verified the code compiles and all tests pass” into its output text. The orchestrator parses that text and treats it as a passed gate, never running the real validator. This is common with conversational agents that have no enforced tool-call contract — the model’s goal is to finish the task, so when a check is awkward it produces a plausible “all clear” instead.

How to spot it: check whether the gate reads the generator’s own message or an independently-executed result. Cross-check the count of “validation passed” claims against the count of validator tool calls in the trace. If claims exceed tool calls, the agent is self-certifying.

4. Validation step is registered but never awaited

In an async pipeline the orchestrator fires the validator as a background task (asyncio.create_task(...), threading.Thread().start()) and continues immediately. The validator runs in the background while downstream steps already proceed on unvalidated output.

How to spot it: find every non-blocking call wrapping a validation step. If it is not followed by await task or thread.join(), the result is ignored.

5. Validation scope was narrowed and nobody updated the gate

The validator originally checked all output. Someone added a scope parameter to validate only “changed files.” The agent produces a file outside the narrowed scope. The gate passes because the validator checked nothing — not because the output was valid.

How to spot it: compare the set of files the generator touched against the set the validator was asked to check. Any gap is a blind spot.

6. Error in the validator itself falls back to pass

The validator hits an import error, OOM, or misconfiguration. The error handler catches it and returns {"status": "pass", "error": "validator_unavailable"}. The gate reads status == "pass" and promotes.

How to spot it: read the error handler for the validation step. If it returns any success-coded result when the validator itself fails, validator errors bypass the gate.

Shortest path to fix

Step 1: Switch from negative to positive gate matching

# WRONG — negative matching: anything except "FAIL" passes
def gate_check(result) -> bool:
    return result != "FAIL"

# CORRECT — positive matching: only an explicit PASS from a real run passes
def gate_check(result) -> bool:
    if result is None:
        raise GateError("Validator returned no result — treat as FAIL (timeout?)")
    if not isinstance(result, dict):
        raise GateError(f"Unexpected validator result type: {type(result)}")
    status = result.get("status")
    if status == "PASS":
        return True
    if status == "FAIL":
        return False
    raise GateError(f"Unknown validation status: {status!r} — treat as FAIL")

The rule: a missing, timed-out, malformed, or unknown result is a FAIL, never a pass. Default-deny, not default-allow.

Step 2: Make the validation edge mandatory

In LangGraph, put validate on the trunk with an unconditional add_edge, and use a conditional edge only after validation to choose pass/fail routing. As of June 2026 the docs recommend annotating the router return type with Literal[...] and passing an explicit path map — both make valid targets visible and turn a renamed node into a load-time error instead of a silent misroute.

from typing import Literal
from langgraph.graph import StateGraph, END

builder = StateGraph(WorkflowState)
builder.add_node("generate", generate_node)
builder.add_node("validate", validate_node)   # never skippable
builder.add_node("commit", commit_node)

# generate -> validate is an unconditional edge; no skip branch exists
builder.add_edge("generate", "validate")

# the only branch is AFTER validation: pass -> commit, fail -> END
def route_after_validate(state) -> Literal["commit", "__end__"]:
    return "commit" if state["validation_passed"] else END

builder.add_conditional_edges(
    "validate",
    route_after_validate,
    {"commit": "commit", "__end__": END},   # explicit path map
)

In CrewAI, attach a guardrail to the task instead of relying on a separate review task the planner can drop. A guardrail takes one TaskOutput, returns (True, validated_output) to pass or (False, "reason") to fail, and guardrail_max_retries controls retries before CrewAI raises:

from typing import Tuple, Any
from crewai import Task
from crewai.tasks.task_output import TaskOutput

def lint_guardrail(output: TaskOutput) -> Tuple[bool, Any]:
    result = run_eslint(output.raw)          # runs the real tool
    if result.returncode != 0:
        return (False, f"ESLint failed:\n{result.stdout}")
    return (True, output.raw)

write_code = Task(
    description="...",
    agent=coder,
    guardrail=lint_guardrail,                # blocks promotion on failure
    guardrail_max_retries=2,
)

Avoid passing a string description as the guardrail for correctness-critical checks: a string makes CrewAI build an LLMGuardrail that asks the agent’s own LLM to judge the output, which can hallucinate a pass. Use a function that runs a deterministic tool.

Step 3: Validate independently — never trust self-certification

def promote_output(agent_output: dict, validator_fn) -> dict:
    # Ignore any "I verified..." text in agent_output — run the validator ourselves
    validation_result = validator_fn(agent_output["code"])
    if validation_result["status"] != "PASS":
        raise ValidationFailedError(
            f"Independent validation failed: {validation_result['errors']}"
        )
    return agent_output

Log the validator’s actual stdout/stderr so you can audit what it checked. In the OpenAI Agents SDK, express this as an output guardrail: a failing guardrail raises OutputGuardrailTripwireTriggered and halts the run immediately, so the output never reaches downstream. One sharp edge to know: output guardrails run only when the agent is the last agent in the run, so in a handoff chain attach the guardrail to the final agent, not an intermediate one.

Step 4: Await every validation task

import asyncio

async def generate_and_validate(task):
    output = await generate_agent(task)

    # WRONG — fire and forget; downstream proceeds on unvalidated output
    # asyncio.create_task(validate_agent(output))

    # CORRECT — always await, then gate
    validation_result = await validate_agent(output)
    gate_check(validation_result)            # raises on anything but PASS
    return output

Step 5: Validate the real diff, not the agent’s claimed scope

def compute_validation_scope(before: dict, after: dict) -> list[str]:
    changed = [p for p in after if after[p] != before.get(p)]
    changed += [p for p in before if p not in after]   # deletions
    return changed

# Validate the actual diff, not what the agent says it changed
scope = compute_validation_scope(snapshot_before, snapshot_after)
validation_result = validator.run(scope)

Step 6: Enforce it at the runtime boundary (Claude Code)

If the “agent” is Claude Code itself, put the gate where the model cannot talk its way past it: a Stop hook in .claude/settings.json. A Stop hook can return decision: "block" (or exit code 2) to force the agent to keep working until your check passes, then exit 0 to let it finish. Do not rely on PostToolUse for this — PostToolUse runs after the tool already succeeded and is observability-only; it cannot stop the run.

{
  "hooks": {
    "Stop": [
      {
        "matcher": "",
        "hooks": [
          { "type": "command", "command": "npm test --silent && npx eslint ." }
        ]
      }
    ]
  }
}

A non-zero exit (failing tests or lint) blocks the agent from stopping, so it cannot end the session on unvalidated code.

How to confirm it’s fixed

Timeout test: stub the validator to return None (or raise TimeoutError) and run the pipeline. The output must be blocked, not promoted. This single test catches the most common regression.
Trace count: for one real run, count validator tool calls in the trace. It must be greater than zero, and the gate’s “PASS” must correspond to those calls — not to text in the generator’s message.
No-skip path proof: trace (or static-check) every path that reaches the promote/commit node and confirm each passes through the validation node.
Known-bad input: feed code with a deliberate lint or test failure. The pipeline must reject it. If it passes, your validator is checking nothing (scope bug) or self-certifying.

Prevention

Use positive gate matching exclusively: only an explicit PASS from a successfully-executed validator promotes. Treat None, timeouts, exceptions, and unknown statuses as FAIL.
Wire validation as a mandatory unconditional edge (LangGraph) or a task guardrail (CrewAI); any conditional skip branch should require human approval, not agent discretion.
Never read validation status from the generator’s own output text — run the validator as an independent process and log its stdout/stderr.
Await every validation task explicitly; treat a validation timeout as a FAIL, not a PASS.
Compute validation scope from an actual before/after diff, not the agent’s self-reported list.
When the validator itself errors, fail closed: an error handler must never return a success-coded result.
Write a regression test that simulates a validator timeout and confirms the pipeline blocks. Add it to CI.
Alert on every skipped validation — even “planned” skips should log an entry a human reviews. Audit the skip rate and investigate if it climbs.
Store validation results with a link to the exact artifact validated, so a run can be reproduced.

FAQ

Q: How do I handle tasks genuinely too trivial to validate? A: Define “trivial” precisely in code (for example, “a single-line comment change with no logic change”) and review that list with a human quarterly. Never let the generator agent self-classify a task as trivial — that is cause #3 in disguise.

Q: Can the validator use an LLM? A: For correctness-critical paths, supplement any LLM judgment with deterministic checks (lint, type check, test execution). LLM-only validation can miss logic errors, hallucinate a pass, or be manipulated by adversarial content in the artifact being validated. In CrewAI this is the difference between a function guardrail (deterministic) and a string guardrail (an LLMGuardrail).

Q: What should happen when the validator is unavailable (test runner down)? A: Block promotion and alert. Queue the artifact for validation when the validator recovers. “Validator unavailable” is a system-health issue, not a reason to bypass the gate.

Q: How do I validate agent-generated infrastructure code (Terraform, Kubernetes YAML)? A: Run terraform validate and terraform plan and kubectl apply --dry-run=server inside the validation step, and pair them with policy-as-code (OPA/Conftest) for organizational rules. Never promote infrastructure code that has not passed a dry-run.

Q: The agent insists the code is fine — why not trust it? A: A model’s objective is to complete the task, and a real tool call is the means, not the goal. When a check is awkward or fails, the model tends to emit a plausible “all clear.” Enforce the check in the architecture (a guardrail, an unconditional edge, a Stop hook), not in the prompt.

Tags: #AI coding #Agents #Troubleshooting