Agent Skipped a Required Validation Step

Your agent bypassed a critical validation gate and pushed bad output downstream. Learn to enforce non-skippable checkpoints in any agent pipeline.

You set up a code-generation pipeline where a Claude Code or Codex agent writes code, then a “validator agent” is supposed to run linting and unit tests before output is accepted. On a deadline, the orchestrator logic has a bug: when the validator agent returns a timeout error, the orchestrator treats it as a pass and promotes the output anyway. The unvalidated code reaches production. Or in a CrewAI pipeline, the “review crew member” is defined but the task dependency is wired as optional, so the planner agent skips it entirely on tasks it deems “simple.” Validation steps that can be skipped will be skipped.

Common causes

1. Validation result and timeout/error treated equivalently

The orchestrator checks if validator_result != "FAIL". When the validator times out, it returns None. None != "FAIL" evaluates to True, so the orchestrator treats the timeout as a pass. Only explicit “FAIL” strings block promotion; everything else is treated as approval.

How to spot it: Read the gate-checking logic. If it uses negative matching (“anything other than FAIL = pass”), a timeout, exception, empty string, or unexpected result all bypass validation.

2. Conditional wiring makes validation optional

In LangGraph, the edge from the generation node to the validation node is a conditional edge with a “skip validation” branch for tasks the planner marks as “low-risk.” The planner marks too many tasks as low-risk, or an attacker/confused model manipulates the low-risk flag.

How to spot it: Count the conditional branches in your graph that bypass validation nodes. Any branch that routes around a validation node without human approval is a potential skip path.

3. Agent inlines validation into its own output claim

The generation agent includes “I verified the code compiles and all tests pass” in its output text. The orchestrator parses this text and interprets it as a passed validation gate, never running the actual validator. The agent self-certifies.

How to spot it: Check whether your gate logic reads the agent’s text output or checks an independently-executed validation result. If the gate reads anything from the generator’s own output message, it can be self-certified.

4. Validation step is registered but never awaited

In an async pipeline, the orchestrator fires the validator as a background task but immediately continues to the next step without waiting for the result. The validator runs (or doesn’t run) in the background while downstream steps already proceed.

How to spot it: Check for asyncio.create_task(), threading.Thread().start(), or similar non-blocking calls wrapping validation steps. If the call is not followed by await task or thread.join(), the result is ignored.

5. Validation scope was narrowed and nobody updated the gate logic

The validator originally checked all output. Then someone added a scope parameter to validate only “changed files.” An agent produces a changed file that falls outside the narrowed scope. The gate passes because the validator checked nothing, not because the output was valid.

How to spot it: Compare the set of files the generator agent touched against the set of files the validator was asked to check. Any gap is a validation blind spot.

6. Error in the validation agent itself causes a fallback to pass

The validator agent hits an import error, OOM, or misconfiguration. Your error handler catches the exception and returns {"status": "pass", "error": "validator_unavailable"}. The gate reads status == "pass" and promotes the output.

How to spot it: Check your error handler for the validation step. If it returns any success-coded result when the validator itself fails, errors in the validator bypass the gate.

Shortest path to fix

Step 1: Switch from negative to positive gate matching

# WRONG — negative matching: anything except "FAIL" passes
def gate_check(result) -> bool:
    return result != "FAIL"

# CORRECT — positive matching: only explicit "PASS" passes
def gate_check(result) -> bool:
    if result is None:
        raise GateError("Validator returned no result — possible timeout")
    if not isinstance(result, dict):
        raise GateError(f"Unexpected validator result type: {type(result)}")
    status = result.get("status")
    if status == "PASS":
        return True
    if status == "FAIL":
        return False
    raise GateError(f"Unknown validation status: {status!r}")

Step 2: Make validation edges mandatory in the graph definition

# LangGraph — unconditional edge from generate to validate
# NO conditional skip branch
graph.add_edge("generate", "validate")
graph.add_edge("validate", "promote_or_reject")

# In the validate node, check result and route
def validate_node(state):
    result = run_validator(state["output"])
    if result["status"] != "PASS":
        return {**state, "validation_failed": True, "validation_errors": result["errors"]}
    return {**state, "validation_failed": False}

# Conditional only AFTER validate — never skip validate
graph.add_conditional_edges(
    "promote_or_reject",
    lambda s: "reject" if s["validation_failed"] else "accept"
)

Step 3: Validate independently — never trust agent self-certification

def promote_output(agent_output: dict, validator_fn) -> dict:
    # Ignore any "I verified..." text in agent_output
    # Always run the validator independently
    validation_result = validator_fn(agent_output["code"])
    if validation_result["status"] != "PASS":
        raise ValidationFailedError(
            f"Independent validation failed: {validation_result['errors']}"
        )
    return agent_output

Log the validator’s actual stdout/stderr so you can audit what it checked.

Step 4: Await every validation task explicitly

import asyncio

async def generate_and_validate(task):
    output = await generate_agent(task)

    # WRONG — fire and forget
    # asyncio.create_task(validate_agent(output))

    # CORRECT — always await
    validation_result = await validate_agent(output)
    gate_check(validation_result)
    return output

Step 5: Validate by diffing scope, not just agent-reported scope

def compute_validation_scope(before_snapshot: dict, after_snapshot: dict) -> list[str]:
    changed = []
    for path in after_snapshot:
        if after_snapshot[path] != before_snapshot.get(path):
            changed.append(path)
    for path in before_snapshot:
        if path not in after_snapshot:
            changed.append(path)  # deleted files
    return changed

# Always validate the actual diff, not what the agent claims it changed
scope = compute_validation_scope(snapshot_before, snapshot_after)
validation_result = validator.run(scope)

Prevention

  • Use positive gate matching exclusively — only an explicit “PASS” from a successfully-executed validator allows promotion.
  • Wire validation as a mandatory unconditional edge in your workflow graph; conditional skip branches should require human approval, not agent discretion.
  • Never read validation status from the generator agent’s own output text — always run the validator as an independent process.
  • Await every validation task explicitly; treat a validation timeout as a FAIL, not a PASS.
  • Compute validation scope from an actual before/after diff of affected files, not from the agent’s self-reported list.
  • Write a test that simulates a validator timeout and confirms the pipeline blocks, not promotes.
  • Alert whenever validation is skipped for any reason — even “planned” skips should generate a log entry reviewed by a human.
  • Store validation results with a link to the exact artifact that was validated; auditors should be able to reproduce the validation run.

FAQ

Q: How do I handle tasks that are genuinely too trivial to validate? A: Define “trivial” criteria precisely in code (e.g., “output is a single-line comment change with no logic change”) and have a human review that criteria list quarterly. Never let the generator agent self-classify a task as trivial.

Q: Can the validator agent itself use an LLM for validation? A: Yes, but LLM-based validation must be supplemented with deterministic checks (linting, type checking, test execution) for correctness-critical paths. LLMs can miss logic errors, hallucinate pass results, or be manipulated by adversarial input in the artifact being validated.

Q: What should happen when a validation step is unavailable (e.g., the test runner is down)? A: Block promotion and alert. Do not skip validation because the validator is unavailable. Queue the artifact for validation when the validator recovers. “Validator unavailable” is a system health issue, not a reason to bypass the gate.

Q: How do I validate agent-generated infrastructure code (Terraform, Kubernetes YAML)? A: Run terraform validate / terraform plan --dry-run and kubectl apply --dry-run=server as part of the validation step. Pair with policy-as-code tools like OPA/Conftest for organizational policy checks. Never promote infrastructure code that has not passed a dry-run.

Tags: #AI coding #Agents #Troubleshooting