Cycle in Agent Call Graph Goes Undetected

Q: Does LangGraph prevent cycles?

No. LangGraph supports cycles on purpose — they are how you build retry and iterative-refinement loops — and it does not validate against cycles at definition time. Its built-in guard is `recursion_limit`, which defaults to `25` and raises `GraphRecursionError` ("Recursion limit of 25 reached without hitting a stop condition") once that many super-steps run without a stop. Set it explicitly per graph: `graph.invoke(state, {"recursion_limit": 50})`. Note it counts total steps, not unique node visits, so a wide graph can hit it legitimately.

Q: My subagents keep hitting the limit even though I raised `recursion_limit`. Why?

As of June 2026 a raised `recursion_limit` on a LangGraph parent graph is not propagated into subgraphs or `SubAgentMiddleware` subagents — they silently run at the default `25` ([deepagents #1698](https://github.com/langchain-ai/deepagents/issues/1698), [langgraphjs #1524](https://github.com/langchain-ai/langgraphjs/issues/1524)). Pass `recursion_limit` to each subgraph's own `invoke`/`stream` config. The OpenAI Agents SDK does not have this problem: `max_turns` is tracked across handoffs, so a single ceiling covers the whole chain.

Q: What is the right max depth or turn limit for a multi-agent chain?

Tune to your framework's units. For a call-path depth, `10` is generous and a chain deeper than that usually signals a routing bug, not real complexity — set the hard cap to `15` and alert above `8`. For OpenAI Agents SDK `max_turns` (default `10`, counts every model turn including tool calls), `12`-`20` covers most tool-using agents. For CrewAI, `max_iter` defaults to `25`, which is a major cost driver; drop it to `5`-`8` per agent.

Q: Can a DAG workflow ever produce a runtime cycle?

A static DAG cannot have a cycle by definition. But dynamic routing — where the next node is chosen at runtime from the current agent's output — can produce a cycle even in a "DAG" framework. Dynamic routing needs runtime cycle detection (the `call_path` approach), not just static graph analysis.

Q: AutoGen agents keep talking forever. How do I stop them?

AutoGen AgentChat teams have no turn limit by default — you must attach a `termination_condition`. Combine `MaxMessageTermination(n)` (a hard ceiling) with `TextMentionTermination("TERMINATE")` (a clean exit) using the `|` operator, and use `HandoffTermination` if agents hand off to a human or back to the orchestrator. Set both: the text condition exits cleanly when work is done, the message count is the safety net for the case where it never says "done."

Q: How do I implement a legitimate "refine until good enough" loop without cycle risk?

Use an explicit iteration counter, not a routing cycle: `while quality < threshold and iteration < 5: output = refine(output); iteration += 1`. It terminates at `5` regardless of quality. If quality is still below threshold at iteration `5`, fail and escalate rather than loop. Even better, add a no-progress check — bail early if the quality score did not improve between rounds.

Agents hand off to each other in a loop that never terminates because nothing checks for cycles. Here is how to catch the cycle, set the right framework limit, and bound the loop for good.

Published: May 25, 2026 Updated: Jun 17, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Your LangGraph or AutoGen orchestrator has a “planner” agent that delegates subtasks to a “researcher” and a “coder.” When the coder produces output the planner deems too abstract, the planner routes back to the researcher for more detail. The researcher asks the coder for a concrete example. The example is too abstract again. The loop runs 400 times before burning the token budget. Or an OpenAI Agents SDK pipeline (the successor to the now-archived Swarm) has agents that hand off with no real limit: Agent A to Agent B to Agent C back to Agent A. Each hop adds a little reasoning. After 200 hops the run has spent $40 and produced nothing useful. Often no exception is raised — the cycle just runs until something else breaks.

Fastest fix: every framework ships a hard turn/step ceiling — turn it down, do not turn it off. Set LangGraph recursion_limit (default 25), OpenAI Agents SDK max_turns (default 10), AutoGen MaxMessageTermination(...), or CrewAI per-agent max_iter (default 25, drop to 5-8). That stops the bleeding in minutes. Then thread a call_path through every invocation so the cycle fails loudly with the exact agent loop instead of dying on a generic limit. The rest of this page does both.

Built-in guard for each framework (as of June 2026)

Before you write any custom code, set the limit the framework already gives you. These are the current defaults and the exact symbol you change.

Framework	Knob	Default	What it raises	Where to set it
LangGraph	`recursion_limit` (steps, not nodes)	`25`	`GraphRecursionError`	`graph.invoke(state, {"recursion_limit": 50})`
OpenAI Agents SDK	`max_turns`	`10`	`agents.exceptions.MaxTurnsExceeded`	`Runner.run(agent, input, max_turns=12)`
AutoGen (AgentChat)	`MaxMessageTermination(n)` / team `max_turns`	none by default	team stops, returns result	`RoundRobinGroupChat(..., termination_condition=MaxMessageTermination(20))`
CrewAI	per-agent `max_iter`; crew `max_rpm`	`max_iter=25`	agent stops iterating	`Agent(..., max_iter=8)`

Two gotchas that bite people in 2026:

LangGraph counts steps, not visits. recursion_limit is the total number of super-steps, so a wide fan-out graph can hit 25 legitimately. Raise it for genuinely deep graphs; lower it when you suspect a cycle. The message reads exactly: Recursion limit of 25 reached without hitting a stop condition.
Subagent limits do not inherit. In LangGraph subgraphs and the deepagents / SubAgentMiddleware pattern, a raised recursion_limit on the parent is not propagated to subagents — they silently run at the default 25 (langchain-ai/deepagents #1698). Set the limit on each subgraph too. The OpenAI Agents SDK is the opposite: max_turns is tracked across handoffs, so one ceiling covers the whole chain.

Common causes

1. Conditional routing with no base case

A conditional edge in LangGraph routes to agent B when quality is below a threshold. Agent B’s output always scores just below the threshold because the threshold was set too strictly. The route fires every time, creating a cycle with no base case that ever evaluates to “move on.”

How to spot it: For every conditional routing function that can route backward to a prior node, check whether there is a code path that does NOT route backward. If all branches route backward or to a holding state that always routes backward, there is no base case.

2. Visited-node set not maintained across the call chain

Each agent invocation is stateless. Agent A calls Agent B, which calls Agent C, which calls Agent A. None of them check “have I been called in this chain before?” because the “visited” set is in-process memory that doesn’t persist between agent invocations.

How to spot it: Search for any “visited” set, “call stack,” or “depth counter” passed through the agent call chain. If these are absent or if they are stored only in the calling agent’s local variables (not passed forward), cycles are undetectable across invocation boundaries.

3. Agent routing decision is made by LLM with no depth constraint

The routing logic is: “Ask the LLM which agent should handle this next.” The LLM can produce any agent name, including the one that was just executing. With no depth limit or cycle-detection constraint injected into the routing prompt, the LLM can freely generate cycles.

How to spot it: Check whether the routing prompt includes the call history or depth. If the LLM receives only the current task and available agents (not the path taken to get here), it has no information to detect or avoid cycles.

4. Dynamic agent registration allows cycles at registration time

Agents register their “can delegate to” list at startup. Agent A says “can delegate to B, C.” Agent B says “can delegate to A, C.” This creates a valid cycle in the capability graph. The orchestrator doesn’t validate this graph for cycles at registration time — it only discovers cycles at runtime after they occur.

How to spot it: Build the delegation graph from agent registrations and run a cycle-detection algorithm (DFS with a recursion stack) on it at startup. If the graph has a cycle, the orchestrator should reject the registration.

5. Max-depth check was added but checked at the wrong layer

A depth < 10 guard was added to the routing function. But the routing function is called by a wrapper that catches the MaxDepthError and silently re-invokes with depth=0 to “retry the routing cleanly.” The depth counter resets, the guard never stops the cycle.

How to spot it: Trace every path where MaxDepthError (or equivalent) is caught. If any catch handler resets the depth counter rather than propagating the error, depth limiting is ineffective.

6. Agent spawns sub-agents that re-enter the same pipeline

Agent A is part of Pipeline P. It spawns a sub-agent that runs Pipeline P to handle a subtask. Pipeline P eventually spawns Agent A again. The recursion is across pipeline boundaries, which makes it invisible to any cycle detection within a single pipeline.

How to spot it: Check whether any agent in a pipeline can trigger the same pipeline (or another pipeline that triggers this one) as a sub-workflow. Cross-pipeline cycles are harder to detect but follow the same pattern.

Shortest path to fix

Step 1: Add cycle detection to the graph at definition time

def validate_no_cycles(edges: dict[str, list[str]]) -> None:
    """Raise if the agent delegation graph contains a cycle."""
    visited = set()
    recursion_stack = set()

    def dfs(node: str) -> bool:
        visited.add(node)
        recursion_stack.add(node)
        for neighbor in edges.get(node, []):
            if neighbor not in visited:
                if dfs(neighbor):
                    return True
            elif neighbor in recursion_stack:
                cycle_path = list(recursion_stack) + [neighbor]
                raise CycleDetectedError(
                    f"Cycle detected in agent graph: {' → '.join(cycle_path)}"
                )
        recursion_stack.discard(node)
        return False

    for node in edges:
        if node not in visited:
            dfs(node)

# Run at agent registration time:
AGENT_EDGES = {
    "planner": ["researcher", "coder"],
    "researcher": ["coder"],  # OK — no back-edge to planner
    "coder": [],              # leaf node
}
validate_no_cycles(AGENT_EDGES)

Step 2: Thread a call-path token through every agent invocation

import hashlib

@dataclass
class CallContext:
    run_id: str
    call_path: list[str]  # ordered list of agent names invoked so far
    max_depth: int = 20

    def enter_agent(self, agent_name: str) -> "CallContext":
        if agent_name in self.call_path:
            cycle = " → ".join(self.call_path + [agent_name])
            raise CycleDetectedError(f"Cycle detected: {cycle}")
        if len(self.call_path) >= self.max_depth:
            raise MaxDepthError(
                f"Max depth {self.max_depth} reached: {' → '.join(self.call_path)}"
            )
        return CallContext(
            run_id=self.run_id,
            call_path=self.call_path + [agent_name],
            max_depth=self.max_depth,
        )

# Pass context through every agent invocation:
def invoke_agent(agent_name: str, task: str, ctx: CallContext) -> str:
    child_ctx = ctx.enter_agent(agent_name)
    agent = AGENT_REGISTRY[agent_name]
    return agent.run(task, ctx=child_ctx)

Step 3: Inject call history into LLM routing prompts

def build_routing_prompt(task: str, call_path: list[str]) -> str:
    history = " → ".join(call_path) if call_path else "none"
    return f"""
You must choose the next agent to handle this task.

Task: {task}

Agents already invoked in this chain (DO NOT route back to any of these):
{history}

Available agents (choose one that has NOT already been invoked):
- researcher: gathers information
- coder: implements solutions
- reviewer: checks quality

Respond with ONLY the agent name. No other text.
"""

With call history in the prompt, the routing LLM has the information to avoid cycles.

Step 4: Add a hard depth limit at the orchestration layer

MAX_AGENT_DEPTH = 15

def run_agent_chain(task: str, depth: int = 0) -> str:
    if depth >= MAX_AGENT_DEPTH:
        raise MaxDepthError(
            f"Agent chain reached maximum depth {MAX_AGENT_DEPTH}. "
            "Possible cycle — review the routing logic."
        )
    agent_name = route_task(task)
    return invoke_agent(agent_name, task, depth=depth + 1)

The depth limit is a safety net independent of cycle detection. It catches cycles that escape the visited-set check, and it is the same idea as the framework knobs in the table above. Set both: the framework ceiling stops a runaway run, and the call_path check tells you which agents formed the loop.

Step 5: Test for cycles in CI using graph validation

# Run cycle detection as part of the test suite
python -m pytest tests/test_agent_graph.py -k "test_no_cycles" -v

def test_agent_delegation_graph_has_no_cycles():
    graph = build_agent_delegation_graph()
    with pytest.raises(CycleDetectedError):
        # Inject a known cycle and confirm detection works
        graph["coder"] = ["planner"]
        validate_no_cycles(graph)

def test_production_graph_is_acyclic():
    # The actual production graph must pass
    graph = PRODUCTION_AGENT_EDGES
    validate_no_cycles(graph)  # should not raise

How to confirm it is fixed

Run one cycle-inducing case and one normal case, and check three things:

The known cycle fails fast and names the loop. Feed an input that previously looped (or temporarily add a back-edge). You should get a CycleDetectedError whose message lists the agents, for example Cycle detected: planner → researcher → coder → planner. A generic GraphRecursionError or MaxTurnsExceeded that does not name the path means the framework limit caught it but your call_path check did not fire. Fix the placement so the cycle is named.
A normal run finishes well under the limit. Log the final len(call_path) (or LangGraph step count). A healthy pipeline should finish at a depth of 5 or below. If a working run is already near the limit, you do not have headroom and a small input change will trip it.
The framework ceiling is set, not disabled. Confirm recursion_limit / max_turns / max_iter is an explicit value in code, never None. In LangGraph, also confirm each subgraph sets its own recursion_limit, since the parent’s value is not inherited.

For a longer-term signal, log the depth distribution of every run. A tail that creeps to depth 10+ over a week is a cycle near-miss waiting to become a cycle.

Prevention

Set the framework’s built-in ceiling first and never disable it: LangGraph recursion_limit, OpenAI Agents SDK max_turns, AutoGen MaxMessageTermination, CrewAI max_iter. In LangGraph, set it on each subgraph too — it does not inherit.
Run cycle detection on the agent delegation graph at startup and reject any registration that creates a cycle.
Thread a call_path list through every agent invocation boundary; check for the current agent’s name in the path before executing.
Include the call history in every LLM routing prompt so the model has information to avoid routing back to already-visited agents.
Add a hard max-depth limit as a secondary safety net independent of cycle detection.
Write a CI test that validates the production agent graph is acyclic after every graph definition change.
For legitimately iterative patterns (e.g., refine-until-quality-passes), use an explicit iteration counter with a hard cap instead of routing edges — make the loop visible and bounded in the graph definition.
Monitor the depth distribution of agent call chains in production; a tail that grows to depth 10+ is a cycle-near-miss.
Distinguish between allowed cycles (explicit bounded retry loops with a counter) and unintended cycles (unbounded delegation loops) in your graph definition.

FAQ

Q: Does LangGraph prevent cycles? A: No. LangGraph supports cycles on purpose — they are how you build retry and iterative-refinement loops — and it does not validate against cycles at definition time. Its built-in guard is recursion_limit, which defaults to 25 and raises GraphRecursionError (“Recursion limit of 25 reached without hitting a stop condition”) once that many super-steps run without a stop. Set it explicitly per graph: graph.invoke(state, {"recursion_limit": 50}). Note it counts total steps, not unique node visits, so a wide graph can hit it legitimately.

Q: My subagents keep hitting the limit even though I raised recursion_limit. Why? A: As of June 2026 a raised recursion_limit on a LangGraph parent graph is not propagated into subgraphs or SubAgentMiddleware subagents — they silently run at the default 25 (deepagents #1698, langgraphjs #1524). Pass recursion_limit to each subgraph’s own invoke/stream config. The OpenAI Agents SDK does not have this problem: max_turns is tracked across handoffs, so a single ceiling covers the whole chain.

Q: What is the right max depth or turn limit for a multi-agent chain? A: Tune to your framework’s units. For a call-path depth, 10 is generous and a chain deeper than that usually signals a routing bug, not real complexity — set the hard cap to 15 and alert above 8. For OpenAI Agents SDK max_turns (default 10, counts every model turn including tool calls), 12-20 covers most tool-using agents. For CrewAI, max_iter defaults to 25, which is a major cost driver; drop it to 5-8 per agent.

Q: Can a DAG workflow ever produce a runtime cycle? A: A static DAG cannot have a cycle by definition. But dynamic routing — where the next node is chosen at runtime from the current agent’s output — can produce a cycle even in a “DAG” framework. Dynamic routing needs runtime cycle detection (the call_path approach), not just static graph analysis.

Q: AutoGen agents keep talking forever. How do I stop them? A: AutoGen AgentChat teams have no turn limit by default — you must attach a termination_condition. Combine MaxMessageTermination(n) (a hard ceiling) with TextMentionTermination("TERMINATE") (a clean exit) using the | operator, and use HandoffTermination if agents hand off to a human or back to the orchestrator. Set both: the text condition exits cleanly when work is done, the message count is the safety net for the case where it never says “done.”

Q: How do I implement a legitimate “refine until good enough” loop without cycle risk? A: Use an explicit iteration counter, not a routing cycle: while quality < threshold and iteration < 5: output = refine(output); iteration += 1. It terminates at 5 regardless of quality. If quality is still below threshold at iteration 5, fail and escalate rather than loop. Even better, add a no-progress check — bail early if the quality score did not improve between rounds.

Tags: #AI coding #Agents #Troubleshooting

Built-in guard for each framework (as of June 2026)

Common causes

1. Conditional routing with no base case

2. Visited-node set not maintained across the call chain

3. Agent routing decision is made by LLM with no depth constraint

4. Dynamic agent registration allows cycles at registration time

5. Max-depth check was added but checked at the wrong layer

6. Agent spawns sub-agents that re-enter the same pipeline

Shortest path to fix

Step 1: Add cycle detection to the graph at definition time

Step 2: Thread a call-path token through every agent invocation

Step 3: Inject call history into LLM routing prompts

Step 4: Add a hard depth limit at the orchestration layer

Step 5: Test for cycles in CI using graph validation

How to confirm it is fixed

Prevention

FAQ

Related

Related Articles

Agent Budget Exhausted Halfway Through the Task

Restored Agent Checkpoint Is Corrupted

Cost Tracking Misses Sub-Agent Usage

Agent Handoff Loses Context Between Steps

Agent Orchestrator Deadlocks Waiting on Each Other

Fix: Agent Output Leaks Secrets Into Logs and Git