Agent Handoff Loses Context Between Steps

Q: Does LangGraph handle context passing automatically?

Within a single run, yes: LangGraph threads the full `State` object between nodes, so any field you define in the schema survives. It does not automatically persist *across* runs. For that you compile with a checkpointer. Note the rename: the in-memory saver is now `InMemorySaver` (the old `MemorySaver` name); use `SqliteSaver` or the Postgres saver for anything that must survive a restart.

A downstream agent re-asks answered questions or contradicts earlier decisions. Find the lossy handoff boundary and wire durable, structured state in under an hour.

Published: May 25, 2026 Updated: Jun 17, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Agent A researches a codebase and writes a detailed analysis, then hands off to Agent B to implement the changes. Agent B starts fresh, re-asks clarifying questions Agent A already answered, and produces output that contradicts decisions made three steps back. Or in an AutoGen/AG2 group chat, the coding assistant ignores the planner’s chosen architecture because the message thread was summarized and key constraints got stripped. The handoff boundary is a lossy compression point: if you do not explicitly serialize state, downstream agents are flying blind.

Fastest fix (start here): stop passing handoff data as a prose summary. Define one typed object (a Pydantic model or LangGraph TypedDict field), put the full decisions, file paths, and constraints in it, and inject it into the next agent’s prompt through an explicit {handoff_context} slot. Write any large artifact (file dumps, test logs) to a shared store and pass only the key. That single change resolves the majority of handoff losses; the rest of this page is for diagnosing which of the six buckets you are actually in.

Which bucket are you in?

Symptom you observe	Most likely cause	Jump to
Agent B gets a one-line “task done” instead of details	Trimmed summary, not structured state	Cause 1
Long file/log content ends mid-line in the next message	No shared store; message truncation	Cause 2
Next agent’s prompt has no place to receive context	Prompt template missing a context slot	Cause 3
Works locally, breaks in serverless/multi-worker	Fresh instance per invocation; no history	Cause 4
Intermittent loss, only under load or concurrency	Race / out-of-order context arrival	Cause 5
A field is present but wrong type or empty	Serialization mismatch or wrong key name	Cause 6

Common causes

1. Context passed as a trimmed summary instead of structured state

The most common culprit. The orchestrator condenses Agent A’s output to fit the next model’s context window, and the summarization loses specifics: file paths, chosen libraries, rejected alternatives, error messages. Agent B receives “analyze authentication issues” instead of “line 47 of src/auth/jwt.ts uses HS256 with a hardcoded salt; switch to RS256 with env-loaded keys.”

In AutoGen/AG2 this often happens silently because the group-chat manager’s summary method defaults to an LLM reflection (reflection_with_llm) whose prompt is literally “Summarize the takeaway from the conversation.” That prompt drops exact strings on purpose.

How to spot it: compare the raw output of Agent A against what Agent B actually received in its first message. If the handoff message is a prose paragraph where Agent A’s output was structured JSON or a code block, compression happened.

2. Stateless tool design with no shared memory store

In frameworks like CrewAI or AutoGen, agents default to passing data through chat messages. Long tool outputs (file reads, test logs, API responses) exceed what fits cleanly in a message, get truncated, and critical lines fall off the end. There is no external store being written to.

How to spot it: search your framework’s message list for truncation markers like ... [truncated], [output clipped], or sudden silent cutoffs. Count the characters in each handoff message and compare against the model’s context limit.

3. Prompt template has no “prior decisions” slot

The next agent’s system prompt has no placeholder for accumulated context. The orchestrator calls it with system_prompt.format(task=task) and never injects prior_decisions, constraints, or artifacts. The agent starts from scratch by design.

How to spot it: open the prompt template for every agent in the pipeline. If none of them reference a context, prior_decisions, or handoff placeholder, context injection is missing entirely.

4. Framework resets conversation history on each invocation

Some orchestration setups, especially stateless AWS Lambda or Cloud Run execution, create a fresh agent instance per invocation. Each agent call has zero conversation history. Any context must be explicitly passed in the input payload; nothing is implicit. This is exactly the “works on my laptop, breaks in prod” class of bug: a single local process shares memory, but multiple workers do not.

How to spot it: print the message-history length before each agent call (print(len(agent.memory.messages)) in LangChain, or inspect the run input in the OpenAI Agents SDK trace). If it always prints 0 or 1, history is not persisting.

5. Race condition in async pipelines, out-of-order context arrival

In Temporal workflows or Inngest async steps, Agent B may start executing before Agent A’s final artifact write has completed. It reads a partial or empty context store and proceeds with stale or empty context.

How to spot it: check workflow step dependencies. If Agent B’s step lists Agent A’s step as optional, or does not await it explicitly, the dependency is not enforced.

6. Serialization mismatch or wrong key name between agents

Agent A writes context as a Python dataclass or Pydantic model. The orchestrator serializes it to JSON and loses fields that are not JSON-serializable (datetimes become strings, enums become ints, nested objects flatten). Or the writer sets state["research_notes"] and the reader looks up state["notes"]; in an untyped LangGraph TypedDict that silently returns None instead of raising.

How to spot it: diff the object Agent A writes against the object Agent B reads. Any field that changed type, disappeared, or came back None is a serialization or key-name casualty.

Shortest path to fix

Step 1: Add a structured context object to every handoff

Replace freeform message passing with a typed handoff envelope:

from dataclasses import dataclass, asdict
import json

@dataclass
class HandoffContext:
    task_id: str
    goal: str
    decisions: list[dict]      # [{"decision": "...", "rationale": "..."}]
    artifacts: dict[str, str]  # {"name": "store-key-or-path"}
    constraints: list[str]
    prior_errors: list[str]

payload = json.dumps(asdict(ctx), default=str)  # default=str keeps datetimes

Pass this as the first user message, or inject it into the system prompt via a {handoff_context} slot. Prefer Pydantic over a bare dataclass when you want the boundary to reject missing fields with a ValidationError instead of passing None downstream.

Step 2: Write large artifacts to a shared store, pass references

Never inline file contents into a message. Write them to a shared store and pass the key:

import uuid, redis

r = redis.Redis()

def store_artifact(content: str) -> str:
    key = f"artifact:{uuid.uuid4()}"
    r.set(key, content, ex=3600)  # 1-hour TTL
    return key

# Agent A writes:
handoff.artifacts["analysis"] = store_artifact(analysis_text)

# Agent B reads:
analysis = r.get(handoff.artifacts["analysis"]).decode()

Redis, S3, or a temp file on a shared volume all work. The principle: messages carry references, not payloads.

Step 3: Audit every prompt template for a context-injection slot

# List agent prompt files that have NO context placeholder
grep -rL "context\|prior_decisions\|handoff" ./prompts/ ./agents/

For each file found, add a slot:

You are continuing work started by a prior agent. You are step 2 of 4.
Prior context (do not re-collect this):
{handoff_context}

Your task:
{task}

Numbering the step (“step 2 of 4”) measurably reduces the “let me start over” behavior, because the model is told it is mid-pipeline rather than a standalone assistant.

Step 4: Use the framework’s real handoff primitive, not ad-hoc messages

As of June 2026, the major frameworks each have a first-class way to carry context across a handoff. Use it instead of hand-rolling string passing.

LangGraph (use Command, and persist with a checkpointer):

from langgraph.types import Command

def agent_a(state):
    # Command carries the state update AND the routing target together
    return Command(
        goto="agent_b",
        update={"handoff_context": ctx},  # structured, not a summary
    )

# Persist across runs. Note: MemorySaver was renamed InMemorySaver.
from langgraph.checkpoint.sqlite import SqliteSaver   # survives restarts
graph = builder.compile(checkpointer=SqliteSaver.from_conn_string("state.db"))

InMemorySaver is for local dev only (lost on restart). Use SqliteSaver/AsyncSqliteSaver for single-node persistence and the Postgres saver for multi-worker production.

OpenAI Agents SDK (Swarm is deprecated; the SDK is the successor, v0.17.x as of mid-2026). Context is a typed object you pass to Runner.run() and it reaches every agent, tool, and handoff:

from agents import Agent, Runner

result = await Runner.run(planner, input=task, context=my_typed_context)

CrewAI: force structured output upstream and declare the dependency with context:

research = Task(description="...", output_pydantic=ResearchNotes, agent=researcher)
write    = Task(description="...", context=[research], agent=writer)  # gets validated JSON

Step 5: Enforce handoff ordering in your orchestration layer

In LangGraph, use an explicit edge so B cannot start until A completes:

graph.add_edge("agent_a", "agent_b")
# NOT a conditional edge with a default fallthrough that can skip A

In Temporal, await each activity and pass its return value into the next one (never rely on shared worker memory across activities):

analysis = await workflow.execute_activity(
    agent_a_activity, task, schedule_to_close_timeout=timedelta(minutes=5)
)
result = await workflow.execute_activity(
    agent_b_activity, analysis, schedule_to_close_timeout=timedelta(minutes=5)
)

Step 6: Log the handoff payload for every run

import logging
logger = logging.getLogger("handoff")

def handoff(ctx, next_agent: str):
    logger.info("HANDOFF to %s: %s", next_agent, json.dumps(asdict(ctx), default=str))

This creates a searchable audit trail. When context loss happens, you can diff what was sent against what was received. If you run a tracer (LangSmith, Langfuse), log a one-line digest per node, for example node=writer fields=4 in_tokens=1820, so you can scan a trace and see the field count drop at the bad boundary.

How to confirm it’s fixed

Add an assertion at the receiving agent’s entry point that the required fields are present and non-empty, for example assert state.get("research_notes"), "handoff missing research_notes". A run that previously lost context now fails loudly instead of silently.
Write one integration test that runs A then B and asserts B’s first prompt contains a known string Agent A produced (a specific file path or error code), not just that each agent works in isolation.
Re-run the original failing scenario. Agent B should act on the prior decisions without re-asking, and should not contradict an earlier choice.

Prevention

Define a typed HandoffContext (Pydantic preferred) before writing any agent; treat it like an API contract between agents and let validation reject missing fields at the boundary.
Store large artifacts externally (Redis, S3, disk); pass only keys or URIs in agent messages, with a fixed token budget (for example 4,000 tokens) above which you compress explicitly rather than truncate silently.
Add a {handoff_context} slot to every agent’s system prompt, even the first agent, so the slot is always present when you add upstream agents later. Number each step (step 2 of 4) so models know they are mid-pipeline.
Set explicit step dependencies in your orchestration layer; never rely on timing or ordering by convention.
Keep a decisions log that every agent appends to rather than replaces, so downstream agents see all prior reasoning.
Run a per-node structured log or trace in development; keep a compact field-count digest in production for diffing.

FAQ

Q: Does LangGraph handle context passing automatically? A: Within a single run, yes: LangGraph threads the full State object between nodes, so any field you define in the schema survives. It does not automatically persist across runs. For that you compile with a checkpointer. Note the rename: the in-memory saver is now InMemorySaver (the old MemorySaver name); use SqliteSaver or the Postgres saver for anything that must survive a restart.

Q: Our agents use different models. Does that affect context loss? A: Yes. Different models have different context windows, and an over-eager orchestrator will silently trim the handoff to fit the smaller one. As of June 2026 most frontier models are roomy (Claude Opus 4.7 and Sonnet 4.6 and Gemini 3.1 Pro are 1M-token; GPT-5.5 is large but in-app limits vary by plan), so the usual fix is to stop inlining giant artifacts rather than to upgrade a model. Always size the handoff against the smallest model in the chain.

Q: How big is too big for an inline handoff message? A: Keep inline messages under roughly 2,000 tokens. Anything larger should be stored externally and referenced by key. This keeps handoff messages fast to inspect, avoids truncation, and makes logging practical.

Q: Isn’t OpenAI Swarm the standard way to do handoffs? A: Not anymore. Swarm was an educational/experimental project and is deprecated; OpenAI’s Agents SDK is the production successor (the repo redirects you there). New work should use the Agents SDK, where context is a typed object passed to Runner.run() and forwarded to every agent and handoff.

Q: Can I use a vector store for handoff context? A: You can, but it introduces retrieval non-determinism: Agent B may not recall the exact constraints Agent A set. For hard constraints (architecture decisions, rejected options, error signatures), use structured JSON in a reliable key-value store. Use vector search only for large reference corpora where fuzzy retrieval is acceptable.

Q: Why does it work locally but lose context in production? A: Local runs are usually a single process sharing memory; production spreads agents across workers, containers, or serverless invocations that share nothing. Any cross-step data must be passed explicitly (as a return value, a checkpointer entry, or a shared-store key), never assumed to be in memory.

Tags: #AI coding #Agents #Troubleshooting