Agent Trace Missing a Tool Call: Find the Gap and Fix It

Q: A tool call's inputs show up inside the LLM span but there's no separate tool span. Why?

The model emitted the `tool_use` request (so its args appear in the parent span), but the execution ran outside the traced path — usually a direct Python call instead of a registered tool. Route it through the executor or a `@traceable`/`@tool` wrapper (Step 1). This is the single most common pattern in LangGraph reports.

Q: Can I reconstruct what a missing tool call did?

Partly, from side effects. Check git history (`git log --all --diff-filter=D -- path`), database audit logs, and OS file-access logs (`fs_usage` on macOS, `auditd` on Linux). These show what happened, not the agent's intent or exact inputs — which is why you instrument at the source.

Q: How do I trace tool calls across microservices?

Use distributed tracing with W3C TraceContext. Pass the `traceparent` header on every inter-service HTTP call. OpenTelemetry with a Jaeger or Tempo backend is the standard; all services emit spans linked by one trace ID, and tool executions follow the `execute_tool {tool_name}` GenAI span convention.

Your LangSmith or Langfuse trace shows a result but no tool call span. Here's how to find which of seven causes you hit and make traces complete again.

Published: May 25, 2026 Updated: Jun 17, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Your agent’s final output says “I deleted the old migration files and ran the schema update,” but the trace shows zero delete_file calls and no database tool invocation. The action happened — the files are gone, the schema changed — but the trace has no record of how. You cannot audit what was deleted, replay the run, or prove the agent followed the right procedure.

Fastest fix: in 90% of cases the side-effecting work ran outside the framework’s traced path — a plain Python function called directly, an asyncio.create_task() that the tracer didn’t follow, or a flush() that never ran before a serverless process exited. Confirm which one with the decision table below, then jump to the matching step.

One currency note first: if you are on LangSmith and tracing stopped capturing tool spans recently, check your env var. As of June 2026 the canonical flag is LANGSMITH_TRACING=true; the old LANGCHAIN_TRACING_V2=true still works but is deprecated, and projects that half-migrated sometimes set neither.

Which bucket are you in?

Symptom in the trace	Most likely cause	Go to
LLM span shows the `tool_use`/tool-call args, but no separate child span for the tool	Tool ran as a direct Python call, not through the executor	Step 1
Outer call returns instantly; the real work has no span	Work moved into an async task / thread the tracer didn’t follow	Step 2
Half-entry: `tool_start` present, `tool_end` missing	Exception swallowed before the end callback fired	Step 3
App-side tool count is higher than the platform’s count, always by a few	Spans not flushed before process exit (serverless/Lambda)	Step 4
Agent ran code that touched files/DB, nothing in the registry log	Unregistered function called via `exec`/code execution	Step 5
Counts dropped after an SDK bump, or only some tool types appear	SDK version mismatch / sampling drops	Steps 6-7

Common causes

1. Tool ran outside the traced code path

The most common cause. A helper that “just runs a quick shell command” is called directly from Python instead of going through the agent’s tool executor, so it bypasses the tracer. The effect happens; the trace records nothing.

In LangChain/LangGraph this shows up as a specific tell: the parent LLM span still contains the tool_use block (the model’s request to call the tool), but there is no separate execute_tool child span for the actual execution. The model asked, something ran, and nothing recorded the run.

How to spot it: list every function that performs side effects (file writes, subprocess calls, HTTP requests, DB mutations). For each, check whether it is invoked through the framework’s tool executor (a @tool-decorated callable, a Tool/StructuredTool, or a @traceable function) or called directly as a plain Python function. Any direct call is invisible.

2. Tool runs in an async task or thread the tracer didn’t follow

A tool fires asyncio.create_task(), threading.Thread().start(), or concurrent.futures.submit() and the real work runs in a separate execution context. Tracers carry their current span through contextvars; a new task or thread does not inherit that context unless you copy it. The outer call shows up as an instant return; the work shows as an orphan span (no parent) or nothing at all.

How to spot it: search for asyncio.create_task(, Thread(, or .submit( inside any side-effecting function. In the Langfuse/LangSmith UI, look for “orphan” spans with no parent trace — those are context-loss cases.

3. Exception swallowed before the end callback fires

The tool raises mid-execution and a broad except Exception: pass eats it before the tracer’s on_tool_end callback runs. You get a tool_start with no matching tool_end — a half-entry, or nothing.

How to spot it: grep for except Exception: pass and except Exception: continue in tool wrappers. If the on_tool_end callback sits inside a try that can be skipped, expect incomplete entries.

4. Spans never flushed before the process exited

Langfuse and LangSmith batch spans and send them in the background to keep your app fast. If the process exits before the queue drains, the last spans are lost. This bites hardest in short-lived contexts: AWS Lambda, Cloud Run, Vercel/Cloudflare Workers, batch scripts, and Jupyter cells.

How to spot it: the app-side count of tool calls is consistently a few higher than what the platform shows, and the missing ones are always near the end of the run.

5. Agent called an unregistered function via code execution

In LangGraph, CrewAI, or any agent with execute_code/run_python, the model can write Python that calls a function directly, never going through the tool registry. The framework traces tool invocations, not arbitrary code.

How to spot it: check whether the agent has any code-execution capability. If it does, any sensitive function it can import is callable without a registry entry and without a span.

6. Tracing SDK is version-pinned and missing newer span types

Your stack pins LangSmith (or an OpenTelemetry GenAI instrumentation) to an old release that predates a tool/span type the agent now emits. The old SDK silently drops events it doesn’t recognize. The June 2026 OpenTelemetry GenAI semantic conventions emit tool execution as a span named execute_tool {gen_ai.tool.name} with gen_ai.operation.name = execute_tool; an instrumentation older than that convention may not produce the span at all.

How to spot it: compare the SDK version in requirements.txt against the release notes for the version that added the span types you rely on.

7. Sampling dropped the span

A production tracer samples below 100% to cut cost, and a low-probability but critical call (a destructive delete) gets sampled out. It executes; it leaves no record.

How to spot it: check the sampler config. Any sample rate below 1.0 means destructive or security-sensitive calls have incomplete coverage.

Shortest path to fix

Step 1: Route every side-effecting call through the tracer

Wrap raw functions so the framework sees them. In LangChain/LangGraph, use the @tool decorator or StructuredTool; for arbitrary code, use @traceable.

from langchain_core.tools import tool, StructuredTool

# Before — direct call, no span
def delete_file(path: str):
    os.remove(path)

# After — traced as an execute_tool span
@tool
def delete_file(path: str) -> str:
    """Delete a file at the given path."""
    os.remove(path)
    return f"deleted {path}"

# Or register an existing function explicitly
delete_tool = StructuredTool.from_function(
    func=delete_file, name="delete_file",
    description="Delete a file at the given path",
)

For side effects outside any framework, decorate the source function so it always emits a span no matter who calls it:

import functools
from langsmith import traceable

@traceable(run_type="tool", name="delete_file")
def delete_file(path: str):
    os.remove(path)

Apply this to every function that writes files, runs subprocesses, calls APIs, or mutates databases.

Step 2: Propagate trace context into async tasks

A new task or thread does not inherit the current span. Copy the context, or use a decorator that does it for you (@observe in Langfuse, @traceable in LangSmith handle this when you stay inside the decorated coroutine).

import asyncio, contextvars

async def spawn_tool_task(tool_fn, **kwargs):
    ctx = contextvars.copy_context()  # carry the active span
    loop = asyncio.get_running_loop()
    return await loop.run_in_executor(None, lambda: ctx.run(tool_fn, **kwargs))

Do not fire-and-forget a coroutine that performs side effects. Trace it and await it:

from opentelemetry import trace as otel_trace
tracer = otel_trace.get_tracer(__name__)

async def traced_async_tool(tool_name: str, coro):
    with tracer.start_as_current_span(f"execute_tool {tool_name}") as span:
        span.set_attribute("gen_ai.tool.name", tool_name)
        try:
            result = await coro
            span.set_attribute("gen_ai.tool.result", str(result)[:500])
            return result
        except Exception as e:
            span.record_exception(e)
            raise

result = await traced_async_tool("run_migration", run_migration_coro())

Step 3: Never swallow exceptions in tracer callbacks

# WRONG — silently drops the trace event
def on_tool_end(self, output, **kwargs):
    try:
        self.log_tool_output(output)
    except Exception:
        pass

# CORRECT — at minimum log, so you know tracing failed
def on_tool_end(self, output, **kwargs):
    try:
        self.log_tool_output(output)
    except Exception as e:
        logger.error("Tracer failed to record tool_end: %s", e)

Step 4: Flush before the process exits

Force the queue to drain at the end of every short-lived run. Both SDKs expose flush(); call shutdown() if you are tearing the client down.

from langfuse import get_client
langfuse = get_client()

# AWS Lambda / Cloud Run — flush in a finally block
def lambda_handler(event, context):
    try:
        return run_agent(event)
    finally:
        langfuse.flush()  # must complete before the function returns

On Vercel and Cloudflare Workers, the background send is killed when the response returns. Keep the runtime alive until the flush resolves with waitUntil:

// Vercel / Cloudflare Workers
ctx.waitUntil(langfuse.flushAsync());

For long-lived servers you do not need a manual flush, but still register an exit hook:

import atexit, signal, sys
atexit.register(langfuse.flush)
signal.signal(signal.SIGTERM, lambda *_: (langfuse.flush(), sys.exit(0)))

Step 5: Force all tools through the registry

ALLOWED_TOOLS = {"delete_file", "write_file", "run_bash", "call_api"}

def execute_tool(tool_name: str, inputs: dict):
    if tool_name not in ALLOWED_TOOLS:
        raise PermissionError(f"Unregistered tool call blocked: {tool_name!r}")
    return TOOL_REGISTRY[tool_name](**inputs)

If the agent has code execution, add an audit log on every call into a sensitive module so a registry bypass still leaves a record.

Step 6: Upgrade and pin the tracing SDK

pip show langsmith | grep Version   # current
pip install --upgrade langsmith     # latest
# requirements.txt: pin a range so a silent regression can't slip in
# langsmith>=0.4,<0.5

After upgrading, replay a representative run and confirm tool-span counts match expectations.

Step 7: Sample destructive calls at 100%

Reduce volume by filtering low-value tools, not by random sampling. Keep every destructive, irreversible, or security-sensitive call.

def should_trace(tool_name: str) -> bool:
    LOW_VALUE = {"health_check", "ping", "get_timestamp"}
    return tool_name not in LOW_VALUE

With OpenTelemetry, use a custom sampler that always keeps destructive spans:

from opentelemetry.sdk.trace.sampling import ParentBased, ALWAYS_ON

class DestructiveAlwaysSampler(ParentBased):
    def should_sample(self, parent_context, trace_id, name, *args, **kwargs):
        if any(kw in name for kw in ("delete", "drop", "truncate", "destroy")):
            return ALWAYS_ON.should_sample(parent_context, trace_id, name, *args, **kwargs)
        return super().should_sample(parent_context, trace_id, name, *args, **kwargs)

How to confirm it’s fixed

Run a scripted task with a known sequence of tool calls (e.g. write_file then delete_file then call_api).
Open the trace and count the execute_tool spans. The count must equal the calls you made, with no orphan (parentless) spans.
Each tool span must carry both a start and an end event, plus its inputs and output.
Lock it in with an automated test that asserts trace completeness:

def test_trace_has_all_tool_calls():
    trace = run_agent_and_fetch_trace(task="delete then migrate")
    tool_spans = [s for s in trace.spans if s.run_type == "tool"]
    assert {s.name for s in tool_spans} >= {"delete_file", "run_migration"}
    assert all(s.end_time for s in tool_spans)  # no half-entries

Prevention

Decorate every side-effecting function (file writes, subprocess, API, DB) so it emits a span no matter how it is called.
Propagate trace context into async tasks and threads; never fire-and-forget a side-effecting coroutine.
Never swallow exceptions in tracer callbacks — log and alert; a silent tracer failure is as bad as no tracer.
Force side-effecting tools through the registered executor, not direct Python calls from agent code.
Flush before exit in every short-lived/serverless context; use waitUntil on Vercel/Cloudflare.
Keep destructive, irreversible, and security-sensitive calls at 100% sampling; reduce volume by filtering low-value read-only tools.
Put the trace ID in every log line so you can correlate logs with spans when the trace itself is incomplete.
Pin SDK versions and re-run the trace-completeness test after every upgrade.

FAQ

Q: A tool call’s inputs show up inside the LLM span but there’s no separate tool span. Why? A: The model emitted the tool_use request (so its args appear in the parent span), but the execution ran outside the traced path — usually a direct Python call instead of a registered tool. Route it through the executor or a @traceable/@tool wrapper (Step 1). This is the single most common pattern in LangGraph reports.

Q: My traces work locally but lose the last few tool calls in Lambda / Cloud Run. A: Classic flush problem. The process exits before the background sender drains the queue. Call flush() in a finally block (Step 4); on Vercel/Cloudflare use ctx.waitUntil(...).

Q: Can I reconstruct what a missing tool call did? A: Partly, from side effects. Check git history (git log --all --diff-filter=D -- path), database audit logs, and OS file-access logs (fs_usage on macOS, auditd on Linux). These show what happened, not the agent’s intent or exact inputs — which is why you instrument at the source.

Q: Does LangSmith capture everything automatically? A: It captures calls that go through LangChain’s tool and chain abstractions. Direct Python calls, subprocesses, and exec()-ed code are invisible unless you add a @traceable wrapper or use wrap_openai/wrap_anthropic on raw SDK calls.

Q: How do I trace tool calls across microservices? A: Use distributed tracing with W3C TraceContext. Pass the traceparent header on every inter-service HTTP call. OpenTelemetry with a Jaeger or Tempo backend is the standard; all services emit spans linked by one trace ID, and tool executions follow the execute_tool {tool_name} GenAI span convention.

Q: What’s the overhead of 100% sampling on destructive tools? A: Negligible. A file delete or DB drop is already expensive; the span overhead is typically under 1ms. The cost of an untraced destructive op far outweighs it.

Tags: #AI coding #Agents #Troubleshooting