Edge Function Timeout: Vercel / Cloudflare / Netlify Fix

FUNCTION_INVOCATION_TIMEOUT or 504 on edge? Move heavy work off edge, stream responses, and timeout every upstream fetch. Verified limits, June 2026.

Published: May 17, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You deploy an LLM-calling, image-processing, or long API-aggregation endpoint to Vercel Edge / Cloudflare Workers / Netlify Edge. Locally it takes 5 seconds, fine. The first few production requests work. Then traffic ramps and you start seeing FUNCTION_INVOCATION_TIMEOUT (HTTP 504) on Vercel, Worker exceeded CPU time limit (Cloudflare Error 1102) on Cloudflare, or a silently truncated response on Netlify. Move the same code off edge and it works.

Fastest fix: if the work is an LLM call, PDF parse, or anything over ~10 seconds of real compute, take it off the edge runtime. On Vercel, delete export const runtime = 'edge' (or set it to 'nodejs') and redeploy. The Node.js runtime with Fluid Compute now defaults to 300 seconds (5 minutes) on every plan, including Hobby, as of June 2026, so the timeout usually disappears immediately. Keep edge only for sub-second routing, auth, and redirects. The rest of this page covers the edge-specific traps when you genuinely need to stay on edge.

Mental model: edge is not “faster Lambda.” It is a low-latency distribution runtime with very tight CPU and time budgets. Each platform enforces a different ceiling, and the trap is that they measure different things.

Platform limits at a glance (verified June 2026)

Platform / runtime	What is metered	Free / default	Max
Vercel Edge runtime	Time to first byte, then stream	Must start the response within `25s`	Stream up to `300s` after that
Vercel Node.js (Fluid Compute)	Wall-clock duration	`300s` default, all plans incl. Hobby	Pro/Ent up to `800s` (`1800s` extended, beta)
Cloudflare Workers (Free)	CPU time only	`10ms` CPU per request	`10ms` (hard)
Cloudflare Workers (Paid)	CPU time only	`30s` CPU default	up to `5 min` (`300000ms`); no hard wall-clock cap on HTTP
Netlify Edge Functions	CPU time only	`50ms` CPU	must return headers within `40s`, then stream indefinitely

The two things people get wrong most: (1) Vercel Edge does not have a flat 25-second wall limit anymore. It needs the first byte out within 25 seconds, then you can stream for up to 300 seconds. (2) Cloudflare and Netlify meter CPU time, not wall-clock, so await fetch(...) to a slow upstream does not burn your CPU budget. A Cloudflare CPU timeout means your own JS (JSON parse, crypto, a hot loop), not your network wait.

Common causes

Ordered by hit rate, highest first.

1. Synchronous heavy work on edge (long LLM, image processing)

LLM completions routinely take 20-60 seconds for a full non-streamed response. PDF parsing and image generation run tens of seconds. If you buffer the whole result before returning, Vercel Edge will cut you off because the first byte never went out inside 25 seconds.

How to spot it: the file has export const runtime = 'edge' and the handler awaits a non-streamed LLM/image call, then returns once.

2. Slow upstream with no timeout

await fetch(upstream) with no signal means when the upstream stalls, you stall. Some third-party APIs are usually 1s but occasionally 60s. On Vercel Edge this blows the 25-second first-byte window; on Node.js it eats your maxDuration.

How to spot it: logs show upstream latency several times baseline; p99 is far above p50.

3. Cloudflare CPU limit, not wall time

Cloudflare Workers meter CPU time (real compute), not wall time. The Free plan gives 10ms CPU per request, so a heavy JSON.parse, a big Array.sort, or crypto.subtle work over a large payload blows it even though the request looks fast. The Paid plan defaults to 30s CPU and can be raised to 5 min.

How to spot it: Cloudflare dashboard → Workers & Pages → your Worker → Metrics shows CPU time spiking near the limit; the error is Error 1102: Worker exceeded CPU time limit. Network waits do not count, so adding await fetch retries will not fix a 1102.

4. Serial upstream calls that should be parallel

const a = await fetch(api1); // 5s
const b = await fetch(api2); // 5s
const c = await fetch(api3); // 5s
// 15s total, all the wait stacks up

These should run with Promise.all.

How to spot it: independent await fetch calls back to back, where none depends on the previous result.

5. Buffering the response instead of streaming

You assemble the full body before returning. This is slower than chunked streaming, delays first byte past the 25-second edge window, and can hit Vercel’s 4.5 MB response body cap (FUNCTION_PAYLOAD_TOO_LARGE, HTTP 413).

How to spot it: the client waits, then everything arrives at once with no progressive display; large responses fail with a 413.

6. Cold start eats several seconds

First-invocation cold start can spend 2-5s on init (bundle eval, KV/secret hydration), leaving less of the 25-second first-byte window for your logic.

How to spot it: the first request after idle is slow; immediate follow-ups are fast.

Shortest path to fix

Step 1: Confirm it really is an edge limit

// Vercel: this line at the top of the route file = edge runtime
export const runtime = 'edge';

Remove it (or set it to 'nodejs') and redeploy. If the timeout disappears, it was an edge limit. On Cloudflare, check the error code: 1102 is a CPU-time limit (your own JS), a plain timeout points at wall-clock or upstream. Then decide whether the endpoint belongs on edge at all.

Step 2: Do not put it on edge unless you have a reason to

What it does	Where it belongs
`< 5s`, pure routing / auth / redirect	edge (low-latency at the network edge is the point)
LLM calls, PDF processing, image work	Node.js serverless (`300s` default on Vercel)
`> 5 min`	background job (Inngest / cron / queue)
Persistent connections / WebSockets	Durable Object / dedicated server

Move the LLM endpoint to the Node.js runtime:

// Vercel Pages Router (pages/api/*)
export const config = { runtime: 'nodejs', maxDuration: 60 };

// Vercel App Router (app/api/*/route.js)
export const runtime = 'nodejs';
// With Fluid Compute (default since 2025), the runtime default is 300s on every
// plan. Hobby caps at 300s; Pro/Enterprise can set up to 800s.
export const maxDuration = 60;

As of June 2026 the old “Hobby 10s” cap is gone. With Fluid Compute, the default is 300s on Hobby too, so most LLM endpoints work without setting maxDuration at all. Set it lower only to fail fast.

Step 3: If you must stay on edge, stream

Stream the response so the first byte arrives inside the 25-second window. After that, Vercel Edge lets you keep streaming for up to 300s:

export const runtime = 'edge';

export async function POST(req) {
  const upstream = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'content-type': 'application/json',
      'x-api-key': process.env.ANTHROPIC_API_KEY,
      'anthropic-version': '2023-06-01',
    },
    body: JSON.stringify({
      model: 'claude-sonnet-4-6',
      max_tokens: 1024,
      stream: true,
      messages: [{ role: 'user', content: 'hello' }],
    }),
  });

  // Pipe the upstream SSE stream straight through; first byte goes out fast.
  return new Response(upstream.body, {
    headers: { 'content-type': 'text/event-stream' },
  });
}

Streaming sidesteps the total-duration trap because the platform only enforces time to first byte, and it plays nicely with edge. The same pattern works on Cloudflare Workers and Netlify Edge (Netlify keeps a streaming function alive as long as it returned headers inside 40 seconds).

Step 4: AbortSignal.timeout on every upstream fetch

const res = await fetch(upstream, {
  signal: AbortSignal.timeout(20_000), // abort after 20s
});

Do not let a slow upstream drag the whole function down. Wrap it once and reuse:

async function fetchWithTimeout(url, options = {}, ms = 20_000) {
  return fetch(url, { ...options, signal: AbortSignal.timeout(ms) });
}

AbortSignal.timeout() is available in the Vercel Edge runtime, Cloudflare Workers, and Netlify Edge (all Web-standard runtimes), so this is portable. An aborted fetch throws a TimeoutError; catch it and return a clean 503 instead of hanging.

Step 5: Parallelize independent calls

// slow: 10s
const a = await fetch(api1);
const b = await fetch(api2);

// fast: ~5s
const [a, b] = await Promise.all([fetch(api1), fetch(api2)]);

Three independent 5-second upstreams become ~5 seconds in parallel instead of 15 seconds serial. Use Promise.allSettled if one upstream failing should not kill the others.

Step 6: For work over ~5 minutes, switch to background jobs

If the task can genuinely run longer than the runtime ceiling, stop trying to do it inside the request. Enqueue and return a job id, then poll or push the result.

// Request side: enqueue, return jobId immediately
export async function POST(req) {
  const { userId } = await req.json();
  const jobId = await enqueue({ task: 'generate-report', userId });
  return Response.json({ jobId });
}

// Client polls until done
async function poll(jobId) {
  while (true) {
    const { status, result } = await fetch(`/api/jobs/${jobId}`).then(r => r.json());
    if (status === 'done') return result;
    if (status === 'failed') throw new Error('job failed');
    await new Promise(r => setTimeout(r, 2000));
  }
}

Background runners:

Inngest — durable functions with retries and step state built in
Trigger.dev — long-running jobs, similar model
Vercel Cron + a queue (or Vercel Workflows for unlimited-duration, resumable steps)
Cloudflare Queues or Workflows (queue consumers get up to 15 min wall-clock)

How to confirm it’s fixed

Reproduce the original load that triggered the timeout (not a single warm request). Replay your real concurrency, or hammer it with npx autocannon -c 20 -d 30 https://your-app/api/endpoint.
Watch the platform logs live during the run: Vercel vercel logs <deployment-url> --follow, or Cloudflare npx wrangler tail. You want zero FUNCTION_INVOCATION_TIMEOUT / 1102 events.
Check the tail, not the average. Confirm p99 latency sits comfortably under the runtime ceiling, since timeouts hit at the tail first.
If you switched to streaming, confirm the client sees first bytes within a second or two (open DevTools → Network → the request should show data arriving progressively, not all at once).

Prevention

Memorize your platform’s real limits (the table above) and which metric each one counts: Vercel Edge = time to first byte; Cloudflare/Netlify = CPU time; Vercel Node.js = wall-clock.
Anything over ~10s of real work defaults to Node.js, not edge. Start on Node.js, move to edge only with a measured reason.
Every external fetch gets an AbortSignal.timeout (5-20s) so one slow upstream cannot sink the request.
Catch back-to-back independent awaits in review and migrate them to Promise.all.
LLM and large-payload endpoints stream (SSE); never buffer the full response on edge.
Monitor p95 / p99 latency, not the average, and alert before you reach the ceiling.
Document which endpoints are edge / Node.js / background so new code follows the pattern.

FAQ

Why does it work locally and on the first few requests but fail under load? Locally there is no platform timeout and no cold-start or contention. In production, the first warm requests beat the ceiling; once concurrency and tail latency rise, the slowest requests cross the limit and 504. Always test at real concurrency, and watch p99 rather than the average.

My Cloudflare Worker times out but I never do heavy compute, just await fetch. Why? Cloudflare meters CPU time, not wall-clock, and network waits do not count against it. If you get Error 1102: Worker exceeded CPU time limit, the cost is in your own JavaScript: a large JSON.parse, sorting/looping over a big array, or crypto work. A plain timeout (not 1102) on an HTTP Worker usually means a hung upstream. Add AbortSignal.timeout and reduce in-Worker compute; on the Free plan the CPU cap is a hard 10ms.

Did Vercel raise the function timeout? What is the real limit now? Yes. With Fluid Compute (default since 2025), the Node.js runtime defaults to 300s (5 minutes) on every plan, Hobby included, as of June 2026. Pro and Enterprise can configure up to 800s (with a 1800s extended beta). The Edge runtime is separate: it must send the first byte within 25s and can then stream up to 300s.

Should I keep my LLM endpoint on the edge runtime at all? Usually no, unless you are streaming and want the lowest latency. The simplest reliable setup is the Node.js runtime (300s default) with a streamed response. Use edge for sub-second routing, auth, geolocation, and redirects, where its global low latency is the actual benefit.

How do I run something longer than 5 minutes? Do not do it inside the request. Enqueue a job and return immediately (Step 6), then poll or push the result. Use Inngest, Trigger.dev, Cloudflare Queues/Workflows, or Vercel Workflows, which support resumable, effectively unlimited-duration steps.

Tags: #Backend #Debug #Troubleshooting