Vercel 500 Errors — Common Causes

Site builds but 500s in production — usually serverless function, env vars, or edge runtime.

Deploy succeeded, build log is all green, but loading a page or hitting an API returns 500: INTERNAL_SERVER_ERROR, often with FUNCTION_INVOCATION_FAILED or EDGE_FUNCTION_INVOCATION_FAILED underneath. “Builds clean, runs broken” almost never means a syntax bug — it means runtime context is missing: an env var that wasn’t synced to the Production scope, an Edge runtime hitting a Node-only API, a cold start over 10 seconds, or an upstream API call with no timeout dragging the whole function down. This guide gives a hit-rate-ordered diagnosis with real vercel logs / vercel inspect commands.

Common causes

Ordered by hit rate, highest first.

1. Env var missing on Production (wrong scope)

Works fine in Preview, 500s on Production. Open Vercel dashboard → Settings → Environment Variables, look at each row’s scope checkboxes: Production / Preview / Development are independent. Classic miss: a newly added OPENAI_API_KEY is checked for Preview but not Production.

TypeError: Cannot read properties of undefined (reading 'startsWith')
  at new OpenAI (/var/task/node_modules/openai/index.js:42)

How to spot it: vercel env ls production lists what production actually receives. Diff against every process.env.XXX reference in your code.

2. Edge runtime uses a Node-only API

The function exports runtime = 'edge', but the code does import fs from 'fs', require('crypto').createHash, or uses Node’s Buffer. Local dev runs on Node so it works; deployed to Edge, it explodes.

Error: The package "fs" wasn't found on the file system but is built into node.
  at __require (file:///var/task/.../chunk.mjs:1:1)

How to spot it: search function logs for wasn't found on the file system or is not supported in Edge Runtime.

3. Upstream call has no timeout

You call OpenAI / Anthropic / Stripe with no AbortController. The upstream hangs for 60s while Vercel’s function ceiling is 10s (Hobby) or 60s (Pro). The function dies before the upstream replies.

Task timed out after 10.00 seconds
FUNCTION_INVOCATION_TIMEOUT

How to spot it: function logs show Task timed out with duration matching your plan’s max → timeout.

4. Slow cold start + heavy dependencies

Function bundle > 50MB unzipped, cold start exceeds the limit. Usual culprit: importing all of aws-sdk or firebase-admin when you only need one submodule.

How to spot it:

vercel inspect <deployment-url>
# Look at the Function size column; &gt; 50MB is near the cap

5. Database connection pool exhausted

Each serverless invocation opens a fresh connection. PostgreSQL/MySQL default to ~100 concurrent connections; a traffic burst saturates the pool and every subsequent function 500s.

Error: remaining connection slots are reserved for non-replication superuser connections

How to spot it: your DB provider’s dashboard (Supabase / Neon / PlanetScale) shows active connections pegged at the limit.

6. catch block logs but doesn’t return

try { ... } catch (e) { console.error(e) } with no return new Response(...) — the function finishes without responding, Vercel marks it 500.

How to spot it: function logs show a stack trace but no business log lines before the 5xx — execution didn’t complete.

Shortest path to fix

Step 1: Grab the real error with vercel logs

Vercel UI logs cap at ~100 recent lines and aren’t fully searchable. Use the CLI:

# Tail production in real time
vercel logs <project-name> --follow

# Grab everything from the last hour and filter
vercel logs <project-name> --since 1h | grep -E 'ERROR|500|FUNCTION_'

Or via dashboard → Deployments → latest → Functions → click the function → Logs tab. Copy the full stack — the first two lines name the real cause.

Step 2: Reconcile env vars

# List all env var keys in production scope
vercel env ls production

# Pull locally and diff against code references
vercel env pull .env.production
diff <(grep -oE '^[A-Z_]+=' .env.production | sort) \
     <(grep -roE 'process\.env\.[A-Z_]+' src/ | sort -u)

Newly added env vars only take effect after a redeploy (click Redeploy in the dashboard).

Step 3: Flip Edge back to Node to verify

If the stack mentions Edge Runtime or not supported, set:

// app/api/chat/route.ts
export const runtime = 'nodejs';  // was 'edge'
export const maxDuration = 30;    // Pro plan allows up to 300

Watch for 24 hours. If stable, decide whether to rewrite Edge-safe (replace axios with fetch, replace crypto with Web Crypto) or stay on Node.

Step 4: Wrap every upstream fetch with a timeout

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 8000);

try {
  const res = await fetch('https://api.openai.com/v1/...', {
    signal: controller.signal,
  });
  return Response.json(await res.json());
} catch (e) {
  if (e.name === 'AbortError') {
    return new Response('Upstream timeout', { status: 504 });
  }
  console.error(e);
  return new Response('Internal error', { status: 500 });
} finally {
  clearTimeout(timeout);
}

Critical: every catch must return, not just console.error.

Step 5: Use a pooled DB connection

Don’t open raw Postgres connections from serverless. Use Prisma Data Proxy, Neon’s @neondatabase/serverless, or Supabase’s pooler URL:

// Use pgBouncer pooler on port 6543 instead of 5432
import { Pool } from 'pg';
const pool = new Pool({
  connectionString: process.env.DATABASE_URL_POOLER,
  max: 1,  // one connection per serverless instance
});

Prevention

  • Add an env-var lint to CI: grep every process.env.XXX reference and diff against vercel env ls production; fail if anything is missing
  • Force an 8-second timeout on every upstream call; require every catch to return a Response
  • Lint Edge functions to forbid Node-only imports (fs, path, crypto, net)
  • Run a post-deploy health check that curls critical API endpoints for 200
  • Always route DB traffic through a pooler URL; alert when active connections hit 80% of cap
  • Guard function bundle size in CI (warn at > 30MB, fail at > 50MB)

Tags: #Hosting #Debug #Troubleshooting #Vercel