Netlify Function Cold Start Times Out at 10s

A Netlify Function works locally but the first request after idle returns a 502 with 'Task timed out after 10.00 seconds' — almost always cold-start init weight or upstream DNS.

A Netlify Function that runs in 200 ms locally and 300 ms on a warm invocation returns a 502 the first time it is hit after a quiet period. The function log shows Task timed out after 10.00 seconds. You retry within a minute and the same code returns instantly. The function is not slow — it is paying a cold-start tax that exceeds the default 10-second synchronous function ceiling on Netlify. The cause is almost always heavy module initialization (a fat SDK, a Prisma client, an OpenAI/Anthropic client doing DNS warm-up), top-level await against a slow upstream, or a bundle that pulls in aws-sdk v2 transitively.

Common causes

Ordered by likelihood for sync Netlify Functions on Node 18/20.

1. Heavy top-level imports pulled in at cold start

Every import at the top of the file runs during init, before your handler. A single import { PrismaClient } from '@prisma/client' or import OpenAI from 'openai' can add 2-6 seconds to cold start because the SDK eagerly resolves DNS, parses large JSON manifests, or warms up TLS.

How to spot it: Add console.time('init') as the literal first line of the file and console.timeEnd('init') as the last line before exporting the handler. If it logs ≥ 4 s, init is your problem.

2. Top-level await against a slow or unreachable upstream

A pattern like const config = await fetch(CONFIG_URL).then(r => r.json()) at module scope blocks init until the upstream responds. If that upstream lives in a different region or is down, cold start eats the entire 10 s budget.

How to spot it: Search for await outside any function. Comment them out and redeploy — if cold start drops to under 2 s, this is it.

3. @netlify/functions synchronous limit vs. background functions

Standard Netlify Functions cap at 10 s of total execution including init. A function that legitimately needs 12-20 s belongs in background functions (suffix -background.ts, 15-min cap) or edge functions (sub-50 ms init).

How to spot it: The function does real work that takes 8-15 s warm. It is not a cold-start problem — it is the wrong runtime.

4. Bundle inflated by aws-sdk v2 transitive dep

Some older libraries (Mailgun SDK, certain analytics packages) drag in aws-sdk v2 — a 50 MB monolith that takes 3-5 s to parse at cold start. The Netlify bundler does not always tree-shake it away.

How to spot it: ls -lah .netlify/functions-internal/<fn>/ and inspect the bundle size, or run du -sh node_modules/aws-sdk. If the bundle is > 10 MB you almost certainly have v2.

5. DNS resolution loop inside an SDK

OpenAI, Anthropic, Stripe, and Twilio SDKs do an HTTPS connection during their first call. If the function’s outbound DNS is slow to resolve api.openai.com or similar (rare but happens during certain Netlify region issues), the first request after cold start can take 4-8 s.

How to spot it: Add console.time('first-api-call') around the first SDK call and log the duration. Compare warm vs. cold.

6. Synchronous filesystem reads of large bundled assets

fs.readFileSync('./prompts.json') or similar at module scope, with a 5+ MB file, takes hundreds of ms on Lambda’s cold cache and can stack with other init costs to blow the budget.

How to spot it: Grep for readFileSync outside the handler. Move large reads inside the handler with a module-level cache.

7. Wrong region for the data upstream

Functions run in us-east-1 by default; if your Postgres / Redis / KV upstream is in eu-west-1, every cold call pays 100-150 ms RTT per query, and init queries multiply that.

How to spot it: Compare your Netlify site’s function region (netlify.toml[functions] regions) with your DB region.

Before you start

  • Confirm the failure is cold-start specific: hit the function, wait 15-20 minutes, hit again. If only the second call fails the first time, it is cold start.
  • Note the exact Node runtime (Node 18 vs Node 20 — startup time differs by ~300 ms).
  • Capture one failing function-log line including the Task timed out message and the request ID.
  • Know whether the function is a sync function, scheduled function, or background function — limits differ.

Information to collect

  • netlify.toml [functions] and [build] sections.
  • The function file’s import list (top of file).
  • Bundle size: ls -lah .netlify/functions-internal/<fn>/.
  • package.json dependencies and any peer/transitive aws-sdk.
  • Function logs around the failing request ID (init time + handler time).
  • Region of the function vs. region of upstream (DB, KV, API).

Step-by-step fix

Ordered cheapest to most invasive.

Step 1: Time the init phase

Add at the very top of the function file, before any other import if possible:

const __init = Date.now();
import OpenAI from "openai";
// ...other imports
console.log(`[init] imports done in ${Date.now() - __init}ms`);

export const handler = async (event) => {
  const __handler = Date.now();
  // ...
  console.log(`[handler] done in ${Date.now() - __handler}ms`);
};

Trigger a cold start (deploy + wait 20 min + hit). The [init] line tells you if the problem is import weight (≥ 3 s) or handler weight.

Step 2: Lazy-import heavy SDKs inside the handler

Replace top-level imports with dynamic imports gated by first use:

let _openai: import("openai").OpenAI | null = null;
async function getOpenAI() {
  if (!_openai) {
    const { default: OpenAI } = await import("openai");
    _openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
  }
  return _openai;
}

export const handler = async (event) => {
  const openai = await getOpenAI();
  // ...
};

The handler pays the import cost on its first call, but init time drops from 4-5 s to under 500 ms. Warm calls still pay nothing.

Step 3: Remove aws-sdk v2 from the bundle

Find what pulls it in:

npm ls aws-sdk

Replace the offending dep with a modular v3 client, or pin a lighter alternative:

npm uninstall mailgun-js
npm install mailgun.js form-data

Verify the bundle shrank:

netlify build
du -sh .netlify/functions-internal/<fn>/

Expect a drop from 30-60 MB to under 5 MB and cold start down 2-3 s.

Step 4: Move long work to a background function

If real handler work is the issue, rename and adjust:

netlify/functions/process-upload.ts          → 10 s cap
netlify/functions/process-upload-background.ts → 15 min cap

The -background suffix is the trigger. Calls return 202 immediately and the function continues. Pair with a status endpoint and a small KV write so the client can poll. See edge function timeout for the parallel Vercel pattern.

Step 5: Pin function region to match your upstream

In netlify.toml:

[functions]
  node_bundler = "esbuild"

[functions."*"]
  # Match your primary DB / KV region
  preferred_region = "us-east-2"

Redeploy. Cold-start handler latency should drop by RTT * (number of init queries).

Step 6: Cache DNS-warm SDK clients on the module scope

For sync functions where init weight is acceptable but you want warm calls fast, keep the client at module scope BUT skip any work in its constructor:

import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// No top-level `await openai.models.list()` — that runs at every cold start.

export const handler = async () => {
  // First HTTPS call here, not at init.
  const r = await openai.chat.completions.create({ /* ... */ });
};

Step 7: Add a scheduled warm-up ping (last resort)

If the function is user-facing and cold-start latency is unacceptable, ping it every 5 minutes from a Netlify scheduled function:

// netlify/functions/warm-up.ts
import type { Config } from "@netlify/functions";
export default async () => {
  await fetch(`${process.env.URL}/.netlify/functions/<critical-fn>?warmup=1`);
};
export const config: Config = { schedule: "*/5 * * * *" };

In the target function, short-circuit on event.queryStringParameters?.warmup === "1" so it returns 200 fast without doing real work.

Verify

  • Cold-start cycle: deploy, wait 20 min, hit the endpoint. Total response time under 4 s.
  • Function log shows [init] imports done in <1500ms and no Task timed out.
  • Warm calls (within 60 s of each other) under 500 ms.
  • Bundle size on du -sh .netlify/functions-internal/<fn>/ under 10 MB.

Long-term prevention

  • Never put await at module top scope in serverless functions.
  • Default to lazy import() for any SDK over 200 kB; keep init dependency-free where possible.
  • Run npm ls aws-sdk after every dependency change; treat any v2 result as a release blocker.
  • Set a hard rule: functions that legitimately need > 5 s warm execution go to background or edge runtime.
  • Add a synthetic cold-start probe in CI: deploy a preview, wait 15 min, hit the endpoint, fail if response > 5 s.
  • Keep function logs flowing to a log drain so you can grep for Task timed out historically, not just live.

Common pitfalls

  • Assuming the issue is “slow code” and rewriting the handler — when 90% of the time is init.
  • Bumping function memory expecting cold start to drop; it helps a little but does not fix a 50 MB bundle.
  • Using a background function for a request that needs a synchronous response — the client gets 202 and no result.
  • Forgetting that netlify dev runs functions warm in-process, so cold-start bugs only appear in production.
  • Adding warm-up pings as the primary fix while ignoring a 40 MB bundle — your bill doubles and the symptom returns the moment the warm-up fails.

FAQ

Q: Can I just raise the 10-second limit on a sync Netlify Function?

No — 10 s is a platform hard cap on synchronous functions across all Netlify plans. The way out is background functions (15 min), edge functions (50 ms init but 50 ms exec budget), or moving the slow work off the request path.

Q: My function is 2 MB but cold start is still 6 s. Why?

Bundle size is one factor. Top-level await, SDK constructor work, and TLS handshake to upstream services often dominate. Profile with Date.now() markers around imports vs. handler — most teams discover init time is 80%+ of cold start.

Q: Does Node 20 cold-start faster than Node 18?

Slightly — typically 100-300 ms faster on a fresh container. Not enough to rescue a 12-second init. Fix the init weight first; runtime version is a rounding error compared to that.

Q: Does the same problem hit Vercel Functions?

Yes, with a different cap. See Vercel build failed and Vercel 500 errors for the equivalent Vercel patterns; the lazy-import and bundle-size fixes apply identically.

Tags: #Troubleshooting #netlify #serverless #cold-start #timeout