A Netlify Function that runs in 200 ms locally and 300 ms on a warm invocation returns a 502 the first time it is hit after a quiet period. The function log shows Task timed out after 10.00 seconds. You retry within a minute and the same code returns instantly. The function is not slow — it is paying a cold-start tax that exceeds the default 10-second synchronous function ceiling on Netlify. The cause is almost always heavy module initialization (a fat SDK, a Prisma client, an OpenAI/Anthropic client doing DNS warm-up), top-level await against a slow upstream, or a bundle that pulls in aws-sdk v2 transitively.
Common causes
Ordered by likelihood for sync Netlify Functions on Node 18/20.
1. Heavy top-level imports pulled in at cold start
Every import at the top of the file runs during init, before your handler. A single import { PrismaClient } from '@prisma/client' or import OpenAI from 'openai' can add 2-6 seconds to cold start because the SDK eagerly resolves DNS, parses large JSON manifests, or warms up TLS.
How to spot it: Add console.time('init') as the literal first line of the file and console.timeEnd('init') as the last line before exporting the handler. If it logs ≥ 4 s, init is your problem.
2. Top-level await against a slow or unreachable upstream
A pattern like const config = await fetch(CONFIG_URL).then(r => r.json()) at module scope blocks init until the upstream responds. If that upstream lives in a different region or is down, cold start eats the entire 10 s budget.
How to spot it: Search for await outside any function. Comment them out and redeploy — if cold start drops to under 2 s, this is it.
3. @netlify/functions synchronous limit vs. background functions
Standard Netlify Functions cap at 10 s of total execution including init. A function that legitimately needs 12-20 s belongs in background functions (suffix -background.ts, 15-min cap) or edge functions (sub-50 ms init).
How to spot it: The function does real work that takes 8-15 s warm. It is not a cold-start problem — it is the wrong runtime.
4. Bundle inflated by aws-sdk v2 transitive dep
Some older libraries (Mailgun SDK, certain analytics packages) drag in aws-sdk v2 — a 50 MB monolith that takes 3-5 s to parse at cold start. The Netlify bundler does not always tree-shake it away.
How to spot it: ls -lah .netlify/functions-internal/<fn>/ and inspect the bundle size, or run du -sh node_modules/aws-sdk. If the bundle is > 10 MB you almost certainly have v2.
5. DNS resolution loop inside an SDK
OpenAI, Anthropic, Stripe, and Twilio SDKs do an HTTPS connection during their first call. If the function’s outbound DNS is slow to resolve api.openai.com or similar (rare but happens during certain Netlify region issues), the first request after cold start can take 4-8 s.
How to spot it: Add console.time('first-api-call') around the first SDK call and log the duration. Compare warm vs. cold.
6. Synchronous filesystem reads of large bundled assets
fs.readFileSync('./prompts.json') or similar at module scope, with a 5+ MB file, takes hundreds of ms on Lambda’s cold cache and can stack with other init costs to blow the budget.
How to spot it: Grep for readFileSync outside the handler. Move large reads inside the handler with a module-level cache.
7. Wrong region for the data upstream
Functions run in us-east-1 by default; if your Postgres / Redis / KV upstream is in eu-west-1, every cold call pays 100-150 ms RTT per query, and init queries multiply that.
How to spot it: Compare your Netlify site’s function region (netlify.toml → [functions] regions) with your DB region.
Before you start
- Confirm the failure is cold-start specific: hit the function, wait 15-20 minutes, hit again. If only the second call fails the first time, it is cold start.
- Note the exact Node runtime (Node 18 vs Node 20 — startup time differs by ~300 ms).
- Capture one failing function-log line including the
Task timed outmessage and the request ID. - Know whether the function is a sync function, scheduled function, or background function — limits differ.
Information to collect
netlify.toml[functions]and[build]sections.- The function file’s import list (top of file).
- Bundle size:
ls -lah .netlify/functions-internal/<fn>/. package.jsondependencies and any peer/transitiveaws-sdk.- Function logs around the failing request ID (init time + handler time).
- Region of the function vs. region of upstream (DB, KV, API).
Step-by-step fix
Ordered cheapest to most invasive.
Step 1: Time the init phase
Add at the very top of the function file, before any other import if possible:
const __init = Date.now();
import OpenAI from "openai";
// ...other imports
console.log(`[init] imports done in ${Date.now() - __init}ms`);
export const handler = async (event) => {
const __handler = Date.now();
// ...
console.log(`[handler] done in ${Date.now() - __handler}ms`);
};
Trigger a cold start (deploy + wait 20 min + hit). The [init] line tells you if the problem is import weight (≥ 3 s) or handler weight.
Step 2: Lazy-import heavy SDKs inside the handler
Replace top-level imports with dynamic imports gated by first use:
let _openai: import("openai").OpenAI | null = null;
async function getOpenAI() {
if (!_openai) {
const { default: OpenAI } = await import("openai");
_openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
}
return _openai;
}
export const handler = async (event) => {
const openai = await getOpenAI();
// ...
};
The handler pays the import cost on its first call, but init time drops from 4-5 s to under 500 ms. Warm calls still pay nothing.
Step 3: Remove aws-sdk v2 from the bundle
Find what pulls it in:
npm ls aws-sdk
Replace the offending dep with a modular v3 client, or pin a lighter alternative:
npm uninstall mailgun-js
npm install mailgun.js form-data
Verify the bundle shrank:
netlify build
du -sh .netlify/functions-internal/<fn>/
Expect a drop from 30-60 MB to under 5 MB and cold start down 2-3 s.
Step 4: Move long work to a background function
If real handler work is the issue, rename and adjust:
netlify/functions/process-upload.ts → 10 s cap
netlify/functions/process-upload-background.ts → 15 min cap
The -background suffix is the trigger. Calls return 202 immediately and the function continues. Pair with a status endpoint and a small KV write so the client can poll. See edge function timeout for the parallel Vercel pattern.
Step 5: Pin function region to match your upstream
In netlify.toml:
[functions]
node_bundler = "esbuild"
[functions."*"]
# Match your primary DB / KV region
preferred_region = "us-east-2"
Redeploy. Cold-start handler latency should drop by RTT * (number of init queries).
Step 6: Cache DNS-warm SDK clients on the module scope
For sync functions where init weight is acceptable but you want warm calls fast, keep the client at module scope BUT skip any work in its constructor:
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// No top-level `await openai.models.list()` — that runs at every cold start.
export const handler = async () => {
// First HTTPS call here, not at init.
const r = await openai.chat.completions.create({ /* ... */ });
};
Step 7: Add a scheduled warm-up ping (last resort)
If the function is user-facing and cold-start latency is unacceptable, ping it every 5 minutes from a Netlify scheduled function:
// netlify/functions/warm-up.ts
import type { Config } from "@netlify/functions";
export default async () => {
await fetch(`${process.env.URL}/.netlify/functions/<critical-fn>?warmup=1`);
};
export const config: Config = { schedule: "*/5 * * * *" };
In the target function, short-circuit on event.queryStringParameters?.warmup === "1" so it returns 200 fast without doing real work.
Verify
- Cold-start cycle: deploy, wait 20 min, hit the endpoint. Total response time under 4 s.
- Function log shows
[init] imports done in <1500msand noTask timed out. - Warm calls (within 60 s of each other) under 500 ms.
- Bundle size on
du -sh .netlify/functions-internal/<fn>/under 10 MB.
Long-term prevention
- Never put
awaitat module top scope in serverless functions. - Default to lazy
import()for any SDK over 200 kB; keep init dependency-free where possible. - Run
npm ls aws-sdkafter every dependency change; treat any v2 result as a release blocker. - Set a hard rule: functions that legitimately need > 5 s warm execution go to background or edge runtime.
- Add a synthetic cold-start probe in CI: deploy a preview, wait 15 min, hit the endpoint, fail if response > 5 s.
- Keep function logs flowing to a log drain so you can grep for
Task timed outhistorically, not just live.
Common pitfalls
- Assuming the issue is “slow code” and rewriting the handler — when 90% of the time is init.
- Bumping function memory expecting cold start to drop; it helps a little but does not fix a 50 MB bundle.
- Using a background function for a request that needs a synchronous response — the client gets 202 and no result.
- Forgetting that
netlify devruns functions warm in-process, so cold-start bugs only appear in production. - Adding warm-up pings as the primary fix while ignoring a 40 MB bundle — your bill doubles and the symptom returns the moment the warm-up fails.
FAQ
Q: Can I just raise the 10-second limit on a sync Netlify Function?
No — 10 s is a platform hard cap on synchronous functions across all Netlify plans. The way out is background functions (15 min), edge functions (50 ms init but 50 ms exec budget), or moving the slow work off the request path.
Q: My function is 2 MB but cold start is still 6 s. Why?
Bundle size is one factor. Top-level await, SDK constructor work, and TLS handshake to upstream services often dominate. Profile with Date.now() markers around imports vs. handler — most teams discover init time is 80%+ of cold start.
Q: Does Node 20 cold-start faster than Node 18?
Slightly — typically 100-300 ms faster on a fresh container. Not enough to rescue a 12-second init. Fix the init weight first; runtime version is a rounding error compared to that.
Q: Does the same problem hit Vercel Functions?
Yes, with a different cap. See Vercel build failed and Vercel 500 errors for the equivalent Vercel patterns; the lazy-import and bundle-size fixes apply identically.
Tags: #Troubleshooting #netlify #serverless #cold-start #timeout