Performance Regression Audit Prompts: 12 Templates for p99 Triage

When p99 spikes, you need triage not vibes. 12 prompt templates for diffing perf signals, hunting N+1s, JS bundle bloat, render storms, and DB plan changes.

“Why is the app slow?” is the perf question that gets the worst answers. A good perf-regression prompt names the metric (p50 / p99 / TTFB / LCP), the diff window, and forbids speculation — only file:line evidence and benchmarks.

Who this is for

On-call engineers debugging a perf alert, leads chasing a slow PR, indie devs trying to pass Core Web Vitals before launch.

When not to use these prompts

Don’t use these without a baseline metric. “Slow” without numbers wastes everyone’s time. Don’t use them on dev-only perf — measure prod.

Prompt anatomy / structure formula

Every perf-regression prompt should carry six elements:

  • Role: who AI plays (SRE / release captain / staff engineer / QA lead).
  • Context: stack / branch / failing logs / diff / dashboard URL.
  • Goal: one concrete deliverable — root cause, checklist, plan, ticket list, runbook.
  • Constraints: what AI MUST NOT do (don’t auto-fix, don’t hallucinate file paths).
  • Output format: numbered findings, markdown table, JSON, unified diff, runnable code.
  • Examples / signal: 1-2 “good” output examples, or counter-examples.

Best for

  • Diff-window p99 regression triage
  • Bundle-size regression on a PR
  • N+1 hunting in a slow endpoint
  • React render-storm investigation
  • DB query plan change detection

12 copy-ready prompt templates

1. p99 diff triage

p99 latency on `{endpoint}` jumped {fromMs}ms → {toMs}ms between `{oldSha}` and `{newSha}`. List 5 likely causes in priority order. For each: (a) suspicion strength, (b) one file:line or query to inspect, (c) one cheap check. Don't propose fixes yet.

Variables to swap: endpoint, fromMs, toMs, oldSha, newSha

2. PR perf risk scan

Scan this PR diff for perf risks: (1) New synchronous I/O in hot path, (2) Loops calling DB / fetch inside, (3) New large dep imported eagerly, (4) React re-render expansion (new context, unstable deps), (5) Missing index for new query. file:line + severity.

3. N+1 hunter

In the function `{functionName}` at `{filePath}`, identify N+1 patterns: (a) Loops calling DB / fetch, (b) Promise.all over single-item fetches, (c) Recursive accessors hitting ORM lazy fields. For each: rewrite as a single batched call, with code.

Variables to swap: functionName, filePath

4. Bundle size regression

Bundle grew from {oldKb} → {newKb} KB. Identify the top 3 contributors: (1) New direct deps and their size, (2) Tree-shake failures (default imports from a library that ships ESM), (3) Polyfill bloat (target browser change?). Output: a fix per item.

Variables to swap: oldKb, newKb

5. React render-storm diagnosis

Component `{component}` re-renders {nRenders} times per interaction. Diagnose: (1) Unstable prop identity (objects / arrays created in render), (2) Context provider value not memoized, (3) Parent state too coarse, (4) useEffect dep that changes each render. Output: cause + minimal fix.

Variables to swap: component, nRenders

6. DB query plan regression

Query plan for `{query}` changed: was index seek + nested loop, now is sequential scan + hash join. Diagnose: (1) Statistics stale (ANALYZE recently?), (2) Cardinality estimate off, (3) New column / index hint mismatch, (4) Parameter sniffing. Output: most likely + ANALYZE / pg_stat_user_indexes command to confirm.

Variables to swap: query

7. Cold start regression

Serverless function `{fnName}` cold start went {fromMs} → {toMs} ms. Diagnose: (1) Bundle size grew, (2) New top-level imports, (3) New connection at boot, (4) New env var fetch. Output: top 3 by likelihood + a 5-min experiment.

Variables to swap: fnName, fromMs, toMs

8. TTFB / LCP regression

LCP on `{pagePath}` went {fromMs} → {toMs} ms. Walk the waterfall: (1) Server response time, (2) Critical CSS / JS blocking, (3) Image / font payload, (4) Layout shift forcing re-render. Pick the dominant cause.

Variables to swap: pagePath, fromMs, toMs

9. Memory growth regression

Service RSS grew from {oldMb} → {newMb} MB. Diagnose: (1) New cache without eviction, (2) Closures retaining large objects, (3) Listener leaks (no removeListener on unmount / restart), (4) Buffer pools sized too large. file:line.

Variables to swap: oldMb, newMb

10. Slow-test regression

Test suite went from {fromMin} → {toMin} min. Identify: (1) Specific test files that grew, (2) Setup / teardown bloat, (3) Real timer / sleep introduced, (4) Parallelism reduction. Output: 3 specific cleanups.

Variables to swap: fromMin, toMin

11. Perf-fix benchmark plan

Before fixing, design a benchmark: (1) Minimal reproducible scenario, (2) Metric (median + p99), (3) Sample size, (4) Baseline run command. After fix, re-run same benchmark. Don't fix without baseline numbers.

12. “Slow but acceptable” decision

A regression is real but small ({deltaMs}ms). Decide: (1) Is the absolute number above target? (2) Is the user impact measurable (conversion / bounce)? (3) Is the fix more expensive than the regression? Output: SHIP / FIX / REVERT + one-line rationale.

Variables to swap: deltaMs

Common mistakes

  • Optimising without baseline numbers.
  • Confusing p50 and p99 — they have different fixes.
  • Trusting dev-only profiles — prod hot paths differ.
  • Adding caches before fixing the actual N+1.
  • Bundle splitting without measuring what was preloaded vs lazy.
  • Memoising everything in React — adds overhead.
  • Investigating before reading the deploy diff — perf regressions are usually code, not infra.

How to push results further

  • Always anchor to a metric + sample size + diff window.
  • p99 fixes are different from p50 fixes — separate them.
  • For React, capture the Profiler trace, don’t guess from logs.
  • For DB, get the query plan before and after with EXPLAIN ANALYZE.
  • Run benchmarks 3 times before declaring victory — variance hides regressions.
  • Cache as a last resort, not first.
  • Document the regression and fix in the post-mortem so the next person doesn’t re-introduce it.

Practical depth notes

Use these prompts as starting points, not final answers. For Performance Regression Audit Prompts: 12 Templates for p99 Triage, the useful extra work is to replace every generic placeholder with a real constraint: audience, channel, length, brand voice, examples to imitate, and examples to avoid. Run at least two versions with different constraints, then compare the outputs side by side instead of accepting the first polished response.

A good result should pass three checks: it is specific enough that another person could reuse it, it avoids vague praise or filler, and it gives you an editable artifact rather than a broad suggestion. If the output feels generic, add one concrete reference, one forbidden pattern, and one measurable success criterion before rerunning the prompt.

FAQ

  • How big a regression matters?: Anything that pushes p99 above your SLO target. Below SLO, evaluate against fix cost.
  • Should I optimise before launch?: Hit your Core Web Vitals targets, then ship. Premature optimisation past targets wastes time.
  • Is React.memo always safe?: No — memo with unstable props (objects / arrays / callbacks) makes things worse.
  • How do I find DB index gaps?: Use pg_stat_user_indexes for unused indexes and pg_stat_user_tables for seq-scan-heavy tables.
  • AI can read flamegraphs?: It can interpret text traces and profiler JSON. Visual flamegraphs need a vision model.
  • When to invest in a perf budget CI gate?: Once a regression has reached prod twice. Before that, manual checks are fine.

Tags: #Prompt #Coding #Performance #Audit