How big a regression actually matters?

Anything that pushes p99 above your SLO target. Below SLO, weigh it against fix cost using template 12.

What are the Core Web Vitals targets to optimize toward?

As of June 2026: LCP under 2.5s, INP under 200ms, CLS under 0.1, each at the 75th percentile of real users. Google's March 2026 core update has been tightening LCP toward 2.0s, so treat 2.0s as the safer goal.

Should I optimize before launch?

Hit your Core Web Vitals targets, then ship. Pushing past the targets before launch is usually time you don't get back.

Is React.memo always safe?

No. `memo` with unstable props (objects, arrays, inline callbacks) does more comparison work and can be slower than no memo at all.

How do I find DB index gaps?

Query `pg_stat_user_indexes` for unused indexes and `pg_stat_user_tables` for tables with heavy sequential scans, then confirm with `EXPLAIN ANALYZE`.

Can AI read a flamegraph?

It reads text traces and Profiler JSON well. A visual flamegraph image needs a vision-capable model, or convert the trace to Markdown first (chperf, chrome-trace-analyzer) and paste that.

Prompt Library

Performance Regression Audit Prompts: 12 Templates for p99 Triage

When p99 spikes, you need triage not vibes. 12 prompt templates for diffing perf signals, hunting N+1s, JS bundle bloat, render storms, and DB plan changes — current for June 2026.

Published: May 19, 2026 Updated: Jun 14, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

“Why is the app slow?” is the perf question that gets the worst answers. A good perf-regression prompt names the metric (p50 / p99 / TTFB / LCP), the diff window, and forbids speculation — it asks for file:line evidence and benchmark numbers, nothing else.

TL;DR

Never ask an AI “make it faster.” Feed it one metric, one diff window (oldSha…newSha), and one deliverable. The 12 templates below are pre-shaped for that.
Paste real signal, not screenshots of vibes: a Chrome trace exported to JSON/Markdown, an EXPLAIN ANALYZE plan, a git diff, or a bundle-analyzer report. Models reason over text traces well; they cannot see a visual flamegraph without a vision step.
Reach for a 1M-token model (Claude Opus 4.7 / Sonnet 4.6, Gemini 3.1 Pro, GPT-5.5) when the trace is large — a full Chrome .json trace easily blows past 100k tokens.
Core Web Vitals targets to anchor LCP/INP/CLS prompts (June 2026): LCP under 2.5s, INP under 200ms, CLS under 0.1, all measured at the 75th percentile of real users.

Who this is for

On-call engineers triaging a perf alert, leads chasing a slow PR, and indie devs trying to pass Core Web Vitals before launch. If you have a number that got worse and a diff that might explain it, these prompts turn that into a ranked hypothesis list instead of a guessing session.

When not to use these prompts

Skip them if you have no baseline metric. “Slow” without a number wastes everyone’s time, and the model will happily invent plausible-sounding causes for a regression that may not exist. Don’t run them against dev-only timings either — measure prod, where the hot paths, data volumes, and cache states are different.

Pick the right model and tool first

The prompt is only half the job; what you paste matters more.

Signal you have	What to paste	Good model fit
Frontend LCP/INP regression	Chrome trace exported as JSON or Markdown	1M-context model (Opus 4.7, Sonnet 4.6, Gemini 3.1 Pro)
Slow endpoint / N+1	The handler code + an `EXPLAIN ANALYZE` plan	Sonnet 4.6 or GPT-5.5
Bundle growth on a PR	`webpack-bundle-analyzer` / `vite build --report` output + the diff	Any frontier model
Live page, no trace yet	Let the agent record one via Chrome DevTools MCP	Claude Code / Cursor with the MCP attached

Two practical notes for June 2026:

Chrome DevTools MCP (v0.19, shipped March 2026) gives an AI agent like Claude Code or Cursor 29 live-browser tools. Its performance_start_trace records a trace and returns LCP, CLS, and FCP with render-blocking insights and a network dependency tree — so the agent gathers its own evidence instead of you exporting files. See the Chrome DevTools MCP repo.
Raw Chrome traces are huge. A real .json trace runs to hundreds of thousands of tokens. Use a converter (chperf, chrome-trace-analyzer) to summarize it to Markdown first, or paste it into a 1M-token model and ask one narrow question at a time.

Prompt anatomy

Every perf-regression prompt should carry six elements:

Role: who the AI plays (SRE / release captain / staff engineer / QA lead).
Context: stack, branch, failing logs, diff, dashboard URL.
Goal: one concrete deliverable — root cause, checklist, plan, ticket list, runbook.
Constraints: what the AI MUST NOT do (don’t auto-fix, don’t invent file paths).
Output format: numbered findings, Markdown table, JSON, unified diff, runnable code.
Signal: the actual trace, plan, or diff — not a paraphrase of it.

12 copy-ready prompt templates

Variables use [brackets] so you can find-and-replace fast. Models from June 2026 (Opus 4.7, Sonnet 4.6, GPT-5.5, Gemini 3.1 Pro) all handle these; for the trace-heavy ones, prefer a 1M-context model.

1. p99 diff triage

p99 latency on `[endpoint]` jumped [fromMs]ms to [toMs]ms between `[oldSha]` and `[newSha]`. List 5 likely causes in priority order. For each: (a) suspicion strength, (b) one file:line or query to inspect, (c) one cheap check. Do not propose fixes yet. Cite only paths that appear in the diff I pasted.

Swap: endpoint, fromMs, toMs, oldSha, newSha

2. PR perf risk scan

Scan this PR diff for perf risks: (1) new synchronous I/O in a hot path, (2) loops calling DB/fetch inside, (3) a new large dep imported eagerly, (4) React re-render expansion (new context, unstable deps), (5) a missing index for a new query. Output file:line + severity. Ignore anything outside the diff.

3. N+1 hunter

In the function `[functionName]` at `[filePath]`, identify N+1 patterns: (a) loops calling DB/fetch, (b) Promise.all over single-item fetches, (c) recursive accessors hitting ORM lazy fields. For each, rewrite as a single batched call, with code.

Swap: functionName, filePath

4. Bundle size regression

Bundle grew from [oldKb] to [newKb] KB. Identify the top 3 contributors: (1) new direct deps and their gzipped size, (2) tree-shake failures (default import from a CJS-only library), (3) polyfill bloat (browserslist target changed?). Output one concrete fix per item.

Swap: oldKb, newKb

5. React render-storm diagnosis

Component `[component]` re-renders [nRenders] times per interaction. Diagnose: (1) unstable prop identity (objects/arrays created in render), (2) context provider value not memoized, (3) parent state too coarse, (4) a useEffect dep that changes every render. Output cause + the minimal fix.

Swap: component, nRenders

6. DB query plan regression

The plan for `[query]` changed: it was an index scan + nested loop, now a sequential scan + hash join. Diagnose: (1) stale statistics (ANALYZE overdue?), (2) cardinality estimate off, (3) new column or index mismatch, (4) parameter sniffing. Output the most likely cause + the exact ANALYZE / pg_stat_user_indexes command to confirm it.

Swap: query

7. Cold start regression

Serverless function `[fnName]` cold start went [fromMs] to [toMs] ms. Diagnose: (1) bundle size grew, (2) new top-level imports, (3) a new connection opened at boot, (4) a new env/secret fetch at module load. Output the top 3 by likelihood + a 5-minute experiment for each.

Swap: fnName, fromMs, toMs

8. TTFB / LCP regression

LCP on `[pagePath]` went [fromMs] to [toMs] ms (target: under 2500ms at p75). Walk the waterfall: (1) server response time / TTFB, (2) render-blocking CSS/JS, (3) image or font payload, (4) layout shift forcing a re-render. Pick the single dominant cause and the one change that recovers the most.

Swap: pagePath, fromMs, toMs

9. Memory growth regression

Service RSS grew from [oldMb] to [newMb] MB. Diagnose: (1) a new cache without eviction, (2) closures retaining large objects, (3) listener leaks (no removeListener on unmount/restart), (4) buffer pools sized too large. Give file:line for each suspect.

Swap: oldMb, newMb

10. Slow-test regression

The test suite went from [fromMin] to [toMin] min. Identify: (1) specific test files that grew, (2) setup/teardown bloat, (3) a real timer or sleep introduced, (4) reduced parallelism. Output 3 specific cleanups ranked by time saved.

Swap: fromMin, toMin

11. Perf-fix benchmark plan

Before fixing, design a benchmark: (1) a minimal reproducible scenario, (2) the metric (report median AND p99), (3) sample size and warmup, (4) the exact baseline run command. After the fix, re-run the identical benchmark 3 times. Do not declare a win without before/after numbers.

12. “Slow but acceptable” decision

A regression is real but small ([deltaMs]ms). Decide: (1) is the absolute p99 now above target/SLO? (2) is user impact measurable (conversion / bounce)? (3) is the fix more expensive than the regression? Output SHIP / FIX / REVERT + a one-line rationale.

Swap: deltaMs

Common mistakes

Optimizing without baseline numbers, so you can’t prove the fix worked.
Confusing p50 and p99 — a tail spike and a median shift have different fixes.
Trusting dev-only profiles; prod hot paths, data sizes, and caches differ.
Adding a cache before fixing the underlying N+1.
Splitting the bundle without measuring what was preloaded versus lazy-loaded.
Memoizing everything in React, which adds comparison overhead and unstable-dep bugs.
Starting the investigation before reading the deploy diff — most regressions are code, not infra.

How to push results further

Anchor every prompt to a metric, a sample size, and a diff window.
Keep p99 work separate from p50 work; they rarely share a root cause.
For React, capture the Profiler trace; don’t let the model guess from logs.
For DB, get the plan before and after with EXPLAIN (ANALYZE, BUFFERS).
Run benchmarks at least 3 times — variance hides and fakes regressions.
Treat cache as the last resort, after you’ve removed the real work.
Write the regression and its fix into the post-mortem so the next person doesn’t reintroduce it.

FAQ

How big a regression actually matters? Anything that pushes p99 above your SLO target. Below SLO, weigh it against fix cost using template 12.
What are the Core Web Vitals targets to optimize toward? As of June 2026: LCP under 2.5s, INP under 200ms, CLS under 0.1, each at the 75th percentile of real users. Google’s March 2026 core update has been tightening LCP toward 2.0s, so treat 2.0s as the safer goal.
Should I optimize before launch? Hit your Core Web Vitals targets, then ship. Pushing past the targets before launch is usually time you don’t get back.
Is React.memo always safe? No. memo with unstable props (objects, arrays, inline callbacks) does more comparison work and can be slower than no memo at all.
How do I find DB index gaps? Query pg_stat_user_indexes for unused indexes and pg_stat_user_tables for tables with heavy sequential scans, then confirm with EXPLAIN ANALYZE.
Can AI read a flamegraph? It reads text traces and Profiler JSON well. A visual flamegraph image needs a vision-capable model, or convert the trace to Markdown first (chperf, chrome-trace-analyzer) and paste that.

Tags: #Prompt #Coding #Performance #Audit