Build Failure Investigation Prompts: 12 Templates for Red CI

Stop guessing at red CI. 12 prompt templates for narrowing build / test failures by environment, cache, dependency, flake, and order of operations.

“CI is red” is not a problem statement. A good build-failure prompt names the failing job, the diff that broke it, and the env that runs it — then bisects between code, deps, cache, and flake. Anything else is throwing wrenches at the machine.

Who this is for

Engineers on a red main branch, release captains debugging blocked merges, indie devs who lost an evening to a green-then-red CI.

When not to use these prompts

Don’t use these for build failures you haven’t even read the log of. Don’t use them to “fix” CI by silencing the failing job.

Prompt anatomy / structure formula

Every build failure prompt should carry six elements:

  • Role: who AI plays (SRE / release captain / staff engineer / QA lead).
  • Context: stack / branch / failing logs / diff / dashboard URL.
  • Goal: one concrete deliverable — root cause, checklist, plan, ticket list, runbook.
  • Constraints: what AI MUST NOT do (don’t auto-fix, don’t hallucinate file paths).
  • Output format: numbered findings, markdown table, JSON, unified diff, runnable code.
  • Examples / signal: 1-2 “good” output examples, or counter-examples.

Best for

  • Narrowing a red CI to the right diff
  • Distinguishing flake from real failure
  • Diffing CI env vs local env
  • Cache-corruption investigations
  • Decision: revert vs hotfix forward

12 copy-ready prompt templates

1. Read this log, find the root

Here is the failing CI log: {log}. Identify the FIRST real error (not symptoms downstream). Output: (1) the error line, (2) the most likely cause among: code bug / dep mismatch / cache poison / env diff / flake, (3) the next 1-2 commands to confirm. No speculation — only what the log supports.

Variables to swap: log — full failing job log

2. Local vs CI env diff

I cannot reproduce a CI failure locally. Compare these envs: local node {nodeLocal}, CI node {nodeCI}, OS, env vars, lockfile drift, cached vs fresh install. Output the 5 most likely diffs and one command each to verify.

Variables to swap: nodeLocal, nodeCI

3. Flake vs real failure

A test failed once on CI but passes on retry. Decide flake vs real: (1) Look at the diff that landed before the run — does it touch the test's subject? (2) Check the failure frequency last 7 days, (3) Inspect the error for non-deterministic terms (timeout / Date.now / random). Output: probability flake (0-1), reasoning, and what to do next.

4. Cache poison diagnosis

CI succeeded last run, fails now, same diff. Suspect cache. Check: (1) Last cache key change, (2) Lockfile changes that altered hoist resolution, (3) Postinstall scripts that read env, (4) Restore time anomalies. Output: most likely cache layer + the one cache flush command to try.

5. Dependency drift hunter

Lockfile changed in this commit. Find the actual upgraded package(s) most likely to have broken CI: list each upgrade with old → new version, type (direct / transitive), and changelog highlights for the version jump. Don't propose downgrades — propose investigation order.

6. Order-of-operations failure

Build passes locally and on CI in isolation, fails when run after another job. Trace likely sources: (1) shared cache between jobs, (2) env vars set by previous job, (3) DB state not reset, (4) file artifacts leaked. Output: 4 checks ordered cheapest first.

7. “Cannot find module” specialist

CI fails with `Cannot find module {modName}`. Identify the cause: (1) deps mis-listed (in devDeps but needed at build), (2) workspace package not built first, (3) case-sensitivity (works on Mac / fails on Linux), (4) path alias not resolved. Output: probable cause + fix.

Variables to swap: modName

8. Out-of-memory CI diagnosis

CI failed with OOM. Decide: (1) Is the build itself heavier (new deps, larger bundle)? (2) Is a test leaking memory? (3) Is concurrency too high? Output: cheapest experiment first — usually `NODE_OPTIONS=--max-old-space-size=4096` to confirm if it's memory or runaway.

9. Timeout vs hang

A CI step hit timeout. Distinguish slow vs hung: (1) Show the last log line — if it's after a network call, hang on dep. (2) If it's mid-test, hang on async. (3) If logs progress steadily, slow. Pick one diagnosis with evidence.

10. Revert or fix forward decision

Main is broken. Decide revert vs fix-forward: (1) Time to fix forward < 15 min? Fix forward, (2) Else revert, restore green main, re-attempt in a branch. Confirm criteria, then output the exact commands (revert SHA + open PR).

11. Bisect within a single PR’s commits

A PR has 8 commits. Last commit fails CI. Bisect to find the offending commit without running 8 builds: (1) Group commits by file area, (2) Identify the most-likely commit by static reasoning (which one introduces the imports / config touched by the failure), (3) Run CI on that commit only.

12. Post-mortem from a CI outage

CI was red for 4 hours today. Generate a brief post-mortem: (1) Trigger commit, (2) Time to detect, (3) Time to revert, (4) Why the bad commit landed (test gap / missing CI check), (5) One follow-up. 200 words max. No blame.

Common mistakes

  • Reading the LAST error line, not the first one — downstream symptoms hide the cause.
  • Adding retries to “fix” flake — masks real timing bugs.
  • Bumping dep versions to “see if it helps” — adds variables.
  • Reverting before identifying the cause — the next person re-lands the same bug.
  • Mass-clearing caches — works once, then the real fix never lands.
  • Disabling the failing test — moves the bug into production.
  • Skipping the post-mortem on long outages — same outage in 2 months.

How to push results further

  • Always start at the first error, not the last.
  • Capture the CI env (node, OS, lockfile sha) before changing anything.
  • Flakes have signatures: timeouts, Date.now, random, network. Real bugs have stack traces in your code.
  • Time-box: 15 min investigation → revert. Fix in a branch.
  • Make CI logs traceable — fail with explicit context, not just exit 1.
  • After each long outage, add one CI check that would have caught it.
  • Keep npm ci reproducible — never npm install in CI.

Practical depth notes

Use these prompts as starting points, not final answers. For Build Failure Investigation Prompts: 12 Templates for Red CI, the useful extra work is to replace every generic placeholder with a real constraint: audience, channel, length, brand voice, examples to imitate, and examples to avoid. Run at least two versions with different constraints, then compare the outputs side by side instead of accepting the first polished response.

A good result should pass three checks: it is specific enough that another person could reuse it, it avoids vague praise or filler, and it gives you an editable artifact rather than a broad suggestion. If the output feels generic, add one concrete reference, one forbidden pattern, and one measurable success criterion before rerunning the prompt.

FAQ

  • Should I let AI auto-revert?: No. Auto-revert is fine if signal is unambiguous (e.g., main red after merge), but a human should sign off.
  • Can AI find the offending dependency?: Often yes, by reading changelogs + lockfile diff. Confirm with a one-version pin.
  • Is “retry the test” a valid fix?: Only as a tag-with-flaky stopgap. Real fixes remove the source of non-determinism.
  • How long before I revert?: 15 minutes. Beyond that you’re on someone else’s time.
  • Should every red CI block all PRs?: Block merges to main, not PR branches. Other devs need to keep working.
  • What do I do about cache poison?: Identify the poison key, flush only that key, document in the post-mortem.

Tags: #Prompt #Coding #CI #Build #Debugging