Should I let AI auto-revert?

No. Auto-revert is acceptable when the signal is unambiguous (main goes red right after a merge), but a human should sign off on the actual `git revert`.

Can AI find the offending dependency?

Often yes, by reading changelogs plus the lockfile diff. Confirm by pinning that one package to its previous version.

Is "retry the test" a valid fix?

Only as a tag-it-flaky stopgap. With roughly two-thirds of retried failures being genuine flake, the other third are real bugs a blind retry will hide — fix the source of non-determinism.

How long before I revert?

15 minutes. Beyond that you're spending the whole team's time on a red main.

Should every red CI block all PRs?

Block merges to main, not work on PR branches. Other devs need to keep shipping.

What do I do about cache poison?

Identify the poisoned key, flush only that key (not the whole cache), and document it in the post-mortem.

Prompt Library

Build Failure Investigation Prompts: 12 Templates for Red CI

Stop guessing at red CI. 12 prompt templates that narrow build and test failures by environment, cache, dependency, flake, and order of operations.

Published: May 19, 2026 Updated: Jun 04, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

“CI is red” is not a problem statement. A useful build-failure prompt names the failing job, the diff that broke it, and the environment that runs it, then bisects between code, deps, cache, and flake. Everything else is throwing wrenches at the machine.

The stakes are higher than they look. A 2026 empirical study of GitHub Actions found that 3.2% of builds get rerun, and 67.7% of those reruns turn green on retry with no code change — meaning the majority of “fix it by clicking re-run” failures are flake, not signal (arXiv 2602.02307). The prompts below exist to tell those two cases apart fast, before you waste an evening or land a real bug because you assumed it was flaky.

TL;DR

Always read the first error in the log, not the last. Downstream noise hides the cause.
Feed the AI the full log, the diff, and the CI environment (Node/OS/lockfile sha). Without those three it guesses.
Roughly two-thirds of “passed on retry” failures are genuine flake; the rest are real timing bugs that retries will mask. Use prompt 3 to decide.
Time-box: 15 minutes of investigation, then revert and fix in a branch.
Paste these into Claude Code, Cursor, ChatGPT, or Gemini — any model with a 1M-token context window (Opus 4.7, Sonnet 4.6, Gemini 3.1 Pro, GPT-5.5) can hold a full CI log plus the diff in one shot.

Who this is for

Engineers staring at a red main branch, release captains debugging a blocked merge, and indie devs who lost an evening to a green-then-red pipeline.

Which model to paste these into

CI logs are long, so context window matters more than raw reasoning here. As of June 2026:

Tool / model	Context	Why it fits CI logs
Claude Code (Opus 4.7 / Sonnet 4.6)	1M tokens	Reads the repo + workflow YAML + log together; runs the confirming command for you
Cursor (Sonnet 4.6, GPT-5.5, Gemini 3.1 Pro)	up to 1M	In-editor; can open the failing test file alongside the log
Gemini 3.1 Pro	1M tokens	Cheapest at $2/$12 per 1M tokens (in/out) for bulk log dumps
ChatGPT Plus (GPT-5.5)	~320 pages in-app	Fine for a single job log; full 1M needs the $200 Pro tier

For a giant multi-job log, prefer a 1M-context tool. For a single failing step, any of them works.

Prompt anatomy

Every build-failure prompt should carry six elements:

Role: who the AI plays (SRE, release captain, staff engineer, QA lead).
Context: stack, branch, failing log, diff, dashboard URL.
Goal: one concrete deliverable — root cause, checklist, plan, ticket list, runbook.
Constraints: what the AI must not do (don’t auto-fix, don’t invent file paths).
Output format: numbered findings, markdown table, JSON, unified diff, runnable code.
Signal: 1-2 examples of a “good” output, or a counter-example.

When not to use these prompts

Don’t reach for them on a failure you haven’t read the log of — paste the log first. And don’t use them to “fix” CI by silencing the failing job; a muted check is a future incident.

12 copy-ready prompt templates

Replace [bracketed] placeholders with your real values before sending.

1. Read this log, find the root

Here is the failing CI log: [log]. Identify the FIRST real error (not symptoms downstream). Output: (1) the error line, (2) the most likely cause among code bug / dep mismatch / cache poison / env diff / flake, (3) the next 1-2 commands to confirm. No speculation — only what the log supports.

Swap: [log] = the full failing job log

2. Local vs CI environment diff

I cannot reproduce a CI failure locally. Compare these envs: local node [nodeLocal], CI node [nodeCI], OS, env vars, lockfile drift, cached vs fresh install. Output the 5 most likely diffs and one command each to verify.

Swap: [nodeLocal], [nodeCI]

3. Flake vs real failure

A test failed once on CI but passes on retry. Decide flake vs real: (1) Look at the diff that landed before the run — does it touch the test's subject? (2) Check the failure frequency over the last 7 days. (3) Inspect the error for non-deterministic terms (timeout / Date.now / random / port / order). Output: probability of flake (0-1), reasoning, and the next action.

4. Cache poison diagnosis

CI succeeded last run and fails now on the same diff. Suspect cache. Check: (1) Last cache key change, (2) Lockfile changes that altered hoist resolution, (3) Postinstall scripts that read env, (4) Restore-time anomalies. Output: the most likely cache layer plus the single cache key to flush.

5. Dependency drift hunter

The lockfile changed in this commit. Find the upgraded package(s) most likely to have broken CI: list each upgrade with old -> new version, type (direct / transitive), and changelog highlights for the version jump. Don't propose downgrades — propose investigation order.

6. Order-of-operations failure

Build passes locally and on CI in isolation but fails when run after another job. Trace likely sources: (1) shared cache between jobs, (2) env vars set by a previous job, (3) DB state not reset, (4) file artifacts leaked. Output: 4 checks ordered cheapest first.

7. “Cannot find module” specialist

CI fails with `Cannot find module [modName]`. Identify the cause: (1) dep mis-listed (in devDependencies but needed at build), (2) workspace package not built first, (3) case-sensitivity (works on macOS, fails on Linux), (4) path alias not resolved. Output: probable cause plus fix.

Swap: [modName]

8. Out-of-memory CI diagnosis

CI failed with OOM (heap out of memory). Decide: (1) Is the build itself heavier (new deps, larger bundle)? (2) Is a test leaking memory? (3) Is concurrency too high for the runner? Output: the cheapest experiment first — usually setting NODE_OPTIONS=--max-old-space-size=4096 to confirm whether it's a memory ceiling or runaway growth.

9. Timeout vs hang

A CI step hit its timeout. Distinguish slow from hung: (1) If the last log line is right after a network call, it's hung on a dependency. (2) If it's mid-test, it's hung on an unresolved async. (3) If logs progressed steadily until cutoff, it's just slow. Pick one diagnosis and cite the evidence line.

10. Revert or fix-forward decision

Main is broken. Decide revert vs fix-forward: (1) If a fix-forward takes under 15 min and the cause is known, fix forward. (2) Otherwise revert, restore green main, and re-attempt in a branch. Confirm which criterion applies, then output the exact commands (git revert SHA, then open the PR).

11. Bisect within a single PR’s commits

A PR has 8 commits; the last fails CI. Bisect to the offending commit without running 8 builds: (1) Group commits by file area, (2) Identify the most-likely commit by static reasoning (which one introduces the imports or config the failure touches), (3) Run CI on that commit only to confirm.

12. Post-mortem from a CI outage

CI was red for 4 hours today. Write a brief blameless post-mortem: (1) Trigger commit, (2) Time to detect, (3) Time to revert, (4) Why the bad commit landed (test gap / missing CI check / skipped review), (5) One follow-up action with an owner. 200 words max.

Common mistakes

Reading the LAST error line instead of the first — downstream symptoms bury the cause.
Adding retries to “fix” flake — this masks real timing and ordering bugs.
Bumping dep versions to “see if it helps” — every bump adds a variable.
Reverting before identifying the cause — the next person re-lands the same bug.
Mass-clearing caches — works once, then the real fix never lands.
Disabling the failing test — that just moves the bug into production.
Skipping the post-mortem on a long outage — same outage returns in two months.

How to push results further

Capture the CI env (Node version, OS, lockfile sha) before you change anything.
Flakes have signatures: timeouts, Date.now, random, port collisions, test ordering, network. Real bugs carry a stack trace into your code.
Cache the global store, not node_modules. The actions/cache v4 guidance is to cache ~/.npm and let npm ci reassemble from a lockfile-keyed cache, which stays reproducible across Node versions.
Turn on deep logging when stuck: set the ACTIONS_STEP_DEBUG and ACTIONS_RUNNER_DEBUG repository secrets to true for a single failing run.
Catch workflow mistakes before they hit CI: run actionlint and dry-run jobs with nektos/act locally.
Use npm ci, never npm install, in CI so the lockfile is authoritative.
After every long outage, add one CI check that would have caught it.

FAQ

Should I let AI auto-revert? No. Auto-revert is acceptable when the signal is unambiguous (main goes red right after a merge), but a human should sign off on the actual git revert.
Can AI find the offending dependency? Often yes, by reading changelogs plus the lockfile diff. Confirm by pinning that one package to its previous version.
Is “retry the test” a valid fix? Only as a tag-it-flaky stopgap. With roughly two-thirds of retried failures being genuine flake, the other third are real bugs a blind retry will hide — fix the source of non-determinism.
How long before I revert? 15 minutes. Beyond that you’re spending the whole team’s time on a red main.
Should every red CI block all PRs? Block merges to main, not work on PR branches. Other devs need to keep shipping.
What do I do about cache poison? Identify the poisoned key, flush only that key (not the whole cache), and document it in the post-mortem.

Tags: #Prompt #Coding #CI #Build #Debugging