The classic AI-drafted postmortem reads like a press release: balanced, well-organized, and stripped of every uncomfortable truth that would have made it useful. The model rounds off “we deployed unreviewed code at 5pm Friday” into “a recent change interacted unexpectedly with our infrastructure.” This workflow uses AI as a fast first-draft engine while keeping the incident commander (IC) in charge of every line where honesty matters more than comfort.
What this covers
A workflow for using AI to turn raw incident artifacts (Slack thread, oncall doc, timeline notes, metrics screenshots described in text) into a structured postmortem — Summary, Impact, Timeline, Root cause, Contributing factors, Action items — without letting the model sanitize the lessons that make the doc worth writing.
Who this is for
ICs writing the doc after a 2am page, SREs running blameless postmortem culture, engineering managers who own the action items, and small teams without a dedicated incident process where everyone is the IC sometimes.
When to reach for it
Incidents big enough to warrant a written doc — customer impact, data loss, multi-hour outage, repeated near-miss. Teams that already have a postmortem template; AI is faster filling a known structure than inventing one. Cases where the Slack thread is the primary source of truth and re-reading 600 messages by hand will take 90 minutes.
When this is NOT the right tool
Security incidents — the writeup is legal-sensitive and AI cannot be trusted with phrasing on liability. Postmortems where the root cause is interpersonal (a hand-off failure, a missed escalation) — AI cannot read the room and will smooth over the part that matters. Tiny incidents (5-minute blip, no customer impact) — skip the doc.
Before you start
- Collect the artifacts in one place: the Slack incident channel export, the oncall on-call doc / runbook used, any timeline notes the IC took live, dashboard screenshots described in prose, and the actual fix (commit SHA or PR link).
- Have a template ready. The five-section pattern (Summary, Impact, Timeline, Root cause, Action items) is fine; whatever your org uses, give it to the model.
- Decide who reviews before the doc goes wide. Always at least the IC and one engineer who was hands-on during the incident.
- Block 60-90 minutes within 48 hours of the incident. Memory degrades fast and is the most valuable input.
Step by step
- Dump the artifacts. Export the Slack channel, paste the oncall doc, paste the timeline notes. Mark each block with a label so the model knows what it is reading.
- Ask for a timeline FIRST, before any narrative. “Build a timeline with timestamps and one-line events. Source each line by quoting the Slack message or note. No interpretation yet.” This catches missing data early.
- Review the timeline. Fill gaps — the IC always remembers details that did not make it into Slack. Add them with
[IC note: ...]so the next pass treats them as authoritative. - Now ask for the Summary, Impact, Root cause sections. “Use only the timeline above. Do NOT invent any factor that is not on the timeline. If you would speculate, write ‘[needs IC input]’ instead.” This is the single most important constraint.
- Run a 5 Whys with the AI as a sparring partner, not a writer. “Here is the proximate cause. Ask me ‘why’ five times. I will answer each. Then summarize.” This keeps you doing the thinking and the AI doing the prompting.
- Ask for draft action items — but only as a list of options grouped by category (prevent, detect, respond). The IC picks which ones ship. AI tends to overload action lists; trim to 3-5 actionable items with owners.
- IC reads the whole draft against the original artifacts. Anywhere the doc is more comfortable than the artifacts justify, push back. “The Slack thread shows we ignored the alert for 22 minutes. The doc should say that.”
A prompt that produces honest output
You are helping draft an incident postmortem. I am the IC.
Inputs:
[SLACK CHANNEL EXPORT]
\{paste\}
[ONCALL DOC USED]
\{paste\}
[IC TIMELINE NOTES]
\{paste\}
[FIX]
\{commit SHA + PR link + one-line description\}
Produce:
1. Timeline — timestamps in UTC, one line per event, each line cites the
source (Slack msg, IC note, dashboard). Do NOT include events not in
the inputs.
2. Summary (3-4 sentences). Plain language. Do not soften the cause.
3. Impact (numbers — duration, customers affected, $ if known).
4. Root cause — one paragraph, only what the timeline supports. If you
would speculate, write "[needs IC input]" and I will fill in.
5. Contributing factors — list of 2-5 items. Same speculation rule.
6. Draft action items — categorize as Prevent / Detect / Respond. List
up to 8 options; I will trim. Each item needs a candidate owner role
(not a person — "Platform team", "On-call rotation").
Rules:
- Blameless tone (no individual names in blame contexts) but NOT
blame-free. "The deploy went out without review" is fair.
- Do NOT round off uncomfortable facts. "Alert was ignored for 22 min"
stays as written.
- If a fact in the inputs contradicts itself, surface both and tag
"[conflict — needs IC]".
Quality check
- Every fact in the doc traces to a source — Slack message, IC note, dashboard, code link. Untraceable claims get cut or marked
[needs IC input]. - The Root cause section names the actual mechanism, not a euphemism. “We removed the canary check to ship faster” is fine. “Our deployment process did not catch the issue” is a sanitized version of the same fact.
- Action items have a candidate owner role and a category. Lists of 12 unowned items are wishes, not work.
- The 5 Whys, if done, is in the doc as a sub-section with the IC’s actual answers. Not paraphrased.
- The IC has read every sentence and would be comfortable defending it to the team. If you would not say it aloud, do not ship it.
How to reuse this workflow
- Save the prompt as your team’s template. Each new incident starts from the same scaffold; only the inputs change.
- Build a tiny “incident export kit” — a script that pulls the Slack channel, runs
gh pr viewon the fix PR, and assembles a single pastable document. Removes 20 minutes of friction. - After each postmortem, review which sections the AI got close on and which sections needed heavy IC rewriting. Adjust the prompt.
- Keep a running file of “sanitization patterns the model uses” — phrases it reaches for that hide truth (“interacted unexpectedly”, “process gap”). Tell future prompts to avoid them.
Recommended workflow
Artifacts collected → AI builds sourced timeline → IC fills gaps → AI drafts Summary / Impact / Root cause with no-speculation rule → AI conducts 5 Whys as questioner → IC picks 3-5 action items from AI’s options → IC reads against the originals and re-hardens the language. For a 4-hour incident with 600 Slack messages, this takes 60-90 minutes vs 3+ hours by hand.
Common mistakes
- Letting AI write the Root cause from the Slack thread directly, no timeline pass. The doc ends up vague because the input was chronologically jumbled.
- Skipping the source-citation rule. The model speculates, the speculation reads plausibly, and a wrong “root cause” enters team folklore.
- Accepting AI’s softened phrasing because it sounds professional. The whole point of a postmortem is to be uncomfortable to read.
- Action items written by AI without an owner. They never get done.
- Running the 5 Whys with AI as the writer. AI converges too fast to a tidy answer. Use it as the questioner only.
- Sharing the doc without IC review. The single rule that prevents most postmortem-quality regressions.
FAQ
- What about blameless culture?: Blameless is about not assigning fault to individuals. It is not about hiding what happened. “The deploy was unreviewed” is blameless and accurate.
- Can the AI run the postmortem meeting?: No. Use the doc as the meeting input. Humans facilitate.
- What if Slack messages are in a different language?: Modern models handle multilingual Slack threads well, but verify timestamps and quoted strings line up. Mixed-language teams often have richer signal in Slack than in formal docs — preserve both.
- Should action items go in a tracker automatically?: Yes, but after IC review. AI’s “draft action items” are options, not commitments.
- Can I include customer-facing wording in the same doc?: Better to keep them separate. The customer note is shorter and lawyer-reviewed; the internal postmortem is honest and detailed.
Related
Tags: #AI coding #Workflow