What about ORMs that auto-generate migrations (Prisma, Drizzle)?

Useful for the DDL skeleton; not trustworthy on backfill or lock risk. Always review the generated file with the prompt above before applying.

What if I have to drop a column with reads still in flight?

Two-phase. Phase 1: stop reading, deploy, monitor for residual reads. Phase 2: drop. Never combined.

Can AI help write the backfill script?

Yes, with the batched / resumable / progress-logged constraints. Without those constraints, AI writes a single-statement UPDATE that locks the table.

What about online schema change tools (gh-ost, pt-online-schema-change)?

For large MySQL tables they avoid the table-rewrite lock by copying into a shadow table. They differ in how they capture writes: `pt-online-schema-change` uses triggers (synchronous, keeps the shadow strictly consistent, supports `--resume` if a run dies) while `gh-ost` is triggerless — it tails the binlog (needs `ROW` format), throttles on replica lag, and lets you control the cut-over, but a dead run is lost with no resume. AI can draft the invocation; the hard constraints (a unique/primary key is required, no foreign keys with gh-ost, watching replica lag) you confirm against the tool docs.

How do I handle multiple migrations in one deploy?

Run them in order on the clone, end-to-end. Migrations are not always commutative. Pair the right code change with the right migration.

Which AI tool should write these?

Any frontier model handles the DDL. For an agent that can read your existing schema files, run the migration against a local clone, and read the timing output back, an agentic coding tool matters more than raw model rank — Claude Code (running Claude Opus 4.7 / Sonnet 4.6) or Cursor (Sonnet 4.6, Opus 4.7, GPT-5.5, Gemini 3.1 Pro) as of June 2026. See [Claude Code vs Cursor](/en/articles/claude-code-vs-cursor/). Whatever you use, the engine-specific lock review is yours, not the model's.

AI Tool Tutorials

AI-Assisted Database Migrations — Reversible, Backfilled, Tested

The three things AI gets wrong on migrations: reversibility, backfill, and the NOT NULL on a big table.

Published: May 24, 2026 Updated: Jun 04, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

The migrations that take prod down are almost never the obviously dangerous ones. They are the migrations that ran green in dev because the dev table had 100 rows. The classic foot-gun has a wrinkle most people get wrong: ALTER TABLE users ADD COLUMN locale TEXT NOT NULL DEFAULT 'en' is metadata-only and instant on Postgres 11+ when the default is a constant — but the same statement on MySQL 8 rewrites the entire 40-million-row table under lock, and on Postgres with a volatile default (now(), gen_random_uuid()) it also rewrites. The dangerous variant is adding NOT NULL to an existing column: that forces a full-table ACCESS EXCLUSIVE scan that blocks reads and writes for the duration. AI is happy to write any of these verbatim without telling you which engine rewrites. The workflow below uses AI for what it is good at (writing up/down pairs, generating backfill scripts) while keeping you in charge of the three things AI gets wrong: reversibility, backfill strategy, and the engine- and scale-dependent foot-guns.

TL;DR

Ask AI for three separate artifacts — up migration, down migration, backfill script — never one merged blob. Read every line for lock risk (the rules differ by engine: see the table below). Run the whole thing on a production-shaped clone, time it, run the down, run the up again. Ship the schema-tolerant code first, then the migration, then the schema-requiring code (expand/contract). Lint the SQL in CI with Squawk so the obvious foot-guns never reach review. Budget 30–60 minutes of prep for a 2-minute migration on a large table; that ratio is correct.

What this covers

A migration workflow built around the triangle of schema change + backfill + reversibility. How to prompt AI to generate up/down pairs, how to validate them on a production-shaped clone before they touch prod, and the specific class of changes (NOT NULL on an existing column, dropping columns with reads still in flight) where you should never trust an AI-generated migration without a manual review.

Who this is for

Backend engineers shipping schema changes weekly, full-stack devs whose framework auto-generates migrations (Prisma, Drizzle, Alembic, ActiveRecord), platform teams owning shared databases, and indie devs who only do migrations occasionally and forget the gotchas between attempts.

When to reach for it

Adding columns or tables (most common, mostly safe). Renaming a column with active reads (needs the expand/contract pattern). Dropping a column (only after readers stop using it). Backfilling a denormalized field from another table. Splitting one table into two. Migrations on tables with 100k+ rows where the runtime matters.

When this is NOT the right tool

Multi-region replication topology changes — needs ops review. Migrations that are part of a regulatory data move (GDPR deletion, audit logs) — read every line yourself. Migrations on a system where you do not have a clone to test against. “Just one quick migration in prod” — there is no such thing.

Before you start

Have a clone of production with realistic row counts. Not full data — sampled if needed — but the table sizes must match. A migration that runs in 200ms on 1k rows can run 18 minutes on 40M.
Confirm your migration framework supports both up and down. Some teams disable down; if so, you need a written rollback procedure as part of the migration.
Have the application code change ready in parallel. Schema migrations almost never ship alone — they pair with code that reads / writes the new column. Plan the deploy order.
Know which migrations require a maintenance window vs which run online. Adding a nullable column is usually online. Adding NOT NULL or rewriting a unique constraint usually is not.

The triangle

Every migration has three properties. Get any wrong and you ship pain:

Schema change — the actual DDL. AI writes the syntax well but does not know your engine version or row count, so it cannot tell you which statements lock.
Backfill — populating data for the new shape. AI writes this passably but often misses batching for large tables.
Reversibility — the down migration. AI writes a syntactic down but often not a correct one (a dropped column cannot be un-dropped with its data).

What actually locks (the part AI gets wrong)

The single biggest source of bad AI migrations is the assumption that “add a column” is dangerous and “change a constraint” is fine. The truth is engine-specific. This is the table to keep next to the review, accurate as of June 2026:

Operation	Postgres 11+	MySQL 8 / InnoDB
`ADD COLUMN` nullable, no default	Instant, metadata-only	Instant (INSTANT algorithm)
`ADD COLUMN ... DEFAULT [constant]`	Instant, metadata-only	Full table rewrite under lock
`ADD COLUMN ... DEFAULT [volatile]` (`now()`, `uuid`)	Full table rewrite, `ACCESS EXCLUSIVE`	Full table rewrite under lock
Add `NOT NULL` to an existing column	Full-table scan, `ACCESS EXCLUSIVE` (use the `NOT VALID` two-step)	Rewrite under lock
`CREATE INDEX`	Blocks writes — use `CREATE INDEX CONCURRENTLY`	Online by default, but watch replica lag
`DROP COLUMN`	Instant, metadata-only (data reclaimed later)	Rewrite under lock
`ALTER COLUMN TYPE`	Usually a rewrite (some no-ops, e.g. `varchar(50)`→`varchar(100)`)	Rewrite under lock

Two consequences AI never volunteers:

On Postgres, the safe way to make an existing column NOT NULL is the three-step NOT VALID dance, not a bare SET NOT NULL:

ALTER TABLE users ADD CONSTRAINT users_locale_not_null
  CHECK (locale IS NOT NULL) NOT VALID;        -- instant, no scan
ALTER TABLE users VALIDATE CONSTRAINT users_locale_not_null;  -- SHARE UPDATE EXCLUSIVE, allows reads/writes
ALTER TABLE users ALTER COLUMN locale SET NOT NULL;           -- fast, the validated CHECK proves it

VALIDATE CONSTRAINT takes only a SHARE UPDATE EXCLUSIVE lock, so concurrent reads and writes keep flowing while it scans.

A bare ALTER TABLE ... ADD CONSTRAINT (foreign key, check) takes an ACCESS EXCLUSIVE lock and, worse, every queued read stacks up behind it in Postgres’s FIFO lock queue — a 200ms validation can stall the whole table for the length of the slowest in-flight query. Always split into ADD CONSTRAINT ... NOT VALID then VALIDATE CONSTRAINT.

Step by step

Write the goal in one sentence. “Add locale column to users, default 'en', backfill for 40M existing rows, ship code to read it.” Vague goals like “add localization support” produce sprawl.
Ask AI for the up/down pair AND the backfill plan as three separate artifacts. Do not let it produce one merged blob.
```
git diff HEAD~1 -- migrations/  # see what AI generated
```
Read the up migration line by line against the lock table above. Specifically check: any SET NOT NULL or ADD CONSTRAINT without the NOT VALID split (red flag), any ADD COLUMN with a volatile default on Postgres or any default on MySQL (rewrite), any CREATE INDEX without CONCURRENTLY on Postgres (red flag), any ALTER COLUMN TYPE (usually a rewrite).
Read the down migration. Confirm it actually reverses the up. AI will sometimes write DROP COLUMN as a down for an ADD COLUMN — that destroys the backfilled data on rollback. If the data is recoverable from elsewhere, fine; if not, the down needs to fail loudly.
Validate the backfill plan. For tables over 100k rows, the backfill must be batched (e.g., 10k rows at a time, committing each batch, with a short sleep) and resumable on a key range. AI tends to produce a single UPDATE users SET locale = 'en' WHERE locale IS NULL — that holds row locks on every matched row for the length of one transaction and bloats the WAL.
Lint the SQL before a human even looks. Run Squawk (squawk migrations/*.sql) — it is a Rust linter that flags exactly this class of issue: non-CONCURRENTLY indexes, adding a NOT NULL column, blocking constraint validation, dangerous column drops. Wire its GitHub Action so every migration PR gets the check automatically.
Run the full migration on the prod-shaped clone. Time it. Run the application against the migrated clone. Run the down migration. Run the up again. Time everything.
Plan the deploy: deploy the code that tolerates both old and new schema FIRST, then run the migration, then deploy the code that requires the new schema. Expand / contract pattern.

A prompt that produces a real migration

I need a migration for {framework: Prisma / Drizzle / Alembic / etc}.

Goal: {one sentence — what schema state I want and why}

Current state:
- Table: {name}, approx {N} rows in prod
- Existing schema: {paste relevant DDL}

Produce THREE artifacts, separately:

1. UP migration — the DDL. Tell me the target engine and version,
   then annotate any statement that rewrites the table or takes an
   ACCESS EXCLUSIVE lock for THAT engine. For Postgres: use CREATE
   INDEX CONCURRENTLY; to add NOT NULL to an existing column use the
   ADD CONSTRAINT ... CHECK (col IS NOT NULL) NOT VALID, then VALIDATE
   CONSTRAINT, then SET NOT NULL three-step (NOT a bare SET NOT NULL).

2. DOWN migration — must actually reverse 1. If reversal would lose
   backfilled data, the down should raise an error with the recovery
   procedure rather than silently destroy data.

3. BACKFILL plan — if the migration needs backfill on an existing table
   with more than 100k rows, produce a batched, resumable script (not a
   single UPDATE). 10k rows per batch. Include progress logging.

Do NOT combine these into one blob. Do NOT generate seed data. Do NOT
"clean up" anything outside the migration scope.

Quality check

Up migration has no full-table lock on a large table. Any NOT NULL on an existing large column uses the NOT VALID → VALIDATE → SET NOT NULL three-step. Squawk passes (or every warning is consciously waived in the PR).
Down migration actually reverses the up. Where reversal destroys data, the down errors with a recovery procedure instead.
Backfill is batched and resumable for tables over 100k rows. Single-statement backfills on large tables get rejected.
Migration ran end-to-end on the prod-shaped clone, including down, then up again. Wall-clock time noted.
Application code that requires the new schema ships AFTER the migration, not in the same PR. Expand / contract order respected.
The migration ships with a one-line note in the PR description: “Estimated lock time: < 100ms” or “Requires 5-minute maintenance window.”

How to reuse this workflow

Save the three-artifact prompt. It is the single biggest win — separating up / down / backfill produces dramatically safer migrations than asking for one merged answer.
Automate the danger checklist instead of remembering it. Squawk in CI catches NOT NULL additions, non-CONCURRENTLY indexes, blocking constraint validation, and risky drops on every migration PR — it does not depend on the reviewer being awake.
Keep a one-page table of “migrations we have done and how long they took on the clone.” Future estimates anchor on real numbers, not guesses.
For frameworks that auto-generate migrations (Prisma, Drizzle), do NOT trust the auto-generated file blindly. Pipe it through the AI review prompt before applying.

Recommended workflow

One-sentence goal → AI generates up/down/backfill as three artifacts → human reviews each for lock risk and data safety → run on prod-shaped clone → time everything → write deploy order (code-tolerant → migration → code-required) → ship. For a column add on a 40M-row table, expect 30-60 minutes of prep for a 2-minute migration. That is the right ratio.

Common mistakes

Assuming ADD COLUMN ... NOT NULL DEFAULT 'foo' is always cheap (or always fatal). On Postgres 11+ a constant default is instant; on MySQL 8 the same statement rewrites the whole table under lock; a volatile default rewrites on both. Know your engine before you trust the migration.
Running a bare SET NOT NULL or ADD CONSTRAINT on an existing large column. Both take ACCESS EXCLUSIVE and force a full scan; on Postgres they also block the queued reads behind them. Use the NOT VALID → VALIDATE split.
Trusting the auto-generated down migration. Especially for ADD COLUMN, where the “obvious” down is DROP COLUMN — destroying any data added since.
Backfilling with a single UPDATE. Batched updates with sleeps are non-negotiable for large tables.
Shipping code that requires the new schema in the same PR as the migration. Either the migration fails and the code is broken, or the migration succeeds but the deploy ordering is fragile. Always two PRs.
Testing the migration on dev with 100 rows, then running on prod with 40M. Lock contention and runtime are non-linear.
Skipping the down test. If you have never run your down migration, it does not work.
Dropping a column while there are still readers. Always two-phase: stop reading, deploy, then drop.

FAQ

What about ORMs that auto-generate migrations (Prisma, Drizzle)?: Useful for the DDL skeleton; not trustworthy on backfill or lock risk. Always review the generated file with the prompt above before applying.
My framework does not support down migrations.: Then your “down” is a written recovery procedure attached to the PR. Test it on a clone.
What if I have to drop a column with reads still in flight?: Two-phase. Phase 1: stop reading, deploy, monitor for residual reads. Phase 2: drop. Never combined.
Can AI help write the backfill script?: Yes, with the batched / resumable / progress-logged constraints. Without those constraints, AI writes a single-statement UPDATE that locks the table.
What about online schema change tools (gh-ost, pt-online-schema-change)?: For large MySQL tables they avoid the table-rewrite lock by copying into a shadow table. They differ in how they capture writes: pt-online-schema-change uses triggers (synchronous, keeps the shadow strictly consistent, supports --resume if a run dies) while gh-ost is triggerless — it tails the binlog (needs ROW format), throttles on replica lag, and lets you control the cut-over, but a dead run is lost with no resume. AI can draft the invocation; the hard constraints (a unique/primary key is required, no foreign keys with gh-ost, watching replica lag) you confirm against the tool docs.
How do I handle multiple migrations in one deploy?: Run them in order on the clone, end-to-end. Migrations are not always commutative. Pair the right code change with the right migration.
Which AI tool should write these?: Any frontier model handles the DDL. For an agent that can read your existing schema files, run the migration against a local clone, and read the timing output back, an agentic coding tool matters more than raw model rank — Claude Code (running Claude Opus 4.7 / Sonnet 4.6) or Cursor (Sonnet 4.6, Opus 4.7, GPT-5.5, Gemini 3.1 Pro) as of June 2026. See Claude Code vs Cursor. Whatever you use, the engine-specific lock review is yours, not the model’s.

Tags: #AI coding #Workflow

TL;DR

What this covers

Who this is for

When to reach for it

When this is NOT the right tool

Before you start

The triangle

What actually locks (the part AI gets wrong)

Step by step

A prompt that produces a real migration

Quality check

How to reuse this workflow

Recommended workflow

Common mistakes

FAQ

Related

Related Articles

AI Changelog Generation: From Commits to a Release Note Humans Read

AI for Incident Postmortems Without Sanitizing the Lessons

AI Merge Conflict Resolution: When to Trust the Auto-Merge

AI On-Call Debugging: From Page to Fix Without Panic

AI PR Descriptions: From Diff to Reviewable

Aider Getting Started: Terminal AI Coding With Per-Edit Git Commits