AI-Assisted Database Migrations — Reversible, Backfilled, Tested

The three things AI gets wrong on migrations: reversibility, backfill, and the NOT NULL on a big table.

The migrations that take prod down are almost never the obviously dangerous ones. They are the migrations that ran green in dev because the dev table had 100 rows. The classic foot-gun: ALTER TABLE users ADD COLUMN locale TEXT NOT NULL DEFAULT 'en' on a 40-million-row table, locking writes for 18 minutes during a Tuesday afternoon traffic spike. AI is happy to write that migration verbatim. The workflow below uses AI for what AI is good at (writing up/down pairs, generating backfill scripts) while keeping you in charge of the three things AI gets wrong: reversibility, backfill strategy, and the scale-dependent foot-guns.

What this covers

A migration workflow built around the triangle of schema change + backfill + reversibility. How to prompt AI to generate up/down pairs, how to validate them on a production-shaped clone before they touch prod, and the specific class of changes (NOT NULL on a large table, dropping columns with reads still in flight) where you should never trust an AI-generated migration without a manual review.

Who this is for

Backend engineers shipping schema changes weekly, full-stack devs whose framework auto-generates migrations (Prisma, Drizzle, Alembic, ActiveRecord), platform teams owning shared databases, and indie devs who only do migrations occasionally and forget the gotchas between attempts.

When to reach for it

Adding columns or tables (most common, mostly safe). Renaming a column with active reads (needs the expand/contract pattern). Dropping a column (only after readers stop using it). Backfilling a denormalized field from another table. Splitting one table into two. Migrations on tables with 100k+ rows where the runtime matters.

When this is NOT the right tool

Multi-region replication topology changes — needs ops review. Migrations that are part of a regulatory data move (GDPR deletion, audit logs) — read every line yourself. Migrations on a system where you do not have a clone to test against. “Just one quick migration in prod” — there is no such thing.

Before you start

  • Have a clone of production with realistic row counts. Not full data — sampled if needed — but the table sizes must match. A migration that runs in 200ms on 1k rows can run 18 minutes on 40M.
  • Confirm your migration framework supports both up and down. Some teams disable down; if so, you need a written rollback procedure as part of the migration.
  • Have the application code change ready in parallel. Schema migrations almost never ship alone — they pair with code that reads / writes the new column. Plan the deploy order.
  • Know which migrations require a maintenance window vs which run online. Adding a nullable column is usually online. Adding NOT NULL or rewriting a unique constraint usually is not.

The triangle

Every migration has three properties. Get any wrong and you ship pain:

  1. Schema change — the actual DDL. AI writes this well.
  2. Backfill — populating data for the new shape. AI writes this passably but often misses batching for large tables.
  3. Reversibility — the down migration. AI writes a syntactic down but often not a correct one (a dropped column cannot be un-dropped with data).

Step by step

  1. Write the goal in one sentence. “Add locale column to users, default 'en', backfill for 40M existing rows, ship code to read it.” Vague goals like “add localization support” produce sprawl.
  2. Ask AI for the up/down pair AND the backfill plan as three separate artifacts. Do not let it produce one merged blob.
    git diff HEAD~1 -- migrations/  # see what AI generated
  3. Read the up migration line by line. Specifically check: any NOT NULL on an existing large table (red flag), any ALTER TABLE that takes a table-level lock (red flag), any index built without CONCURRENTLY (red flag on Postgres).
  4. Read the down migration. Confirm it actually reverses the up. AI will sometimes write DROP COLUMN as a down for an ADD COLUMN — that destroys the backfilled data on rollback. If the data is recoverable from elsewhere, fine; if not, the down needs to fail loudly.
  5. Validate the backfill plan. For tables over 100k rows, the backfill must be batched (e.g., 10k rows at a time with sleeps) and resumable. AI tends to produce a single UPDATE users SET locale = 'en' WHERE locale IS NULL — that locks the table.
  6. Run the full migration on the prod-shaped clone. Time it. Run the application against the migrated clone. Run the down migration. Run the up again. Time everything.
  7. Plan the deploy: deploy the code that tolerates both old and new schema FIRST, then run the migration, then deploy the code that requires the new schema. Expand / contract pattern.

A prompt that produces a real migration

I need a migration for \{framework: Prisma / Drizzle / Alembic / etc\}.

Goal: \{one sentence — what schema state I want and why\}

Current state:
- Table: \{name\}, approx \{N\} rows in prod
- Existing schema: \{paste relevant DDL\}

Produce THREE artifacts, separately:

1. UP migration — the DDL. Annotate any statement that takes a
   table-level lock or that rewrites the whole table. For Postgres,
   prefer CONCURRENTLY for indexes. For any NOT NULL on an existing
   column, split into add-nullable, backfill, then SET NOT NULL.

2. DOWN migration — must actually reverse 1. If reversal would lose
   backfilled data, the down should raise an error with the recovery
   procedure rather than silently destroy data.

3. BACKFILL plan — if the migration needs backfill on an existing table
   with more than 100k rows, produce a batched, resumable script (not a
   single UPDATE). 10k rows per batch. Include progress logging.

Do NOT combine these into one blob. Do NOT generate seed data. Do NOT
"clean up" anything outside the migration scope.

Quality check

  • Up migration has no full-table lock on a large table. Any NOT NULL on an existing large column is split into the three-step expand pattern.
  • Down migration actually reverses the up. Where reversal destroys data, the down errors with a recovery procedure instead.
  • Backfill is batched and resumable for tables over 100k rows. Single-statement backfills on large tables get rejected.
  • Migration ran end-to-end on the prod-shaped clone, including down, then up again. Wall-clock time noted.
  • Application code that requires the new schema ships AFTER the migration, not in the same PR. Expand / contract order respected.
  • The migration ships with a one-line note in the PR description: “Estimated lock time: < 100ms” or “Requires 5-minute maintenance window.”

How to reuse this workflow

  • Save the three-artifact prompt. It is the single biggest win — separating up / down / backfill produces dramatically safer migrations than asking for one merged answer.
  • Build a “migration danger checklist” — NOT NULL on large table, non-CONCURRENTLY index, table-level locks, dropping a column with live reads. Run it on every diff.
  • Keep a one-page table of “migrations we have done and how long they took on the clone.” Future estimates anchor on real numbers, not guesses.
  • For frameworks that auto-generate migrations (Prisma, Drizzle), do NOT trust the auto-generated file blindly. Pipe it through the AI review prompt before applying.

One-sentence goal → AI generates up/down/backfill as three artifacts → human reviews each for lock risk and data safety → run on prod-shaped clone → time everything → write deploy order (code-tolerant → migration → code-required) → ship. For a column add on a 40M-row table, expect 30-60 minutes of prep for a 2-minute migration. That is the right ratio.

Common mistakes

  • Letting AI write ADD COLUMN ... NOT NULL DEFAULT 'foo' on a multi-million-row table. Locks writes for the duration. Split into nullable add, backfill, SET NOT NULL.
  • Trusting the auto-generated down migration. Especially for ADD COLUMN, where the “obvious” down is DROP COLUMN — destroying any data added since.
  • Backfilling with a single UPDATE. Batched updates with sleeps are non-negotiable for large tables.
  • Shipping code that requires the new schema in the same PR as the migration. Either the migration fails and the code is broken, or the migration succeeds but the deploy ordering is fragile. Always two PRs.
  • Testing the migration on dev with 100 rows, then running on prod with 40M. Lock contention and runtime are non-linear.
  • Skipping the down test. If you have never run your down migration, it does not work.
  • Dropping a column while there are still readers. Always two-phase: stop reading, deploy, then drop.

FAQ

  • What about ORMs that auto-generate migrations (Prisma, Drizzle)?: Useful for the DDL skeleton; not trustworthy on backfill or lock risk. Always review the generated file with the prompt above before applying.
  • My framework does not support down migrations.: Then your “down” is a written recovery procedure attached to the PR. Test it on a clone.
  • What if I have to drop a column with reads still in flight?: Two-phase. Phase 1: stop reading, deploy, monitor for residual reads. Phase 2: drop. Never combined.
  • Can AI help write the backfill script?: Yes, with the batched / resumable / progress-logged constraints. Without those constraints, AI writes a single-statement UPDATE that locks the table.
  • What about online schema change tools (gh-ost, pt-online-schema-change)?: AI can help write the invocation, but read the tool docs yourself for the constraints (must have primary key, replica lag, etc).
  • How do I handle multiple migrations in one deploy?: Run them in order on the clone, end-to-end. Migrations are not always commutative. Pair the right code change with the right migration.

Tags: #AI coding #Workflow