The deploy is mid-flight, the migration runner is stuck at ALTER TABLE orders ADD COLUMN ..., and nothing else is happening. CPU is flat, no errors. Postgres takes an ACCESS EXCLUSIVE lock for most DDL and waits behind any other transaction touching the table — even a sleepy SELECT from an analyst or an open pg_dump. Worse, while your migration waits, every new query queues behind it. Fix by finding the blocker, terminating it, and re-running with a lock_timeout so future migrations fail fast instead of stalling.
Common causes
Ordered by hit rate.
1. Long-running transaction holds a competing lock
A reporting query, a forgotten psql session, or an ORM transaction left open is holding a ROW EXCLUSIVE or ACCESS SHARE lock on the target table.
How to spot it: pg_stat_activity shows a long-running statement against the same table.
2. pg_dump running concurrently
pg_dump holds ACCESS SHARE on every table for the entire dump duration. ALTER TABLE will wait until the dump completes.
How to spot it: An application_name = 'pg_dump' session in pg_stat_activity.
3. Autovacuum on the same table
A non-aggressive autovacuum holds locks weak enough for queries but blocks DDL. Worse, VACUUM FULL or CLUSTER takes an ACCESS EXCLUSIVE itself.
How to spot it: autovacuum: VACUUM table in pg_stat_activity.query.
4. Idle in transaction from the app
Someone called BEGIN and never closed it (often via a debugger or a dropped connection that did not signal close).
How to spot it: state = 'idle in transaction' for many minutes.
5. The DDL itself is doing work, not blocked
ADD COLUMN ... DEFAULT non_constant (older Postgres) or ADD CONSTRAINT ... NOT VALID followed by VALIDATE CONSTRAINT does a full table scan. It is not stuck — it is working.
How to spot it: The session is active, not waiting on a lock. wait_event_type IS NULL.
Shortest path to fix
Step 1: Identify the blocker
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.usename AS blocking_user,
blocking.application_name,
blocking.state,
age(now(), blocking.xact_start) AS xact_age,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.query ILIKE 'ALTER TABLE%';
This returns who is blocking the migration. pg_blocking_pids() is exact and lighter than older joining-against-pg_locks recipes.
Step 2: Terminate the blocker (with care)
For most app sessions you can safely cancel:
-- Try cancel first (less brutal)
SELECT pg_cancel_backend(12345);
-- If it does not release, terminate
SELECT pg_terminate_backend(12345);
Do not blindly kill pg_dump if it is part of your backup pipeline — pause the migration and let the dump finish, or schedule the migration after the dump.
Step 3: Re-run with a lock_timeout
Avoid future stalls by making the migration fail fast and retry.
SET lock_timeout = '5s';
SET statement_timeout = '0'; -- DDL itself can take long
ALTER TABLE orders ADD COLUMN shipping_method text;
In a migration runner config (Flyway, Alembic, Prisma):
# alembic.ini snippet — execute before each migration
sqlalchemy.connect_args = { "options": "-c lock_timeout=5s" }
-- Flyway: set in the migration script itself
SET lock_timeout TO '5s';
If the lock cannot be acquired, the statement aborts with canceling statement due to lock timeout. Retry with backoff. Trains of small retries beat one long stall.
Step 4: Use a safer DDL pattern
For ADD COLUMN on a large table:
-- Postgres 11+: adding a nullable column is metadata-only
ALTER TABLE orders ADD COLUMN shipping_method text;
-- Adding a NOT NULL needs two steps
ALTER TABLE orders ADD COLUMN shipping_method text;
UPDATE orders SET shipping_method = 'standard' WHERE shipping_method IS NULL; -- in batches
ALTER TABLE orders ALTER COLUMN shipping_method SET NOT NULL;
For new indexes, use CREATE INDEX CONCURRENTLY (no ACCESS EXCLUSIVE):
CREATE INDEX CONCURRENTLY orders_shipping_method_idx
ON orders (shipping_method);
Note: cannot be inside a transaction; some migration runners need an opt-out (Alembic: op.create_index(..., postgresql_concurrently=True) with op.execute("COMMIT")).
Step 5: Validate the migration completed
\d+ orders -- column present
SELECT relname, n_live_tup FROM pg_stat_user_tables WHERE relname = 'orders';
SELECT * FROM pg_locks WHERE relation = 'orders'::regclass; -- no leftover locks
Prevention
- Every migration sets
lock_timeoutbetween 2 s and 30 s and is retry-safe. - Schedule migrations outside backup and heavy reporting windows.
- Use
CREATE INDEX CONCURRENTLYand avoidADD COLUMN ... DEFAULT non_constanton large tables. - Alert on
idle in transactionover 60 s — these are the usual suspects. - For risky migrations, dry-run on a recent prod-sized clone (e.g.,
pg_dump+pg_restoreto staging).