AI-Generated SQL Locks a Hot Table for Minutes

Q: Why is `FOR UPDATE SKIP LOCKED` important?

Without `SKIP LOCKED`, a chunked worker blocks behind whatever locked the same rows last. `SKIP LOCKED` makes workers walk around contended rows and keep moving — essential for parallel jobs.

Q: My ORM generated the bulk query, not the AI. Same fix?

Yes. ActiveRecord `update_all`, Sequelize `bulkUpdate`, Prisma `updateMany` all default to a single-statement bulk write. Configure batching at the ORM level or rewrite the call.

Q: I cancelled the query but latency stayed high. Why?

Cancelling a large `UPDATE`/`DELETE` rolls it back, and the rollback also touches every changed row and holds locks while it unwinds, so recovery is not instant. Watch the backend disappear from `pg_stat_activity` before assuming it is done. This is a strong argument for batching: a 5000-row batch rolls back in milliseconds.

Q: `FOR UPDATE` vs `FOR NO KEY UPDATE` — which should I use?

Use `FOR NO KEY UPDATE` unless you are changing a primary key or a column referenced by a foreign key on those rows. It is a weaker lock that still prevents two workers from grabbing the same row, but does not block inserts of child rows that reference these via FK — fewer false conflicts on a busy table.

Q: How do I find what is blocking what, not just that something is blocked?

Query `pg_locks` joined to `pg_stat_activity`, or use the blocking-pid helper: `SELECT pid, pg_blocking_pids(pid) AS blocked_by, query FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;`. The `blocked_by` array tells you exactly which backend to cancel first.

AI wrote an UPDATE or DELETE that holds long locks on a production Postgres table and blocks reads and writes. Cancel it, then rewrite it into lock-friendly batches.

Published: May 24, 2026 Updated: Jun 17, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Your events table is taking writes at 800/sec. The AI wrote a “quick cleanup script” — a single UPDATE events SET status = 'archived' WHERE created_at < now() - interval '90 days'. You run it. Forty seconds later your pager fires: write latency on events jumped from 5ms to 12 seconds, the ingest queue is backing up, customer dashboards are timing out. The query is still running, holding row locks on three million rows, and blocking every concurrent writer. AI-generated SQL reads cleanly because it expresses intent. What it does not model is concurrency, lock granularity, and the cost of holding one transaction open against a hot table.

Fastest fix (do this first): cancel the running statement with SELECT pg_cancel_backend(<pid>) to stop the bleeding, then re-run the cleanup as small batches (LIMIT 5000 + FOR UPDATE SKIP LOCKED in a loop) instead of one giant statement. The rest of this page is how to find the right pid, rewrite the query, and stop it recurring. Examples use PostgreSQL (current stable is PostgreSQL 18 as of June 2026); the same patterns apply to MySQL/InnoDB with different syntax.

Which bucket are you in?

Identify the pattern before you touch anything — the fix differs by cause.

Symptom you see	Likely cause	Jump to
One statement active 30s+, hundreds of blocked sessions	Unbounded UPDATE/DELETE	Cause 1, Step 2
Reads fine, UPDATEs queue behind your txn	`FOR UPDATE` with no `LIMIT`	Cause 2, Step 4
Lock count far exceeds affected rows; `EXPLAIN` shows `Seq Scan`	Missing index	Cause 3, Step 3
`idle in transaction` session held for tens of seconds	App-side work inside `BEGIN`/`COMMIT`	Cause 4, Step 5
Relation-level lock (not tuple) on a low-cardinality predicate	Lock escalation pressure	Cause 5
Source and destination tables both locked	Writable CTE	Cause 6
Everything blocks, including `SELECT`	`VACUUM FULL` / non-concurrent `REINDEX`	Cause 7, Step 7

Common causes

Ordered by how often each shows up in real outages.

1. Unbounded UPDATE or DELETE in a single transaction

The AI writes one statement that touches millions of rows. Postgres takes a row lock per row and an EXCLUSIVE lock per page. Any concurrent writer to the same rows waits.

How to spot it: SELECT * FROM pg_stat_activity WHERE state = 'active' shows your query running for 30s+; pg_locks shows hundreds of blocked sessions.

2. `SELECT ... FOR UPDATE` without LIMIT

The AI thinks FOR UPDATE makes the read “safe”. Without a LIMIT or WHERE-key clause, it locks every row matching the predicate, often the whole table. Worse, plain FOR UPDATE takes the strongest row lock — it also blocks concurrent INSERTs of rows that reference these via a foreign key. If you are not modifying a key column, SELECT ... FOR NO KEY UPDATE is a lighter lock that lets those referencing inserts through.

How to spot it: Reads on the table look fine but UPDATEs (and FK-child inserts) queue up. pg_locks shows FOR UPDATE / FOR NO KEY UPDATE tuple locks held by your transaction.

3. Implicit table scan due to missing index

UPDATE orders SET ... WHERE customer_email = ? with no index on customer_email does a full table scan, locking every row in the process even if only 3 rows match.

How to spot it: EXPLAIN ANALYZE of the same UPDATE shows Seq Scan. The lock count is way larger than the affected row count.

4. Long-running transaction wrapping the query

The AI wraps everything in a BEGIN/COMMIT block: read data, process in app, write back. If the “process in app” step takes 90 seconds, locks are held for 90 seconds.

How to spot it: SELECT now() - xact_start FROM pg_stat_activity WHERE state = 'idle in transaction' returns large intervals.

5. Lock-escalating predicate on enums / low-cardinality columns

UPDATE events SET status = 'archived' WHERE status = 'pending' with only two distinct status values. Half the table matches; the database may upgrade to a table-level lock.

How to spot it: pg_locks shows a ShareUpdateExclusive or Exclusive lock at the relation level rather than tuple level.

6. CTE that the planner materializes and locks defensively

WITH old AS (DELETE FROM events WHERE ... RETURNING *) INSERT INTO archive SELECT * FROM old — Postgres holds locks across both operations. On large result sets, archive writes block too.

How to spot it: The query is structured as a writable CTE; both source and destination tables show locks.

7. `VACUUM FULL` or `REINDEX` (non-CONCURRENT) generated by AI for “cleanup”

VACUUM FULL takes an ACCESS EXCLUSIVE lock — it blocks everything including reads. AI sometimes suggests this as a “performance fix”.

How to spot it: Migration / maintenance script contains VACUUM FULL <table>. App-wide outage for the duration.

Before you start

Check whether the query is actually still running. If it is, decide quickly whether to let it finish or kill it.
Identify the table’s current write traffic. A 10-second lock on a write-once table is fine; a 1-second lock on a 1000/sec table is an incident.
Confirm you have a non-prod environment with realistic data volume to test the rewritten version.
Have your pg_cancel_backend / pg_terminate_backend privileges ready before running anything risky in prod.

Information to collect

The exact SQL the AI generated.
Output of EXPLAIN (ANALYZE, BUFFERS) for the query, ideally against prod-like data.
Output of SELECT pid, state, wait_event, query, now() - query_start AS dur FROM pg_stat_activity WHERE datname = current_database() ORDER BY dur DESC.
Output of SELECT * FROM pg_locks WHERE NOT granted during the incident.
Table size and index list: \d+ <table>.
Concurrent write rate on the table.

Step-by-step fix

Ordered: stop the bleeding, then re-architect.

Step 1: Decide whether to kill the running query

If app latency is up and the query is the cause:

SELECT pid, query, now() - query_start AS dur
  FROM pg_stat_activity
  WHERE query LIKE 'UPDATE events%' AND state = 'active';

SELECT pg_cancel_backend(<pid>);
-- If cancel does not work within 10s:
SELECT pg_terminate_backend(<pid>);

pg_cancel_backend is graceful (sends a cancel request, like Ctrl-C); pg_terminate_backend is forceful (closes the connection and rolls back). Use cancel first. Note: cancelling a huge UPDATE/DELETE triggers a rollback that itself takes time and holds locks while it unwinds, so latency may not recover the instant you fire the cancel. If pg_terminate_backend also stalls, the backend is likely stuck in a kernel call (disk I/O) and you may have to wait it out rather than escalate to kill -9, which can crash the whole instance.

Step 2: Rewrite bulk updates to chunked batches

Replace:

UPDATE events SET status = 'archived'
  WHERE created_at < now() - interval '90 days';

With:

DO $$
DECLARE
  rows_affected int := 1;
BEGIN
  WHILE rows_affected > 0 LOOP
    WITH batch AS (
      SELECT id FROM events
        WHERE created_at < now() - interval '90 days'
          AND status <> 'archived'
        ORDER BY id
        LIMIT 5000
        FOR UPDATE SKIP LOCKED
    )
    UPDATE events SET status = 'archived'
      WHERE id IN (SELECT id FROM batch);
    GET DIAGNOSTICS rows_affected = ROW_COUNT;
    PERFORM pg_sleep(0.1);
  END LOOP;
END $$;

FOR UPDATE SKIP LOCKED lets concurrent writers proceed by skipping rows you have temporarily locked. The pg_sleep gives the planner / vacuum / replication slack.

Step 3: Add the missing index before the next big update

If EXPLAIN shows a sequential scan:

CREATE INDEX CONCURRENTLY idx_events_created_at_status
  ON events(created_at) WHERE status <> 'archived';

Partial indexes are excellent for cleanup-style queries: small index, fast lookups, and they cover only the rows still needing work. Two caveats with CONCURRENTLY: it cannot run inside a BEGIN/COMMIT block (run it on its own), and if the build fails — for example it gets cancelled, or hits a uniqueness violation in the second scan — it leaves an INVALID index behind. Check \d <table> or SELECT indexrelid::regclass FROM pg_index WHERE NOT indisvalid; and clean up with DROP INDEX CONCURRENTLY idx_events_created_at_status; before retrying. Long-running transactions will also stall the build, since CONCURRENTLY waits for them to finish.

Step 4: Bound `FOR UPDATE` queries

Replace:

SELECT * FROM orders WHERE status = 'pending' FOR UPDATE;

With:

SELECT * FROM orders WHERE status = 'pending'
  ORDER BY id LIMIT 100 FOR NO KEY UPDATE SKIP LOCKED;

You process 100 at a time, skip rows other workers grabbed, and never accidentally lock the whole pending set. Use FOR NO KEY UPDATE rather than plain FOR UPDATE unless you actually change a primary/foreign key on these rows — it is a weaker lock that does not block inserts of FK-referencing children. This is the canonical queue/worker pattern: SKIP LOCKED lets N workers pull disjoint batches with zero contention.

Step 5: Move long processing out of the transaction

Replace:

BEGIN;
SELECT * FROM orders WHERE id = ? FOR UPDATE;
-- 90 seconds of app-side work
UPDATE orders SET ... WHERE id = ?;
COMMIT;

With:

BEGIN;
UPDATE orders SET processing_started_at = now() WHERE id = ? AND processing_started_at IS NULL;
COMMIT;
-- 90 seconds of app-side work, no transaction held
BEGIN;
UPDATE orders SET ... WHERE id = ?;
COMMIT;

Optimistic claiming via processing_started_at replaces the long FOR UPDATE.

Step 6: Set timeouts on the session

Before any AI-generated bulk operation in prod:

SET statement_timeout = '30s';
SET lock_timeout = '5s';
SET idle_in_transaction_session_timeout = '60s';

If the query exceeds the budget it aborts cleanly — far better than a 10-minute lock. lock_timeout is the one that saves you here: it caps how long a statement waits to acquire a lock, so a stuck migration fails fast instead of joining the pile-up. The SET above only covers the current session; to make it impossible for any ad-hoc query (including pasted AI suggestions) to hold locks forever, set them at the role level so they apply to every connection:

ALTER ROLE app_writer SET statement_timeout = '30s';
ALTER ROLE app_writer SET lock_timeout = '5s';
ALTER ROLE app_writer SET idle_in_transaction_session_timeout = '60s';

Step 7: Replace `VACUUM FULL` with `pg_repack` or table rewrites

If the AI suggested VACUUM FULL:

# Online table reorganization; only a brief ACCESS EXCLUSIVE lock at the final swap
pg_repack -t events -d mydb

pg_repack rebuilds the table into a shadow copy while it stays readable and writable, and only takes a short ACCESS EXCLUSIVE lock for the final swap (seconds, regardless of table size). pg_squeeze is a newer alternative that uses logical replication instead of triggers, so it puts less load on the primary. Both are extensions you install. As of June 2026 a native, non-blocking REPACK CONCURRENTLY command was previewed for a future PostgreSQL release (targeted at PostgreSQL 19) but is not yet shipped, so an extension is still required today. For most maintenance, a regular autovacuum tune is enough — VACUUM FULL is almost never the right answer on a hot prod table because it takes a full ACCESS EXCLUSIVE lock that blocks reads too.

Verify

EXPLAIN ANALYZE on the rewritten query shows index usage and bounded work.
A test run on a prod clone with concurrent write load completes without write-latency spikes.
During the real run, pg_stat_activity never shows your query running longer than the chunk budget.
No idle in transaction sessions from your job persist beyond a few seconds.
App latency dashboards stay flat during the operation.

Long-term prevention

Add to CLAUDE.md / .cursorrules: “Never generate bulk UPDATE / DELETE without LIMIT + loop. Always assume the target table has concurrent writes.”
Use a job framework (Sidekiq, BullMQ, pgboss) to manage chunked bulk operations rather than ad-hoc SQL scripts.
Set statement_timeout, lock_timeout, and idle_in_transaction_session_timeout at the role level so even a careless AI query cannot hold locks forever.
Add a wrapper script for “any DML hitting more than N rows must use the batch helper” — enforce it in code review.
Keep an internal cookbook of safe SQL patterns the AI can reference: chunked update, SKIP LOCKED queue, partial index for cleanup.
For Postgres, enable log_lock_waits = on and a low deadlock_timeout, then alert on slow log entries.

Common pitfalls

Running the AI’s “quick cleanup query” during peak traffic because it looks small.
Assuming WHERE with no index “is fast on a small table” — small in dev, huge in prod.
Killing a query mid-run on a replica setup without realizing the lock had already replicated.
Setting statement_timeout only on the migration script but not on ad-hoc psql sessions where AI suggestions get pasted.
Trusting that BEGIN; ... COMMIT; is “safer” — it actually extends lock duration.
Letting AI generate a CTE with writable subqueries on a hot table without reviewing the lock implications.

For related issues see AI migration fails on prod schema, AI removed working logic, and AI tests pass but feature is broken. The authoritative references for the lock modes here are the PostgreSQL docs on explicit locking and SELECT FOR UPDATE / SKIP LOCKED.

FAQ

Q: The AI’s query looks like every example on Stack Overflow. Why is it bad here?

Stack Overflow examples are usually toy databases. Production tables have 100x the rows, concurrent writers, and replication. The same SQL is safe at one scale and catastrophic at another.

Q: Can I just add a LIMIT 1000 and re-run repeatedly?

That is exactly the chunked-batch approach above, plus a loop and SKIP LOCKED. Manually re-running is fine for one-offs; scripted loops are required for cleanup jobs.

Q: Why is FOR UPDATE SKIP LOCKED important?

Without SKIP LOCKED, a chunked worker blocks behind whatever locked the same rows last. SKIP LOCKED makes workers walk around contended rows and keep moving — essential for parallel jobs.

Q: My ORM generated the bulk query, not the AI. Same fix?

Yes. ActiveRecord update_all, Sequelize bulkUpdate, Prisma updateMany all default to a single-statement bulk write. Configure batching at the ORM level or rewrite the call.

Q: I cancelled the query but latency stayed high. Why?

Cancelling a large UPDATE/DELETE rolls it back, and the rollback also touches every changed row and holds locks while it unwinds, so recovery is not instant. Watch the backend disappear from pg_stat_activity before assuming it is done. This is a strong argument for batching: a 5000-row batch rolls back in milliseconds.

Q: FOR UPDATE vs FOR NO KEY UPDATE — which should I use?

Use FOR NO KEY UPDATE unless you are changing a primary key or a column referenced by a foreign key on those rows. It is a weaker lock that still prevents two workers from grabbing the same row, but does not block inserts of child rows that reference these via FK — fewer false conflicts on a busy table.

Q: How do I find what is blocking what, not just that something is blocked?

Query pg_locks joined to pg_stat_activity, or use the blocking-pid helper: SELECT pid, pg_blocking_pids(pid) AS blocked_by, query FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;. The blocked_by array tells you exactly which backend to cancel first.

Tags: #Troubleshooting #AI coding #sql #Database #locks

Which bucket are you in?

Common causes

1. Unbounded UPDATE or DELETE in a single transaction

2. SELECT ... FOR UPDATE without LIMIT

3. Implicit table scan due to missing index

4. Long-running transaction wrapping the query

5. Lock-escalating predicate on enums / low-cardinality columns

6. CTE that the planner materializes and locks defensively

7. VACUUM FULL or REINDEX (non-CONCURRENT) generated by AI for “cleanup”

Before you start

Information to collect

Step-by-step fix

Step 1: Decide whether to kill the running query

Step 2: Rewrite bulk updates to chunked batches

Step 3: Add the missing index before the next big update

Step 4: Bound FOR UPDATE queries

Step 5: Move long processing out of the transaction

Step 6: Set timeouts on the session

Step 7: Replace VACUUM FULL with pg_repack or table rewrites

Verify

Long-term prevention

Common pitfalls

FAQ

Related Articles

AI Added a Route That Bypasses Auth Middleware

AI Invented a Wrong API Signature That Does Not Exist

AI Migration Works on Dev, Fails on Prod Schema: The Fix

AI Keeps Using Deprecated Syntax Despite Lint Errors

AI Runs npm in a pnpm or Yarn Project (Lockfile Fix)

AI Suggested a Stale Dependency: Fix It Fast

2. `SELECT ... FOR UPDATE` without LIMIT

7. `VACUUM FULL` or `REINDEX` (non-CONCURRENT) generated by AI for “cleanup”

Step 4: Bound `FOR UPDATE` queries

Step 7: Replace `VACUUM FULL` with `pg_repack` or table rewrites