Postgres Autovacuum Stalled by a Long-Running Transaction

Table bloat grows, queries get slower, and pg_stat_progress_vacuum shows nothing running. A single forgotten transaction is holding back the entire vacuum horizon.

Your Postgres dashboard shows a steadily climbing dead-tuple count. The largest table has tripled in disk size over two weeks but row count is flat. You ran VACUUM ANALYZE manually and it returned in seconds without complaining. Autovacuum logs say it ran. Yet pg_stat_user_tables.n_dead_tup keeps rising and p99 query latency on that table has crept from 40 ms to 900 ms.

The culprit is almost never autovacuum itself. It is one stuck transaction somewhere in the cluster pinning the global xmin horizon so vacuum cannot actually reclaim any rows, even when it appears to run successfully.

Common causes

Ordered by hit rate, highest first.

1. An idle-in-transaction session held open for hours

Some worker opened a transaction with BEGIN, did one query, and then went to sleep waiting on an external API. The connection is alive, the transaction never committed.

How to spot it: Run SELECT pid, state, xact_start, now() - xact_start AS age, query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY xact_start; Any row with age over a few minutes is a smoking gun.

2. A replication slot with no consumer

A logical replication slot was created for a CDC pipeline that died, was deleted, or fell behind. The slot keeps advancing catalog_xmin backwards so vacuum cannot remove rows newer than the slot.

How to spot it: SELECT slot_name, active, restart_lsn, confirmed_flush_lsn, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag FROM pg_replication_slots; A slot with active = false and growing lag is the bug.

3. A long analytics query running for hours

A BI tool or a pg_dump started a REPEATABLE READ snapshot for an export. Until it finishes, no row visible to that snapshot can be cleaned up.

How to spot it: Look for long-running SELECT in pg_stat_activity with state = 'active' and a low backend_xmin.

4. Prepared transaction left behind by 2PC

A two-phase-commit transaction got into the prepared state and nobody ever called COMMIT PREPARED or ROLLBACK PREPARED. It survives reconnects and restarts.

How to spot it: SELECT * FROM pg_prepared_xacts; Any row here is almost certainly the bug. They are extremely sticky.

5. A standby with hot_standby_feedback = on and a slow query

A replica is running a long query and feeding its xmin back to the primary. The primary’s vacuum horizon is held back by what is happening on the replica.

How to spot it: On the primary, check SELECT application_name, backend_xmin FROM pg_stat_replication; and compare with txid_current(). If backend_xmin is far behind, your replica is the brake.

6. Autovacuum is running but getting cancelled

On heavily-locked tables, autovacuum keeps starting and then yielding when a session asks for a conflicting lock. It looks like it ran in logs but actually completed zero work.

How to spot it: Check pg_stat_user_tables.autovacuum_count versus last_autovacuum. If count is climbing but n_dead_tup never drops, vacuums are being cancelled. The Postgres log will show canceling autovacuum task messages.

7. autovacuum_freeze_max_age reached and an anti-wraparound vacuum is stuck

Once the cluster gets close to transaction-ID wraparound, autovacuum launches aggressive freeze workers that cannot be cancelled. If one of these is blocked, everything stalls.

How to spot it: SELECT datname, age(datfrozenxid) FROM pg_database; Values above 200 million mean you are in the danger zone.

Shortest path to fix

Step 1: Find the oldest xmin holder

This is the single most useful query. Run it first.

SELECT 'idle_in_tx' AS source, pid::text, xact_start::text, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
UNION ALL
SELECT 'long_query', pid::text, xact_start::text, query
FROM pg_stat_activity
WHERE state = 'active' AND xact_start < now() - interval '5 min'
UNION ALL
SELECT 'prepared_xact', gid, prepared::text, '2PC'
FROM pg_prepared_xacts
UNION ALL
SELECT 'repl_slot', slot_name, xmin::text, 'replication'
FROM pg_replication_slots
WHERE xmin IS NOT NULL
ORDER BY 3;

The earliest entry is what is holding back vacuum.

Step 2: Kill or unblock the offender

For an idle-in-transaction session, ask owners first, then terminate.

SELECT pg_terminate_backend(12345);

For an orphan replication slot:

SELECT pg_drop_replication_slot('dead_cdc_slot');

For a prepared transaction:

ROLLBACK PREPARED 'transaction-gid-here';

Step 3: Verify the xmin horizon moved

SELECT now(), txid_current(), txid_snapshot_xmin(txid_current_snapshot());

txid_snapshot_xmin should now be close to txid_current. If still far behind, you have a second holder; repeat step 1.

Step 4: Manually vacuum the bloated table aggressively

Once the horizon is unblocked, force a vacuum that actually reclaims pages.

VACUUM (VERBOSE, ANALYZE) public.orders;
-- if you can afford a brief exclusive lock and you want disk back:
VACUUM (FULL, VERBOSE) public.orders;

VACUUM FULL rewrites the table and takes an ACCESS EXCLUSIVE lock — do this in a maintenance window or use pg_repack for online rebuilds.

Step 5: Lower the idle-in-transaction timeout cluster-wide

ALTER SYSTEM SET idle_in_transaction_session_timeout = '5min';
ALTER SYSTEM SET statement_timeout = '30min';  -- per session if needed
SELECT pg_reload_conf();

This terminates forgotten transactions automatically rather than waiting for a human.

Step 6: Add an autovacuum monitor query

-- alert when oldest xact age > 30 min
SELECT max(extract(epoch FROM now() - xact_start)) AS oldest_tx_seconds
FROM pg_stat_activity
WHERE xact_start IS NOT NULL;

Wire this into Prometheus or whatever you use; page on > 1800.

Step 7: Tune autovacuum on the hot table

Default autovacuum thresholds are conservative for write-heavy tables. For a 50M-row table that gets 1M updates a day, lower the scale factor.

ALTER TABLE public.orders SET (
  autovacuum_vacuum_scale_factor = 0.02,
  autovacuum_vacuum_cost_limit = 2000,
  autovacuum_naptime = 10
);

When this is not on you

A managed provider (RDS, Cloud SQL, Aurora) may run its own internal long transactions for backups or major-version upgrades. These hold xmin briefly and you cannot kill them. If pg_stat_activity shows a session owned by rdsadmin or cloudsqladmin as the only old transaction, just wait it out.

Easy to misdiagnose as

A slow disk. The symptom is identical: queries get slower, IOPS climb, dashboards look saturated. People reach for bigger instances. The actual problem is that the index is full of dead tuples Postgres cannot remove, so every scan touches 5x more pages than it should. Bigger disks make this last longer but not better.

Another common misdiagnosis: blaming the query planner. The plans got worse because statistics are stale, but the statistics are stale because analyze cannot do its job when the table is full of dead rows.

Prevention

  • Set idle_in_transaction_session_timeout cluster-wide. There is almost never a legitimate reason to hold a transaction idle for hours.
  • Alert on oldest_tx_seconds, n_dead_tup ratio, and pg_replication_slots lag.
  • Never create a replication slot without a matching cleanup story; treat orphan slots like memory leaks.
  • For tables over 10M rows with significant update load, override autovacuum_vacuum_scale_factor to 0.01-0.02 instead of relying on the global default of 0.2.
  • Run a weekly SELECT * FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 20; as a health check.

FAQ

  • Will killing a backend cause data loss? No. Postgres rolls back the transaction cleanly. The application sees a connection error.
  • Should I use VACUUM FULL regularly? No. It is an exclusive-lock rewrite. Use pg_repack for routine bloat cleanup; reserve VACUUM FULL for emergencies.

Tags: #Backend #Troubleshooting #infra #postgres #Database #autovacuum #bloat