Your app handles 200 RPS comfortably, then traffic doubles and you start seeing FATAL: remaining connection slots are reserved for non-replication superuser connections or sorry, too many clients already. Postgres has hit max_connections. New requests fail or hang waiting for a free slot. Existing requests stall when their connection turns out to be dead. Fix it by sizing the pool correctly, putting PgBouncer in transaction mode in front of Postgres, killing idle-in-transaction connections, and routing read traffic to replicas.
Common causes
Ordered by hit rate.
1. Each app instance opens its own large pool
10 app instances, each with pool size 20 = 200 connections. Postgres default max_connections is 100. You overflow as soon as all instances ramp up.
How to spot it: SELECT count(*) FROM pg_stat_activity versus SHOW max_connections. Count near max = overflow incoming.
2. Connections leak from missing transaction commits
Code path opens a transaction, throws an error, never commits or rolls back. Connection stays in idle in transaction forever, blocking the pool.
How to spot it: SELECT state, count(*) FROM pg_stat_activity GROUP BY state. High count of idle in transaction = leak.
3. Long-running queries hold connections
A reporting query that takes 5 minutes holds a connection. 20 such queries fill the pool.
How to spot it: SELECT pid, query_start, query FROM pg_stat_activity WHERE state = 'active' ORDER BY query_start LIMIT 10. Queries running over 30s suggest stragglers.
4. No PgBouncer in front of Postgres
App pools connect directly to Postgres. Every app instance owns its own real Postgres connections. No multiplexing.
How to spot it: Connection origin is the app server, not a PgBouncer host.
5. Serverless functions create new connections per invocation
Vercel / Lambda functions without connection pooling create a fresh connection per cold start. Burst traffic = burst of new connections.
How to spot it: pg_stat_activity shows many short-lived connections from random source ports.
6. Read traffic hitting primary
All read queries go to the primary. Could be served by replicas, freeing primary connections for writes.
How to spot it: Read-heavy queries are 80 percent of pg_stat_activity. Replica usage near zero.
Before you start
- Confirm the symptom:
too many clientsorremaining connection slots reserved. - Note current
max_connectionsand total connections in use. - Identify the most recent app deploys that may have changed pool size or query patterns.
- Check if Postgres itself is healthy: CPU, memory, replication lag.
- Plan for safe restart: connection draining strategy, brief downtime window.
Information to collect
SHOW max_connections,SHOW shared_buffers, server flavor (RDS, Cloud SQL, Supabase, self-hosted).SELECT state, count(*) FROM pg_stat_activity GROUP BY state.- App-side pool config: pool size, idle timeout, max lifetime.
- PgBouncer config if present.
- Slow query log over last hour.
Step-by-step fix
Step 1: Size pool correctly
Rule: app_instances * pool_size_per_instance <= 0.7 * max_connections
// pg-pool config
import { Pool } from 'pg';
const pool = new Pool({
max: 10, // per app instance
idleTimeoutMillis: 30000, // drop idle in 30s
connectionTimeoutMillis: 2000, // fail fast on saturation
});
If 10 app instances and max_connections = 100, pool size 7 leaves headroom.
Step 2: Put PgBouncer in front of Postgres
# pgbouncer.ini
[databases]
mydb = host=postgres-primary port=5432 dbname=app
[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
server_idle_timeout = 600
App connects to PgBouncer at port 6432. PgBouncer holds 25 real Postgres connections and multiplexes 1000 app connections over them.
transaction pool mode is the most common — connection released after each transaction. Avoid session mode unless you need session-state features.
Step 3: Kill idle-in-transaction connections
-- See offenders
SELECT pid, now() - state_change AS idle_for, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND state_change < now() - interval '5 minutes'
ORDER BY state_change;
-- Kill them
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND state_change < now() - interval '5 minutes';
Set a server-side guard:
ALTER SYSTEM SET idle_in_transaction_session_timeout = '60s';
SELECT pg_reload_conf();
Step 4: Add statement timeouts
-- Per database
ALTER DATABASE app SET statement_timeout = '30s';
-- Per role for reporting
ALTER ROLE reporting SET statement_timeout = '5min';
Per-session in app code:
await pool.query("SET statement_timeout = '10s'");
Step 5: Route reads to replicas
import { Pool } from 'pg';
const primary = new Pool({ host: 'postgres-primary' });
const replica = new Pool({ host: 'postgres-replica' });
function getPool(query: string) {
if (/^\s*(SELECT|EXPLAIN)\b/i.test(query) && !/FOR UPDATE/i.test(query)) {
return replica;
}
return primary;
}
Tag queries explicitly when ambiguous. Reads on replicas free primary for writes.
Step 6: Use serverless-friendly drivers
For Vercel / Lambda:
// Use a driver that supports HTTP / connection pooling at the edge
import { neon } from '@neondatabase/serverless';
const sql = neon(process.env.DATABASE_URL);
// or Supabase
import { createClient } from '@supabase/supabase-js';
const supabase = createClient(url, key);
These avoid native Postgres connections in serverless contexts.
Step 7: Monitor and alert
-- Metric query, run every 30s
SELECT
count(*) FILTER (WHERE state = 'active') AS active,
count(*) FILTER (WHERE state = 'idle') AS idle,
count(*) FILTER (WHERE state = 'idle in transaction') AS idle_tx,
count(*) FILTER (WHERE wait_event_type = 'Lock') AS waiting
FROM pg_stat_activity;
Alert when total reaches 80 percent of max_connections or idle_in_transaction exceeds 5.
Verify
- Run a load test at 2x typical peak;
pg_stat_activitycount should stay under 70 percent ofmax_connections. - Confirm PgBouncer logs show high multiplexing ratio (client connections vs server connections).
- After 10 minutes of normal traffic, idle-in-transaction count should be near zero.
- Replica receives a meaningful share of read traffic (verify via
pg_stat_statements).
Long-term prevention
- Standardize pool config across all app instances and document the math.
- Make PgBouncer (or equivalent like Supavisor / RDS Proxy) the default for production.
- Set
idle_in_transaction_session_timeoutandstatement_timeoutdefaults at the database level. - Have a read replica from day one; route reporting and analytics there.
- Monthly review of pool exhaustion alerts; tighten what trips most.
Common pitfalls
- Raising
max_connectionsto 500 to “fix” the problem — memory cost is significant (about 10 MB per connection). - Using PgBouncer in
sessionmode whentransactionwould work — gives up most of the pooling benefit. - Setting
statement_timeouttoo aggressively and breaking legitimate long-running migrations. - Killing connections without identifying the leak source — it will recur.
FAQ
What pool size should each app instance have? Start at 5 to 10. Most apps need fewer than they think. Raise only with evidence of contention.
Does PgBouncer work with prepared statements? Yes, in transaction mode with PgBouncer 1.21+. Older versions had limitations.
Should I use connection pooling in serverless? Use a serverless-friendly driver (Neon, Supabase, RDS Proxy) instead of native Postgres pooling.