RabbitMQ Consumers Connected But Not Processing — Queue Depth Growing

RabbitMQ shows healthy consumer connections but the queue keeps growing. Fix prefetch, unacked messages, and dead-letter routing for stuck consumers.

The RabbitMQ console shows healthy consumers connected to the queue. Messages are being delivered. But the queue depth keeps climbing and downstream work is not happening. The most common reason is that the consumer is holding messages in the “unacknowledged” bucket — it pulled prefetch_count messages and never ack’d them, so RabbitMQ refuses to push more. The other common reason is a consumer that crashes after pulling but before ack’ing, with no dead-letter routing, so messages re-deliver to the same broken consumer forever. Fix with explicit basic_qos, a dead-letter exchange, and an alert on “unacked” rather than just “ready” depth.

Common causes

Ordered by hit rate.

1. Prefetch exhausted by unacked messages

The consumer asked for prefetch_count = 100 but processes slowly. 100 messages sit unacked while the consumer chews through one of them. The broker stops delivering.

How to spot it: In the RabbitMQ UI, the queue’s Unacked column is at or near the consumer’s prefetch total; Ready keeps growing.

2. Consumer crashes between pull and ack

Worker pulled a message, panicked or got OOM-killed, never sent basic_nack. RabbitMQ holds it as unacked until the channel closes, then redelivers it to a new worker that hits the same crash. Poison pill loop.

How to spot it: redelivered=True on every message. CPU spikes per delivery. No progress.

3. Manual acks turned off — auto-ack hides failures

auto_ack=True (a.k.a. “no-ack mode”) ack’s on delivery. If processing fails, the message is gone but the work was not done. Queue depth looks fine; the data is lost silently.

How to spot it: No unacked count growing, no errors in broker, but consumers complain about silent data loss.

4. Single-threaded consumer with high prefetch

A Python consumer using one thread and prefetch=200 will only ever process one at a time. The other 199 are holding the slot.

How to spot it: CPU per worker is near zero, processing rate matches one slow job at a time.

5. Channel closed by broker due to flow control

Memory or disk watermark exceeded; the broker pauses publishers and consumers. Consumers look connected but messages do not flow.

How to spot it: rabbitmqctl list_connections name state shows flow, or the management UI shows the node memory/disk alarm.

Shortest path to fix

Step 1: Set prefetch explicitly

Tie prefetch to actual concurrency. A common rule: prefetch = concurrency * 2.

Python (pika):

import pika

connection = pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
channel = connection.channel()
channel.basic_qos(prefetch_count=10)   # per channel

def on_message(ch, method, properties, body):
    try:
        process(body)
        ch.basic_ack(delivery_tag=method.delivery_tag)
    except Exception:
        ch.basic_nack(delivery_tag=method.delivery_tag, requeue=False)

channel.basic_consume(queue='jobs', on_message_callback=on_message, auto_ack=False)
channel.start_consuming()

Node (amqplib):

await channel.prefetch(10);
await channel.consume('jobs', async (msg) => {
  try {
    await process(msg.content);
    channel.ack(msg);
  } catch (e) {
    channel.nack(msg, false, false);   // dead-letter, no requeue
  }
}, { noAck: false });

Step 2: Set up a dead-letter exchange

Poison-pill messages must go somewhere. Configure the work queue to dead-letter on nack/reject.

# Declare DLX and DLQ
rabbitmqadmin declare exchange name=jobs.dlx type=fanout
rabbitmqadmin declare queue name=jobs.dlq
rabbitmqadmin declare binding source=jobs.dlx destination=jobs.dlq

# Declare main queue with DLX
rabbitmqadmin declare queue name=jobs arguments='{"x-dead-letter-exchange":"jobs.dlx","x-delivery-limit":5}'

x-delivery-limit (RabbitMQ 3.10+ for quorum queues) caps redelivery attempts. After 5 failures the message moves to the DLQ. No more poison-pill loops.

For classic queues, enforce in app code: read x-death header count and basic_nack(requeue=False) past N attempts.

Step 3: Alert on unacked count, not just ready depth

rabbitmqctl list_queues name messages_ready messages_unacknowledged consumers

Prometheus rule via the RabbitMQ exporter:

- alert: RabbitMQUnackedHigh
  expr: rabbitmq_queue_messages_unacked{queue="jobs"} > 50 and rate(rabbitmq_queue_messages_published_total[5m]) > 0
  for: 10m
  labels:
    severity: warning
- alert: RabbitMQQueueDepthGrowing
  expr: rate(rabbitmq_queue_messages_ready[10m]) > 0 and rabbitmq_queue_consumers > 0
  for: 15m

Step 4: Check the broker is not in flow control

rabbitmqctl list_connections name state user
rabbitmqctl status | grep -E 'mem|disk'

If state is flow, the broker is throttling. Raise the memory/disk watermark or shed load.

# Inspect alarms
rabbitmqctl list_node_alarms

Step 5: Use quorum queues for durability

Classic mirrored queues are deprecated; quorum queues handle redelivery and persistence more predictably.

rabbitmqadmin declare queue name=jobs durable=true arguments='{"x-queue-type":"quorum","x-dead-letter-exchange":"jobs.dlx","x-delivery-limit":5}'

Available since RabbitMQ 3.8; 3.13 is the current stable as of this writing.

Prevention

  • Always use auto_ack=false; ack only after success, nack to DLQ on failure.
  • prefetch_count = worker concurrency times 2; never leave the default.
  • Every work queue has a dead-letter exchange and a delivery limit.
  • Alert on messages_unacknowledged and on poison-pill redelivery patterns.
  • Migrate to quorum queues for new work queues.

Tags: #Backend #Troubleshooting #rabbitmq