RabbitMQ Consumers Connected But Not Processing — Queue Depth Growing

Q: Why does the same message keep coming back about every 30 minutes?

That is the `consumer_timeout` (default `1800000` ms). The job runs longer than the timeout, the broker reclaims the unacked delivery, and it is redelivered. Raise `consumer_timeout` (globally in `rabbitmq.conf`, or per quorum queue via the `x-consumer-timeout` argument / `consumer-timeout` policy as of 4.3), or ack earlier and track completion yourself. On RabbitMQ 4.3+ this only happens on quorum and JMS queues; classic queues and streams no longer enforce the timeout.

Q: Do I still need to write poison-message code on RabbitMQ 4.x?

On quorum queues, no — they apply a default `delivery-limit` of `20` (configurable via `x-delivery-limit`) and route over-limit messages to a DLX if one is configured. On classic queues you still enforce it in application code using the `x-death` header count.

RabbitMQ shows healthy consumers but the queue keeps growing. Fix prefetch, unacked messages, consumer_timeout, and dead-letter routing on RabbitMQ 4.x.

Published: May 24, 2026 Updated: Jun 18, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

The RabbitMQ console shows healthy consumers connected to the queue. Messages are being delivered. But the queue depth keeps climbing and downstream work is not happening.

Fastest fix: look at the queue’s Unacked column in the management UI. If it sits at or near your prefetch_count and never drains, your consumer pulled a batch and never ack’d it, so the broker stopped pushing. Set basic_qos(prefetch_count=N) to match real concurrency, ack only after success, basic_nack(requeue=False) failures into a dead-letter queue, and confirm the queue’s delivery-limit is set so poison messages get evicted instead of looping. On RabbitMQ 4.x quorum queues that limit defaults to 20; on classic queues you must add it yourself.

The second most common cause on long-running jobs is the broker’s own consumer_timeout (default 30 minutes, i.e. 1800000 ms) reclaiming the delivery before your worker finishes, which redelivers the message to the next worker and repeats forever. As of RabbitMQ 4.3 (4.3.0 shipped 23 Apr 2026) this timeout is enforced only on quorum queues and the new JMS queues; classic queues and streams no longer evaluate it at all.

Which bucket are you in

Read the queue row in the management UI (http://<host>:15672, Queues tab) or via rabbitmqctl list_queues. Match the pattern:

Symptom	Likely cause	Jump to
`Unacked` pinned near prefetch, `Ready` climbing	Prefetch exhausted by slow/never-ack’d messages	Cause 1
Every message `redelivered=true`, CPU spikes per delivery	Crash or `PRECONDITION_FAILED` between pull and ack	Cause 2, Cause 6
`Unacked` near zero, errors none, but data missing downstream	`auto_ack` mode silently dropping failures	Cause 3
Worker CPU near zero, one job at a time	Single-threaded consumer with high prefetch	Cause 4
Connection `state` shows `flow`, node alarm shown	Memory/disk watermark — broker throttling	Cause 5
Quorum queue, same message returns ~30 min apart; log says `... has timed out waiting for a consumer acknowledgement ...`	`consumer_timeout` hit on a long job	Cause 6

Common causes

Ordered by hit rate.

Cause 1. Prefetch exhausted by unacked messages

The consumer asked for prefetch_count = 100 but processes slowly. 100 messages sit unacked while the consumer chews through one of them. The broker stops delivering to that channel.

How to spot it: in the management UI, the queue’s Unacked column is at or near the consumer’s prefetch total; Ready keeps growing.

Cause 2. Consumer crashes between pull and ack

Worker pulled a message, panicked or got OOM-killed, never sent basic_nack. RabbitMQ holds it as unacked until the channel closes, then redelivers it to a new worker that hits the same crash. Poison-pill loop.

How to spot it: redelivered=true on every message, and (on quorum queues) a rising x-delivery-count header. CPU spikes per delivery. No progress.

Cause 3. Manual acks turned off — auto-ack hides failures

auto_ack=True (a.k.a. “no-ack mode”) ack’s on delivery. If processing fails, the message is gone but the work was not done. Queue depth looks fine; the data is lost silently.

How to spot it: no unacked count growing, no errors in the broker, but consumers report silent data loss.

Cause 4. Single-threaded consumer with high prefetch

A Python consumer using one thread and prefetch=200 will only ever process one at a time. The other 199 are holding the slot and counting as unacked.

How to spot it: CPU per worker is near zero, processing rate matches one slow job at a time.

Cause 5. Channel paused by broker due to flow control

Memory or disk watermark exceeded; the broker raises a resource alarm and blocks publishers (and effectively stalls delivery). Consumers look connected but messages do not flow. The default memory high watermark is vm_memory_high_watermark.relative = 0.6 (60% of detected RAM) and the default disk_free_limit.absolute is 50MB.

How to spot it: rabbitmqctl list_connections name state shows flow, or the management UI shows the node memory/disk alarm banner.

Cause 6. consumer_timeout reclaims the delivery mid-job

This is the one most people miss. RabbitMQ enforces an acknowledgement timeout on consumers; if a delivery is not ack’d within consumer_timeout (default 1800000 ms = 30 minutes), the broker reclaims it. The timeout is checked roughly once a minute, so in practice it fires about a minute past the configured value, and values below 5 minutes are not recommended.

The behavior changed across versions, so check yours:

Before 4.3: any queue type can hit it; the broker closes the whole channel with a PRECONDITION_FAILED exception, and every delivery on that channel (from all consumers) is requeued.
4.3+: the timeout is enforced only on quorum queues and the new JMS queues — classic queues and streams no longer evaluate it. On a quorum queue the broker now uses a targeted, graceful path: for AMQP 0.9.1 clients that advertise consumer_cancel_notify it sends a basic.cancel to just the timed-out consumer and leaves the channel and other consumers intact (falling back to closing the channel only for clients without that capability); for AMQP 1.0 it releases the message with DISPOSITION(state=released) instead of detaching the link.

Either way the in-flight message is redelivered, and a job that legitimately takes longer than the timeout will loop forever.

How to spot it: broker or client logs show a line like Consumer ... on channel N and queue 'jobs' ... has timed out waiting for a consumer acknowledgement of a delivery with delivery tag = ... (and PRECONDITION_FAILED on pre-4.3 or older clients). The same message keeps coming back roughly 30 minutes apart. If you are on 4.3+ and see this, the queue is a quorum queue.

Shortest path to fix

Step 1: Set prefetch explicitly

Tie prefetch to actual concurrency. A common rule: prefetch = concurrency * 2.

Python (pika):

import pika

connection = pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
channel = connection.channel()
channel.basic_qos(prefetch_count=10)   # per-consumer by default (global=False)

def on_message(ch, method, properties, body):
    try:
        process(body)
        ch.basic_ack(delivery_tag=method.delivery_tag)
    except Exception:
        ch.basic_nack(delivery_tag=method.delivery_tag, requeue=False)  # send to DLX

channel.basic_consume(queue='jobs', on_message_callback=on_message, auto_ack=False)
channel.start_consuming()

Node (amqplib):

await channel.prefetch(10);
await channel.consume('jobs', async (msg) => {
  try {
    await process(msg.content);
    channel.ack(msg);
  } catch (e) {
    channel.nack(msg, false, false);   // dead-letter, no requeue
  }
}, { noAck: false });

Step 2: Set up a dead-letter exchange

Poison-pill messages must go somewhere. Configure the work queue to dead-letter on nack/reject, and set a delivery limit so a message that keeps failing is evicted instead of looping.

# Declare DLX and DLQ
rabbitmqadmin declare exchange name=jobs.dlx type=fanout
rabbitmqadmin declare queue name=jobs.dlq
rabbitmqadmin declare binding source=jobs.dlx destination=jobs.dlq

# Declare main queue (quorum) with DLX + delivery limit
rabbitmqadmin declare queue name=jobs arguments='{"x-queue-type":"quorum","x-dead-letter-exchange":"jobs.dlx","x-delivery-limit":5}'

On RabbitMQ 4.x quorum queues already apply a default delivery-limit of 20 (set in 4.0). Set x-delivery-limit explicitly if you want a different cap. When a message exceeds the limit it is dropped, or dead-lettered if a DLX is configured, so confirm you have a DLX before lowering the limit. As of RabbitMQ 4.3 the limit counts delivery-count (failed delivery attempts) rather than every requeue, so an unlimited number of explicit basic_nack(requeue=True) returns no longer trip the limit on their own.

For classic queues there is no built-in delivery limit, so enforce it in app code: read the x-death header count (set by the DLX cycle) and basic_nack(requeue=False) once it passes N attempts.

Step 3: Alert on unacked count, not just ready depth

rabbitmqctl list_queues name messages_ready messages_unacknowledged consumers

Use the built-in rabbitmq_prometheus plugin (enable with rabbitmq-plugins enable rabbitmq_prometheus; it serves /metrics on port 15692). Sample rules:

- alert: RabbitMQUnackedHigh
  expr: rabbitmq_queue_messages_unacked{queue="jobs"} > 50 and rate(rabbitmq_global_messages_received_total[5m]) > 0
  for: 10m
  labels:
    severity: warning
- alert: RabbitMQQueueDepthGrowing
  expr: rate(rabbitmq_queue_messages_ready[10m]) > 0 and rabbitmq_queue_consumers > 0
  for: 15m

Per-queue series like rabbitmq_queue_messages_ready and rabbitmq_queue_messages_unacked come from the /metrics/per-object (or /metrics/detailed, prefixed rabbitmq_detailed_) endpoints; the default /metrics endpoint is aggregated, so enable per-object metrics if your dashboard filters by queue=.

Step 4: Check the broker is not in flow control

rabbitmqctl list_connections name state user
rabbitmqctl status | grep -E 'mem|disk'
rabbitmqctl list_node_alarms

If state is flow, the broker is throttling that connection. If list_node_alarms shows a memory or disk alarm, you are over the watermark — free disk, reduce memory pressure, or (carefully) raise vm_memory_high_watermark.relative / disk_free_limit.absolute in rabbitmq.conf and reload.

Step 5: Fix consumer_timeout for long jobs

If list_queues shows messages redelivering every ~30 minutes and logs say ... has timed out waiting for a consumer acknowledgement ..., your job runs longer than the default consumer_timeout. Note that on RabbitMQ 4.3+ only quorum queues (and JMS queues) enforce this — if you are seeing it there, the queue is a quorum queue. Three correct fixes, in order of preference:

Best for occasional long jobs: do not hold the delivery for the whole job. Ack early and track completion yourself, or chunk the work so each ack lands well under the timeout.
Raise the global timeout in rabbitmq.conf (value in milliseconds), then restart the node (or apply at runtime with rabbitmqctl eval 'application:set_env(rabbit, consumer_timeout, 3600000).'):

# rabbitmq.conf — 1 hour
consumer_timeout = 3600000

On 4.3+ you can also set the timeout per quorum queue via a policy or the x-consumer-timeout queue argument, so one slow queue does not force a high global value. Disabling the timeout entirely (consumer_timeout set to a non-positive value, or unset) is discouraged because a genuinely stuck consumer then holds the delivery indefinitely and can block on-disk compaction.

Step 6: Use quorum queues for durability

Classic mirrored (HA) queues were removed in RabbitMQ 4.0 — quorum queues are now the only replicated/highly-available queue type, and they handle redelivery limits and persistence predictably.

rabbitmqadmin declare queue name=jobs durable=true arguments='{"x-queue-type":"quorum","x-dead-letter-exchange":"jobs.dlx","x-delivery-limit":5}'

Quorum queues exist since RabbitMQ 3.8. As of June 2026 the current stable line is 4.3.x (4.3.2 shipped 15 Jun 2026); 4.2.x reaches end of community support on 31 Jul 2026, so plan an upgrade if you are still on it. If you are on a pre-4.0 cluster with mirrored classic queues, follow the official migrate-mirrored-classic-to-quorum guide before upgrading. See the RabbitMQ consumers docs for the exact consumer_timeout semantics and the 4.3 release highlights for the queue-type change.

How to confirm it’s fixed

rabbitmqctl list_queues name messages_ready messages_unacknowledged consumers — messages_ready should be falling and messages_unacknowledged should stay near (prefetch × consumers), not pinned at the ceiling.
Watch a known-bad (“poison”) message: it should land in jobs.dlq after the delivery limit, not keep reappearing in jobs.
For long jobs, confirm no new ... has timed out waiting for a consumer acknowledgement ... lines appear in the broker log after a full job duration.

Prevention

Always use auto_ack=false; ack only after success, nack to the DLQ on failure.
Set prefetch_count to worker concurrency times 2; never leave the client default.
Every work queue has a dead-letter exchange and an explicit (or default-20) delivery limit.
Set consumer_timeout deliberately if any job can run more than 30 minutes (per quorum queue via x-consumer-timeout on 4.3+, since only quorum/JMS queues enforce it there).
Alert on messages_unacknowledged and on rising x-delivery-count / redelivery patterns.
Use quorum queues for all new work queues; classic mirrored queues no longer exist in 4.x.

FAQ

Why is my queue growing if consumers are connected and consumers count is non-zero? Connected is not the same as consuming. The usual cause is that delivered messages are stuck in the Unacked bucket because the consumer never ack’d them, so the broker will not push past prefetch_count. Check the Unacked column first.

What is a safe prefetch_count? Tie it to real concurrency, not to throughput hopes. A single-threaded worker should use a small value (often 1–10); a pool of N workers can use roughly N × 2. A high prefetch on a slow or single-threaded consumer just parks messages as unacked and starves other consumers.

Why does the same message keep coming back about every 30 minutes? That is the consumer_timeout (default 1800000 ms). The job runs longer than the timeout, the broker reclaims the unacked delivery, and it is redelivered. Raise consumer_timeout (globally in rabbitmq.conf, or per quorum queue via the x-consumer-timeout argument / consumer-timeout policy as of 4.3), or ack earlier and track completion yourself. On RabbitMQ 4.3+ this only happens on quorum and JMS queues; classic queues and streams no longer enforce the timeout.

Do I still need to write poison-message code on RabbitMQ 4.x? On quorum queues, no — they apply a default delivery-limit of 20 (configurable via x-delivery-limit) and route over-limit messages to a DLX if one is configured. On classic queues you still enforce it in application code using the x-death header count.

Are classic mirrored queues still an option for high availability? No. Classic queue mirroring was removed in RabbitMQ 4.0. Quorum queues are the supported replicated/HA queue type; migrate before upgrading to a 4.x cluster.

Tags: #Backend #Troubleshooting #rabbitmq