Redis Cluster Failover Stuck: No Replica Gets Promoted

Q: `CLUSTER FAILOVER FORCE` vs `TAKEOVER` — which do I use?

Use `FORCE` whenever a majority of masters are still alive; it gets a proper authorized config epoch. Use `TAKEOVER` only when you have lost master quorum and cannot get it back, and you accept split-brain risk if the old master returns.

Q: How long should a healthy failover take?

Roughly `cluster-node-timeout` (default 15 s) for the node to reach `FAIL`, plus a sub-second election delay, so about 15 to 20 seconds end to end with defaults, up to ~30 s. Anything past a couple of minutes is genuinely stuck.

Q: Should I lower `cluster-replica-validity-factor`?

Lowering it makes failover *harder* (a replica must be fresher to qualify); raising it or setting it to 0 lets a more stale replica promote. For caches, a higher value or 0 favors availability; for session/data stores, keep the default and prefer waiting for resync.

A master node died but no replica took over and the cluster sits in fail state. Fix it by checking master quorum, the FAIL flag, replica eligibility, and forcing a manual takeover.

Published: May 23, 2026 Updated: Jun 18, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Your Redis Cluster lost a master and you expected a replica to take over in about 15 to 30 seconds. A few minutes later the cluster is still broken: CLUSTER INFO shows cluster_state: fail, CLUSTER NODES shows the dead master flagged master,fail while its replica is still slave, and every key in that master’s hash-slot range returns CLUSTERDOWN The cluster is down or a MOVED/TRYAGAIN error. That is a stuck failover.

Fastest fix: confirm a majority of master nodes are alive and can see each other (a 3-master cluster needs at least 2 reachable), confirm the surviving replica is current, then run CLUSTER FAILOVER FORCE on that replica. If the old master is confirmed dead and you have lost master quorum, CLUSTER FAILOVER TAKEOVER is the last resort. The rest of this guide explains how to find the actual cause before you reach for TAKEOVER, because the wrong shortcut here causes split-brain and data loss.

This applies to true Redis Cluster (cluster-enabled yes), not Sentinel. The two are different products with different fixes — the next section helps you tell them apart in five seconds.

Cluster vs Sentinel: get this right first

Redis ships two completely different high-availability designs, and the fix differs:

Redis Cluster (this article) shards data across multiple masters and handles failover itself through a gossip protocol. There are no Sentinel processes. Replicas drive the election and a majority of the master nodes vote. Diagnostic commands are CLUSTER INFO and CLUSTER NODES.
Redis Sentinel is a separate set of sentinel processes that monitor a single non-sharded master/replica group and use replica-priority plus a Sentinel quorum. Diagnostic commands are SENTINEL MASTERS and SENTINEL SLAVES.

If you run redis-cli -p 26379 SENTINEL MASTERS and get a reply, you are on Sentinel, not Cluster, and the relevant settings are replica-priority and the per-master quorum. Everything below assumes true Redis Cluster (nodes started with cluster-enabled yes). Note that replica-priority is read by Sentinel only and is ignored in Cluster mode — setting it does nothing for cluster failover.

How cluster failover actually works

Knowing the mechanism tells you where it can stall. As of June 2026 (Redis 7.x and 8.x), the sequence is:

A node stops answering pings. After cluster-node-timeout (default 15000 ms) the detecting node marks it PFAIL (probable fail) and gossips that.
When a majority of master nodes report the same node as PFAIL, it is promoted to FAIL. Only then can a failover start.
Each eligible replica of the dead master computes a small election delay (500ms + random(0-500ms) + rank * 1000ms, where rank 0 is the replica with the highest replication offset, i.e. the most up-to-date) and then requests votes from the masters. The fixed 500 ms exists so the FAIL state has time to propagate before any vote is requested; the random part desynchronizes replicas; the rank ensures the freshest replica asks first.
A replica wins when a majority of masters grant it a vote in the same config epoch. A 3-master cluster needs 2 votes; a 5-master cluster needs 3. The winner takes a new, higher configEpoch and claims the dead master’s slots.

Anything that breaks step 2 (no master majority sees FAIL) or step 4 (no master majority can vote) leaves you stuck in fail with a slave that never gets promoted. The exact rules are in the Redis Cluster specification.

Common causes

Ordered by how often they cause a stuck failover.

#	Cause	One-line check	FAIL flag set?
1	Lost master quorum (too many masters down or partitioned)	`CLUSTER INFO` shows `cluster_known_nodes` but fewer masters reachable	No
2	Master stuck at `fail?` (PFAIL), never promoted to `fail`	`CLUSTER NODES` shows `fail?` not `fail`	No
3	Dead master had no replica at all	`CLUSTER NODES`: no `slave` lines pointing at it	n/a
4	Replica too far behind (validity window exceeded)	`INFO replication` link down longer than the validity window	Yes
5	Replica blocked with `cluster-replica-no-failover yes`	`CONFIG GET cluster-replica-no-failover` returns `yes`	Yes
6	New replica not yet known to a majority of masters	`CLUSTER NODES` on each master does not list the replica	Yes
7	`cluster-require-full-coverage yes` masks recovery (cluster refuses all writes)	`CONFIG GET cluster-require-full-coverage` returns `yes`	Yes

1. Lost master quorum

This is the most common real cause. Cluster failover needs a majority of masters, not nodes overall. Adding replicas never changes the math. In a 3-master cluster, if 2 masters are down or partitioned, the one survivor cannot reach majority and no replica can be promoted.

How to spot it: CLUSTER INFO. If cluster_state:fail and you can only reach one of three masters from a healthy node, you have lost quorum. Restore connectivity to the other masters; do not try to “fix” this with TAKEOVER on every shard at once.

2. Master stuck at `fail?`

A node marked fail? is suspected (PFAIL) but a majority of masters have not yet confirmed it as fail. Until it reaches fail, no election begins. This happens when only one master can see the failure (asymmetric partition).

How to spot it: CLUSTER NODES shows master,fail? rather than master,fail. Check why other masters still consider it reachable (one-way firewall rule, security group, MTU/packet drops on the cluster bus port).

3. Dead master had no replica

You cannot promote what does not exist. If the failed master had zero replicas, the cluster simply loses those slots.

How to spot it: CLUSTER NODES and look for slave <dead-master-id> lines. None means no candidate.

4. Replica too far behind

A replica is only eligible if its data is recent enough. The validity window is cluster-node-timeout * cluster-replica-validity-factor / 1000 seconds. With the defaults (cluster-node-timeout 15000, cluster-replica-validity-factor 10) that is 15 * 10 = 150 seconds. A replica whose link was down longer than that will not self-promote.

How to spot it: INFO replication on the replica. Check master_link_status, master_link_down_since_seconds, and master_last_io_seconds_ago. If the down time exceeds the validity window, it was ruled ineligible.

5. Replica explicitly blocked from failover

cluster-replica-no-failover yes tells a replica never to promote itself automatically. If every replica of the dead master has this set, automatic failover cannot happen.

How to spot it: CONFIG GET cluster-replica-no-failover on each replica. yes = blocked. (Note again: replica-priority is a Sentinel setting and has no effect here.)

6. New replica unknown to the masters

A replica can only be elected if a majority of masters already know it as a replica. A node added moments before the outage may not have propagated yet.

How to spot it: run CLUSTER NODES (or CLUSTER REPLICAS <master-id>) on each master and confirm the replica appears, before relying on it.

7. `cluster-require-full-coverage` hides recovery

With this yes (the default), the entire cluster refuses commands while any slot is uncovered, so even shards that are healthy look dead. It does not block failover itself, but it makes a partial outage look like a total one and pressures you into a hasty TAKEOVER.

How to spot it: CONFIG GET cluster-require-full-coverage. yes plus one uncovered slot range = whole cluster returns CLUSTERDOWN.

Before you start

Confirm it is actually stuck, not just slow. A normal cluster failover completes in roughly cluster-node-timeout plus a few seconds (default about 15 to 20 seconds, up to 30).
Identify which master(s) are fail and the affected hash-slot ranges (16384 total).
Note which key prefixes are unreachable from the app.
Capture state before changing anything: CLUSTER NODES from at least 3 nodes, CLUSTER INFO, and INFO replication from every replica.
Have a rollback plan: snapshot (BGSAVE or RDB/AOF copy) before touching the cluster.

Step-by-step fix

Step 1: Confirm the FAIL flag and master quorum

# From a healthy node
redis-cli -h <healthy-node> -p 6379 CLUSTER INFO
redis-cli -h <healthy-node> -p 6379 CLUSTER NODES

In CLUSTER NODES, the dead master should read master,fail. If it reads master,fail?, the cluster has not confirmed the failure yet (cause #2) and the problem is connectivity between masters, not the replica. Count how many masters you can actually reach: you need a majority alive before any failover can complete.

Step 2: Restore the cluster bus and connectivity between masters

Cluster nodes talk on two ports: the client port (6379) and the cluster-bus port, which is the client port plus 10000 (16379) unless overridden by cluster-port. Both must be open between every pair of nodes.

# Reachability on both ports, from each node to each other node
redis-cli -h <other-master> -p 6379 PING
nc -vz <other-master> 16379    # cluster bus port

# Inspect firewall rules
iptables -L -n | grep -E '6379|16379'
# AWS: confirm the security group allows 6379 and 16379 between all node subnets

# Example: open the bus port if blocked
ufw allow from <cluster-subnet> to any port 16379 proto tcp

A blocked cluster-bus port is a classic cause of a master stuck at fail?: clients work but gossip and votes do not flow.

Step 3: Check that an eligible replica exists

# On a master, list replicas of the dead master
redis-cli -h <master> -p 6379 CLUSTER REPLICAS <dead-master-node-id>

# On the candidate replica, check freshness and link state
redis-cli -h <replica> -p 6379 INFO replication
# Look for:
#   role:slave
#   master_link_status:down
#   master_link_down_since_seconds
#   slave_repl_offset

If no replica is listed (cause #3) you cannot fail over to anything; you must add a node and rebuild the shard. If the link was down longer than the validity window (cause #4), the replica was ruled ineligible — either wait for it to resync or widen the window:

# Widen the eligibility window cluster-wide (validity = node_timeout * factor / 1000)
redis-cli -h <node> -p 6379 CONFIG SET cluster-replica-validity-factor 20
# Set to 0 to disable the freshness check entirely (accepts a stale replica)

Step 4: Unblock replicas that opted out of failover

# Check on each replica of the dead master
redis-cli -h <replica> -p 6379 CONFIG GET cluster-replica-no-failover

# If it returns "yes", re-enable automatic promotion
redis-cli -h <replica> -p 6379 CONFIG SET cluster-replica-no-failover no
redis-cli -h <replica> -p 6379 CONFIG REWRITE

Do not bother touching replica-priority in cluster mode — it is read only by Sentinel and is ignored here.

Step 5: Force a manual failover

If the master is dead, masters can see each other, and a current replica exists but still has not promoted, trigger it from the replica:

# Run ON the chosen replica node
redis-cli -h <replica> -p 6379 CLUSTER FAILOVER FORCE

The three modes, weakest to strongest guarantee:

CLUSTER FAILOVER — coordinates with the old master to avoid data loss. Requires the old master to be reachable, so it is useless when the master is dead.
CLUSTER FAILOVER FORCE — skips the handshake with the (unreachable) master but still needs a majority of masters to authorize the new epoch. This is the normal choice when a master has crashed.
CLUSTER FAILOVER TAKEOVER — skips cluster authorization entirely. The replica unilaterally takes the next config epoch and claims the slots. Use only when you have lost master quorum and cannot recover it, because it can cause split-brain if the old master comes back.

Only use TAKEOVER when all of these are true:

The old master is confirmed dead (not just partitioned).
You cannot restore a majority of masters in an acceptable time.
You accept losing any writes that had not replicated.

CLUSTER FAILOVER (with or without FORCE) returns OK immediately but does not guarantee success: per the CLUSTER FAILOVER docs, it only schedules the failover. TAKEOVER is the only synchronous mode. If the scheduled failover never completes, check the replica log:

Manual failover timed out — the replica gave up after a few seconds. Re-check master quorum and the bus port.
Currently unable to failover: Waiting for votes, but majority still not reached — masters are reachable for the replica but are not granting votes. This is almost always a master-side connectivity problem (bus port blocked between masters) or the replica not being known to a majority of masters (cause #6).

Step 6: Verify the promotion

# On the former replica
redis-cli -h <replica> -p 6379 ROLE                 # should now print "master"
redis-cli -h <replica> -p 6379 INFO replication     # role:master

# From any node
redis-cli -h <node> -p 6379 CLUSTER INFO
# Expected:
#   cluster_state:ok
#   cluster_slots_assigned:16384
#   cluster_slots_ok:16384
redis-cli -h <node> -p 6379 CLUSTER NODES           # promoted node now flagged "master"

Step 7: Decide on `cluster-require-full-coverage`

redis-cli -h <node> -p 6379 CONFIG GET cluster-require-full-coverage
# Allow the rest of the cluster to keep serving covered slots during a partial outage
redis-cli -h <node> -p 6379 CONFIG SET cluster-require-full-coverage no
redis-cli -h <node> -p 6379 CONFIG REWRITE

Trade-off: writes to an uncovered slot still fail, but the healthy shards keep serving instead of the whole cluster returning CLUSTERDOWN. Choose no for read-availability, yes for strict write-consistency.

How to confirm it’s fixed

CLUSTER INFO shows cluster_state:ok and cluster_slots_ok:16384 from every node.
CLUSTER NODES shows the promoted node as master and the old master either gone or now slave.
ROLE on the promoted node prints master.
The app can read and write keys in the previously affected slot range with no CLUSTERDOWN/MOVED loops.
Once the old node returns and re-attaches as a replica, INFO replication on it shows master_link_status:up and the offset gap closes.

Long-term prevention

Run at least 3 masters across 3 availability zones so you keep master majority through one zone loss. (Quorum is per-master-count, so 3 masters tolerate 1 down.)
Give every master at least one replica, ideally in a different zone, so every shard has a promotion candidate.
Open both the client port and the cluster-bus port (port + 10000) between all nodes; a blocked bus port is the top cause of fail? that never reaches fail.
Tune cluster-node-timeout deliberately. Lower (for example 5000 ms) detects faster but risks false failovers on brief network blips; the default 15000 ms is conservative.
Monitor replica lag and the validity window; alert when a replica’s master_link_down_since_seconds approaches cluster-node-timeout * cluster-replica-validity-factor / 1000.
Run a failover drill monthly: kill a master and confirm automatic promotion completes within roughly cluster-node-timeout plus a few seconds.

Common pitfalls

Treating a Cluster like Sentinel: tuning replica-priority or Sentinel quorum does nothing for cluster failover.
Running CLUSTER FAILOVER TAKEOVER while the old master is only partitioned (not dead) — when it returns you get split-brain and divergent writes.
Forgetting the cluster-bus port (16379 by default). Clients connect fine, gossip and votes do not, and the master sits at fail? forever.
Forgetting CONFIG REWRITE after a CONFIG SET — the change reverts on restart.
Putting all masters in one AZ — a single zone outage drops below master majority and nothing can fail over.

FAQ

Why didn’t my replica auto-promote even though the master is clearly dead? Most often a majority of masters cannot confirm the failure (the dead master is stuck at fail? because the cluster-bus port is blocked) or you have lost master quorum. Auto-promotion needs a majority of masters to mark the node fail and then to vote, regardless of how many replicas you have.

CLUSTER FAILOVER FORCE vs TAKEOVER — which do I use? Use FORCE whenever a majority of masters are still alive; it gets a proper authorized config epoch. Use TAKEOVER only when you have lost master quorum and cannot get it back, and you accept split-brain risk if the old master returns.

Does replica-priority control which replica wins in a cluster? No. That setting is for Sentinel. In Redis Cluster the most up-to-date replica (lowest rank, highest replication offset) elects first; to exclude a replica entirely set cluster-replica-no-failover yes on it.

How long should a healthy failover take? Roughly cluster-node-timeout (default 15 s) for the node to reach FAIL, plus a sub-second election delay, so about 15 to 20 seconds end to end with defaults, up to ~30 s. Anything past a couple of minutes is genuinely stuck.

Should I lower cluster-replica-validity-factor? Lowering it makes failover harder (a replica must be fresher to qualify); raising it or setting it to 0 lets a more stale replica promote. For caches, a higher value or 0 favors availability; for session/data stores, keep the default and prefer waiting for resync.

My replica log says Waiting for votes, but majority still not reached — what now? The replica is trying to get elected but the masters are not voting. The replica can reach the masters, but either the masters cannot agree the old master is FAIL (bus port blocked between masters, so the dead node sits at fail?), or this replica is not yet known to a majority of masters (cause #6). Fix master-to-master connectivity on the bus port (16379 by default) and confirm the replica shows up in CLUSTER NODES on every master, then retry CLUSTER FAILOVER FORCE.

Tags: #Backend #Troubleshooting #redis

Cluster vs Sentinel: get this right first

How cluster failover actually works

Common causes

1. Lost master quorum

2. Master stuck at fail?

3. Dead master had no replica

4. Replica too far behind

5. Replica explicitly blocked from failover

6. New replica unknown to the masters

7. cluster-require-full-coverage hides recovery

Before you start

Step-by-step fix

Step 1: Confirm the FAIL flag and master quorum

Step 2: Restore the cluster bus and connectivity between masters

Step 3: Check that an eligible replica exists

Step 4: Unblock replicas that opted out of failover

Step 5: Force a manual failover

Step 6: Verify the promotion

Step 7: Decide on cluster-require-full-coverage

How to confirm it’s fixed

Long-term prevention

Common pitfalls

FAQ

Related

Related Articles

Scheduled Cron Job Skipped Silently With No Error Logged

Postgres Migration Stuck on ALTER TABLE in Production

Docker Container Restarts With Exit Code 137 (OOM Killed): Fix It

Fix gRPC DEADLINE_EXCEEDED Errors Under Load

JWT 'jwt expired' on Fresh Tokens: Fix Clock Skew

Kafka Consumer Lag Keeps Growing After Scaling Consumers

2. Master stuck at `fail?`

7. `cluster-require-full-coverage` hides recovery

Step 7: Decide on `cluster-require-full-coverage`