Redis Cluster vs Single Node: When Clustering Hurts Performance

May 10, 2026 | 14 min read | Engineering

The default advice for scaling Redis is to add more nodes. Dataset growing? Add shards. Need more throughput? Add replicas. Latency too high? Distribute the load. This advice sounds reasonable and it is completely wrong for the majority of cache workloads. Redis Cluster was designed to solve a specific problem -- horizontal write scaling for datasets that exceed single-node memory. If that is not your problem, Redis Cluster is not your solution. It is your new problem.

Most production cache workloads are read-heavy. Ratios of 90:10 or 99:1 reads-to-writes are normal. A session cache, an API response cache, a feature flag store, a computed result cache -- these are all read-dominated. For read-heavy workloads, Redis Cluster adds overhead at every layer without providing any benefit that a single node cannot deliver. The clustering tax is real, measurable, and in most cases larger than the performance gain teams expected from distributing their data.

This post breaks down exactly where Redis Cluster adds latency, when clustering genuinely helps, and the architecture that delivers better performance than either option: a single Redis node for writes and persistence combined with Cachee L1 for reads.

1-2ms

Added Per MOVED/ASK Redirect

10-15%

Bandwidth Lost to Gossip Protocol

31ns

Cachee L1 Read Latency

Where Redis Cluster Adds Latency

Redis Cluster distributes keys across 16,384 hash slots, spread across multiple nodes. When a client sends a command to a node that does not own the target hash slot, the node responds with a MOVED redirect, telling the client which node actually owns that slot. The client must then open a connection to the correct node and retry the command. This redirect costs 1-2ms per hop in a well-configured network, and it happens on every request that hits the wrong node until the client updates its slot map.

MOVED redirects are not rare events. Smart clients cache the slot map and route requests directly, but the slot map becomes stale whenever the cluster topology changes -- node additions, removals, failovers, or rebalancing. During any topology change, a percentage of requests will hit stale slot mappings and incur redirect penalties. In a cluster that rebalances weekly, this means weekly latency spikes that are entirely self-inflicted.

ASK redirects during migration are worse. When a hash slot is being migrated from one node to another, clients receive ASK redirects for keys that have already moved but whose slot is still in the "migrating" state. Unlike MOVED redirects, ASK redirects cannot be cached. Every request for a key in a migrating slot requires two network round trips: one to the source node (which returns ASK), and one to the destination node (preceded by an ASKING command). This doubles the latency for affected keys during the entire migration window, which can last minutes to hours depending on slot size.

Cross-slot operations are silently broken. Redis Cluster does not support multi-key operations across different hash slots. An MGET that works perfectly on a single Redis node will fail with a CROSSSLOT error on a cluster if the keys hash to different slots. The workaround is hash tags -- forcing keys into the same slot by using a common substring in curly braces, like {user:123}:session and {user:123}:profile. But hash tags create hot slots. If your most popular user's keys are all in the same slot, that slot's node handles a disproportionate share of traffic while other nodes sit idle. You traded a scaling problem for a hot-spot problem.

The Gossip Protocol Tax

Redis Cluster nodes communicate via a gossip protocol to maintain cluster state. Each node sends periodic PING messages to a random subset of other nodes, and the recipients respond with PONG messages that include their view of the cluster state. This protocol consumes bandwidth proportional to the number of nodes. In a 6-node cluster (3 primaries, 3 replicas), the gossip overhead is negligible. In a 30-node cluster, each node is exchanging gossip packets with multiple peers every second, and each packet includes the state of every node the sender knows about.

The gossip protocol consumes 10-15% of available network bandwidth in clusters with 20 or more nodes. This is not bandwidth that serves your requests. It is bandwidth the cluster spends talking to itself about its own health. On networks with limited bandwidth -- VPC peering, cross-AZ links, or shared cloud networking -- this overhead directly competes with your actual cache traffic.

The gossip protocol also determines failure detection speed. A node is considered failed only after a majority of other nodes agree it is unreachable, which requires multiple missed PING/PONG cycles. The default cluster-node-timeout is 15 seconds. For 15 seconds after a node fails, requests routed to that node hang or fail. Compare this to a single Redis node with a Sentinel setup, where failover is detected and executed by the Sentinel quorum in 1-3 seconds.

When Redis Cluster Actually Helps

Redis Cluster solves three genuine problems. If you have one of these problems, clustering is the correct solution. If you do not, it is overhead.

Write sharding. A single Redis node handles approximately 100,000 write operations per second on modern hardware. If your workload generates more than 100K writes/sec, you need to distribute writes across multiple nodes. Redis Cluster does this transparently by assigning hash slots to different nodes. Each node handles writes for its assigned slots independently. This is the primary use case Redis Cluster was designed for, and it works well.

Dataset exceeds single-node memory. A single Redis node is limited by the host's available memory, typically 64-256GB in production. If your dataset exceeds this, you need to shard it across multiple nodes. Redis Cluster distributes keys across nodes based on hash slots, so each node holds a fraction of the total dataset. If your dataset is 500GB, a 3-node cluster gives each node approximately 167GB.

Geographic distribution. If your users are distributed across regions and you need low-latency cache access from each region, Redis Cluster with replicas in each region provides geographically local reads. This is the only scenario where clustering improves read latency -- by physically placing data closer to the reader.

The 99% Case: Your Dataset Fits on One Node

Most production cache workloads use less than 50GB of memory. A single Redis node on a modern cloud instance (r6g.2xlarge, 64GB) handles this with room to spare. If your dataset fits on one node, Redis Cluster is not solving a problem you have. It is adding redirect latency, gossip overhead, cross-slot restrictions, and operational complexity to a system that was already fast enough as a single node.

Benchmark: Cluster vs Single Node vs L1 Hybrid

We benchmarked three architectures on equivalent hardware to quantify the clustering tax. All tests used a 20GB dataset with 10 million keys, a 95:5 read-to-write ratio, and 1,000 concurrent connections. The workload was realistic cache traffic: 80% point reads, 15% multi-key reads (MGET of 5 keys), and 5% writes.

Metric	Redis Cluster (6 nodes)	Redis Single Node	Single Node + Cachee L1
P50 read latency	0.31ms	0.12ms	0.000031ms (31ns)
P99 read latency	2.8ms	0.45ms	0.000089ms (89ns)
P99.9 read latency	11.2ms	1.1ms	0.000142ms (142ns)
MGET (5 keys) latency	4.7ms (cross-slot split)	0.18ms	0.000155ms (155ns)
Throughput (reads/sec)	890,000	620,000	32,000,000+
Redirect rate	2.1% steady state	0%	0%
Network overhead	12% (gossip)	0%	0%

The cluster has higher aggregate throughput than the single node because it distributes reads across six nodes. But the per-request latency is worse at every percentile. The P99 is 6x worse. The P99.9 is 10x worse. Multi-key operations are 26x slower because the cluster must split them across nodes. And 2.1% of requests are still hitting redirects in steady state, adding 1-2ms to those requests for no reason.

The single node has lower throughput because it is one machine, but every request is a local memory access with no network hops, no slot lookups, no redirects. The latency profile is clean and predictable.

The hybrid architecture -- single Redis node for writes and cold misses, Cachee L1 for reads -- dominates both. L1 serves reads from in-process memory at 31ns. There is no network round trip, no serialization overhead, no connection pooling contention. Redis handles the 5% of traffic that is writes, plus the small percentage of cold reads that miss L1. The result is read throughput that exceeds the 6-node cluster by 36x at latencies that are four orders of magnitude lower.

The Cross-Slot Problem in Practice

Cross-slot failures are the most common operational surprise teams encounter after migrating to Redis Cluster. Code that worked perfectly on a single node breaks silently or throws errors in a cluster. Here are the specific patterns that fail.

# This works on single Redis, fails on cluster
# CROSSSLOT error: keys "user:123" and "user:456" hash to different slots
redis.mget("user:123:session", "user:456:session")

# This works but creates a hot slot
# All keys with {user} hash to the same slot
redis.mget("{user}:123:session", "{user}:456:session")

# Lua scripts across slots: fails
redis.eval("return redis.call('get', KEYS[1]) .. redis.call('get', KEYS[2])",
           2, "key-in-slot-1", "key-in-slot-2")

# Pipeline across slots: silently splits into per-slot batches
# Each batch is a separate round trip -- pipeline benefit destroyed
pipe = redis.pipeline()
for key in thousand_keys:  # keys hash to different slots
    pipe.get(key)
results = pipe.execute()  # N round trips instead of 1

The pipeline case is particularly insidious. A pipeline on single Redis batches all commands into a single network round trip. On a cluster, the client library silently splits the pipeline by slot, sending each group to its owning node separately. A 1,000-key pipeline that was one round trip on single Redis becomes potentially hundreds of round trips on a cluster, one per unique slot. The code looks the same. The performance is orders of magnitude worse.

Hash Tags Create Hot Spots

The standard advice for cross-slot issues is to use hash tags: {user:123}:session and {user:123}:profile hash to the same slot because Redis only hashes the portion between the first pair of curly braces. This guarantees multi-key operations on the same user work correctly. It also guarantees that all of a user's data lands on a single node. If that user generates 10% of your traffic, one node handles 10% of your traffic while five nodes handle the remaining 90%. You have traded a correctness problem for a hot-spot problem, and the hot spot is harder to diagnose because it appears as random latency spikes on one node.

Rebalancing: The Latency Event You Scheduled

Redis Cluster rebalancing migrates hash slots from one node to another. This happens when you add nodes, remove nodes, or manually rebalance to fix uneven key distribution. During migration, every key in the moving slot must be individually transferred from the source node to the destination node. For each key, the source node serializes the value, sends it over the network, and the destination node deserializes and stores it.

While a slot is being migrated, reads for already-migrated keys in that slot incur ASK redirects (two round trips). Reads for not-yet-migrated keys work normally on the source node. Writes are more complex: new writes to the migrating slot go to the source node but must be forwarded to the destination. The entire migration window -- which scales linearly with the number of keys in the slot and their sizes -- is a period of degraded performance for every key in that slot.

A slot with 100,000 keys averaging 1KB each takes 15-45 seconds to migrate, depending on network speed and node load. During that window, every request for a key in that slot pays a latency penalty. If you have 16,384 slots and need to migrate 1,000 of them to rebalance after adding a node, you are looking at hours of degraded performance across roughly 6% of your keyspace. This is not a failure scenario. This is a planned operation that you initiated to improve your cluster.

The Better Architecture: Single Node + L1

The architecture that outperforms both Redis Cluster and single Redis for read-heavy workloads is straightforward. Use a single Redis node as the write path and persistence layer. Use Cachee L1 as the read path. L1 is an in-process cache tier that runs inside your application's memory space. There is no network round trip for reads. There is no serialization or deserialization. There is no connection pooling. The read path is a hash table lookup in your own process memory, completing in 31 nanoseconds.

use cachee::L1Cache;

// Initialize L1 with Redis as the backing store
let l1 = L1Cache::builder()
    .max_entries(1_000_000)         // 1M entries in process memory
    .backing_store("redis://single-node:6379")
    .populate_on_miss(true)         // L1 miss -> fetch from Redis -> populate L1
    .invalidation_channel("cachee:invalidate")  // Redis Pub/Sub for coherence
    .build()
    .await?;

// Read path: 31ns from L1, falls through to Redis on miss
let session = l1.get("user:123:session").await?;

// Write path: writes to Redis, L1 populated on next read or via Pub/Sub
l1.set("user:123:session", &session_data).await?;

// Multi-key reads: no cross-slot problem, all in local memory
let keys = vec!["user:123:session", "user:456:session", "user:789:session"];
let sessions = l1.mget(&keys).await?;  // 155ns, not 4.7ms

The L1 tier absorbs 99%+ of reads because cache workloads follow power-law distributions: a small number of keys account for the majority of reads. The hottest 1 million keys fit comfortably in L1 (typically 1-4GB of process memory depending on value sizes). Cold misses fall through to the single Redis node, which handles them at its normal 0.12ms latency. The single Redis node also handles all writes and provides persistence via RDB or AOF.

Invalidation Without Complexity

The obvious concern with an L1 cache is coherence: how do you invalidate L1 entries when the backing data changes? Cachee L1 uses Redis Pub/Sub on the single node to broadcast invalidations. When a write occurs, the writer publishes the invalidated key on the cachee:invalidate channel. All L1 instances subscribed to that channel evict the key immediately. The invalidation propagation time is the Redis Pub/Sub delivery latency, typically under 0.5ms.

This is simpler than cluster topology management. There is no gossip protocol, no slot map, no MOVED redirects. There is one Pub/Sub channel. When a key changes, every L1 instance drops it. The next read for that key misses L1, fetches from Redis, and populates L1 again. Total coherence with zero operational complexity.

For workloads where even Pub/Sub invalidation is too complex, computation fingerprinting provides an alternative coherence model. Instead of invalidating by key, Cachee fingerprints the computation that produced the cached value. If the inputs change, the fingerprint changes, and the old cached value is never returned because no request will match the old fingerprint. There is no explicit invalidation at all. Coherence is a property of the fingerprint, not of an invalidation protocol.

When to Use Each Architecture

The decision is not complicated. It depends on two variables: your dataset size relative to single-node memory, and your write volume relative to single-node capacity.

Your Situation	Architecture	Why
Dataset < 50GB, writes < 100K/sec	Single Redis + Cachee L1	99% of workloads. L1 absorbs reads at 31ns. Single node handles writes and cold misses.
Dataset > 256GB	Redis Cluster + Cachee L1	You need sharding for data volume. L1 still absorbs reads and hides cluster latency.
Writes > 100K/sec sustained	Redis Cluster	You need write sharding. This is the problem Cluster was designed for.
Multi-region, low-latency reads	Redis Cluster with geo-replicas	Physical proximity requires distribution. L1 on top further reduces latency.
Read-heavy, latency-sensitive	Single Redis + Cachee L1	Cache bottleneck is network round trip. Eliminate it with L1.

Notice that even in the cases where Redis Cluster is justified, adding Cachee L1 on top still improves the architecture. The L1 tier absorbs the read traffic that clustering was never designed to optimize. The cluster handles the writes and data distribution that it was designed for. Each component does what it is good at.

Beyond Performance: Attestation at the L1 Tier

Cachee L1 is not just a faster cache. Every entry in L1 carries a post-quantum attestation -- triple PQ signatures that prove the cached value is authentic and unmodified. Redis, whether single node or cluster, stores plaintext values with no integrity verification. An attacker who compromises Redis can modify cached values and every client will trust the modified data. Cachee L1 entries are cryptographically bound to the computation that produced them. A modified entry fails signature verification and is rejected, even if the backing store is compromised.

Migration Path: Cluster to L1 Hybrid

If you are currently running Redis Cluster and want to move to the single-node-plus-L1 architecture, the migration is low-risk because L1 is additive. You do not need to tear down your cluster on day one. The migration has three phases.

Phase 1: Add L1 alongside your existing cluster. Deploy Cachee L1 in your application processes with your existing Redis Cluster as the backing store. L1 starts absorbing reads immediately. Measure the L1 hit rate and the reduction in cluster traffic. This phase has zero risk because L1 is a transparent read-through cache. If L1 misses, the request falls through to your cluster exactly as before.

Phase 2: Measure and validate. After two weeks of L1 operation, you will have clear data: what percentage of reads hit L1, what is the remaining cluster traffic, and can a single node handle the residual load? If L1 absorbs 95% of reads and your write volume is under 50K/sec, a single node handles the remaining traffic comfortably.

Phase 3: Consolidate. Stand up a single Redis node with the same dataset. Point the L1 backing store at the single node. Drain traffic from the cluster. Decommission cluster nodes. Your operational complexity drops from managing a multi-node cluster with gossip, slots, and rebalancing to managing one Redis instance and one Pub/Sub channel.

# Phase 1: L1 with existing cluster (zero risk)
[backing_store]
type = "redis_cluster"
nodes = ["redis-1:6379", "redis-2:6379", "redis-3:6379"]

# Phase 3: L1 with single node (after validation)
[backing_store]
type = "redis"
url = "redis://single-node:6379"
invalidation_channel = "cachee:invalidate"

The operational savings alone justify the migration. A single Redis node requires monitoring one instance, one replication stream (to a standby), and one failover path. A Redis Cluster requires monitoring N nodes, N replication streams, the gossip protocol health, slot distribution balance, and migration events. Every additional node is additional surface area for failures, alerts, and on-call pages. Reducing from six nodes to one is not just a performance improvement. It is an operational complexity reduction that your team will feel every week.

Redis Cluster is a powerful tool for the specific problem it was designed to solve: horizontal write scaling and memory sharding beyond single-node limits. For the 99% of cache workloads that are read-heavy with datasets under 50GB, it adds latency at every percentile, wastes bandwidth on gossip, breaks multi-key operations, and introduces rebalancing as a regular latency event. The better architecture uses the simplest possible write path -- a single Redis node -- and the fastest possible read path -- Cachee L1 at 31ns. Your cluster is making your cache slower. The numbers prove it.

The Bottom Line

Redis Cluster adds 1-2ms per redirect, wastes 10-15% of bandwidth on gossip, breaks multi-key operations, and introduces rebalancing latency. For read-heavy workloads (99% of cache use cases), a single Redis node for writes plus Cachee L1 for reads delivers 31ns read latency -- four orders of magnitude faster than cluster reads -- with zero cross-slot restrictions, zero gossip overhead, and zero rebalancing events. Stop scaling horizontally when your problem is vertical.

Your Redis Cluster is adding latency, not reducing it. Cachee L1 absorbs 99% of reads at 31ns with zero network hops.

Get Started Redis vs L1 Benchmark