How to Speed Up Redis: 7 Production Fixes

April 27, 2026 | 12 min read | Engineering

Redis is fast. Everyone knows this. It is an in-memory data store that operates in microseconds. But "Redis is fast" is a statement about the engine, not about your deployment. In production, Redis latency is a function of value sizes, connection management, access patterns, and network topology. The engine might be capable of sub-millisecond responses, but your application is seeing 2-5ms P99 latencies and you do not know why.

This is not a theoretical problem. We have profiled dozens of production Redis deployments and found the same seven issues in almost every one. These are not obscure configuration bugs. They are structural patterns in how applications use Redis that silently degrade performance from microseconds to milliseconds. Each fix below is ranked by typical impact, starting with the highest-leverage change.

The good news is that six of these seven fixes are free. They require code changes but no additional infrastructure. The seventh -- L1 tiering -- requires a small architectural change, but it delivers the largest improvement by far: 10-20x latency reduction on hot reads.

300 us

Typical Redis GET Latency

31 ns

In-Process L1 Latency

10-20x

Latency Reduction (Fix #4)

Fix 1: Eliminate Large Keys (Impact: High)

The Problem

Every byte you store in Redis must be serialized on write, transmitted over TCP, deserialized on read, and transmitted back over TCP. For a 64-byte session token, this overhead is negligible. For a 4KB JSON object, it is the dominant cost. For a 50KB cached API response, it is catastrophic.

The serialization tax scales linearly with value size. A 64-byte value takes approximately 0.3 microseconds to serialize with MessagePack. A 1KB value takes 5 microseconds. A 10KB value takes 48 microseconds. That is before the network round-trip. On a typical cross-AZ connection with 150 microseconds of base latency, a 10KB value adds another 80 microseconds of transfer time on a 1 Gbps link. Your 300-microsecond Redis GET just became a 478-microsecond operation, and serialization -- not Redis itself -- is the bottleneck.

The Measurement

Run redis-cli --bigkeys to find your largest keys. Then use DEBUG OBJECT <key> to see the serialized size of individual keys. Any value over 1KB deserves scrutiny. Any value over 10KB is almost certainly hurting your P99 latency. Run redis-cli --stat during peak traffic and watch the instantaneous throughput. If you see it drop below 100K ops/sec on a modern instance, large values are likely the cause.

The Fix

Break large values into smaller pieces or store only the fields you actually read. If you are caching a 12KB user profile JSON object but only reading the display name and avatar URL on most requests, store those two fields as separate hash fields. Use HGET or HMGET to fetch only what you need. If you must store large objects, compress them with LZ4 or Zstandard before writing. LZ4 compresses typical JSON 2-4x with negligible CPU cost (200ns for 4KB payloads). Zstandard compresses 4-8x but costs 5-10 microseconds. Choose based on your latency budget.

Expected Improvement

Eliminating values over 1KB typically reduces P99 latency by 30-50%. In one production deployment, replacing a 8KB cached user object with a 200-byte hash containing only the hot fields reduced median Redis latency from 1.2ms to 0.38ms.

Fix 2: Use MGET and Pipelines Instead of Individual GETs (Impact: High)

The Problem

Every Redis command incurs a network round-trip. On a local connection, that round-trip is 50-100 microseconds. On a cross-AZ connection, it is 150-500 microseconds. If your application sends 10 individual GET commands to render a page, you pay 10 round-trips. On a cross-AZ link at 200 microseconds per round-trip, that is 2 milliseconds of pure network latency before Redis even processes a single command. Redis processes each command in under 1 microsecond. The network is 200x more expensive than the computation.

The Measurement

Instrument your Redis client to count the number of round-trips per application request. Most Redis clients have hooks or middleware for this. If your average page load triggers more than 3 individual Redis commands, you are leaving performance on the table. Monitor with redis-cli monitor for 30 seconds during peak traffic (be careful, this impacts performance) and count how many commands arrive in rapid sequential bursts from the same client -- those bursts are pipeline candidates.

The Fix

Use MGET for fetching multiple keys of the same type. Use pipelines for fetching multiple keys of different types or for mixed read/write operations. A pipeline sends all commands to Redis in a single write, then reads all responses in a single read. The total latency is one round-trip plus the processing time for all commands, instead of N round-trips.

# Before: 5 round-trips (5 x 200us = 1000us network latency)
r.get("user:1:name")
r.get("user:1:email")
r.get("user:1:avatar")
r.get("user:1:role")
r.get("user:1:prefs")

# After: 1 round-trip (200us network latency)
r.mget("user:1:name", "user:1:email", "user:1:avatar",
       "user:1:role", "user:1:prefs")

Expected Improvement

Replacing N individual GETs with a single MGET or pipeline reduces network latency from N * RTT to 1 * RTT. For 5 sequential GETs on a 200-microsecond link, this saves 800 microseconds per application request. On a 10-GET sequence, it saves 1.8 milliseconds. We have seen page-level P50 latency drop from 4.2ms to 1.1ms purely from pipeline adoption, with zero changes to Redis configuration or infrastructure.

Fix 3: Connection Pooling (Impact: Medium-High)

The Problem

Creating a new TCP connection to Redis costs 1-3 milliseconds. This includes the TCP three-way handshake (1 RTT), TLS negotiation if enabled (1-2 additional RTTs), and the Redis AUTH command (1 RTT). If your application creates a new connection for each request, you are paying this cost on every request. Even if your application reuses connections within a request, you may be creating new connections on every cold start, deployment, or connection timeout.

The more insidious problem is connection churn under load. If your connection pool is too small, requests queue waiting for a free connection. If it is too large, Redis spends CPU time managing thousands of file descriptors, and context switching between client connections adds latency to every operation. Redis is single-threaded (or limited to I/O threads in Redis 7+), so connection management overhead directly reduces the time available for processing commands.

The Measurement

Run INFO clients and check connected_clients and total_connections_received. If total_connections_received is growing rapidly (more than a few per minute during steady-state traffic), you have connection churn. If connected_clients exceeds 500, you likely have pool sizing issues. Check blocked_clients -- any non-zero value means clients are waiting for resources, which is a red flag.

The Fix

Use a connection pool with a fixed maximum size. For most applications, 10-50 connections per application instance is sufficient. Set idle timeout to 300 seconds (not lower -- reconnection costs more than maintaining an idle connection). Enable TCP keepalive with a 60-second interval to detect dead connections before they cause timeouts. If using Redis 7+, enable I/O threads (io-threads 4) to parallelize connection handling across multiple cores. This does not make Redis multi-threaded for command processing, but it offloads read/write syscalls to separate threads, which reduces latency under high connection counts.

Expected Improvement

Proper connection pooling eliminates the 1-3ms connection setup cost on every request. For applications that were creating connections per-request, this alone can reduce P99 from 5ms to under 1ms. Even for applications with existing pools, tuning pool size to match actual concurrency reduces connection management overhead inside Redis by 10-30%. We measured a 22% throughput increase on a Redis 7 instance by reducing pool size from 200 to 30 connections (matching actual concurrency) and enabling 4 I/O threads.

Fix 4: Move Hot Reads to In-Process L1 (Impact: Highest)

The Problem

This is the single highest-impact fix and the one that most teams overlook. Redis is a network service. Every GET requires a round-trip over TCP, regardless of how fast Redis processes the command. On a local connection, this minimum latency is 50-100 microseconds. On a cross-AZ connection, it is 150-500 microseconds. No amount of Redis tuning can reduce this below the network floor.

But most applications have a small set of keys that account for a disproportionate share of reads. Session tokens, feature flags, user permissions, rate limit counters -- these are read on every request and change infrequently. They are perfect candidates for in-process caching. An in-process hash map lookup takes 31 nanoseconds. That is 10,000x faster than a cross-AZ Redis GET.

The Measurement

Use redis-cli --hotkeys (requires maxmemory-policy set to an LFU variant) to identify your most frequently accessed keys. Alternatively, sample your application logs to find which keys are read most often. In our experience, 5-15% of keys account for 80-95% of reads. These are your L1 candidates. Calculate the percentage of your total Redis operations that hit these hot keys -- that percentage is approximately the fraction of your Redis traffic that L1 can absorb.

Before and After: The L1 Tiering Effect

In a production deployment with 45,000 Redis GETs per second, we identified 2,400 unique hot keys (session data, feature flags, config) that accounted for 87% of all reads. Before L1 tiering, median latency was 310 microseconds and P99 was 1.8ms. After adding an in-process L1 cache for those 2,400 keys: median latency dropped to 31 nanoseconds for L1 hits (87% of reads) and remained at 310 microseconds for L1 misses (13% of reads). Weighted P50 fell to 67 microseconds. P99 fell from 1.8ms to 0.34ms. Redis CPU utilization dropped from 73% to 11% because 87% of reads never reached Redis at all.

The Fix

Add an in-process cache tier in front of Redis. On every read, check the L1 cache first. If the key is present and not expired, return the value without touching Redis. On L1 miss, read from Redis and populate L1. Use a short TTL (5-60 seconds depending on consistency requirements) to bound staleness. For keys that change rarely (feature flags, configuration), longer TTLs are safe. For keys that change per-request (rate limit counters), do not cache in L1.

# Without L1: every read hits Redis (300us)
value = redis.get("session:abc123")

# With L1: 87% of reads hit in-process cache (31ns)
value = l1_cache.get("session:abc123")
if value is None:
    value = redis.get("session:abc123")  # 300us, only 13% of reads
    l1_cache.set("session:abc123", value, ttl=30)

Expected Improvement

The improvement is proportional to your L1 hit rate. At 80% L1 hit rate with 300-microsecond Redis latency and 31-nanosecond L1 latency, your weighted average latency drops from 300 microseconds to 60 microseconds -- a 5x reduction. At 95% L1 hit rate, it drops to 15 microseconds -- a 20x reduction. The Redis server's CPU utilization drops by the same percentage because those reads never reach it. This is the only fix on this list that reduces both client-side latency and server-side load simultaneously.

L1 Hit Rate	Weighted Avg Latency	Reduction vs No L1	Redis Load Reduction
0% (no L1)	300 us	1x (baseline)	0%
50%	150 us	2x	50%
70%	90 us	3.3x	70%
80%	60 us	5x	80%
90%	30 us	10x	90%
95%	15 us	20x	95%
99%	3 us	100x	99%

Fix 5: Enable Lazy Expiry Instead of Active Expiry (Impact: Medium)

The Problem

Redis expires keys in two ways: lazily (when a client accesses an expired key, Redis deletes it and returns nil) and actively (Redis periodically scans the keyspace for expired keys and deletes them proactively). The active expiry process runs 10 times per second by default. On each cycle, it samples 20 random keys from the set of keys with TTLs set. If more than 25% of sampled keys are expired, it immediately repeats the cycle. This can cascade: if you have a large number of keys expiring simultaneously (common with cache stampede patterns), the active expiry loop can run continuously for hundreds of milliseconds, blocking all other Redis operations.

The Measurement

Check INFO stats and look at expired_keys and evicted_keys. If expired_keys is growing by thousands per second, active expiry is doing significant work. Use redis-cli --latency-history to look for periodic latency spikes. If you see spikes every 100 milliseconds (matching the 10 Hz active expiry cycle), active expiry is likely the cause. Also check SLOWLOG GET 10 -- if you see commands taking 10-100ms when they normally take microseconds, active expiry is blocking the event loop.

The Fix

In Redis 7+, configure lazyfree-lazy-expire yes to move the actual memory deallocation to a background thread. The key is still logically deleted during the active expiry scan, but the memory free happens asynchronously, which reduces the time the main thread spends on expiry. Additionally, if your workload can tolerate slightly stale expired keys, you can reduce the active expiry aggressiveness by setting hz to a lower value (default is 10; try 5 or even 2). This reduces how often Redis scans for expired keys but means expired keys may linger slightly longer before being reclaimed.

The most effective fix is to avoid mass expiry entirely. Instead of setting all cache keys to the same TTL, add a random jitter of 10-20% to each TTL. This spreads expirations over time and prevents the cascade effect where active expiry enters its tight loop. A TTL of 300 seconds becomes a TTL of 270-330 seconds, randomly chosen per key.

Expected Improvement

For workloads with heavy TTL usage (caching, sessions, rate limiting), enabling lazy free and adding TTL jitter typically eliminates the periodic P99 spikes caused by active expiry. In one deployment, P99 dropped from 8.2ms to 1.1ms after adding a 15% random TTL jitter to all cache keys. The median latency was unchanged because the median was not affected by the periodic expiry spikes -- but the tail was dramatically improved.

Fix 6: Shard by Access Pattern, Not by Key Hash (Impact: Medium)

The Problem

Redis Cluster shards keys by CRC16 hash. This distributes keys evenly across shards by name, but it does not distribute them evenly by access frequency. If keys "session:hot-user-1" and "session:hot-user-2" hash to the same shard, that shard handles disproportionate traffic while other shards idle. Key distribution is not the same as load distribution. A perfectly balanced cluster by key count can be wildly imbalanced by operations per second.

The problem is worse with hash tags. If you use hash tags like {user:123}:session and {user:123}:prefs to co-locate related keys on the same shard, you are intentionally concentrating all operations for a user on a single shard. For most users, this is fine. For your heaviest users (enterprise accounts, bots, power users), this creates a hotspot that no amount of horizontal scaling can fix because all keys for that user must be on the same shard.

The Measurement

Use redis-cli --cluster info to see key distribution across shards. Then use INFO commandstats on each shard to compare operations per second. If one shard is handling 3-5x the operations of the average shard, you have a hotspot. Use --hotkeys on the hot shard to identify which keys are causing the imbalance. Often, a single key or a small cluster of related keys accounts for the majority of the hot shard's traffic.

The Fix

Move the hottest keys off the hot shard by changing their hash tag or key naming scheme. If a few keys are responsible for the hotspot, consider moving them to a dedicated Redis instance (or to in-process L1, which eliminates the problem entirely). For more systematic load balancing, use read replicas for read-heavy shards and direct read traffic to replicas using READONLY mode. Note that replicas add replication lag (typically 0.1-1ms), so this trades load balancing for a small consistency window.

The deeper fix is to separate your cache topology from your data model. Instead of sharding by key name (which ties data model changes to infrastructure changes), shard by access pattern: put high-frequency low-latency keys on one cluster, low-frequency large-value keys on another, and ephemeral rate-limit keys on a third. Each cluster can be sized and tuned independently for its specific workload.

Expected Improvement

Eliminating hot shards typically improves P99 on the affected shard by 40-60%. For the overall system, the improvement depends on how much traffic hit the hot shard. In a 6-shard cluster where one shard handled 35% of all operations (vs the expected 17%), rebalancing reduced system-wide P99 from 2.4ms to 1.1ms because the hot shard was the P99-determining factor for the entire cluster.

Fix 7: Monitor with SLOWLOG and Fix the Top Offenders (Impact: Ongoing)

The Problem

Redis has a built-in slow query log that records every command taking longer than a configurable threshold. The default threshold is 10 milliseconds (10,000 microseconds), which is far too high to catch most performance issues. A command taking 1 millisecond is 1000x slower than a typical GET, but it will not appear in the slow log at the default threshold. You are blind to a wide range of performance problems.

The Measurement

Set the slow log threshold to 100 microseconds: CONFIG SET slowlog-log-slower-than 100. Set the log size to hold at least 1000 entries: CONFIG SET slowlog-max-len 1000. Then check the log periodically with SLOWLOG GET 20. You will almost certainly find commands you did not expect. Common offenders include KEYS * (scans the entire keyspace -- never use in production), SMEMBERS on large sets, HGETALL on large hashes, SORT operations, LRANGE 0 -1 on long lists, and Lua scripts that do too much work in a single EVAL call.

The Fix

This is not a single fix but an ongoing discipline. Review the slow log weekly. For each slow command, determine whether the command is necessary (can you replace HGETALL with HMGET for the specific fields you need?), whether the data structure is appropriate (if you are calling SMEMBERS on a 10,000-member set, maybe it should be a sorted set with ZRANGEBYSCORE on a range), whether the operation can be batched or pipelined (are you calling SISMEMBER 50 times in a loop instead of SMISMEMBER once?), and whether the key should exist in Redis at all (large, rarely-accessed data might belong in the database, not the cache).

# Set slow log to catch anything over 100us
CONFIG SET slowlog-log-slower-than 100
CONFIG SET slowlog-max-len 1000

# Check the offenders
SLOWLOG GET 20

# Example output:
# 1) (integer) 142
#    (integer) 1745712000
#    (integer) 8234         <-- 8.2ms!
#    1) "HGETALL"
#    2) "user:9823:full_profile"
#
# Fix: replace HGETALL with HMGET for the 3 fields you need
# Result: 8.2ms -> 0.12ms

Expected Improvement

Each slow command fix has a localized impact, but the cumulative effect is significant. Teams that adopt weekly slow log review typically see their P99 decrease by 5-15% per month for the first 3-6 months as they systematically eliminate the long tail of slow operations. The discipline also prevents regression: new code that introduces a slow Redis pattern shows up in the slow log within days, not after a production incident.

Putting It All Together

The seven fixes above are ordered by individual impact, but the compounding effect of applying multiple fixes is greater than the sum of their parts. Eliminating large keys (Fix 1) makes pipelines more effective (Fix 2) because smaller payloads fit more efficiently into pipeline batches. Connection pooling (Fix 3) reduces the baseline latency, which makes the L1 hit/miss delta even more dramatic (Fix 4). Lazy expiry (Fix 5) eliminates the periodic spikes that inflate P99, and sharding by access pattern (Fix 6) ensures that no single shard is the bottleneck. Slow log monitoring (Fix 7) catches regressions in all of the above.

Fix	Effort	P99 Impact	Throughput Impact
1. Eliminate large keys	Medium	30-50% reduction	20-40% increase
2. MGET / pipelines	Low	40-60% reduction	2-5x increase
3. Connection pooling	Low	20-40% reduction	10-30% increase
4. In-process L1 cache	Medium	10-20x reduction	5-20x increase
5. Lazy expiry + TTL jitter	Low	50-80% spike reduction	Minimal
6. Shard by access pattern	High	40-60% on hot shard	Proportional
7. SLOWLOG monitoring	Low (ongoing)	5-15% per month	Varies

If you can only do one thing, do Fix 4. Adding an in-process L1 cache in front of Redis is the single highest-leverage change you can make. It reduces latency by 10-20x on hot reads, reduces Redis server load proportionally, and works regardless of your Redis deployment topology. The other six fixes optimize how you use Redis. Fix 4 eliminates the need to use Redis for your hottest data entirely.

The Bottom Line

Redis is not slow. Your usage of Redis is slow. Large values add serialization tax. Sequential GETs waste network round-trips. Connection churn wastes handshake time. Active expiry causes periodic spikes. Hot shards create bottlenecks. And through all of it, every single read pays the network latency floor -- 50 to 500 microseconds that no Redis optimization can eliminate. The fix for the network floor is not a faster network. It is an in-process L1 cache at 31 nanoseconds that absorbs your hottest reads before they ever touch the wire.

In-process L1 caching at 31 nanoseconds. Drop your Redis P99 by 10-20x.

brew install cachee Why 60% Hit Rate Costs You