← Back to Blog

Redis Slows Above 1KB: Latency by Value Size

April 18, 2026 | 10 min read | Engineering

Every engineering team that uses Redis as a primary cache eventually discovers the same thing: latency is not constant. It feels constant when every value is a 64-byte session token or a small JSON blob. The network round-trip dominates, serialization is negligible, and Redis delivers consistent sub-millisecond GETs. The illusion breaks the moment your values grow.

We benchmarked Redis 7.2 GET latency across nine value sizes, from 64 bytes to 1 megabyte, in the same availability zone on AWS. The results are not surprising if you understand the protocol, but they are devastating if you assumed your cache latency would stay flat as your payloads grew.

This post presents the raw numbers, explains the physics behind them, identifies the exact size threshold where Redis transitions from "fast" to "problem," and shows what to do about it.

The Benchmark: Redis GET Latency by Value Size

Test setup: Redis 7.2.4 on an r7g.xlarge (4 vCPU, 32 GB) in us-east-1a. Client on a c7g.xlarge in the same AZ. Single persistent connection, pipelining disabled, 100,000 GETs per value size, P50 and P99 reported. All values are random bytes to prevent compression effects.

Value SizeRedis P50Redis P99In-Process P50Slowdown vs In-Process
64 B0.29ms0.52ms31ns9,355x
512 B0.31ms0.55ms31ns10,000x
1 KB0.35ms0.61ms31ns11,290x
4 KB0.50ms0.87ms31ns16,129x
10 KB0.74ms1.30ms31ns23,871x
50 KB1.52ms2.80ms31ns49,032x
100 KB2.70ms5.10ms31ns87,097x
500 KB6.80ms14.20ms31ns219,355x
1 MB12.50ms28.00ms31ns403,226x

Two things jump out of this table. First, the in-process column is constant at 31 nanoseconds regardless of value size. Second, Redis latency roughly doubles every time value size quadruples. At 1 MB, the P99 is 28 milliseconds -- longer than most database queries, longer than most HTTP requests to downstream services, and certainly longer than anything you would expect from a "fast" cache.

28ms
Redis P99 at 1MB
31ns
In-process at 1MB
403,226x
Slowdown factor

Why This Happens: The Three Linear Costs

Redis latency for a GET operation is the sum of five components: key lookup, value serialization in Redis, TCP transfer, client-side deserialization, and the base network round-trip. The key lookup is O(1) and constant -- Redis's hash table is extremely fast regardless of value size. The network round-trip is constant for a given network path. That leaves three components that scale linearly with payload size.

Cost 1: Serialization in Redis (Server-Side)

When Redis processes a GET, it must serialize the value into the RESP (Redis Serialization Protocol) wire format. For a bulk string, this means writing the $ prefix, the length as an ASCII integer, a CRLF, the raw bytes, and another CRLF. The length prefix is trivial. The raw bytes are the cost. Redis calls addReplyBulkLen and addReplyBulkCBuffer, which copy the value into the output buffer. For a 64-byte value, this copy takes less than a microsecond. For a 1 MB value, it takes measurably longer because the CPU must touch every byte of the value to copy it into the network buffer.

This is a memcpy at its core. memcpy throughput on modern hardware is roughly 20-40 GB/s for cache-resident data, but Redis values are often not in the CPU cache -- they live in main memory. At main memory bandwidth of ~50 GB/s on a Graviton3, copying 1 MB takes approximately 20 microseconds. Not the dominant cost, but not free.

Cost 2: TCP Transfer (Network)

This is the largest contributor to the scaling behavior. A 64-byte RESP response fits in a single TCP segment. The cost is one network round-trip: approximately 0.1-0.2ms within the same AZ. A 1 MB RESP response requires approximately 700 TCP segments (at 1460-byte MSS). Even with TCP window scaling and Nagle disabled, the kernel must fragment the response, compute checksums for each segment, and the NIC must transmit each one. The receiver must reassemble them.

The transfer time for 1 MB over a 10 Gbps network link is approximately 0.8ms for the raw bits. But that is the theoretical minimum. In practice, kernel buffer copies, interrupt coalescing, and TCP flow control add overhead. The measured transfer time for 1 MB within the same AZ consistently lands between 5-10ms depending on concurrent network traffic and instance type.

This cost is fundamental. No amount of Redis tuning, connection pooling, or pipelining can eliminate the time required to physically move bytes across a network. If the value is 1 MB, you are paying for 1 MB of network transfer. Period.

Cost 3: Deserialization on the Client (Application-Side)

Your Redis client library must parse the RESP response, allocate memory for the value, and copy the bytes into your application's memory space. For most Redis clients (Jedis, redis-py, ioredis, redis-rs), this involves at least one memory allocation proportional to value size and one copy. Some clients add additional overhead: redis-py decodes bytes to strings by default, Jedis copies into a Java byte array (triggering GC pressure for large values), and ioredis creates a Buffer object.

For a 64-byte value, the allocation and copy are negligible. For a 1 MB value, you are allocating 1 MB of heap memory per GET. At 1,000 GETs per second of 1 MB values, your application is allocating 1 GB/s of heap memory purely for cache reads. In garbage-collected languages (Java, Go, Python, JavaScript), this creates significant GC pressure that further degrades P99 latency.

The Compound Effect

These three costs do not just add -- they compound under concurrency. When 50 concurrent connections each request a 100 KB value simultaneously, Redis's single-threaded event loop must serialize all 50 responses sequentially. The last response waits for all 49 preceding serializations to complete. This is why Redis P99 latency for large values is dramatically worse than P50: the tail is caused by queueing behind other large-value serializations.

The 1KB Inflection Point

Look at the benchmark table again. Between 64 bytes and 512 bytes, latency increases by only 0.02ms (from 0.29ms to 0.31ms). The serialization and transfer costs at these sizes are swallowed by the base network round-trip. Redis feels constant. Your monitoring dashboards show a flat line.

At 1 KB, the latency is 0.35ms -- a 21% increase over 64 bytes. This is the inflection point. Below 1 KB, the network round-trip dominates and value size is irrelevant. Above 1 KB, the three linear costs begin to dominate and latency scales with payload.

At 4 KB, latency has increased 72% over the 64-byte baseline. At 10 KB, it has increased 155%. The relationship is approximately linear beyond the inflection point: every additional 10 KB adds roughly 0.25ms of latency at P50 in the same AZ.

This inflection matters because many real-world cached values sit right at or above it. A JWT with custom claims is 1-2 KB. A serialized user profile with preferences is 2-8 KB. A GraphQL response is 5-50 KB. An API response with nested resources is 10-100 KB. A cached HTML fragment is 20-200 KB. All of these are above the inflection point. All of them pay the linear tax.

When Redis Is Fine

Despite the data above, Redis remains an excellent choice for specific workloads. The performance degradation does not matter if the workload does not trigger it.

Small Values, Any Frequency

Session tokens (64 bytes), feature flags (a few bytes), rate limit counters (8 bytes), boolean cache entries ("is this email verified?"), and similar small values perform well in Redis. At 64 bytes, the 0.29ms latency is acceptable for the vast majority of applications. If every value in your cache is under 512 bytes, the value-size scaling effect is invisible.

Large Values, Low Frequency

If you cache a 100 KB rendered report that gets accessed 10 times per minute, the 2.7ms latency per access is not going to break anything. The total Redis time for that key is 27ms per minute. Your system will not notice. The problem arises when large values are accessed at high frequency -- hundreds or thousands of times per second per key.

Shared State Across Processes

Redis's strength is shared mutable state. If five application servers must agree on the value of a key, Redis provides a single source of truth with atomic operations. In-process caches are per-process by definition. For use cases that genuinely require shared state (distributed locks, global counters, pub/sub), Redis's network overhead is the cost of coordination, and it is worth paying.

When Redis Breaks

Large Values at High Frequency

An API gateway caching GraphQL responses at 20 KB average, 5,000 requests per second. Each request hits the cache 2-3 times. That is 10,000-15,000 Redis GETs per second of 20 KB values. At 0.9ms per GET, you are consuming 9-13.5 seconds of cumulative Redis time per second. You need more than 9 Redis connections just to keep up, and that is before accounting for P99 spikes that will queue behind each other.

Multiple Cache Lookups Per Request

A typical API request might look up an auth token (1 KB), user profile (4 KB), user preferences (2 KB), feature flags (200 bytes), and a rate limit counter (8 bytes). Total: five Redis GETs. At small sizes, five GETs take 1.5ms total. But if the user profile grows to 8 KB (adding avatar URL, notification settings, payment methods), and you add a cached permission matrix at 3 KB, each request now accumulates 2.5-3ms of pure cache latency. That is 20-30% of a 10ms request budget spent talking to your "fast" cache.

Cross-AZ Deployments

The numbers above are same-AZ. Cross-AZ latency adds 0.5-1ms of base round-trip time. A 10 KB value that takes 0.74ms same-AZ takes 1.5-2.0ms cross-AZ. Many production deployments run application servers in multiple AZs for resilience, with Redis in a single AZ. Half of your fleet pays the cross-AZ penalty on every cache access.

The Hidden P99 Problem

Redis is single-threaded for command processing. A single 1 MB GET takes 12.5ms to serialize and transfer. During that 12.5ms, every other client on that Redis instance is blocked. If you have even a small percentage of large-value GETs mixed with small-value GETs, the large values create P99 spikes for the small values. One 1 MB value per second can push your 64-byte GET P99 from 0.52ms to 13ms. The large value poisons the tail latency of everything else on that Redis instance.

Why In-Process Caching Is Constant at 31ns

An in-process cache eliminates all three linear costs simultaneously. There is no serialization because the value is already in the application's memory space -- a GET returns a pointer or reference, not a copy of bytes. There is no TCP transfer because there is no network -- the value lives in the same process. There is no deserialization because the value was never serialized in the first place.

The 31 nanosecond measurement is the time for a hash table lookup and a pointer dereference. The hash computation takes approximately 10-15ns (depending on key length). The table lookup takes 5-10ns (one or two cache-line reads). The pointer dereference takes 5-10ns. This cost is the same whether the value behind the pointer is 64 bytes or 64 megabytes, because the GET operation never touches the value bytes. It returns a reference.

This is not a minor optimization. At 100 KB, in-process is 87,097 times faster than Redis. At 1 MB, it is 403,226 times faster. These are not percentages. These are orders of magnitude.

0 bytes
Serialized per GET
0 copies
Memory allocations
0 hops
Network round-trips

Practical Migration: From Redis to In-Process L0

You do not need to replace Redis entirely. The most effective architecture is a tiered cache where in-process serves as L0 (hot data) and Redis serves as L1 (warm/shared data). Here is how to migrate incrementally.

Step 1: Identify Your Large-Value Keys

Run redis-cli --bigkeys to find the largest keys in your Redis instance. Alternatively, enable slowlog and look for GET commands with high latency -- they correlate with large values. Sort by access frequency times value size. The keys with the highest product of (frequency x size) are your highest-impact migration candidates.

# Find your biggest keys
redis-cli --bigkeys

# Check slowlog for large-value GETs
redis-cli slowlog get 100

# Monitor real-time latency per command
redis-cli --latency-history

Step 2: Add an In-Process L0 Layer

Install Cachee as a sidecar or library. Configure it as a read-through cache in front of Redis. On GET, Cachee checks L0 first (31ns). On L0 miss, it falls through to Redis, fetches the value, and promotes it to L0 for subsequent reads. No application code changes required -- Cachee speaks RESP, so your existing Redis client connects to localhost:6380 instead of your Redis endpoint.

# Install
brew tap h33ai-postquantum/tap
brew install cachee

# Start as L0 in front of Redis
cachee init --upstream redis://your-redis:6379
cachee start

# Point your app at Cachee (RESP-compatible)
# Change: redis://your-redis:6379
# To:     redis://localhost:6380

Step 3: Tune CacheeLFU Admission

CacheeLFU automatically promotes frequently accessed values to L0 and evicts cold entries. For large-value workloads, the admission scoring already factors access frequency against recency. A 100 KB value accessed 1,000 times per second stays in L0. A 100 KB value accessed once per hour evicts to Redis L1. No manual TTL tuning is required, but you can configure the L0 memory budget based on your instance size.

Step 4: Monitor the Impact

After migration, check three metrics: L0 hit rate (target 85%+ for hot-path keys), P99 latency reduction (expect 10-100x improvement for large-value keys), and Redis load reduction (expect 40-70% fewer GETs hitting Redis). Cachee exposes these via cachee status and a Prometheus-compatible metrics endpoint.

Step 5: Expand Coverage

Start with the top 10 large-value keys. Measure the impact. Then expand to the top 50, then 200. Each expansion reduces Redis load further and improves tail latency. For most applications, 200-500 hot keys in L0 cover 80-90% of cache traffic by volume.

The Numbers That Matter

If your cached values are under 512 bytes and you access them infrequently, Redis is fine. Do not over-optimize. But if any of the following are true, the value-size latency scaling is actively degrading your application:

In each of these cases, adding an in-process L0 cache in front of Redis eliminates the serialization, transfer, and deserialization costs for your hottest keys. The values that dominate your latency budget -- the large, frequently accessed ones -- move from milliseconds to nanoseconds. Redis continues to serve as durable shared state. It just stops being the bottleneck.

The Bottom Line

Redis GET latency is not constant. It scales linearly with value size above 1 KB. At 100 KB, you are paying 2.7ms per GET -- nearly 10x the baseline. At 1 MB, you are paying 12.5ms. In-process caching delivers 31 nanoseconds regardless of value size because there is no serialization, no network transfer, and no deserialization. The value is a pointer dereference in local memory. For large-value, high-frequency workloads, this is not an optimization. It is a category change.

Stop paying the linear tax on large values. 31ns reads at any payload size.

Install Cachee Large-Value Caching Guide