← Back to Blog

Cache Serialization Cost: Why Large Values Lag

April 18, 2026 | 12 min read | Engineering

You added caching to speed up your application. It worked for session tokens and feature flags. Then you cached a 100 KB API response and the latency barely improved. You assumed the cache was cold, warmed it up, and measured again. Still slow. The value is in Redis. Redis is fast. Why is your cached read taking over a millisecond?

Because the cache read is not just a read. It is six distinct operations, and at least three of them scale linearly with the size of the value. Serialization -- the process of converting data structures to byte streams and back -- is the hidden tax on every cache operation. For small values, the tax is invisible. For large values, it dominates your latency budget.

This article dissects every step of a Redis GET, benchmarks each step in isolation, compares serialization formats, and explains why in-process caching eliminates the problem entirely.

The Six Steps Inside Every Redis GET

When your application calls redis.get("user:profile:12345"), the following operations execute in sequence. Each one has a cost. For a 64-byte session token, the total cost is dominated by the network round trip. For a 100 KB API response, the cost profile changes dramatically.

Step 1: Client Serializes the Key to RESP

The Redis Serialization Protocol (RESP) is a text-based protocol. Your client library converts the GET command and key into RESP wire format. For a simple key like "user:profile:12345", this produces:

*2\r\n$3\r\nGET\r\n$20\r\nuser:profile:12345\r\n

This step is fast and constant-size regardless of value size because you are only serializing the key and command. Cost: 0.1-0.5 microseconds. This step is not the problem.

Step 2: TCP Send

The serialized RESP command is sent over a TCP connection to the Redis server. If connection pooling is configured (it should be), this reuses an existing connection. The command is small (typically under 100 bytes), so it fits in a single TCP segment. Cost: variable, dominated by kernel syscall overhead (~1-5 microseconds for the write() syscall) plus network latency (50-300 microseconds same-AZ, 500-2000 microseconds cross-AZ). For this analysis, we will use 150 microseconds as a representative same-AZ network latency.

Step 3: Redis Finds the Value

Redis receives the command, parses it, and performs a hash table lookup. Redis uses a chained hash table with incremental rehashing. The lookup is O(1) amortized. For a database with 1 million keys, the lookup takes 0.5-2 microseconds, including hash computation and pointer chasing. This step does not depend on value size. Redis finds the pointer to the value; it does not touch the value data yet.

Step 4: Redis Serializes the Value to RESP

This is where value size starts to matter. Redis must convert the stored value into RESP wire format for transmission. For a bulk string response, RESP produces:

$[length]\r\n[data bytes]\r\n

The length prefix is trivial. The data bytes are a memcpy from Redis's internal SDS (Simple Dynamic String) buffer into the output buffer. For a 64-byte value, this memcpy is negligible -- a few cache lines, a few nanoseconds. For a 100 KB value, Redis must copy 100,000 bytes into the output buffer. On modern hardware, memcpy throughput is approximately 10-20 GB/s, so 100 KB copies in about 5-10 microseconds.

But the real cost is not the memcpy itself. It is the interaction with Redis's event loop. Redis is single-threaded. While it is writing 100 KB into the output buffer and flushing it to the TCP socket, it is not processing any other commands. For a 1 MB value, the serialization and write phase blocks the event loop for 50-100 microseconds. Every other client waiting for a response from this Redis instance is stalled for that duration.

Step 5: TCP Receive

The RESP-encoded response travels back over the network. For a 64-byte value, this is a single TCP segment. For a 100 KB value, this is approximately 70 TCP segments (at a typical MSS of 1,460 bytes). The kernel must reassemble these segments, copy them from kernel buffers to userspace, and deliver them to the client. Cost: network latency (150 microseconds) plus transfer time. Transfer time for 100 KB at 10 Gbps: ~80 microseconds. At 1 Gbps: ~800 microseconds. This step scales linearly with value size.

Step 6: Client Deserializes the RESP Response

The client library receives the raw RESP bytes and must parse them back into a usable data structure. For a simple string value, this is a buffer slice -- nearly free. But most applications do not store raw strings. They store serialized objects: JSON, MessagePack, Protobuf, or application-specific binary formats. The client library first parses the RESP envelope (trivial), then the application code deserializes the payload from its chosen format into an in-memory object.

This is where the real serialization cost lives. Deserializing 100 KB of JSON involves parsing every byte, allocating strings, building hash maps for objects, and converting numbers from text to native types. This is CPU-intensive work that scales linearly -- and sometimes super-linearly -- with value size.

6
Steps per Redis GET
3
Scale with value size
0
Steps in-process

Benchmarking Serialization Formats at Scale

We benchmarked four common serialization formats at multiple value sizes. The benchmark measures only the serialization and deserialization time -- no network, no Redis, no I/O. Pure CPU cost of converting an in-memory data structure (a nested user profile object with arrays, strings, integers, and nested objects) to bytes and back.

Format1 KB Serialize1 KB Deserialize10 KB Ser10 KB Deser100 KB Ser100 KB Deser
JSON2.1 us3.8 us18 us35 us175 us340 us
MessagePack1.4 us2.1 us12 us19 us115 us185 us
Protobuf0.8 us1.2 us7 us11 us68 us108 us
RESP (raw bytes)0.3 us0.4 us2.5 us3.2 us24 us31 us

At 1 KB, serialization cost is barely measurable against network latency. JSON round-trip (serialize + deserialize) costs 5.9 microseconds. The network costs 300+ microseconds. Serialization is 2% of total latency. Nobody notices.

At 100 KB, the picture inverts. JSON round-trip costs 515 microseconds. The network costs 300-400 microseconds. Serialization is now 55-60% of total latency. You have spent more CPU time converting data to and from bytes than you spent moving those bytes across the network.

The JSON Tax at 100 KB

A 100 KB JSON value round-trips through serialize + network + deserialize in approximately 915 microseconds (175us serialize + 300us network + 100us RESP overhead + 340us deserialize). Of that, 515 microseconds -- 56% -- is pure serialization work. Switching from JSON to Protobuf reduces the serialization portion to 176 microseconds, but that is still 30% of total latency. The only way to eliminate this cost entirely is to not serialize at all.

Why Deserialization Is More Expensive Than Serialization

In every format, deserialization is more expensive than serialization. The reasons are structural:

For a 100 KB JSON payload with many string fields, deserialization allocates dozens of heap objects and processes thousands of escape-eligible characters. The cost is not just CPU cycles -- it is cache pollution. The deserializer touches memory in an unpredictable pattern (allocating objects scattered across the heap), which thrashes L1/L2 CPU caches and slows down subsequent application code.

The Full Cost Breakdown: Redis GET of a 100 KB Value

Putting all six steps together for a 100 KB value stored as JSON, accessed from the same availability zone:

StepOperationCost% of Total
1Client serializes key to RESP0.5 us<0.1%
2TCP send (command)155 us16.3%
3Redis hash lookup1.5 us0.2%
4Redis serializes value to RESP + write35 us3.7%
5TCP receive (100 KB payload)230 us24.2%
6Client deserializes JSON340 us35.8%
--Application-level JSON parse175 us18.4%
Total937 us

The two serialization steps (Redis RESP serialization + client JSON deserialization) plus the application-level JSON parsing account for 550 microseconds -- 58.7% of total latency. The actual network transfer is 385 microseconds. The hash lookup that represents the "cache" part of the operation is 1.5 microseconds.

Your 100 KB cached value takes 937 microseconds to retrieve. Of that, less than 0.2% is the cache doing its job (finding the value). The remaining 99.8% is overhead: moving data across the network and converting it between formats.

In-Process: Zero Serialization Because There Is Nothing to Serialize

An in-process cache stores values in the same address space as your application. When you call cache.get("user:profile:12345"), the cache performs a hash lookup (the same 1.5 microsecond operation Redis performs) and returns a pointer to the value. There is no Step 1 (no RESP command to construct). No Step 2 (no TCP send). No Step 4 (no RESP serialization). No Step 5 (no TCP receive). No Step 6 (no deserialization).

The value is already in the application's memory. It is already an in-memory data structure -- the same struct, the same object, the same bytes that the application works with. Accessing it is a pointer dereference, not a data transformation. The cost is the hash lookup itself: 31 nanoseconds.

There is no serialization because there is nothing to convert. The cache entry IS the application object. Asking "how long does it take to serialize?" is like asking "how long does it take to convert a variable into itself?" The answer is zero. Not "very fast." Zero.

31ns
In-process GET (any size)
937us
Redis GET (100 KB JSON)
30,226x
Latency difference

When Serialization Cost Matters and When It Does Not

Not every application caches 100 KB values. Not every application is latency-sensitive. Here is a practical guide for when serialization overhead is a real problem versus when you can ignore it.

Serialization Does Not Matter When:

Serialization Matters When:

Choosing a Serialization Format (If You Must Serialize)

If you cannot move to in-process caching (multi-instance deployments that need shared cache state), choosing the right serialization format reduces but does not eliminate the overhead.

FormatProsConsBest For
JSONHuman-readable, universal support, debuggableSlowest, largest wire size, no schemaDevelopment, debugging, small values
MessagePackBinary JSON, 1.5-2x faster than JSON, compactStill schemaless, requires libraryDrop-in JSON replacement
ProtobufSchema-enforced, 2-3x faster, smallest wire sizeRequires .proto files, code generation, versioning disciplineProduction large-value caching
FlatBuffers/Cap'n ProtoZero-copy access, no deserialization stepComplex API, alignment requirements, not human-debuggableExtreme latency sensitivity
Raw bytes (RESP only)No application serialization, direct buffer accessNo structure, application must interpret bytesBinary blobs, pre-rendered content

FlatBuffers and Cap'n Proto deserve special mention. These formats store data in a wire-compatible format that can be accessed directly without deserialization. You read fields from the buffer in place, following offsets instead of parsing. This eliminates Step 6 (client deserialization) at the cost of a more complex programming model. However, you still pay for Steps 2-5 (network transfer). The serialization savings are real but the network cost remains.

The Architecture Decision: Eliminate Steps, Not Optimize Them

There are two approaches to the serialization problem:

  1. Optimize serialization. Switch from JSON to Protobuf. Use connection pooling. Enable pipelining. Use Unix domain sockets instead of TCP. Compress large values. Each optimization reduces a step's cost by some percentage.
  2. Eliminate serialization. Move the cache in-process. Remove Steps 1-6 entirely. The cost goes from 937 microseconds to 31 nanoseconds. No optimization of the individual steps can compete with removing them.

Option 1 is incremental improvement. Switching from JSON to Protobuf reduces serialization cost from 515 microseconds to 176 microseconds at 100 KB. That is a 2.9x improvement. Impressive in isolation. But total latency goes from 937 microseconds to 598 microseconds. You have optimized one component of a six-step pipeline. The other five steps still execute.

Option 2 is architectural elimination. Total latency goes from 937 microseconds to 31 nanoseconds. That is a 30,226x improvement. Not because in-process hash tables are magically faster, but because five of the six steps no longer exist. You do not optimize what you remove.

Zero Serialization with Cachee

Cachee stores values in-process as native memory. A GET is a DashMap lookup and a pointer dereference. No RESP encoding. No TCP transfer. No JSON parsing. No Protobuf decoding. The value is already the data structure your application uses. 31 nanoseconds at any value size. The 100 KB API response, the 17 KB PQ signature, the 200 KB rendered report -- all accessed at the same cost as a 64-byte session token.

Practical Steps

1. Measure Your Serialization Overhead

Before changing anything, measure. Instrument your Redis client to log the time spent in serialization and deserialization separately from network time. Most client libraries have hooks or middleware for this. You may be surprised -- teams often assume "Redis is slow" when the actual bottleneck is their JSON library.

# Python example: measure serialization cost
import time, json, redis

r = redis.Redis()
raw = r.get("large:key")       # Returns bytes (RESP already parsed)

t0 = time.perf_counter_ns()
obj = json.loads(raw)           # This is your deserialization cost
t1 = time.perf_counter_ns()

print(f"Deserialization: {(t1-t0)/1000:.1f} us for {len(raw)} bytes")

2. Profile Value Size Distribution

Run redis-cli --bigkeys or use MEMORY USAGE key to understand the size distribution of your cached values. If 90% of your keys are under 1 KB, serialization is not your problem. If 10% of your keys are over 10 KB and those keys are accessed frequently, those are your candidates for in-process caching.

3. Move Hot Large Values In-Process

You do not need to replace Redis entirely. Identify the large values that are accessed most frequently and move them to an in-process L0 tier. Keep Redis as L1 for shared state and less-frequently-accessed data. The hot path avoids serialization entirely. The warm path still uses Redis but handles only the values where serialization cost is tolerable.

# Install Cachee
brew tap h33ai-postquantum/tap
brew install cachee

# Start with RESP compatibility
cachee init
cachee start

# Point hot-path reads at localhost:6380 (Cachee)
# Point warm-path reads at your Redis cluster
# No serialization on the hot path. Zero.

4. If You Must Use Network Cache: Switch to Protobuf

If architectural constraints prevent in-process caching (shared state across multiple instances, for example), at minimum switch from JSON to Protobuf for values over 10 KB. The 2.9x serialization speedup is free performance. The schema enforcement also catches bugs at compile time that JSON catches at runtime (or never).

Conclusion

Serialization is invisible at small value sizes and dominant at large ones. The crossover happens around 10 KB, where serialization begins to consume more time than the network transfer itself. At 100 KB, serialization is the majority of your cache latency. At 1 MB, it is overwhelming.

The fix is not faster serialization. The fix is no serialization. An in-process cache eliminates the concept of format conversion from the read path. The value does not travel across a network. It does not transform between representations. It exists once, in memory, and your application reads it directly. That is why large values lag in network caches and why they do not in Cachee.

Stop serializing. Start reading from memory. 31ns for any value size.

Install Cachee View Benchmarks