Every cache request travels through multiple layers before a response reaches your application. Each layer adds latency. Most engineering teams focus on the cache engine itself, but the engine is rarely the bottleneck. The overhead is in the infrastructure surrounding it.
Network Round-Trip: 0.5-3ms
The single largest contributor to cache latency is the network. When your application calls Redis, the request must traverse the TCP stack, cross a network boundary (even if it is a loopback interface on the same host), reach the Redis process, and return. On localhost, this takes 0.3-0.5ms. In a same-AZ deployment, expect 0.5-1ms. Cross-AZ adds 1-3ms. Cross-region introduces 10-80ms. No amount of Redis tuning can eliminate the network round-trip.
Serialization and Deserialization: 0.05-0.2ms
Before your data can travel over the network, it must be serialized into a wire format (typically RESP protocol for Redis). On the other end, the response must be deserialized back into your application's native data structures. For simple key-value pairs, this adds 50-100 microseconds. For complex JSON objects, nested structures, or large payloads, serialization alone can exceed 200 microseconds. This cost is paid on every single request, both directions.
Redis Single-Thread Queuing: 0.1-1ms
Redis processes commands on a single thread. Under moderate load (50-100K ops/sec), commands queue behind each other. At 100K ops/sec, each command waits an average of 10 microseconds. At peak loads approaching Redis's throughput ceiling, queuing delay spikes to 0.5-1ms. Even with Redis 6+ I/O threading for reads, the command execution remains single-threaded. Slow commands (KEYS, SORT, large MGET batches) block the entire pipeline, causing tail latency spikes that cascade across your application. See how this compares in our Redis latency reduction guide.