Why Your Cache Is Slower
Than Your Compute
Redis adds 300us per call. Your compute takes 2us.
The cache is the bottleneck.
Your Cache Takes 150x Longer Than Your Compute
You added a cache to speed things up. It became the slowest part of your pipeline.
Latency waterfall: single cached read
You optimized your algorithm down to 2 microseconds. Then you wrapped it in a Redis call that takes 300 microseconds. The cache is not accelerating your application. It is dominating your latency budget.
Every Distributed Cache Shares the Same Flaw
Redis, Memcached, ElastiCache, DAX, Memorystore. Different names. Same architecture. Same bottleneck: the network round trip.
Every cache read follows the same path: serialize your key, open a TCP connection (or reuse one from the pool), send bytes over the network, wait for the cache server to process, receive the response, deserialize the value, return it to your application. This costs 200-500 microseconds minimum, regardless of how fast the cache server is internally.
| Cache Solution | Read Latency (64B) | Bottleneck | Why |
|---|---|---|---|
| Redis (same AZ) | ~310 us | TCP round trip | serialize + TCP + RESP parse + deserialize |
| ElastiCache (cross-AZ) | ~500+ us | Network hop | Same as Redis + AZ transit latency |
| DynamoDB DAX | ~200 us | SDK overhead | SDK serialization + TCP + item marshalling |
| GCP Memorystore | ~200 us | Network | Managed Redis = same TCP overhead |
The pattern is always the same:
// Every distributed cache does this:
serialize(key) // ~5 us
-> TCP send // ~50 us
-> server lookup // ~10 us
-> TCP receive // ~50 us
-> RESP parse // ~5 us
-> deserialize(value) // ~5-200 us (scales with payload)
// Total: 125-320 us MINIMUM
// And that's same-AZ, warm connection, no contention
In-Process L1: 31 Nanoseconds
Same address space. Zero network. Zero serialization. A hash lookup and a pointer dereference.
Redis vs Cachee: 64-byte value
Same key, same value, same hardware. The only difference is architecture.
// Network round trip on every call
let val = redis_client
.get("session:abc123") // 310 us
.await?;
// What actually happens:
// 1. Serialize key to RESP
// 2. TCP send to Redis server
// 3. Redis hashtable lookup
// 4. TCP receive response
// 5. Parse RESP protocol
// 6. Deserialize value
// Total: ~310 us
// Same address space. No network.
let val = cachee
.get("session:abc123"); // 31 ns
// What actually happens:
// 1. Hash the key
// 2. Pointer dereference
//
// That's it.
//
// Total: 31 ns
Cachee lives in your process. Your data is already in your address space. A read is a hash computation (key to bucket) and a pointer dereference (bucket to value). No syscalls. No context switches. No serialization. No TCP. No protocol parsing. The CPU never leaves your process. 31 nanoseconds is not a trick. It is what a hash lookup costs when you remove everything else.
Numbers That Scale With Payload
Redis latency grows with payload size because serialization and network transfer scale linearly. Cachee latency stays at 31ns because it stores references, not copies. The gap widens as payloads grow.
| Payload Size | Use Case | Redis Latency | Cachee Latency | Speedup |
|---|---|---|---|---|
| 64 B | Session token | 310 us | 31 ns | 10,000x |
| 1 KB | JWT / API response | 360 us | 31 ns | 11,613x |
| 4.5 KB | PQ session (ML-KEM + ML-DSA) | 520 us | 31 ns | 16,774x |
| 50 KB | SLH-DSA public key bundle | 1.42 ms | 31 ns | 45,806x |
| 1 MB | STARK proof / model weights | 12.5 ms | 31 ns | 403,226x |
Benchmarked on AWS Graviton4 c8g.metal-48xl, 192 vCPUs. Redis 7.2 on same instance (localhost). Cachee in-process DashMap. 1M iterations, p50 reported.
Redis vs ElastiCache vs DAX vs Cachee
Every dimension. One table.
| Feature | Redis / ElastiCache | DynamoDB DAX | Memcached | Cachee |
|---|---|---|---|---|
| Read latency (64B) | 310 us | 200 us | 250 us | 31 ns |
| Read latency (50KB) | 1.42 ms | ~800 us | ~1.1 ms | 31 ns |
| Serialization required | Yes | Yes | Yes | No |
| Network hop | Yes (TCP) | Yes (TCP) | Yes (UDP/TCP) | No (in-process) |
| Latency scales with payload | Yes (linear) | Yes | Yes | No (constant 31ns) |
| Separate infrastructure | Yes | Yes | Yes | No (library) |
| Post-quantum attestation | No | No | No | Yes (3 PQ families) |
| Eviction policy | LRU / LFU / random | TTL-based | LRU only | CacheeLFU |
| Cost at 1B ops/month | $500-2,000/mo | $800-3,000/mo | $300-1,500/mo | $5,000/mo* |
| What you're really paying for | Separate servers + network | AWS managed infra | Separate servers | PQ attestation per op |
* Cachee Core: $0.000005/op. 1B ops = $5,000/mo. Includes PQ attestation. Redis/ElastiCache pricing is infrastructure cost only with no attestation.
Run It Yourself
Install: brew tap h33ai-postquantum/tap && brew install cachee
Before and After
Redis is a database you use as a cache. Cachee is a cache that lives where your data lives. The distinction is architectural, and it is why the performance gap is 10,000x, not 10x.
Frequently Asked
Redis latency scales with payload size because every operation requires serialization, a network round trip (TCP or Unix socket), and deserialization. A 64-byte value takes ~310 microseconds. A 50KB value (such as an SLH-DSA public key) takes ~1.42 milliseconds. A 1MB value takes ~12.5 milliseconds. The serialization and network transfer costs dominate.
An in-process cache eliminates both costs. Reads are a pointer dereference at 31 nanoseconds regardless of payload size, because the data is already in your application's address space. No bytes cross a network boundary. No serialization occurs.
The single largest source of cache latency is the network round trip. Even with Redis on localhost, you pay ~100-300 microseconds per call for TCP overhead, serialization, and protocol parsing.
To eliminate this: use an in-process L1 cache that stores data in the same address space as your application. Cachee provides 31-nanosecond reads with zero network hops, zero serialization, and zero protocol overhead. For data that must be shared across processes, use a tiered architecture: L1 in-process (31ns) backed by L2 distributed (Redis/ElastiCache) for cache misses only.
An in-process cache stores cached data in the same memory space as your application, eliminating network round trips, serialization, and protocol overhead. Instead of sending a request over TCP to a separate cache server (Redis, Memcached), an in-process cache performs a hash lookup and pointer dereference -- completing in nanoseconds rather than microseconds.
Cachee is an in-process L1 cache that delivers 31-nanosecond reads with post-quantum attestation, CacheeLFU eviction, and optional L2 federation for distributed deployments.
Yes. 31 nanoseconds is a measured, reproducible benchmark on production hardware (AWS Graviton4, c8g.metal-48xl, 192 vCPUs). It represents a DashMap hash lookup plus pointer dereference -- no network, no serialization, no protocol parsing.
The number is consistent across payload sizes because the cache stores references, not copies. The lookup cost is the hash computation plus one pointer dereference. This is fundamentally different from Redis, which must traverse a network stack, parse RESP protocol, and deserialize the value on every call.
Run it yourself: brew tap h33ai-postquantum/tap && brew install cachee && cachee bench
Your cache is the bottleneck. Remove it.
31ns reads. Zero network. PQ-attested. Drop-in replacement.
Install Cachee Computation Caching