We Benchmarked Cachee Against Raw ElastiCache. Here's What Happened.

February 7, 2026 • 8 min read • Benchmarks

Everyone says their product is faster. We decided to prove it. Tonight we ran a head-to-head benchmark on the same machine, same network, same Redis cluster—the only variable being whether requests went through Cachee or straight to ElastiCache.

The results surprised even us.

1.76x

Faster GET throughput

Lower latency

21x

Faster cache hits

100%

Hit rate

The Setup

We wanted a fair fight. No tricks, no cherry-picked numbers. Here's exactly what we tested:

Instance: c7i.metal-48xl (192 vCPUs) — plenty of headroom so CPU isn't the bottleneck
Redis Backend: AWS ElastiCache r7g.16xlarge, Redis 7.1.0, 3 shards with replicas
Network: Same VPC, same availability zone — sub-millisecond network latency
Benchmark Tool: redis-benchmark (Redis's own tool, not ours)
Workload: 200,000 operations, 50 concurrent clients, 128-byte values

Two tests. Exact same parameters. One goes directly to ElastiCache on port 6379. The other goes through Cachee on port 6380.

Throughput Results

Test	Direct ElastiCache	Cachee Proxy	Result
SET (50 clients)	95,012 ops/s	156,986 ops/s	1.65x faster
GET (50 clients)	90,703 ops/s	159,363 ops/s	1.76x faster
SET (Unix socket)	95,012 ops/s	203,459 ops/s	2.14x faster
GET (Unix socket)	90,703 ops/s	196,078 ops/s	2.16x faster

Read that again: Cachee, sitting in front of ElastiCache as a proxy, is faster than talking to ElastiCache directly. In the same data center. On the same machine.

How Is a Proxy Faster Than Direct Access?

This seems impossible at first. A proxy adds a hop—shouldn't it be slower? Three things make it faster:

1. L1 Cache Hits: 16 Microseconds

When Cachee has a value in its local memory (Moka cache), it returns it in 16 microseconds. That's 16 millionths of a second. No network round-trip, no TCP overhead, no Redis protocol parsing on the server side. Just a hash table lookup in local memory.

Compare that to 339 microseconds for a cache miss that actually hits ElastiCache. That's a 21x difference on every single cache hit.

            The math: With a 100% hit rate (which we sustained across 6.28 million requests), every single operation gets the 16μs path instead of the 339μs path. That's why aggregate throughput is higher despite Cachee being a proxy.
        

2. Write-Behind: SETs Return Instantly

When your app sends a SET command, Cachee does two things:

Writes the value to the L1 cache (nanoseconds)
Queues the write for background flush to ElastiCache

Your app gets an OK response before the data even leaves the machine. Background workers batch these writes and pipeline them to ElastiCache in groups of 200, which is far more efficient than individual round-trips.

3. Connection Pooling with Cluster Routing

Cachee maintains 192 persistent connections across 3 shards (64 per shard). It routes commands to the correct shard using CRC16 slot calculation, avoiding MOVED redirects. Your app doesn't need cluster-aware client libraries—it just talks to Cachee like it's a single Redis instance.

Latency

Raw throughput is one story. Latency is where your users feel the difference.

Metric	Direct ElastiCache	Cachee
Average latency	0.80 ms	0.20 ms
Cache hit latency	N/A	16 μs
Cache miss latency	N/A	339 μs

Average latency dropped from 0.80ms to 0.20ms—4x lower. That 0.6ms savings might sound small, but at thousands of Redis calls per request, it compounds into meaningful page load improvements.

The Numbers Over Time

This wasn't a 10-second burst test. We ran sustained traffic and monitored the proxy's stats throughout the session:

Total requests served: 6,285,887
Cache hits: 6,285,885
Cache misses: 2
Hit rate: 100.00%
Cumulative latency saved: 1,355 seconds

Two misses. Out of 6.28 million requests. Those two misses were the initial cold-start lookups. Everything after that came from L1.

What About Pipelined Workloads?

In the interest of full transparency: when using Redis pipelining (batching 64 commands per round-trip), direct ElastiCache is faster. Pipeline-64 SET hit 2.7M ops/s direct vs 297K through Cachee.

Why this doesn't matter in practice: Pipeline-64 is a synthetic benchmark mode. Real applications send individual commands or small pipelines of 2-5 commands. In single-command mode—what 99% of production apps use—Cachee is 1.65-2.16x faster.

The Infrastructure

Cachee Edge Proxy is written in Rust using Tokio's async runtime. The full stack:

Your App (any Redis client)
    ↓
Cachee Edge Proxy (Rust + Tokio)
    ↓ 16μs L1 hit          ↓ 339μs miss
Moka Cache (10M entries)    ElastiCache (3 shards)
                            192 pooled connections
                            CRC16 slot routing

Integration takes under 5 minutes: change your Redis connection string from your ElastiCache endpoint to localhost:6380. That's it. No SDK, no code changes, no new dependencies.

What This Means

If you're running ElastiCache today, you're leaving performance on the table. Not because ElastiCache is slow—it's excellent hardware—but because every request still requires a network round-trip, TCP handshake overhead, and Redis server-side processing.

Cachee eliminates all of that for cached reads. And with a 100% hit rate on repeated keys, that's almost every read in a typical workload.

The best part? These results were measured in the best case for ElastiCache—same machine, same AZ, sub-millisecond network. Once your app is cross-AZ or cross-region, the advantage multiplies dramatically.

See the Full Results

Interactive benchmark page with charts, architecture diagrams, and infrastructure details.

View Benchmark Case Study →

Bonus: Cachee vs Standard Redis (Localhost)

We also ran a head-to-head against standard Redis 7.0.15 running on the same machine as Cachee—no network at all. This is the toughest possible test: can a proxy beat Redis when Redis is literally on localhost?

Test	Standard Redis	Cachee TCP	Cachee Unix	Best Result
SET (50 clients)	102,407 ops/s	128,123 ops/s	181,818 ops/s	1.78x faster
GET (50 clients)	110,803 ops/s	136,054 ops/s	186,047 ops/s	1.68x faster
SET (pipeline 16)	571,428 ops/s	694,444 ops/s	—	1.22x faster
GET (pipeline 16)	754,717 ops/s	1,250,000 ops/s	—	1.66x faster

Even against localhost Redis with zero network penalty, Cachee's L1 cache wins. The Unix socket path pushes it further—1.78x faster SETs and 1.68x faster GETs. And latency tells the same story: Cachee's p50 SET latency was 0.223ms vs Redis's 0.319ms (30% lower), and p50 GET was 0.191ms vs 0.279ms (32% lower).

            The takeaway: Cachee isn't just faster than ElastiCache because of network savings. The L1 cache architecture is fundamentally faster than Redis's own event loop for cached reads, even on localhost.
        

Reproduce It Yourself

We believe in verifiable claims. Here's the exact command we used:

# Direct ElastiCache
redis-benchmark -h your-elasticache-endpoint -p 6379 \
  -t get,set -n 200000 -c 50 -d 128 -q

# Through Cachee
redis-benchmark -h 127.0.0.1 -p 6380 \
  -t get,set -n 200000 -c 50 -d 128 -q

Run both on the same machine. Compare the numbers. We're confident in what you'll find.