Case Study Updated — Mar 7, 2026

Cachee Native Engine: Live AWS Production Results

Native cache server with fast lane middleware, pre-compression, and horizontal scaling. Production numbers from Graviton4 c8g.16xlarge with ElastiCache Redis. v4.3 cluster build.

1.5µs
L1 GET Latency
3.1x faster than v3.0
99%+
L1 Hit Rate
99.05% measured in production
660K+
ops/sec (GET)
3x vs v3.0 engine
0.0015ms
HTTP Response (L1)
216x faster than Redis P99

v3.0 vs v4.3 — Production Measurements

Metric v3.0 (Feb 2026) v4.3 (Mar 2026) Improvement
L1 GET Latency 4.65µs 1.5µs 3.1x faster
P99 Latency ~16µs 3.7µs 4.3x faster
HTTP Response (L1) 14.5µs (0.0145ms) 1.5µs (0.0015ms) 9.7x faster
GET Throughput 215K ops/sec 660K+ ops/sec ~3x higher
P99 vs Redis 38x faster 216x faster 5.7x better claim
Horizontal Scaling Single instance Pub/sub coherent cluster Virtually unlimited
L1 Hit Rate 100% (warm set) 99.05% (production) Real production measurement
Cache Engine v3.0 L1 Cache (NAPI-RS) Native Cachee Engine + DashMap + pre-compress Native binary, zero runtime overhead
v4.3: 3.1x latency improvement + horizontal scaling
The fast lane bypasses all middleware (compression, CORS, security headers) for GET /cache/:key. Combined with pre-compression at write time and inline auth, server-side response drops from 14.5µs to 1.5µs.

Production Architecture

Two-Tier Cache Stack
L1 — Native Cachee Engine
DashMap + Pre-Compression + Fast Lane
10M key capacity, ~1.5µs GET, inline auth, pre-compressed br/gzip
L2 — ElastiCache Redis 7.1
cache.r7g.12xlarge — 317GB RAM
48 vCPU, sub-1ms latency, circuit breaker protected
Compute
Graviton4 c8g.16xlarge — 64 vCPU
Docker container, us-east-1, ARM64
Dashboard
cachee.ai/admin/dashboard.html
Live metrics, click-through detail view, 10s auto-refresh

Engine Evolution: v1.0 → v2.0 → v3.0 → v4.3

Metric v1.0 (Redis Proxy) v2.0 (JS L1) v3.0 (NAPI L1) v4.3 (Native Engine)
L1 Hit Latency N/A (no L1) 0.0085ms 0.0145ms / 4.65µs 0.0015ms / 1.5µs
P99 Latency N/A ~30µs ~16µs 3.7µs
L2 Hit Latency 0.55ms 0.55ms 0.55ms 0.55ms (same Redis)
L1 Hit Rate 0% (no L1) 85% 100% (warm set) 99.05% (production)
GET Throughput N/A ~100K ops/s 215K ops/s 660K+ ops/s
Horizontal Scaling N/A N/A Single instance Pub/sub cluster
Engine None (pass-through) JS L1 Cache (Node.js) NAPI L1 Cache Native Cachee Engine + DashMap

Capacity Planning — Redis Memory Analysis

ElastiCache r7g.12xlarge — Memory Breakdown
Total Allocated76.10 MB
Startup Overhead (Redis engine)9.17 MB
Allocator Fragmentation66.86 MB
Client Buffers (42 connections)0.04 MB
Actual Session Data0 bytes (all keys expired via TTL)
Keys in Redis0 (DBSIZE confirmed)
Fragmentation Ratio1.23 (normal jemalloc)
76MB ≠ Session Data
The 76.33MB reported is 100% Redis engine overhead + jemalloc fragmentation. Zero session keys exist. At 3.5KB/session on 253GB usable (80% of 317GB): ~72 million sessions capacity — original projection holds.

Live Test Timeline — Feb 12, 2026

02:41 UTC
Cachee v3.0 server started via PM2 — native engine initialized, L2 Redis connected (76MB)
02:52 UTC
Dashboard reporter enabled — auto-sending metrics to cachee.ai every 10s
02:53 UTC
Test burst: 5 ops (2 SET, 3 GET) — L1 hit rate 100%, 0.017ms avg latency
02:59 UTC
Production traffic burst: 1,007 ops, 503 keys loaded, 100% L1 hit rate maintained
02:59 UTC
Reporter confirmed: 1,002 ops pushed to dashboard (213 + 787 + 2 batches)
03:16 UTC
H33 card visible on cachee.ai/admin/dashboard.html — live, connected, auto-refreshing

Key Findings

What Worked

  • Native L1 engine: 1.8x faster raw GET vs JS
  • 100% L1 hit rate on warm working set
  • Zero errors across all test traffic
  • NAPI-RS FFI: zero-copy, no serialization overhead
  • Dashboard reporter: seamless metrics pipeline
  • Redis capacity math validated (3.5KB/session holds)

v4.3 Key Optimizations

  • Fast lane middleware — bypasses compression, CORS, security headers
  • Inline auth — constant-time API key check in hot path (~1µs)
  • Pre-compression — Brotli + gzip stored at write time
  • Pub/sub cache coherence — Redis channel for cross-instance invalidation
  • ETag support — 304 Not Modified for conditional requests
  • Request deduplication — concurrent identical GETs coalesced