Not everything fits in RAM. Hybrid tiering keeps your hottest keys at 1.5µs in RAM while warm keys live on NVMe at 10–50µs — still 50–250x faster than a network round-trip to Redis. The eviction engine decides what goes where.
Your data is growing faster than your RAM budget. The gap between RAM speed and network speed is massive, and nothing fills it. Until now.
CPU caches have used L1/L2/L3 hierarchies for decades. Fast and small at the top, slow and large at the bottom. Your data layer should work the same way.
Cachee's W-TinyLFU eviction engine already tracks access frequency and recency for every key. Hybrid tiering gives it a new option: instead of evicting to Redis, demote to NVMe.
When RAM is full and a new key arrives, the eviction engine identifies the least valuable key in RAM. Instead of dropping it entirely (forcing a Redis miss on next access), the engine writes it to NVMe asynchronously. The key remains accessible at 10–50µs instead of 1–5ms.
Demotion is non-blocking. The write to NVMe happens via io_uring in the background. The hot path — serving the new key that triggered the eviction — is never delayed.
When a key is accessed on NVMe, the engine tracks the hit. After a configurable promotion threshold (default: 3 accesses), the key is promoted back to RAM. A key that was warm becomes hot again, and the hierarchy adapts.
The eviction engine learns which keys are "warm" (accessed occasionally) versus "hot" (accessed constantly). Hot stays in RAM. Warm moves to NVMe. Cold evicts from NVMe to L2 or misses entirely.
The tiering abstraction is designed as a pluggable backend. RAM, NVMe, and future storage technologies all implement the same interface.
NVMe is 100x cheaper than RAM per GB. Hybrid tiering lets you keep the same effective working set at a fraction of the cost.
| Tier | Cost/GB/Month | Latency | Recommended Size |
|---|---|---|---|
| RAM (L1) | $5–10 | 1.5µs | Hot 5% of keys |
| NVMe (L1.5) | $0.05–0.10 | 10–50µs | Warm 30% of keys |
| Redis (L2) | $0.50–1.00 | 1–5ms | Cold 65% of keys |
| Tier | Size | Monthly Cost | Latency | Key Distribution |
|---|---|---|---|---|
| RAM (L1) | 5GB | $25–50 | 1.5µs | Hot keys |
| NVMe (L1.5) | 30GB | $1.50–3 | 20µs avg | Warm keys |
| Redis (L2) | 65GB | $32–65 | 2ms avg | Cold keys |
80–90% cost reduction with sub-50µs P99 latency for 35% of your working set that would otherwise hit Redis at 1–5ms.
Any workload with a working set larger than available RAM and power-law access patterns.
Hybrid memory tiering is a cache storage hierarchy that uses multiple tiers of storage with different speed and cost characteristics. Hot keys stay in RAM at 1.5µs latency. Warm keys are demoted to NVMe SSDs at 10–50µs latency. Cold keys fall through to Redis or your database. This mirrors how CPU L1/L2/L3 caches work, applied to your data layer, enabling 100x larger working sets without compromising speed for your most accessed keys.
RAM delivers cache reads at approximately 1.5 microseconds. NVMe SSDs deliver random reads at 10–50 microseconds, which is 50–250x faster than a network round-trip to Redis at 1–5 milliseconds. NVMe fills a massive latency gap between in-process RAM and remote Redis that no other caching system exploits. It is 100x cheaper per GB than RAM while still delivering sub-millisecond reads.
Cachee's W-TinyLFU eviction engine already tracks access frequency and recency for every key. When RAM is full, instead of evicting a key entirely to Redis, the engine demotes it to the NVMe tier. Keys that are accessed occasionally (warm) stay on NVMe at 10–50 microsecond latency. Keys that are accessed frequently (hot) are promoted back to RAM. Keys that are rarely accessed (cold) are eventually evicted from NVMe to the L2 Redis tier. The promotion and demotion decisions are automatic and transparent to the application.
For a 100GB working set, hybrid tiering can reduce monthly costs by 80–90%. Instead of keeping all 100GB in RAM at $500–1,000/month, you keep 5GB in RAM (hot keys), 30GB on NVMe (warm keys), and 65GB in Redis (cold keys) for approximately $58–118/month total. NVMe storage costs $0.05–0.10 per GB/month compared to $5–10 per GB for RAM, making it 100x cheaper while still being 50x faster than network-based Redis.
No. Hybrid tiering is completely transparent to the application. Your GET and SET commands work exactly the same way. The tiering engine handles promotion, demotion, and eviction automatically behind the same cache interface. The only observable difference is that some cache hits will return at NVMe latency (10–50µs) instead of RAM latency (1.5µs), but both are sub-millisecond and invisible to end users.
Hybrid memory tiering. 1.5µs for hot keys. 10–50µs for warm keys. 100x larger working sets at 80–90% lower cost. Same cache interface.