How It Works Pricing Benchmarks
vs Redis Docs Resources Blog
Start Free Trial
New Capability

10GB in RAM. 1TB on NVMe.
Same Cache Interface.

Not everything fits in RAM. Hybrid tiering keeps your hottest keys at 1.5µs in RAM while warm keys live on NVMe at 10–50µs — still 50–250x faster than a network round-trip to Redis. The eviction engine decides what goes where.

1.5µs
RAM Tier
10–50µs
NVMe Tier
100x
Larger Working Sets
Transparent
To Your App
The Problem

Working Sets Don't Fit in RAM Anymore

Your data is growing faster than your RAM budget. The gap between RAM speed and network speed is massive, and nothing fills it. Until now.

💰
RAM Is Expensive at Scale
10GB of RAM costs $50–100/month. 100GB costs $500–1,000/month. 1TB is impractical for most teams. Working sets larger than available RAM force you to miss to Redis at 1–5ms. Every miss is a network round-trip that your users feel.
1TB RAM = $5,000–10,000/month
📊
The 95/5 Rule
95% of your reads hit 5% of your keys. The other 95% of keys are accessed occasionally but still need sub-millisecond response. Currently, they get 1–5ms Redis round-trips. These warm keys are too expensive to keep in RAM and too important to let miss to the network.
95% of keys need a faster home than Redis
NVMe: 100x Cheaper, 50x Faster
NVMe SSDs deliver 10–50µs random reads. That is 100x cheaper per GB than RAM and 50–250x faster than a network round-trip to Redis. A massive middle ground between 1.5µs and 1–5ms that nobody in caching exploits. Until now.
NVMe fills the gap nobody else uses
The Hierarchy

The CPU Cache Hierarchy, Applied to Your Data Layer

CPU caches have used L1/L2/L3 hierarchies for decades. Fast and small at the top, slow and large at the bottom. Your data layer should work the same way.

Cachee Memory Hierarchy
FutureL0: Zero-Copy Shared Memory
Direct memory mapping between processes. No copy, no serialization.
<1µs
HotL1: RAM (W-TinyLFU)
Current Cachee in-process cache. DashMap with W-TinyLFU eviction. Your hottest 5% of keys.
1.5µs
Warm — NEWL1.5: NVMe SSD
Hybrid tiering. Memory-mapped NVMe with io_uring async I/O. Your warm 30% of keys.
10–50µs
ColdL2: Redis / ElastiCache
Network round-trip to shared remote cache. Your cold 65% of keys.
1–5ms
OriginL3: Database
Source of truth. Disk I/O plus query execution. The fallback you want to avoid.
5–50ms
Design Principle
Fast & small at the top. Slow & large at the bottom.
Exactly how CPU L1/L2/L3 caches work. Your data layer should be no different.
How It Works

W-TinyLFU Already Knows What's Hot

Cachee's W-TinyLFU eviction engine already tracks access frequency and recency for every key. Hybrid tiering gives it a new option: instead of evicting to Redis, demote to NVMe.

Demotion: RAM → NVMe

When RAM is full and a new key arrives, the eviction engine identifies the least valuable key in RAM. Instead of dropping it entirely (forcing a Redis miss on next access), the engine writes it to NVMe asynchronously. The key remains accessible at 10–50µs instead of 1–5ms.

Demotion is non-blocking. The write to NVMe happens via io_uring in the background. The hot path — serving the new key that triggered the eviction — is never delayed.

Promotion: NVMe → RAM

When a key is accessed on NVMe, the engine tracks the hit. After a configurable promotion threshold (default: 3 accesses), the key is promoted back to RAM. A key that was warm becomes hot again, and the hierarchy adapts.

The eviction engine learns which keys are "warm" (accessed occasionally) versus "hot" (accessed constantly). Hot stays in RAM. Warm moves to NVMe. Cold evicts from NVMe to L2 or misses entirely.

# Read path (transparent to application) GET user:123:profile # Internal resolution order: # 1. Check RAM (DashMap) → HIT at 1.5µs → return # 2. Check NVMe (io_uring read) → HIT at 10-50µs → return + async promote # 3. Check Redis L2 → HIT at 1-5ms → return + write to RAM # 4. Cache miss → fetch from origin # Write path (always goes to RAM first) SET user:123:profile <value> # Writes to RAM. If RAM is full, least-valuable key demoted to NVMe async.
Architecture

Pluggable Storage Backend

The tiering abstraction is designed as a pluggable backend. RAM, NVMe, and future storage technologies all implement the same interface.

trait StorageBackend { fn get(&self, key: &[u8]) -> Option<Vec<u8>>; fn put(&self, key: &[u8], value: &[u8]); fn delete(&self, key: &[u8]); fn capacity(&self) -> usize; } // Current backends struct RamBackend; // DashMap — current Cachee L1 (unchanged) struct NvmeBackend; // Memory-mapped NVMe + io_uring async I/O (NEW) // Future backends struct CxlBackend; // CXL-attached memory (~300ns, byte-addressable) struct OptaneBackend; // Intel Optane persistent memory struct CloudBackend; // EBS io2, Azure Ultra Disk, etc.
💻
RAM Backend (Current)
DashMap with W-TinyLFU eviction. Zero changes to the existing Cachee L1 path. Sub-2µs reads. The hot tier you already know.
1.5µs reads — unchanged
💾
NVMe Backend (New)
Memory-mapped file on NVMe with io_uring for zero-copy async I/O. LRU eviction within the tier. 10–50µs random reads at 100x lower cost per GB than RAM.
10–50µs reads — new tier
🔮
Future Backends
CXL-attached memory at 300ns latency. Intel Optane persistent memory. Cloud-specific storage (EBS io2, Azure Ultra Disk). Same interface, new tiers as hardware evolves.
Pluggable — add new tiers without code changes
Cost

80–90% Cost Reduction at Scale

NVMe is 100x cheaper than RAM per GB. Hybrid tiering lets you keep the same effective working set at a fraction of the cost.

Tier Cost/GB/Month Latency Recommended Size
RAM (L1) $5–10 1.5µs Hot 5% of keys
NVMe (L1.5) $0.05–0.10 10–50µs Warm 30% of keys
Redis (L2) $0.50–1.00 1–5ms Cold 65% of keys

Example: 100GB Working Set

Tier Size Monthly Cost Latency Key Distribution
RAM (L1) 5GB $25–50 1.5µs Hot keys
NVMe (L1.5) 30GB $1.50–3 20µs avg Warm keys
Redis (L2) 65GB $32–65 2ms avg Cold keys
Hybrid Tiering
$58–118/mo
<50µs P99 for 35% of keys
All-RAM (Traditional)
$500–1,000/mo
Or constant Redis misses for 95% of keys

80–90% cost reduction with sub-50µs P99 latency for 35% of your working set that would otherwise hit Redis at 1–5ms.

Use Cases

Who Benefits

Any workload with a working set larger than available RAM and power-law access patterns.

🛒
Large Catalog E-Commerce
Millions of products, but 5% drive 80% of traffic. Hot products in RAM, long-tail catalog on NVMe at 20µs instead of 2ms from Redis.
🧠
Recommendation Engines
Millions of embeddings with power-law access. Popular item vectors in RAM, the full embedding table on NVMe. No more OOM kills on model updates.
📱
IoT Device State
Millions of device states, but recently active devices are hot. Active devices in RAM, dormant devices on NVMe. Wake-up reads at 30µs, not 3ms.
🔍
Enterprise Search
Millions of document indices, but trending topics drive most queries. Trending in RAM, the full index on NVMe. Sub-millisecond for every query, not just popular ones.
The question isn't whether you can fit your working set in RAM.
It's whether you need to.
FAQ

Frequently Asked Questions

What is hybrid memory tiering in caching?

Hybrid memory tiering is a cache storage hierarchy that uses multiple tiers of storage with different speed and cost characteristics. Hot keys stay in RAM at 1.5µs latency. Warm keys are demoted to NVMe SSDs at 10–50µs latency. Cold keys fall through to Redis or your database. This mirrors how CPU L1/L2/L3 caches work, applied to your data layer, enabling 100x larger working sets without compromising speed for your most accessed keys.

How does NVMe compare to RAM and Redis for cache latency?

RAM delivers cache reads at approximately 1.5 microseconds. NVMe SSDs deliver random reads at 10–50 microseconds, which is 50–250x faster than a network round-trip to Redis at 1–5 milliseconds. NVMe fills a massive latency gap between in-process RAM and remote Redis that no other caching system exploits. It is 100x cheaper per GB than RAM while still delivering sub-millisecond reads.

How does the eviction engine decide what goes on NVMe vs RAM?

Cachee's W-TinyLFU eviction engine already tracks access frequency and recency for every key. When RAM is full, instead of evicting a key entirely to Redis, the engine demotes it to the NVMe tier. Keys that are accessed occasionally (warm) stay on NVMe at 10–50 microsecond latency. Keys that are accessed frequently (hot) are promoted back to RAM. Keys that are rarely accessed (cold) are eventually evicted from NVMe to the L2 Redis tier. The promotion and demotion decisions are automatic and transparent to the application.

How much cost savings does hybrid tiering provide?

For a 100GB working set, hybrid tiering can reduce monthly costs by 80–90%. Instead of keeping all 100GB in RAM at $500–1,000/month, you keep 5GB in RAM (hot keys), 30GB on NVMe (warm keys), and 65GB in Redis (cold keys) for approximately $58–118/month total. NVMe storage costs $0.05–0.10 per GB/month compared to $5–10 per GB for RAM, making it 100x cheaper while still being 50x faster than network-based Redis.

Does hybrid tiering require application code changes?

No. Hybrid tiering is completely transparent to the application. Your GET and SET commands work exactly the same way. The tiering engine handles promotion, demotion, and eviction automatically behind the same cache interface. The only observable difference is that some cache hits will return at NVMe latency (10–50µs) instead of RAM latency (1.5µs), but both are sub-millisecond and invisible to end users.

Stop Choosing Between Speed and Scale.
Get Both.

Hybrid memory tiering. 1.5µs for hot keys. 10–50µs for warm keys. 100x larger working sets at 80–90% lower cost. Same cache interface.

Start Free Trial Schedule Demo