Hybrid Memory Tiering | RAM + NVMe for 100x Larger Working Sets

Q: What is hybrid memory tiering in caching?

Hybrid memory tiering is a cache storage hierarchy that uses multiple tiers of storage with different speed and cost characteristics. Hot keys stay in RAM at 1.5 microsecond latency. Warm keys are demoted to NVMe SSDs at 10-50 microsecond latency. Cold keys fall through to Redis or your database. This mirrors how CPU L1/L2/L3 caches work, applied to your data layer, enabling 100x larger working sets without compromising speed for your most accessed keys.

Q: How does NVMe compare to RAM and Redis for cache latency?

RAM delivers cache reads at approximately 1.5 microseconds. NVMe SSDs deliver random reads at 10-50 microseconds, which is 50-250x faster than a network round-trip to Redis at 1-5 milliseconds. NVMe fills a massive latency gap between in-process RAM and remote Redis that no other caching system exploits. It is 100x cheaper per GB than RAM while still delivering sub-millisecond reads.

Q: How does the eviction engine decide what goes on NVMe vs RAM?

Cachee's Cachee-FLU eviction engine already tracks access frequency and recency for every key. When RAM is full, instead of evicting a key entirely to Redis, the engine demotes it to the NVMe tier. Keys that are accessed occasionally (warm) stay on NVMe at 10-50 microsecond latency. Keys that are accessed frequently (hot) are promoted back to RAM. Keys that are rarely accessed (cold) are eventually evicted from NVMe to the L2 Redis tier. The promotion and demotion decisions are automatic and transparent to the application.

Q: How much cost savings does hybrid tiering provide?

For a 100GB working set, hybrid tiering can reduce monthly costs by 80-90%. Instead of keeping all 100GB in RAM at $500-1000 per month, you keep 5GB in RAM (hot keys), 30GB on NVMe (warm keys), and 65GB in Redis (cold keys) for approximately $58-118 per month total. NVMe storage costs $0.05-0.10 per GB per month compared to $5-10 per GB for RAM, making it 100x cheaper while still being 50x faster than network-based Redis.

Q: Does hybrid tiering require application code changes?

No. Hybrid tiering is completely transparent to the application. Your GET and SET commands work exactly the same way. The tiering engine handles promotion, demotion, and eviction automatically behind the same cache interface. The only observable difference is that some cache hits will return at NVMe latency (10-50 microseconds) instead of RAM latency (1.5 microseconds), but both are sub-millisecond and invisible to end users.

The Problem

Working Sets Don't Fit in RAM Anymore

Your data is growing faster than your RAM budget. The gap between RAM speed and network speed is massive, and nothing fills it. Until now.

💰

RAM Is Expensive at Scale

10GB of RAM costs $50–100/month. 100GB costs $500–1,000/month. 1TB is impractical for most teams. Working sets larger than available RAM force you to miss to Redis at 1–5ms. Every miss is a network round-trip that your users feel.

1TB RAM = $5,000–10,000/month

📊

The 95/5 Rule

95% of your reads hit 5% of your keys. The other 95% of keys are accessed occasionally but still need sub-millisecond response. Currently, they get 1–5ms Redis round-trips. These warm keys are too expensive to keep in RAM and too important to let miss to the network.

95% of keys need a faster home than Redis

⚡

NVMe: 100x Cheaper, 50x Faster

NVMe SSDs deliver 10–50µs random reads. That is 100x cheaper per GB than RAM and 50–250x faster than a network round-trip to Redis. A massive middle ground between 1.5µs and 1–5ms that nobody in caching exploits. Until now.

NVMe fills the gap nobody else uses

The Hierarchy

The CPU Cache Hierarchy, Applied to Your Data Layer

CPU caches have used L1/L2/L3 hierarchies for decades. Fast and small at the top, slow and large at the bottom. Your data layer should work the same way.

Cachee Memory Hierarchy

FutureL0: Zero-Copy Shared Memory

Direct memory mapping between processes. No copy, no serialization.

<1µs

HotL1: RAM (Cachee-FLU)

Current Cachee in-process cache. DashMap with Cachee-FLU eviction. Your hottest 5% of keys.

1.5µs

Warm — NEWL1.5: NVMe SSD

Hybrid tiering. Memory-mapped NVMe with io_uring async I/O. Your warm 30% of keys.

10–50µs

ColdL2: Redis / ElastiCache

Network round-trip to shared remote cache. Your cold 65% of keys.

1–5ms

OriginL3: Database

Source of truth. Disk I/O plus query execution. The fallback you want to avoid.

5–50ms

Design Principle

Fast & small at the top. Slow & large at the bottom.

Exactly how CPU L1/L2/L3 caches work. Your data layer should be no different.

How It Works

Cachee-FLU Already Knows What's Hot

Cachee's Cachee-FLU eviction engine already tracks access frequency and recency for every key. Hybrid tiering gives it a new option: instead of evicting to Redis, demote to NVMe.

Demotion: RAM → NVMe

When RAM is full and a new key arrives, the eviction engine identifies the least valuable key in RAM. Instead of dropping it entirely (forcing a Redis miss on next access), the engine writes it to NVMe asynchronously. The key remains accessible at 10–50µs instead of 1–5ms.

Demotion is non-blocking. The write to NVMe happens via io_uring in the background. The hot path — serving the new key that triggered the eviction — is never delayed.

Promotion: NVMe → RAM

When a key is accessed on NVMe, the engine tracks the hit. After a configurable promotion threshold (default: 3 accesses), the key is promoted back to RAM. A key that was warm becomes hot again, and the hierarchy adapts.

The eviction engine learns which keys are "warm" (accessed occasionally) versus "hot" (accessed constantly). Hot stays in RAM. Warm moves to NVMe. Cold evicts from NVMe to L2 or misses entirely.

# Read path (transparent to application)
GET user:123:profile

# Internal resolution order:
#   1. Check RAM (DashMap)         → HIT at 1.5µs     → return
#   2. Check NVMe (io_uring read)  → HIT at 10-50µs   → return + async promote
#   3. Check Redis L2              → HIT at 1-5ms      → return + write to RAM
#   4. Cache miss                  → fetch from origin

# Write path (always goes to RAM first)
SET user:123:profile <value>
#   Writes to RAM. If RAM is full, least-valuable key demoted to NVMe async.
    

Architecture

Pluggable Storage Backend

The tiering abstraction is designed as a pluggable backend. RAM, NVMe, and future storage technologies all implement the same interface.

trait StorageBackend {
    fn get(&self, key: &[u8]) -> Option<Vec<u8>>;
    fn put(&self, key: &[u8], value: &[u8]);
    fn delete(&self, key: &[u8]);
    fn capacity(&self) -> usize;
}

// Current backends
struct RamBackend;      // DashMap — current Cachee L1 (unchanged)
struct NvmeBackend;     // Memory-mapped NVMe + io_uring async I/O (NEW)

// Future backends
struct CxlBackend;      // CXL-attached memory (~300ns, byte-addressable)
struct OptaneBackend;   // Intel Optane persistent memory
struct CloudBackend;    // EBS io2, Azure Ultra Disk, etc.
    

💻

RAM Backend (Current)

DashMap with Cachee-FLU eviction. Zero changes to the existing Cachee L1 path. Sub-2µs reads. The hot tier you already know.

1.5µs reads — unchanged

💾

NVMe Backend (New)

Memory-mapped file on NVMe with io_uring for zero-copy async I/O. LRU eviction within the tier. 10–50µs random reads at 100x lower cost per GB than RAM.

10–50µs reads — new tier

🔮

Future Backends

CXL-attached memory at 300ns latency. Intel Optane persistent memory. Cloud-specific storage (EBS io2, Azure Ultra Disk). Same interface, new tiers as hardware evolves.

Pluggable — add new tiers without code changes

Cost

80–90% Cost Reduction at Scale

NVMe is 100x cheaper than RAM per GB. Hybrid tiering lets you keep the same effective working set at a fraction of the cost.

Tier	Cost/GB/Month	Latency	Recommended Size
RAM (L1)	$5–10	1.5µs	Hot 5% of keys
NVMe (L1.5)	$0.05–0.10	10–50µs	Warm 30% of keys
Redis (L2)	$0.50–1.00	1–5ms	Cold 65% of keys

Example: 100GB Working Set

Tier	Size	Monthly Cost	Latency	Key Distribution
RAM (L1)	5GB	$25–50	1.5µs	Hot keys
NVMe (L1.5)	30GB	$1.50–3	20µs avg	Warm keys
Redis (L2)	65GB	$32–65	2ms avg	Cold keys

Hybrid Tiering

$58–118/mo

<50µs P99 for 35% of keys

All-RAM (Traditional)

$500–1,000/mo

Or constant Redis misses for 95% of keys

80–90% cost reduction with sub-50µs P99 latency for 35% of your working set that would otherwise hit Redis at 1–5ms.

Use Cases

Who Benefits

Any workload with a working set larger than available RAM and power-law access patterns.

🛒

Large Catalog E-Commerce

Millions of products, but 5% drive 80% of traffic. Hot products in RAM, long-tail catalog on NVMe at 20µs instead of 2ms from Redis.

🧠

Recommendation Engines

Millions of embeddings with power-law access. Popular item vectors in RAM, the full embedding table on NVMe. No more OOM kills on model updates.

📱

IoT Device State

Millions of device states, but recently active devices are hot. Active devices in RAM, dormant devices on NVMe. Wake-up reads at 30µs, not 3ms.

🔍

Enterprise Search

Millions of document indices, but trending topics drive most queries. Trending in RAM, the full index on NVMe. Sub-millisecond for every query, not just popular ones.

The question isn't whether you can fit your working set in RAM.
It's whether you need to.

FAQ

Frequently Asked Questions

What is hybrid memory tiering in caching?

Hybrid memory tiering is a cache storage hierarchy that uses multiple tiers of storage with different speed and cost characteristics. Hot keys stay in RAM at 1.5µs latency. Warm keys are demoted to NVMe SSDs at 10–50µs latency. Cold keys fall through to Redis or your database. This mirrors how CPU L1/L2/L3 caches work, applied to your data layer, enabling 100x larger working sets without compromising speed for your most accessed keys.

How does NVMe compare to RAM and Redis for cache latency?

RAM delivers cache reads at approximately 1.5 microseconds. NVMe SSDs deliver random reads at 10–50 microseconds, which is 50–250x faster than a network round-trip to Redis at 1–5 milliseconds. NVMe fills a massive latency gap between in-process RAM and remote Redis that no other caching system exploits. It is 100x cheaper per GB than RAM while still delivering sub-millisecond reads.

How does the eviction engine decide what goes on NVMe vs RAM?

Cachee's Cachee-FLU eviction engine already tracks access frequency and recency for every key. When RAM is full, instead of evicting a key entirely to Redis, the engine demotes it to the NVMe tier. Keys that are accessed occasionally (warm) stay on NVMe at 10–50 microsecond latency. Keys that are accessed frequently (hot) are promoted back to RAM. Keys that are rarely accessed (cold) are eventually evicted from NVMe to the L2 Redis tier. The promotion and demotion decisions are automatic and transparent to the application.

How much cost savings does hybrid tiering provide?

For a 100GB working set, hybrid tiering can reduce monthly costs by 80–90%. Instead of keeping all 100GB in RAM at $500–1,000/month, you keep 5GB in RAM (hot keys), 30GB on NVMe (warm keys), and 65GB in Redis (cold keys) for approximately $58–118/month total. NVMe storage costs $0.05–0.10 per GB/month compared to $5–10 per GB for RAM, making it 100x cheaper while still being 50x faster than network-based Redis.

Does hybrid tiering require application code changes?

No. Hybrid tiering is completely transparent to the application. Your GET and SET commands work exactly the same way. The tiering engine handles promotion, demotion, and eviction automatically behind the same cache interface. The only observable difference is that some cache hits will return at NVMe latency (10–50µs) instead of RAM latency (1.5µs), but both are sub-millisecond and invisible to end users.

10GB in RAM. 1TB on NVMe.
Same Cache Interface.

Working Sets Don't Fit in RAM Anymore

The CPU Cache Hierarchy, Applied to Your Data Layer

Cachee-FLU Already Knows What's Hot

Demotion: RAM → NVMe

Promotion: NVMe → RAM

Pluggable Storage Backend

80–90% Cost Reduction at Scale

Example: 100GB Working Set

Who Benefits

Frequently Asked Questions

What is hybrid memory tiering in caching?

How does NVMe compare to RAM and Redis for cache latency?

How does the eviction engine decide what goes on NVMe vs RAM?

How much cost savings does hybrid tiering provide?

Does hybrid tiering require application code changes?

Stop Choosing Between Speed and Scale.
Get Both.

10GB in RAM. 1TB on NVMe.Same Cache Interface.

Working Sets Don't Fit in RAM Anymore

The CPU Cache Hierarchy, Applied to Your Data Layer

Cachee-FLU Already Knows What's Hot

Demotion: RAM → NVMe

Promotion: NVMe → RAM

Pluggable Storage Backend

80–90% Cost Reduction at Scale

Example: 100GB Working Set

Who Benefits

Frequently Asked Questions

What is hybrid memory tiering in caching?

How does NVMe compare to RAM and Redis for cache latency?

How does the eviction engine decide what goes on NVMe vs RAM?

How much cost savings does hybrid tiering provide?

Does hybrid tiering require application code changes?

Stop Choosing Between Speed and Scale.Get Both.

10GB in RAM. 1TB on NVMe.
Same Cache Interface.

Stop Choosing Between Speed and Scale.
Get Both.