Does MVCC change the Cachee API?

No. MVCC is transparent to the client. GET, SET, HGET, HSET, and all other commands work identically. The only new commands are CONFIG SET mvcc.enabled, CONFIG SET mvcc.max_versions, and CONFIG SET mvcc.gc_interval_us for enabling and tuning the feature. Existing applications require zero code changes.

MVCC | Zero-Contention Reads Under Concurrent Writes

Q: Is MVCC configurable?

Yes. Three configuration parameters control MVCC behavior: mvcc.enabled (true/false) turns the feature on or off, mvcc.max_versions (default: 2) controls how many versions are retained per key before garbage collection, and mvcc.gc_interval_us (default: 100) sets how often the background GC thread scans for reclaimable versions in microseconds. These can be changed at runtime via CONFIG SET.

The Problem

Per-Shard Locking Has a Ceiling

DashMap is fast. But at extreme concurrency with mixed read-write workloads, sharded locks still create measurable jitter. The problem is not speed — it is determinism.

🔒

Per-Shard Locking Has a Ceiling

DashMap shards reduce contention but same-shard read/write collisions still happen at extreme concurrency. At 96 workers doing tight FHE/NTT loops, even microseconds of lock contention compound. The probability of a same-shard collision scales quadratically with worker count.

96 workers = statistically frequent collisions

⚡

HFT Can't Tolerate Jitter

A 2µs P99 spike from lock contention is unacceptable when your tick-to-trade budget is 10µs. Deterministic latency requires zero contention, not low contention. One bad tail-latency event per thousand reads can cost real money when each read is a trading decision.

10µs tick-to-trade budget leaves no room for jitter

📈

Write-Heavy Workloads Suffer Most

Feature stores, position updates, and real-time pricing with 30-50% write ratios see contention spikes that pure-read benchmarks don't reveal. The benchmarks that show sub-microsecond reads are 100% read workloads. Add 30% writes and the P99 story changes completely.

30% writes = contention spikes invisible in read-only benchmarks

How It Works

Multi-Version Concurrency Control for the Cache Engine

Each write creates a new version of the value. Readers see a consistent snapshot at their read timestamp. Old versions are garbage-collected after all active readers complete. No locks on the read path. Writes are serialized per-key via atomic version counters — not per-shard.

Version Chain — Per-Key Snapshots

Current (v3)

price:AAPL

epoch=1042, $187.50

→

Previous (v2)

price:AAPL

epoch=1041, $187.48

→

Expired (v1)

price:AAPL

epoch=1040 — GC eligible

Read Path

Acquire epoch → Find version ≤ epoch → Return value

ZERO locks. The read path is completely lock-free — not “mostly lock-free” like DashMap.

Lock-Free Read Path

The read path is completely lock-free. A reader acquires the current global epoch (a single atomic load), then traverses the version chain to find the most recent version whose epoch is less than or equal to the reader's. No mutex, no read-write lock, no compare-and-swap retry loop. The reader never waits on any writer.

DashMap is mostly lock-free for reads — until a concurrent write to the same shard acquires the write lock. MVCC removes the “mostly”. Reads are unconditionally non-blocking regardless of concurrent write activity.

Per-Key Write Serialization

Writes create a new version and swap the head pointer via atomic CAS (compare-and-swap). Serialization is per-key, not per-shard. Two writers updating different keys in the same shard proceed in parallel with zero coordination. This is a fundamental improvement over shard-level write locks.

Write latency increases by approximately 0.001ms (the cost of allocating a new version struct and performing the atomic swap). For workloads where write latency is not the bottleneck — which is most of them — this is invisible.

Technical Detail

Version Chains and Epoch-Based GC

Each key maintains a version chain from newest to oldest. Old versions are garbage-collected when all active readers have advanced past them.

# Version struct per key (24 bytes overhead per version)
struct Version {
    value:     Bytes,     // the cached value
    timestamp: u64,       // write timestamp
    epoch:     u64,       // global epoch at write time
}

# Chain: [v3 (current)] → [v2] → [v1 (expired)]
# GC reclaims v1 when all active readers have epoch > v1.epoch
    

Version Retention

Configurable via mvcc.max_versions. Default: 2 versions per key. Higher values allow readers with older snapshots to continue operating, at the cost of memory.

Default: 2 versions — configurable at runtime

Garbage Collection

Epoch-based GC runs on a background thread every 100µs (configurable). When all active readers have epoch greater than a version's epoch, that version is eligible for reclamation. GC is non-blocking — it never pauses reads or writes.

Background GC every 100µs — zero reader impact

Memory Overhead

Keys	Versions per Key	Version Overhead
1M	2	48 MB
10M	2	480 MB
10M	4	960 MB
100M	2	4.8 GB

For 10M keys with 2 versions each, overhead is ~480MB. This is the cost of zero read contention. For workloads where microsecond-level P99 determinism matters, the tradeoff is overwhelmingly positive.

Before / After

Measured Impact at 96-Worker Scale

96 workers, Graviton4 (c8g.metal-48xl), 30% write ratio. The P50 is unchanged. The P99 drops by 55%.

Metric	Without MVCC (DashMap)	With MVCC
Read Latency (P50)	~0.0015ms	~0.0015ms (unchanged)
Read Latency (P99, 30% writes)	~4µs (shard contention)	~1.8µs (zero contention)
Write Latency	0.013ms	0.014ms (+0.001ms version creation)
P99 Jitter Reduction	—	55%
Read-Path Locks	Per-shard RwLock	Zero (completely lock-free)

Use Cases

Who Benefits Most

MVCC is not for every workload. It is for workloads where microsecond-level P99 determinism is a requirement, not a nice-to-have.

💹

HFT / Algorithmic Trading

Tick-to-trade latency budgets of 10µs leave no room for lock contention jitter. Position lookups, order book state, and risk limits must be readable with deterministic latency while concurrent price updates stream in. A 2µs P99 spike can cost more than the infrastructure.

Deterministic latency = deterministic P&L

🧠

ML Feature Stores

Streaming features produce 30-50% write ratios as models ingest real-time signals. Feature lookups during inference must not block on concurrent feature writes. MVCC guarantees that inference reads always see a consistent snapshot, even during heavy feature ingestion.

High write ratio + low-latency reads = MVCC territory

💰

Real-Time Pricing Engines

Continuous price updates from multiple market feeds write to the cache while downstream services read prices for order validation, display, and risk calculation. Without MVCC, a price write can block a price read on the same shard. With MVCC, the read always succeeds instantly with the most recent consistent snapshot.

Continuous writes + concurrent reads at scale

📱

IoT Device State

Millions of devices writing state (temperature, location, status) while fleet management queries read aggregate and per-device state. The write volume is enormous, the read latency requirement is strict, and the devices never stop. MVCC lets queries run without waiting for state updates.

Millions of writers, zero reader blocking

Composition

Composes With Everything

MVCC is not a standalone feature. It is a concurrency layer that composes with every other Cachee primitive to eliminate contention at every level.

🌐

MVCC + Coherence

Coherence handles cross-instance consistency. MVCC handles within-instance concurrency. Together, they give you zero-contention reads that are also guaranteed consistent across every instance in the cluster. A write propagated via coherence creates a new version on each instance without blocking any local reader.

Cross-instance consistency + zero local contention

📜

MVCC + Cache Contracts

Cache Contracts define freshness guarantees and automatic refresh behavior. When a contract-driven refresh writes a new value, MVCC ensures that concurrent reads see the previous consistent version until the write completes. No reader ever sees a partially-written or in-flight value.

Consistent snapshot reads during contract refreshes

🔮

MVCC + Speculative Pre-Fetch

Speculative pre-fetch writes predicted keys to L1 ahead of demand. These writes happen concurrently with normal read traffic. Without MVCC, pre-fetch writes to a hot shard can momentarily block reads. With MVCC, pre-fetched values are written as new versions without any read-path impact.

Pre-fetch writes never block concurrent reads

Low contention is not zero contention.
For workloads where microseconds are P&L, the difference matters.

FAQ

Frequently Asked Questions

What is MVCC in caching?

MVCC (Multi-Version Concurrency Control) is a concurrency technique where each write creates a new version of a value instead of overwriting it in place. Readers see a consistent snapshot at their read timestamp and never block on concurrent writes. This eliminates read-write contention entirely, which is critical for high-concurrency workloads like HFT, ML feature stores, and IoT state management.

Does MVCC add memory overhead?

Yes. Each additional version of a key adds approximately 24 bytes of overhead (pointer, timestamp, epoch). With the default configuration of 2 versions per key, 10 million keys add roughly 480 MB of version overhead. Old versions are garbage-collected automatically via epoch-based GC once all active readers have advanced past them. The overhead is configurable via mvcc.max_versions.

Does it change the API?

No. MVCC is transparent to the client. GET, SET, HGET, HSET, and all other commands work identically. The only new commands are CONFIG SET mvcc.enabled true, CONFIG SET mvcc.max_versions 2, and CONFIG SET mvcc.gc_interval_us 100 for enabling and tuning the feature. Existing applications require zero code changes.

How does it differ from DashMap's sharded locking?

DashMap divides keys into shards, each protected by a read-write lock. Reads within the same shard as a concurrent write must wait for the write lock to release. At high worker counts (64-96+), same-shard collisions become statistically frequent and add 1-4 microseconds of P99 jitter. MVCC eliminates this entirely: readers acquire an epoch number (a single atomic load) and read from the version chain without any lock. The read path is completely lock-free, not just “mostly lock-free.”

Is MVCC configurable?

Yes. Three configuration parameters control MVCC behavior: mvcc.enabled (true/false) turns the feature on or off, mvcc.max_versions (default: 2) controls how many versions are retained per key before garbage collection, and mvcc.gc_interval_us (default: 100) sets how often the background GC thread scans for reclaimable versions. All three can be changed at runtime via CONFIG SET.

Readers Never Block Writers.
Writers Never Block Readers.

Per-Shard Locking Has a Ceiling

Multi-Version Concurrency Control for the Cache Engine

Lock-Free Read Path

Per-Key Write Serialization

Version Chains and Epoch-Based GC

Memory Overhead

Measured Impact at 96-Worker Scale

Who Benefits Most

Composes With Everything

Frequently Asked Questions

What is MVCC in caching?

Does MVCC add memory overhead?

Does it change the API?

How does it differ from DashMap's sharded locking?

Is MVCC configurable?

Stop Tolerating Lock Contention.
Enable MVCC. Ship Deterministic Latency.

Readers Never Block Writers.Writers Never Block Readers.

Per-Shard Locking Has a Ceiling

Multi-Version Concurrency Control for the Cache Engine

Lock-Free Read Path

Per-Key Write Serialization

Version Chains and Epoch-Based GC

Memory Overhead

Measured Impact at 96-Worker Scale

Who Benefits Most

Composes With Everything

Frequently Asked Questions

What is MVCC in caching?

Does MVCC add memory overhead?

Does it change the API?

How does it differ from DashMap's sharded locking?

Is MVCC configurable?

Stop Tolerating Lock Contention.Enable MVCC. Ship Deterministic Latency.

Readers Never Block Writers.
Writers Never Block Readers.

Stop Tolerating Lock Contention.
Enable MVCC. Ship Deterministic Latency.