Overview
MVCC is an optional engine-level feature that replaces DashMap's per-shard read-write locking with a version-chain architecture for cache values. When enabled, each write to a key creates a new version of the value instead of overwriting in place. Readers acquire a global epoch and read from the version chain without any lock. Old versions are reclaimed by a background garbage collector when no active reader can see them.
The result is zero read-write contention. At 96 workers on Graviton4 (c8g.metal-48xl) with a 30% write ratio, P99 read latency drops from ~4µs (DashMap shard contention) to ~1.8µs (MVCC, zero contention) — a 55% jitter reduction.
MVCC is recommended when P99 read latency under concurrent writes is a measured problem. For read-only or read-heavy workloads (<5% writes), DashMap shard contention is negligible and MVCC adds memory overhead without measurable benefit.
Architecture
MVCC introduces three structural changes to the cache engine: version chains per key, epoch-based snapshot reads, and per-key write serialization via atomic CAS.
Version Chain
Each key maintains a singly-linked chain of versions, ordered newest to oldest:
Each version is a fixed-size struct:
Read Path
The read path executes three steps with zero locks:
- Acquire epoch: Load the current global epoch via
AtomicU64::load(Ordering::Acquire). This is a single CPU instruction on ARM — no fence, no lock, no CAS. - Find version: Traverse the version chain from head, returning the first version whose
epoch ≤ reader_epoch. This guarantees the reader sees a consistent snapshot as of the moment it started. - Return value: Clone the value bytes and return. The reader never modifies the version chain.
The read path has no mutex, no read-write lock, no compare-and-swap retry loop. It is unconditionally non-blocking regardless of concurrent write activity on the same key, same shard, or any other key. This is the fundamental difference from DashMap, where a concurrent write to the same shard acquires an exclusive lock that blocks readers.
Write Path
The write path creates a new version and swaps the head pointer:
- Allocate version: Create a new
Versionstruct with the new value, current timestamp, and current global epoch. - Atomic swap: Set
new_version.next = current_head, thenCAS(head, current_head, new_version). On CAS failure (another writer raced on the same key), retry with the updated head. - Increment epoch:
global_epoch.fetch_add(1, Ordering::Release). This makes the new version visible to all subsequent readers.
Write serialization is per-key, not per-shard. Two writers updating different keys — even keys that hash to the same DashMap shard — proceed in parallel with zero coordination. This is a strict improvement over shard-level write locks.
Configuration
MVCC is controlled by three runtime-configurable parameters. No restart required.
| Parameter | Default | Description |
|---|---|---|
mvcc.enabled |
false | Enable or disable MVCC. When disabled, the engine uses standard DashMap shard locking. |
mvcc.max_versions |
2 | Maximum number of versions retained per key. Higher values allow readers with older snapshots to continue but increase memory usage. |
mvcc.gc_interval_us |
100 | How often the background GC thread scans version chains for reclaimable versions, in microseconds. |
Setting mvcc.enabled false at runtime triggers a GC pass that collapses all version chains to single versions. This is non-blocking but may take several GC cycles to complete. During the transition, reads continue to see consistent snapshots.
Memory Overhead
Each additional version of a key adds 24 bytes of structural overhead (timestamp, epoch, next pointer). The value payload is stored separately and is not counted in version overhead. The table below summarizes total version overhead at common scales.
| Keys | Versions per Key | Version Overhead |
|---|---|---|
| 1M | 2 | 48 MB |
| 10M | 2 | 480 MB |
| 10M | 4 | 960 MB |
| 100M | 2 | 4.8 GB |
For capacity planning, add the version overhead to your existing memory footprint. At 10M keys with 2 versions, the 480 MB overhead is the cost of zero read contention. For workloads where microsecond-level P99 determinism translates to revenue, this is an efficient tradeoff.
Performance Impact
Measured on c8g.metal-48xl (192 vCPUs, Graviton4), 96 workers, 70% reads / 30% writes.
| Metric | Without MVCC | With MVCC |
|---|---|---|
| Read latency (P50) | 0.0015ms | 0.0015ms (unchanged) |
| Read latency (P99, 96 workers, 30% writes) | ~4µs | ~1.8µs |
| Write latency | 0.013ms | 0.014ms (+0.001ms) |
| Memory per key (additional) | 0 bytes | 24 bytes per version |
| P99 jitter reduction | — | 55% |
The P50 is unchanged because the common case — reads that do not collide with concurrent writes on the same shard — was already fast. MVCC eliminates the uncommon-but-critical case where a read arrives during a concurrent write to the same shard. Write latency increases by ~0.001ms (the cost of allocating a version struct and performing one atomic CAS).
Garbage Collection
MVCC uses epoch-based garbage collection to reclaim old versions. The GC is non-blocking and runs on a dedicated background thread.
How It Works
- Epoch tracking: Each read operation captures the current global epoch at start. The GC maintains a registry of all active readers and their epochs.
- Minimum epoch: The GC computes the minimum epoch across all active readers. Any version with an epoch less than this minimum is invisible to all active readers.
- Reclamation: The GC scans version chains and removes versions that are both (a) below the minimum active epoch and (b) beyond the
max_versionsretention limit. Reclaimed memory is returned to the allocator.
GC Characteristics
- Non-blocking: The GC thread never acquires any lock that could block a reader or writer. It operates on version chain tails that are guaranteed unreachable by active readers.
- Latency: Under sustained load, versions are GC'd within 100–500µs of becoming unreachable (depending on
gc_interval_usand active reader lifetime). - Pressure: At very high write rates (>1M writes/sec per key), version chains can grow faster than GC reclaims them. The
max_versionslimit provides a hard cap: once the chain reachesmax_versions, the oldest version is dropped immediately on the next write, regardless of active readers.
If active readers hold epochs for extended periods (e.g., long-running transactions or slow consumers), versions cannot be reclaimed and memory will grow. Monitor the mvcc.versions_pending_gc metric. If it grows continuously, reduce reader hold times or increase gc_interval_us.
API Changes
MVCC is transparent to the client. No existing commands change behavior.
- GET / SET / DEL: Work identically. GET returns the most recent version visible at the reader's epoch. SET creates a new version. DEL marks the key as deleted at the current epoch.
- HGET / HSET / HDEL: Work identically. Hash field operations create per-field versions.
- MGET / MSET: Work identically. Each key in the batch is versioned independently.
- New commands: Only
CONFIG SET mvcc.*for enabling and tuning MVCC. No new data commands.
Existing client libraries, SDKs, and application code require zero changes. MVCC is an engine-internal optimization that is invisible at the protocol level.
Limitations
MVCC is not a universal improvement. It adds overhead that is only justified when P99 jitter under concurrent writes is a measured problem.
- Memory overhead: 24 bytes per version per key. At 100M keys with 2 versions, this is 4.8 GB of version overhead alone. Size your instances accordingly.
- Not needed for read-heavy workloads: With <5% writes, DashMap shard contention is negligible. MVCC adds memory overhead for no measurable latency improvement.
- GC pressure at extreme write rates: Workloads exceeding 1M writes/sec to individual hot keys can outpace GC. The
max_versionscap prevents unbounded memory growth, but readers with very old epochs may see version-not-found errors if their version was forcibly reclaimed. - Write latency increase: Each write incurs an additional ~0.001ms for version allocation and atomic CAS. For write-latency-critical workloads, measure the impact before enabling.
Measure your P99 read latency under your actual workload (read/write ratio, worker count, key distribution). If P99 is materially higher than P50 under mixed workloads, MVCC will improve it. If P99 is already close to P50, DashMap's sharded locking is sufficient and MVCC adds unnecessary overhead.