Speculative Pre-Fetch Spec

Overview

Most cache misses are predictable. If key A is always accessed before key B, and key B is not in the cache, you already know B will be needed next. Speculative Pre-Fetch exploits this predictability by maintaining a co-occurrence model of key access patterns and proactively loading predicted keys on cache misses.

The PrefetchEngine records access sequences in a ring buffer, builds a co-occurrence matrix (DashMap<String, HashMap<String, u32>>), and uses it to predict which keys will be accessed next. When a cache miss occurs, the engine calls predict_next to identify likely follow-up keys and loads them in the background via the configured source.

When to Use

Enable speculative pre-fetch when your workload has predictable access sequences: user → session → permissions, product-list → product-detail → reviews, or any pattern where keys are accessed in a consistent order. The model learns these patterns automatically from your traffic.

Co-Occurrence Model

Rust
struct PrefetchEngine {
    // Co-occurrence: key A → { key B: count, key C: count, ... }
    model: DashMap<String, HashMap<String, u32>>,

    // Ring buffer of recent accesses (fixed-size, wraps around)
    access_log: RingBuffer<String>,

    // Maximum number of tracked keys in the model
    max_keys: usize,

    // Accuracy tracking
    metrics: PrefetchMetrics,
}
        

Building the Model

On every GET (hit or miss), the engine appends the key to the ring buffer and updates co-occurrence counts. If key A was accessed within the last N accesses before key B, the model increments model[A][B]. The ring buffer window (default: 8 keys) controls how far back the model looks for co-occurrences.

Example
# Access sequence: user:42, session:42, perms:42, dashboard:42
# After training, the model learns:
#   user:42    → { session:42: 47, perms:42: 45, dashboard:42: 43 }
#   session:42 → { perms:42: 46, dashboard:42: 44 }
#   perms:42   → { dashboard:42: 45 }
        

predict_next

Given a key, return the top-N most likely next keys (by co-occurrence count).

Rust
fn predict_next(&self, key: &str, top_n: usize) -> Vec<(String, u32)> {
    match self.model.get(key) {
        Some(cooccurrences) => {
            let mut pairs: Vec<_> = cooccurrences.iter().collect();
            pairs.sort_by(|a, b| b.1.cmp(&a.1));
            pairs.into_iter().take(top_n)
                .map(|(k, &v)| (k.clone(), v)).collect()
        }
        None => vec![]
    }
}
        

On-Miss Pre-Fetch

When a cache miss occurs, the PrefetchEngine runs predict_next for the missed key and checks which predicted keys are also not in the cache. For each predicted-but-missing key, it initiates a background fetch from the configured source.

Cache miss for key A: Normal miss processing (fetch from source, return to client).
Predict: Call predict_next(A, top_3) → returns [B, C, D] with co-occurrence counts.
Filter: Check which of [B, C, D] are already in the cache. Suppose B is present; C and D are missing.
Pre-fetch: Asynchronously load C and D from the source into the cache.
Result: When the application requests C or D next, they are already in the cache — a hit instead of a miss.

RESP
# Query what the model would predict for a key
PREFETCH_PREDICT "user:42"
# → [
#   {"key": "session:42", "count": 47, "in_cache": true},
#   {"key": "perms:42", "count": 45, "in_cache": false},
#   {"key": "dashboard:42", "count": 43, "in_cache": false}
# ]
        

Decay & Model Bounding

Without decay, the co-occurrence model would grow unbounded and reflect patterns from weeks or months ago. The engine uses two mechanisms to keep the model current and bounded.

Exponential Decay

Every 60 seconds (configurable), the engine halves all co-occurrence counts. This causes old patterns to fade and recent patterns to dominate. After 5 decay cycles (~5 minutes), an old pattern's count is reduced to 3% of its original value.

Decay Example
# t=0:   model["user:42"]["session:42"] = 100
# t=60s:  model["user:42"]["session:42"] = 50  (halved)
# t=120s: model["user:42"]["session:42"] = 25  (halved)
# t=180s: model["user:42"]["session:42"] = 12  (halved)
# t=300s: model["user:42"]["session:42"] = 3   (~3% of original)
        

Bounded Model (100K Keys)

The model tracks at most prefetch.max_keys (default: 100,000) distinct keys. When the limit is reached, the engine evicts the key with the lowest total co-occurrence count before inserting a new one. This keeps memory bounded regardless of key cardinality.

High-Cardinality Keys

If your workload has millions of unique keys (e.g., per-user keys), the 100K model limit means only the most active keys are tracked. Increase prefetch.max_keys if you need broader coverage, but be aware of memory overhead: each tracked key consumes approximately 200–400 bytes in the model (key string + HashMap of co-occurrences).

Configuration

Parameter	Default	Description
`prefetch.enabled`	false	Enable the speculative pre-fetch engine
`prefetch.max_keys`	100000	Maximum number of keys tracked in the co-occurrence model
`prefetch.window_size`	8	Ring buffer look-back window for co-occurrence counting
`prefetch.top_n`	3	Number of keys to predict and potentially pre-fetch per miss
`prefetch.decay_interval_s`	60	How often to halve co-occurrence counts (seconds)
`prefetch.min_count`	3	Minimum co-occurrence count to trigger a pre-fetch (noise filter)

Accuracy Metrics

RESP
PREFETCH_STATS
# → {
#   "predictions_made": 84200,
#   "prefetches_triggered": 52100,
#   "prefetch_hits": 41680,     // prefetched key was actually accessed
#   "prefetch_wasted": 10420,   // prefetched but never accessed (evicted)
#   "accuracy_pct": 80.0,       // prefetch_hits / prefetches_triggered
#   "model_keys": 87432,
#   "ring_buffer_size": 8
# }
        

Metric	Description	Target
`accuracy_pct`	Percentage of pre-fetched keys that were subsequently accessed	>70% = good
`prefetch_wasted`	Pre-fetched keys that expired or were evicted before being read	<30% = acceptable
`miss_rate_reduction`	Percentage reduction in cache misses due to pre-fetching	>15% = meaningful impact

Tuning for Accuracy

If accuracy is below 50%, your workload may not have predictable access patterns. Increase prefetch.min_count to only pre-fetch keys with strong co-occurrence signals, or reduce prefetch.top_n to be more selective. If accuracy is above 80%, consider increasing top_n to capture more predictable keys.

Speculative Pre-Fetch:
Predict the Next Cache Miss

Overview

Co-Occurrence Model

Building the Model

predict_next

On-Miss Pre-Fetch

Decay & Model Bounding

Exponential Decay

Bounded Model (100K Keys)

Configuration

Accuracy Metrics

Also Read

Turn Future Misses
Into Present Hits.

Overview

Co-Occurrence Model

Building the Model

predict_next

On-Miss Pre-Fetch

Decay & Model Bounding

Exponential Decay

Bounded Model (100K Keys)

Configuration

Accuracy Metrics

Also Read

Turn Future MissesInto Present Hits.

Turn Future Misses
Into Present Hits.