Overview

Most cache misses are predictable. If key A is always accessed before key B, and key B is not in the cache, you already know B will be needed next. Speculative Pre-Fetch exploits this predictability by maintaining a co-occurrence model of key access patterns and proactively loading predicted keys on cache misses.

The PrefetchEngine records access sequences in a ring buffer, builds a co-occurrence matrix (DashMap<String, HashMap<String, u32>>), and uses it to predict which keys will be accessed next. When a cache miss occurs, the engine calls predict_next to identify likely follow-up keys and loads them in the background via the configured source.

When to Use

Enable speculative pre-fetch when your workload has predictable access sequences: user → session → permissions, product-list → product-detail → reviews, or any pattern where keys are accessed in a consistent order. The model learns these patterns automatically from your traffic.

Co-Occurrence Model

Rust struct PrefetchEngine { // Co-occurrence: key A → { key B: count, key C: count, ... } model: DashMap<String, HashMap<String, u32>>, // Ring buffer of recent accesses (fixed-size, wraps around) access_log: RingBuffer<String>, // Maximum number of tracked keys in the model max_keys: usize, // Accuracy tracking metrics: PrefetchMetrics, }

Building the Model

On every GET (hit or miss), the engine appends the key to the ring buffer and updates co-occurrence counts. If key A was accessed within the last N accesses before key B, the model increments model[A][B]. The ring buffer window (default: 8 keys) controls how far back the model looks for co-occurrences.

Example # Access sequence: user:42, session:42, perms:42, dashboard:42 # After training, the model learns: # user:42 → { session:42: 47, perms:42: 45, dashboard:42: 43 } # session:42 → { perms:42: 46, dashboard:42: 44 } # perms:42 → { dashboard:42: 45 }

predict_next

Given a key, return the top-N most likely next keys (by co-occurrence count).

Rust fn predict_next(&self, key: &str, top_n: usize) -> Vec<(String, u32)> { match self.model.get(key) { Some(cooccurrences) => { let mut pairs: Vec<_> = cooccurrences.iter().collect(); pairs.sort_by(|a, b| b.1.cmp(&a.1)); pairs.into_iter().take(top_n) .map(|(k, &v)| (k.clone(), v)).collect() } None => vec![] } }

On-Miss Pre-Fetch

When a cache miss occurs, the PrefetchEngine runs predict_next for the missed key and checks which predicted keys are also not in the cache. For each predicted-but-missing key, it initiates a background fetch from the configured source.

  1. Cache miss for key A: Normal miss processing (fetch from source, return to client).
  2. Predict: Call predict_next(A, top_3) → returns [B, C, D] with co-occurrence counts.
  3. Filter: Check which of [B, C, D] are already in the cache. Suppose B is present; C and D are missing.
  4. Pre-fetch: Asynchronously load C and D from the source into the cache.
  5. Result: When the application requests C or D next, they are already in the cache — a hit instead of a miss.
RESP # Query what the model would predict for a key PREFETCH_PREDICT "user:42" # → [ # {"key": "session:42", "count": 47, "in_cache": true}, # {"key": "perms:42", "count": 45, "in_cache": false}, # {"key": "dashboard:42", "count": 43, "in_cache": false} # ]

Decay & Model Bounding

Without decay, the co-occurrence model would grow unbounded and reflect patterns from weeks or months ago. The engine uses two mechanisms to keep the model current and bounded.

Exponential Decay

Every 60 seconds (configurable), the engine halves all co-occurrence counts. This causes old patterns to fade and recent patterns to dominate. After 5 decay cycles (~5 minutes), an old pattern's count is reduced to 3% of its original value.

Decay Example # t=0: model["user:42"]["session:42"] = 100 # t=60s: model["user:42"]["session:42"] = 50 (halved) # t=120s: model["user:42"]["session:42"] = 25 (halved) # t=180s: model["user:42"]["session:42"] = 12 (halved) # t=300s: model["user:42"]["session:42"] = 3 (~3% of original)

Bounded Model (100K Keys)

The model tracks at most prefetch.max_keys (default: 100,000) distinct keys. When the limit is reached, the engine evicts the key with the lowest total co-occurrence count before inserting a new one. This keeps memory bounded regardless of key cardinality.

High-Cardinality Keys

If your workload has millions of unique keys (e.g., per-user keys), the 100K model limit means only the most active keys are tracked. Increase prefetch.max_keys if you need broader coverage, but be aware of memory overhead: each tracked key consumes approximately 200–400 bytes in the model (key string + HashMap of co-occurrences).

Configuration

Parameter Default Description
prefetch.enabled false Enable the speculative pre-fetch engine
prefetch.max_keys 100000 Maximum number of keys tracked in the co-occurrence model
prefetch.window_size 8 Ring buffer look-back window for co-occurrence counting
prefetch.top_n 3 Number of keys to predict and potentially pre-fetch per miss
prefetch.decay_interval_s 60 How often to halve co-occurrence counts (seconds)
prefetch.min_count 3 Minimum co-occurrence count to trigger a pre-fetch (noise filter)

Accuracy Metrics

RESP PREFETCH_STATS # → { # "predictions_made": 84200, # "prefetches_triggered": 52100, # "prefetch_hits": 41680, // prefetched key was actually accessed # "prefetch_wasted": 10420, // prefetched but never accessed (evicted) # "accuracy_pct": 80.0, // prefetch_hits / prefetches_triggered # "model_keys": 87432, # "ring_buffer_size": 8 # }
Metric Description Target
accuracy_pct Percentage of pre-fetched keys that were subsequently accessed >70% = good
prefetch_wasted Pre-fetched keys that expired or were evicted before being read <30% = acceptable
miss_rate_reduction Percentage reduction in cache misses due to pre-fetching >15% = meaningful impact
Tuning for Accuracy

If accuracy is below 50%, your workload may not have predictable access patterns. Increase prefetch.min_count to only pre-fetch keys with strong co-occurrence signals, or reduce prefetch.top_n to be more selective. If accuracy is above 80%, consider increasing top_n to capture more predictable keys.