Overview
Most cache misses are predictable. If key A is always accessed before key B, and key B is not in the cache, you already know B will be needed next. Speculative Pre-Fetch exploits this predictability by maintaining a co-occurrence model of key access patterns and proactively loading predicted keys on cache misses.
The PrefetchEngine records access sequences in a ring buffer, builds a co-occurrence matrix (DashMap<String, HashMap<String, u32>>), and uses it to predict which keys will be accessed next. When a cache miss occurs, the engine calls predict_next to identify likely follow-up keys and loads them in the background via the configured source.
Enable speculative pre-fetch when your workload has predictable access sequences: user → session → permissions, product-list → product-detail → reviews, or any pattern where keys are accessed in a consistent order. The model learns these patterns automatically from your traffic.
Co-Occurrence Model
Building the Model
On every GET (hit or miss), the engine appends the key to the ring buffer and updates co-occurrence counts. If key A was accessed within the last N accesses before key B, the model increments model[A][B]. The ring buffer window (default: 8 keys) controls how far back the model looks for co-occurrences.
predict_next
Given a key, return the top-N most likely next keys (by co-occurrence count).
On-Miss Pre-Fetch
When a cache miss occurs, the PrefetchEngine runs predict_next for the missed key and checks which predicted keys are also not in the cache. For each predicted-but-missing key, it initiates a background fetch from the configured source.
- Cache miss for key A: Normal miss processing (fetch from source, return to client).
- Predict: Call
predict_next(A, top_3)→ returns [B, C, D] with co-occurrence counts. - Filter: Check which of [B, C, D] are already in the cache. Suppose B is present; C and D are missing.
- Pre-fetch: Asynchronously load C and D from the source into the cache.
- Result: When the application requests C or D next, they are already in the cache — a hit instead of a miss.
Decay & Model Bounding
Without decay, the co-occurrence model would grow unbounded and reflect patterns from weeks or months ago. The engine uses two mechanisms to keep the model current and bounded.
Exponential Decay
Every 60 seconds (configurable), the engine halves all co-occurrence counts. This causes old patterns to fade and recent patterns to dominate. After 5 decay cycles (~5 minutes), an old pattern's count is reduced to 3% of its original value.
Bounded Model (100K Keys)
The model tracks at most prefetch.max_keys (default: 100,000) distinct keys. When the limit is reached, the engine evicts the key with the lowest total co-occurrence count before inserting a new one. This keeps memory bounded regardless of key cardinality.
If your workload has millions of unique keys (e.g., per-user keys), the 100K model limit means only the most active keys are tracked. Increase prefetch.max_keys if you need broader coverage, but be aware of memory overhead: each tracked key consumes approximately 200–400 bytes in the model (key string + HashMap of co-occurrences).
Configuration
| Parameter | Default | Description |
|---|---|---|
prefetch.enabled |
false | Enable the speculative pre-fetch engine |
prefetch.max_keys |
100000 | Maximum number of keys tracked in the co-occurrence model |
prefetch.window_size |
8 | Ring buffer look-back window for co-occurrence counting |
prefetch.top_n |
3 | Number of keys to predict and potentially pre-fetch per miss |
prefetch.decay_interval_s |
60 | How often to halve co-occurrence counts (seconds) |
prefetch.min_count |
3 | Minimum co-occurrence count to trigger a pre-fetch (noise filter) |
Accuracy Metrics
| Metric | Description | Target |
|---|---|---|
accuracy_pct |
Percentage of pre-fetched keys that were subsequently accessed | >70% = good |
prefetch_wasted |
Pre-fetched keys that expired or were evicted before being read | <30% = acceptable |
miss_rate_reduction |
Percentage reduction in cache misses due to pre-fetching | >15% = meaningful impact |
If accuracy is below 50%, your workload may not have predictable access patterns. Increase prefetch.min_count to only pre-fetch keys with strong co-occurrence signals, or reduce prefetch.top_n to be more selective. If accuracy is above 80%, consider increasing top_n to capture more predictable keys.