How It Works Pricing Benchmarks
vs Redis Docs Blog
Start Free Trial
Smart Eviction

Your Cache Should Know What's
Expensive to Re-Fetch.

LRU evicts based on recency. LFU evicts based on frequency. Neither knows that a 200ms ML recommendation is 200x more expensive to re-compute than a 1ms database lookup. Cost-aware eviction does.

Origin Cost
Tracking
ML-Optimized
Eviction Scoring
Per-Key
Cost Scoring
Total Cost
System Optimization
The Problem

Why LRU/LFU Get Eviction Wrong

Every traditional eviction policy shares the same blind spot: they treat all cached values as equally expensive to replace. They are not.

LRU looks at when a key was last accessed. LFU looks at how often a key is accessed. Both are measuring demand. Neither measures supply cost. A configuration flag that takes 1ms to re-fetch from your database and a complex aggregation query that takes 200ms to recompute sit in the same cache with the same eviction priority. When memory pressure hits, LRU evicts whichever was touched least recently. If that happens to be your expensive aggregation, your system just traded a microsecond cache hit for a 200ms origin fetch.

This is not a theoretical problem. In production workloads, the cost distribution of cached values is wildly uneven. Most keys are cheap to re-fetch: simple database lookups, static configuration, session metadata. A small percentage of keys are expensive: machine learning inference results, complex JOIN queries across sharded databases, third-party API responses with rate limits, computed analytics aggregations.

When LRU evicts an expensive key to make room for a cheap one, total system cost increases. The hit rate number stays high because the cheap key is now cached, but the next request for the expensive key triggers a costly origin fetch. Your dashboard shows a healthy cache. Your P99 latency tells a different story.

LRU Eviction (Cost-Blind)
ml:recommend:user_429 200ms origin / EVICTED
analytics:dashboard:q4 150ms origin / EVICTED
config:feature_flags 1ms origin / KEPT
session:meta:abc 2ms origin / KEPT
Cost-Aware Eviction (Cachee)
ml:recommend:user_429 200ms origin / KEPT
analytics:dashboard:q4 150ms origin / KEPT
config:feature_flags 1ms origin / EVICTED
session:meta:abc 2ms origin / EVICTED

Traditional caching tools optimize for hit rate because it is the only metric they can measure. Cost-aware eviction optimizes for what actually matters: the total cost your system pays for cache misses. Learn how our AI caching layer makes this possible.

How It Works

How Cost-Aware Eviction Works

Four mechanisms working together. No configuration required. The ML layer already has the data it needs.

Cost-Aware Eviction Pipeline
Step 1
Track Origin
Step 2
Score Keys
Step 3
Rank Eviction
Step 4
Evict Cheapest

Origin Fetch Cost Tracking

Every cache miss is an observation. When a key misses and the origin fetch completes, Cachee records the latency as the cost of that key. Over time, a rolling average builds a per-key cost profile. A key whose origin fetch consistently takes 200ms gets a cost score of 200. A key that resolves in 1ms gets a cost score of 1.

This is not additional instrumentation. The predictive pre-warming system already observes origin latency for every key as part of its prediction pipeline. Cost-aware eviction simply feeds that existing data into the eviction scoring function. Zero overhead. Zero configuration.

Composite Eviction Scoring

Each key receives a composite eviction score that combines recency, frequency, and origin cost. The formula weights cost heavily: a key that is accessed infrequently but costs 200ms to re-fetch will survive eviction longer than a key accessed often but costing only 1ms. The exact weighting is tuned by the ML layer based on your workload characteristics.

When memory pressure triggers eviction, keys are ranked by their composite score. The lowest-scoring keys, those that are cheap to re-fetch, infrequently accessed, and have been idle longest, are evicted first. Expensive keys remain in cache even if they have not been accessed recently, because the cost of a miss on those keys is disproportionately high.

Origin Latency Tracking
Every cache miss records origin fetch time. Rolling average builds per-key cost profiles automatically.
Per-key cost resolution
📊
Composite Scoring
Recency + frequency + origin cost combined into a single eviction priority score. ML-tuned weighting.
3-factor scoring model
🧠
ML Weight Tuning
The scoring weights adapt to your workload. Cost-dominated workloads weight cost higher. Access-dominated workloads balance all three.
Continuous optimization
Zero Overhead
Cost data comes from the existing prediction pipeline. No extra instrumentation, no added latency, no configuration.
0 additional latency

Cost-aware eviction is enabled by default for all Cachee deployments. The ML layer learns your origin cost distribution within 60 seconds of deployment. See the full AI caching architecture for details on how prediction and eviction work together.

The Math

The Math That Proves It

Hit rate is a vanity metric. Total miss cost is the metric that determines your system's actual performance. Here is a scenario that makes the difference concrete.

Consider two eviction strategies applied to the same workload of 100 requests. Strategy A achieves a 99% hit rate, meaning 1 miss. But that 1 miss is an expensive ML inference result that costs 200ms to re-compute from the origin. Total miss cost: 200ms across 100 requests, or 2ms per request amortized.

Strategy B achieves a 97% hit rate, meaning 3 misses. But those 3 misses are all cheap configuration lookups that cost 1ms each to re-fetch. Total miss cost: 3ms across 100 requests, or 0.03ms per request amortized.

Strategy A looks better on every dashboard. Strategy B costs your system 67x less in actual compute and latency. Every monitoring tool you have will tell you Strategy A is superior. Every user waiting for that 200ms re-computation will disagree. Cost-aware eviction chooses Strategy B because it optimizes for the right metric.

Total Miss Cost Comparison (per 100 requests)
Strategy A: Hit Rate 99%
Strategy A: Misses 1 miss x 200ms
Strategy A: Total Miss Cost 2ms / 100 req
Strategy B: Hit Rate 97%
Strategy B: Misses 3 misses x 1ms
Strategy B: Total Miss Cost 0.03ms / 100 req
Cost Reduction 67x lower
Eviction Score Breakdown (Higher = Survives Longer)
ML Recommendation (200ms)
950
Analytics Query (150ms)
820
API Gateway Response (30ms)
450
Session Metadata (5ms)
150
Config Flag (1ms)
50

The score bars above show how cost-aware eviction ranks keys. High origin cost pushes the score up, making expensive keys survive eviction. Cheap keys are evicted first, minimizing total system cost. See our hit rate optimization guide for how this works alongside other ML-driven cache strategies.

Competitive Advantage

Why Nobody Else Can Build This

Cost-aware eviction sounds obvious once you hear it. Building it requires something no traditional cache has: per-key origin cost data.

Redis does not know that user:recommendations:429 takes 200ms to compute at the origin. Memcached does not know that config:flags takes 1ms. They are key-value stores. They store bytes and evict bytes. The cost of producing those bytes is invisible to them.

To implement cost-aware eviction, you need to observe origin fetch latency for every key, maintain a rolling cost profile, and feed that data into the eviction algorithm. Traditional caches would need a complete architectural overhaul to add this capability. Even then, they would be bolting instrumentation onto a system that was not designed for it.

Cachee's predictive pre-warming system already observes origin latency for every key it manages. It needs this data to predict which keys are worth pre-warming and when. Cost-aware eviction is the natural extension of data we already collect. No new infrastructure. No additional latency. Just a smarter eviction function that uses data we already have.

Capability Redis / Memcached Cachee
Eviction Algorithm LRU / LFU (fixed) Cost-aware (ML-tuned)
Origin Cost Tracking Not available Per-key, real-time
Cost-Weighted Scoring Not possible Recency + freq + cost
Adaptive Weighting Static algorithm ML per workload
Optimizes For Hit rate Total system cost
Configuration maxmemory-policy flag Zero-config, automatic

See how Cachee's full ML pipeline compares to traditional caches in our detailed comparison, or explore the enterprise deployment options for production-scale cost-aware eviction.

Integration

Cost-Aware Eviction Is Already On

No flags to set. No policies to configure. Cost-aware eviction activates automatically when the ML layer has enough origin cost data, typically within 60 seconds of deployment.

// Cost-aware eviction is automatic. No configuration needed. // Just use Cachee normally — the ML layer handles the rest. import { Cachee } from '@cachee/sdk'; const cache = new Cachee({ apiKey: 'ck_live_your_key_here' }); // Expensive origin fetch — ML learns this costs 200ms const recs = await cache.get('ml:recommend:user_429', { origin: () => computeRecommendations(429) // ~200ms }); // Cheap origin fetch — ML learns this costs 1ms const flags = await cache.get('config:feature_flags', { origin: () => db.query('SELECT * FROM flags') // ~1ms }); // Under memory pressure, config:feature_flags is evicted first. // ml:recommend:user_429 survives — it's 200x more expensive to refetch. // Your system pays 1ms to re-fetch the config instead of 200ms for the ML result.
1. Deploy
Install the SDK and add your API key. Cachee sits in front of your existing cache as an overlay layer. Your Redis or Memcached stays in place.
2. Observe
The ML layer observes origin fetch latency for every key. Within 60 seconds, it builds a per-key cost profile and begins scoring eviction candidates by cost.
3. Optimize
Eviction decisions now factor in origin cost. Expensive keys survive longer. Cheap keys are evicted first. Total system cost drops without any manual tuning.
Impact

Where Cost-Aware Eviction Has the Biggest Impact

Cost-aware eviction delivers the most value when your cached data has a wide cost distribution. These workloads see the largest improvement.

🤖
ML/AI Inference Caching
Recommendation engines, fraud detection models, and NLP pipelines produce results that take 50-500ms to compute. Evicting these cached results triggers expensive re-inference. Cost-aware eviction keeps ML outputs in cache and evicts cheap metadata instead.
50-500ms origin cost protected
📈
Analytics Aggregations
Dashboard queries that aggregate millions of rows, compute percentiles, or join across shards take 100-300ms. A simple key-value lookup takes 1ms. Cost-aware eviction knows the difference and protects the expensive aggregation.
100-300ms queries preserved
🌐
Third-Party API Responses
External API calls are subject to network latency, rate limits, and quotas. A Stripe payment method lookup or a geocoding API response costs 50-200ms and may be rate-limited. Cost-aware eviction treats these as high-value cache entries.
Rate-limit-aware caching
🗄
Multi-Region Data
Cross-region database reads add 30-80ms of network latency on top of query execution time. Keys sourced from distant regions are inherently more expensive to re-fetch. Cost-aware eviction factors in the full round-trip cost.
Cross-region cost awareness

Stop Optimizing for Hit Rate.
Start Optimizing for Cost.

Cost-aware eviction is enabled by default on every Cachee deployment. Start with the free tier, deploy in under 5 minutes, and watch your total miss cost drop while your expensive data stays cached.

Start Free Trial Compare Eviction Policies
Predictive Caching Hit Rate Optimization AI Caching Platform Enterprise