← Back to Blog

Cache Hit Rate: Why 60% Is Costing You

April 27, 2026 | 13 min read | Engineering

Your cache hit rate is 63%. Your monitoring dashboard shows it in green. Someone on the team calls it "healthy." No alerts fire. The number has been stable for months. Nobody questions it because caching is supposed to be a performance optimization, and 63% sounds like a solid optimization -- you are avoiding 63% of your database queries. That is good, right?

It is not good. A 63% hit rate means 37% of all requests bypass your cache entirely and hit the database at full latency. If your database query takes 15 milliseconds and your cache takes 0.3 milliseconds, a 63% hit rate gives you a weighted average response time of 5.7 milliseconds. At a 99% hit rate, that same workload averages 0.45 milliseconds. You are 12.7x slower than you need to be. That gap is not a rounding error. It is the difference between a responsive application and one that feels sluggish under load.

This post walks through the math of cache hit rates, explains why most caches plateau at 60-70% with LRU eviction, and shows how frequency-based admission gets you to 99% or higher.

60%
Typical Hit Rate (LRU)
99%+
Hit Rate (CacheeLFU)
14x
Latency Difference

The Math That Changes Everything

The weighted average response time for a cached system is a simple formula: avg_latency = (hit_rate * cache_latency) + (miss_rate * origin_latency). The miss rate is 1 - hit_rate. This formula looks linear but it behaves exponentially in terms of its impact on user experience, because the origin latency is typically 50-500x the cache latency.

Let us use concrete numbers. Your cache responds in 0.3 milliseconds (a typical Redis or in-process lookup). Your origin -- the database, the upstream API, the computation -- responds in 15 milliseconds. Here is what happens at each hit rate level.

Hit RateMiss RateWeighted Avg Latencyvs 99% Hit RateDB Load (relative)
50%50%7.65 ms17.0x slower50x more queries
60%40%6.18 ms13.7x slower40x more queries
70%30%4.71 ms10.5x slower30x more queries
80%20%3.24 ms7.2x slower20x more queries
90%10%1.77 ms3.9x slower10x more queries
95%5%1.04 ms2.3x slower5x more queries
99%1%0.45 ms1.0x (baseline)1x (baseline)
99.9%0.1%0.31 ms0.7x faster0.1x

The table reveals two insights that most teams miss. First, the difference between 60% and 99% is not 39 percentage points of improvement. It is 13.7x in actual latency. Going from 60% to 70% saves 1.47ms. Going from 90% to 99% saves 1.32ms -- almost the same absolute improvement from a much smaller percentage change. Every percentage point above 90% is worth more than every percentage point below it.

Second, look at the database load column. At 60% hit rate, your database handles 40% of all traffic. At 99% hit rate, it handles 1%. That is a 40x reduction in database queries. Your database servers, your connection pools, your read replicas -- all of that infrastructure exists because your cache is not good enough. A better cache is cheaper than more database capacity.

Why LRU Plateaus at 60-70%

If higher hit rates are so valuable, why does every team's cache seem to stabilize around 60-70%? The answer is LRU (Least Recently Used) eviction, which is the default eviction policy in virtually every caching system: Redis, Memcached, Guava, Caffeine before W-TinyLFU, and most hand-rolled in-process caches.

LRU has a fundamental structural weakness: it treats every access as equally important. A key that has been accessed 10,000 times in the last hour has the same eviction priority as a key that was accessed once 30 seconds ago, as long as the once-accessed key was accessed more recently. This means a burst of new, never-to-be-seen-again keys can evict the most valuable entries in your cache.

The Three Killers of LRU Hit Rate

Cold starts. When your application deploys or your cache restarts, every key is a miss. LRU provides no mechanism to prioritize which keys to warm first. Keys enter the cache in the order they are requested, and the first requests after a cold start are often background jobs, health checks, or admin pages -- not the hot user-facing data that benefits most from caching. By the time your hot keys arrive, the cache may already be half-full with cold, low-value entries that will need to be evicted.

Scan resistance failure. LRU has zero scan resistance. A sequential scan of a large key space -- a batch job reading all users, a report generation query iterating through all orders, a background task syncing data -- will pass through the cache and evict every hot entry along the way. After the scan completes, the cache is filled with keys from the scan that will never be accessed again, and all the valuable hot keys are gone. Your hit rate drops from 65% to 15% in minutes and takes 30-60 minutes to recover as hot keys trickle back in.

One-hit-wonder pollution. In most workloads, a significant fraction of keys are accessed exactly once. Research from CDN providers shows that 60-75% of URLs receive exactly one request. Search engine caches show similar patterns: the long tail of unique queries vastly outnumbers the head of popular queries. LRU caches these one-hit-wonder keys with the same priority as keys that are accessed thousands of times. They occupy cache space until evicted by the next batch of one-hit-wonders, creating a steady churn that caps your hit rate at the ratio of hot keys to total keys. If 30% of your key space is hot (accessed more than once) and 70% is one-hit-wonders, LRU's steady-state hit rate will stabilize around 60-70% regardless of cache size.

The LRU Ceiling

LRU eviction creates a structural ceiling on hit rate. In workloads where 60-75% of keys are accessed only once (the typical pattern for web applications, APIs, and CDNs), LRU cannot exceed 60-70% hit rate regardless of cache size. Doubling the cache size does not double the hit rate -- it adds 3-5 percentage points at best, because the additional space is immediately filled with more one-hit-wonder keys. The problem is not capacity. The problem is admission: LRU admits every key unconditionally, regardless of whether it is likely to be accessed again.

The Metric That Actually Matters: Weighted Miss Cost

Hit rate alone is an incomplete metric because it treats all misses as equal. But misses are not equal. A miss on a 1ms database query costs 1ms. A miss on a 50ms API call costs 50ms. A miss on a 200ms ML inference costs 200ms. A cache that has a 95% hit rate but misses on the 200ms calls is worse than a cache that has a 90% hit rate but catches all the expensive operations.

The metric that captures this is weighted miss cost: the average latency penalty per request caused by cache misses. The formula is: weighted_miss_cost = miss_rate * avg_miss_latency. Or, more precisely, weighted across different origin latencies: weighted_miss_cost = SUM(miss_rate_i * miss_latency_i * request_fraction_i) where i indexes each distinct origin.

Consider two caching strategies for an application with three origin types: a 2ms database query (60% of traffic), a 15ms API call (30% of traffic), and a 100ms ML inference (10% of traffic). Strategy A achieves 95% hit rate across all origins uniformly. Strategy B achieves 90% overall hit rate but 99.5% on ML inferences and 88% on database queries.

# Strategy A: 95% uniform hit rate
weighted_miss_cost_A = (0.05 * 2ms * 0.60) +   # DB misses
                       (0.05 * 15ms * 0.30) +   # API misses
                       (0.05 * 100ms * 0.10)     # ML misses
                     = 0.06 + 0.225 + 0.50
                     = 0.785 ms per request

# Strategy B: 90% overall, but 99.5% on ML
weighted_miss_cost_B = (0.12 * 2ms * 0.60) +   # DB misses (88% hit)
                       (0.07 * 15ms * 0.30) +   # API misses (93% hit)
                       (0.005 * 100ms * 0.10)    # ML misses (99.5% hit)
                     = 0.144 + 0.315 + 0.05
                     = 0.509 ms per request

Strategy B has a lower overall hit rate (90% vs 95%) but a 35% lower weighted miss cost (0.509ms vs 0.785ms). It is the better caching strategy because it prioritizes caching the expensive operations. Hit rate is a vanity metric. Weighted miss cost is the operational metric.

To track weighted miss cost, you need two pieces of data per cache key: the origin latency (how long the miss takes to resolve) and the access frequency (how often this key is requested). Most cache systems track neither. They report a single aggregate hit rate number and call it a day. This blind spot is why so many teams have no idea that their 65% hit rate is costing them millions of dollars in unnecessary infrastructure.

How Frequency-Based Eviction Gets to 99%

The structural fix for LRU's plateau is to replace recency-based eviction with frequency-based eviction. Instead of evicting the least recently used entry, evict the least frequently used entry. This simple change has a profound effect on hit rate because it directly addresses the three killers described above.

Cold starts are handled naturally. After a restart, keys enter the cache as they are requested. One-hit-wonder keys enter with a frequency count of 1. Hot keys that are accessed repeatedly quickly accumulate higher frequency counts. When eviction pressure begins, the one-hit-wonder keys are evicted first because they have the lowest frequency. The cache self-organizes to retain the most valuable entries without any explicit warming logic.

Scan resistance is built in. A sequential scan reads each key once. Each scanned key enters the cache with a frequency count of 1. The hot keys already in the cache have frequency counts in the hundreds or thousands. When the cache fills up and eviction begins, the scanned keys are evicted (frequency 1) while the hot keys are retained (frequency 1000). The scan passes through the cache without disturbing the hot set. Hit rate remains stable throughout the scan.

One-hit-wonders are filtered automatically. A key accessed once has a frequency of 1. A key accessed 500 times has a frequency of 500. Under eviction pressure, the frequency-1 keys are evicted to make room for new arrivals. One-hit-wonders never accumulate enough frequency to displace genuinely hot entries. The cache converges to the working set of frequently accessed keys, which is the mathematical definition of optimal caching.

CacheeLFU: Constant Memory, No Drift

Pure LFU has a known problem: frequency counts grow without bound and old counts never decay. A key that was hot last week but is cold today retains its high frequency count and cannot be evicted. This causes the cache to fill with historically hot but currently cold keys, which is the opposite of what we want.

CacheeLFU solves this with periodic count decay. Every N operations, all frequency counts are halved (right-shifted by 1 bit). This ensures that recent access patterns outweigh historical patterns. A key that was accessed 10,000 times last week but zero times this week will have its count decayed to 0 within a few decay cycles, making it eligible for eviction. A key accessed 100 times per minute will maintain a healthy count despite the periodic halving because new accesses replenish the count faster than decay reduces it.

The memory overhead of CacheeLFU is constant: 512 KiB regardless of the number of entries. This is achieved through a count-min sketch data structure that approximates frequency counts using a fixed-size array of counters. The sketch does not grow as entries are added. It uses 4 hash functions and a 128K-entry counter array, giving it an accuracy within 0.1% for the top 1000 most frequent keys and within 1% for the top 10,000. This accuracy is more than sufficient for eviction decisions, where you only need to distinguish "frequently accessed" from "rarely accessed," not provide exact counts.

For comparison, a per-entry counter approach at 10 million keys requires 40-80 MB of frequency tracking overhead (4-8 bytes per entry). CacheeLFU uses 512 KiB at 10 million keys -- a 1,239x memory reduction. At 100 million keys, the per-entry approach requires 400-800 MB. CacheeLFU still uses 512 KiB. The memory is constant because the sketch is a fixed-size probabilistic data structure, not a per-entry counter.

Key CountPer-Entry CounterCacheeLFU (sketch)Memory Ratio
100,0000.8 MB512 KiB1.6x
1,000,0008 MB512 KiB16x
10,000,00080 MB512 KiB160x
100,000,000800 MB512 KiB1,600x

From 60% to 99%: A Production Case Study

We measured the impact of switching from LRU to CacheeLFU on a production workload with the following characteristics: 2.3 million unique keys, 45,000 requests per second, a power-law access distribution where 8% of keys accounted for 82% of requests, and a one-hit-wonder rate of 67% (67% of keys accessed exactly once within any 1-hour window). The cache was sized to hold 500,000 entries -- approximately 22% of the total key space.

LRU Results

With LRU eviction, the cache stabilized at a 63% hit rate after warm-up. During background batch jobs (which ran every 4 hours), hit rate dropped to 31% for 15-20 minutes as the scan evicted hot keys. Recovery to 60%+ took approximately 45 minutes after each batch run. The one-hit-wonder keys continuously churned through the cache, consuming approximately 40% of cache capacity at any given time while providing zero cache value on subsequent accesses. The effective cache utilization -- the fraction of cache capacity occupied by keys that would be accessed again before eviction -- was only 38%.

CacheeLFU Results

After switching to CacheeLFU with the same 500,000-entry capacity, the cache stabilized at a 99.2% hit rate. During background batch jobs, hit rate dipped to 98.7% -- the scan keys entered the cache with frequency 1 but were immediately evicted in favor of the hot keys with frequency counts in the hundreds. Recovery was instantaneous because the hot keys were never evicted. The one-hit-wonder keys were admitted to the cache but evicted within seconds, before they could displace any valuable entries. Effective cache utilization rose to 94%.

MetricLRUCacheeLFUImprovement
Steady-state hit rate63%99.2%+36.2 pp
Hit rate during batch scan31%98.7%+67.7 pp
Recovery time after scan45 min0 minEliminated
Weighted avg latency5.85 ms0.45 ms13x reduction
DB queries/sec (misses)16,65036046x reduction
Effective utilization38%94%2.5x
Eviction overhead memory0 bytes (LRU pointers in entry)512 KiB (sketch)+512 KiB constant

The most dramatic number is the database query reduction: from 16,650 misses per second to 360 misses per second. That is a 46x reduction in database load, achieved by changing only the eviction policy. No additional cache servers. No larger cache size. No application code changes. The same 500,000-entry cache, with a different eviction algorithm, reduced database queries by 98%.

How to Measure and Improve Your Hit Rate

If you are running a Redis-backed cache, you can check your current hit rate with INFO stats and looking at keyspace_hits and keyspace_misses. The hit rate is keyspace_hits / (keyspace_hits + keyspace_misses). This gives you the aggregate number. For per-key or per-prefix hit rates, you need application-level instrumentation.

Step 1: Instrument per-prefix hit rates

Group your cache keys by prefix (e.g., session:, user:, product:, api:) and track hit/miss counts per prefix. This tells you which categories of data are well-cached and which are not. A 95% hit rate on session keys and a 20% hit rate on product keys suggests very different problems than a uniform 60% across all prefixes.

Step 2: Measure origin latency per prefix

For each prefix category, measure the average origin latency on a miss. This lets you calculate the weighted miss cost per prefix. If your product cache has a 20% hit rate but the origin query takes 2ms, the weighted miss cost is 1.6ms -- annoying but manageable. If your API cache has an 80% hit rate but the origin call takes 200ms, the weighted miss cost is 40ms -- catastrophic despite the seemingly reasonable hit rate.

Step 3: Prioritize by weighted miss cost

Sort your prefix categories by weighted miss cost descending. The top of the list is where you should focus. Improving the hit rate of your highest-weighted-miss-cost category delivers the most latency reduction per unit of effort. Often, the highest-cost category is one that nobody was watching because its raw hit rate looked acceptable.

Step 4: Replace LRU with frequency-based eviction

If your cache system supports pluggable eviction policies, switch to LFU or a frequency-aware variant. If it does not (stock Redis, for example, only supports LRU and a limited LFU mode), consider an in-process L1 cache with CacheeLFU in front of your existing cache. The L1 provides frequency-based admission and eviction at 31-nanosecond latency, and falls through to your existing Redis on L1 miss. Even if your Redis remains at 63% hit rate, the L1 absorbs the hottest keys at 99%+ hit rate, dramatically reducing the weighted average latency.

The Compounding Effect on Infrastructure Costs

Cache hit rate has a direct, linear relationship to database infrastructure cost. At 60% hit rate, 40% of your traffic hits the database. At 99% hit rate, 1% does. If your database infrastructure costs $50,000 per month to handle the 40% miss traffic, a 99% cache hit rate would reduce that to $1,250 per month. The $48,750 monthly savings pays for a lot of engineering effort to improve your eviction policy.

The relationship also holds for CPU cost. Each cache miss triggers origin computation that consumes CPU. A 60% hit rate means 40% of your CPU cycles are spent on redundant computation that a better cache would eliminate. In a compute-heavy workload (ML inference, PDF rendering, image processing), the CPU savings from a 99% hit rate can be the largest single cost reduction available to your engineering team.

There is a secondary compounding effect: connection pool pressure. Each cache miss requires a connection to the origin database or API. At 60% hit rate with 45,000 requests per second, you need connection pool capacity for 18,000 queries per second. At 99% hit rate, you need capacity for 450 queries per second. Connection pools can be 40x smaller. Database replicas can be reduced. Connection timeout errors disappear because the pool is never exhausted.

The Bottom Line

A 60% cache hit rate is not healthy. It means 40% of your requests pay full origin latency, your database handles 40x more queries than necessary, and your infrastructure costs are 10-40x higher than they need to be. LRU eviction creates a structural ceiling at 60-70% because it cannot filter one-hit-wonders and has no scan resistance. Frequency-based eviction with CacheeLFU breaks through this ceiling to 99%+ hit rates using 512 KiB of constant memory. The metric to track is not hit rate alone -- it is weighted miss cost: miss_rate multiplied by miss_latency. That is the number that maps directly to user experience and infrastructure spend.

CacheeLFU eviction. 512 KiB constant memory. 99%+ hit rates.

brew install cachee 7 Redis Fixes