← Back to Blog

How to Debug Cache Performance Issues in Production

December 21, 2025 • 7 min read • Debugging Guide

Your application was fast yesterday. Today it's slow. Requests that took 50ms now take 2 seconds. The cache is the likely culprit, but where do you start? This guide provides a systematic approach to diagnosing and fixing cache performance issues in production without causing more problems.

Symptoms and Root Causes

Common cache performance symptoms and what they indicate:

SymptomLikely Cause
Slow response timesLow hit rate, cache stampede
Database overloadCache misses, poor hit rate
Memory errorsCache full, aggressive eviction
Intermittent slownessCache stampede at expiry
High CPU on cache serverInefficient queries, large keys

Step 1: Check Hit Rate

Hit rate is the first metric to examine. Below 85% typically indicates a problem.

# Redis hit rate
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"

# Calculate hit rate
keyspace_hits:125423
keyspace_misses:8234

hit_rate = 125423 / (125423 + 8234) = 93.8%

# Low hit rate (<85%) indicates:
# - Insufficient cache size
# - Poor TTL configuration
# - Cache not warming properly
# - Keys not structured optimally

Quick Hit Rate Fixes

# 1. Check cache memory usage
redis-cli INFO memory

# If memory is maxed out:
used_memory:8.5G
maxmemory:8G
# → Increase memory or optimize eviction

# 2. Check eviction stats
evicted_keys:45231  # High = cache too small

# 3. Examine key distribution
redis-cli --bigkeys
# Identifies large keys consuming memory

Step 2: Identify Cache Stampedes

Cache stampede: many requests simultaneously hit a cold cache, overwhelming your backend.

Detecting Stampedes

# Symptom: Periodic latency spikes
# Check application logs for patterns

# Look for:
# - Latency spikes every N minutes (matching TTL)
# - Database query spikes
# - Multiple identical cache miss logs

# Example log pattern indicating stampede:
2025-12-21 10:00:00 Cache miss: product:123
2025-12-21 10:00:00 Cache miss: product:123
2025-12-21 10:00:00 Cache miss: product:123
[... 50 more identical misses in same second]

Stampede Mitigation

// Add request coalescing
const inFlightRequests = new Map();

async function getCached(key) {
  const cached = await cache.get(key);
  if (cached) return cached;

  // Check if another request is already fetching
  if (inFlightRequests.has(key)) {
    return await inFlightRequests.get(key);
  }

  // Fetch and share result with concurrent requests
  const promise = fetchFromDatabase(key).then(async (data) => {
    await cache.set(key, data, { ttl: 300 });
    inFlightRequests.delete(key);
    return data;
  });

  inFlightRequests.set(key, promise);
  return promise;
}

Step 3: Analyze Slow Cache Operations

Sometimes the cache itself is slow. Diagnose using slowlog and latency tracking.

# Redis SLOWLOG - shows slow commands
redis-cli SLOWLOG GET 10

# Example output:
1) 1) (integer) 12      # Log entry ID
   2) (integer) 1640000000  # Timestamp
   3) (integer) 15234   # Execution time (microseconds)
   4) 1) "GET"
      2) "user:profile:12345:preferences:settings"

# Slow operations indicate:
# - Large values being retrieved
# - Network latency
# - Blocking operations (KEYS, SMEMBERS on large sets)

Common Slow Operations

# BAD: KEYS command in production (blocks Redis)
KEYS user:*  # Scans all keys - O(N) operation

# GOOD: Use SCAN instead
SCAN 0 MATCH user:* COUNT 100  # Non-blocking iteration

# BAD: Retrieving entire large set
SMEMBERS large:set  # Returns all members at once

# GOOD: Use SSCAN for large sets
SSCAN large:set 0 COUNT 100

# BAD: Large value storage
SET config:data "{ ... 10MB JSON ... }"

# GOOD: Compress or break into smaller chunks
SET config:data:compressed [compressed 1MB data]

Step 4: Monitor Memory Usage

Memory issues cause evictions, which hurt hit rate and performance.

# Check Redis memory stats
redis-cli INFO memory

# Key metrics:
used_memory_human:7.5G       # Actual memory used
maxmemory_human:8.0G         # Configured limit
mem_fragmentation_ratio:1.23 # >1.5 indicates fragmentation

# Memory policy
maxmemory_policy:allkeys-lru # How Redis evicts

# If memory is full:
# Option 1: Increase memory
# Option 2: Optimize data structures
# Option 3: Reduce TTLs
# Option 4: Implement better eviction policy

Finding Memory Hogs

# Find largest keys
redis-cli --bigkeys

# Output shows:
[00.00%] Biggest string: user:123:session (512KB)
[00.00%] Biggest list: notifications:456 (2048 items)
[00.00%] Biggest hash: product:789 (10MB)

# Investigate large keys
redis-cli MEMORY USAGE product:789
# Shows: (integer) 10485760 bytes

# Fix: Compress or restructure
# Before:
SET product:789 [10MB JSON]

# After: Store only essential fields
HSET product:789 id 789 name "Product" price 49.99
# Reference full data in database

Step 5: Detect Inefficient Key Patterns

Poor key naming and structure lead to slow operations and memory waste.

# Anti-pattern: Using KEYS to find related data
KEYS user:123:*  # Scans ALL keys - very slow

# Better: Use hash tags for co-location
# Group related data with {user:123} tag
user:{123}:profile
user:{123}:preferences
user:{123}:sessions

# Best: Use Redis hashes for structured data
HSET user:123 profile {JSON} preferences {JSON}

# Anti-pattern: Very long key names
SET user_profile_data_for_authenticated_user_id_123_with_preferences "data"

# Better: Concise key names
SET user:123:profile "data"

Step 6: Trace End-to-End Latency

Identify where time is spent in cache operations.

// Instrument cache operations
async function getCached(key) {
  const startTime = Date.now();

  // Network latency to cache
  const cacheStart = Date.now();
  const cached = await cache.get(key);
  const cacheLatency = Date.now() - cacheStart;

  if (cached) {
    metrics.record('cache.latency', cacheLatency, { result: 'hit' });
    return cached;
  }

  // Database latency on miss
  const dbStart = Date.now();
  const data = await database.query(key);
  const dbLatency = Date.now() - dbStart;

  // Cache write latency
  const writeStart = Date.now();
  await cache.set(key, data, { ttl: 300 });
  const writeLatency = Date.now() - writeStart;

  metrics.record('cache.latency', cacheLatency, { result: 'miss' });
  metrics.record('db.latency', dbLatency);
  metrics.record('cache.write_latency', writeLatency);

  const totalLatency = Date.now() - startTime;
  if (totalLatency > 100) {
    logger.warn(`Slow cache operation: ${totalLatency}ms`, {
      key,
      cacheLatency,
      dbLatency,
      writeLatency
    });
  }

  return data;
}

Step 7: Check Connection Pool Health

Connection pool exhaustion causes slow cache operations.

// Monitor connection pool stats
const poolStats = cache.getPoolStats();

console.log({
  total: poolStats.totalConnections,     // All connections
  active: poolStats.activeConnections,   // Currently in use
  idle: poolStats.idleConnections,       // Available
  waiting: poolStats.waitingRequests     // Queued requests
});

// Warning signs:
// - waiting > 0: Pool is saturated
// - active ≈ max: Need larger pool
// - idle = 0: Pool too small for load

// Fix: Increase pool size
const cache = new Redis({
  host: 'localhost',
  maxRetriesPerRequest: 3,
  // Increase connection pool
  connectionPool: {
    min: 10,
    max: 100  // Was 20, increased
  }
});

Step 8: Examine Eviction Patterns

Understand what's being evicted and why.

# Track evicted keys
redis-cli INFO stats | grep evicted
evicted_keys:12453  # Total evictions since start

# Set up eviction monitoring
CONFIG SET notify-keyspace-events Ex
# E = eviction events, x = expired events

# Subscribe to eviction notifications
redis-cli --csv PSUBSCRIBE '__keyevent@0__:evicted'

# Log evicted keys to understand patterns
# High eviction of frequently-accessed keys = memory too small

Production Debugging Checklist

Immediate Actions (5 minutes)

  1. Check hit rate: redis-cli INFO stats
  2. Check memory: redis-cli INFO memory
  3. Check slow operations: redis-cli SLOWLOG GET 10
  4. Review recent deployments or traffic changes

Deep Investigation (30 minutes)

  1. Analyze access patterns with redis-cli MONITOR (sample briefly!)
  2. Check for cache stampedes in application logs
  3. Examine key distribution with --bigkeys
  4. Review eviction policy and tune if needed
  5. Trace end-to-end latency in application code

Long-Term Fixes (hours to days)

  1. Optimize data structures (use hashes, compress large values)
  2. Implement request coalescing for stampede prevention
  3. Adjust TTLs based on observed access patterns
  4. Scale cache infrastructure (more memory/nodes)
  5. Add monitoring and alerting for key metrics

Essential Monitoring Metrics

Set up continuous monitoring for these metrics:

// Key cache metrics to track
const metrics = {
  // Performance
  hitRate: 0.95,              // Target: >90%
  p50Latency: 2,              // milliseconds
  p95Latency: 5,              // milliseconds
  p99Latency: 15,             // milliseconds

  // Capacity
  memoryUsage: 0.75,          // 75% of max
  evictionRate: 100,          // evictions/second
  connectionPoolUtilization: 0.60,

  // Health
  errorRate: 0.001,           // 0.1%
  timeouts: 5,                // timeouts/minute
  connectionFailures: 0
};

// Alert thresholds
if (metrics.hitRate < 0.85) alert('Low cache hit rate');
if (metrics.p99Latency > 50) alert('High cache latency');
if (metrics.memoryUsage > 0.9) alert('Cache memory high');
if (metrics.evictionRate > 1000) alert('High eviction rate');

Conclusion

Debugging cache performance issues requires systematic analysis: start with hit rate, identify stampedes, analyze slow operations, monitor memory, trace latency, and check connection pools. Most issues fall into a few categories: insufficient memory, cache stampedes, inefficient operations, or poor key design.

The key is having good monitoring in place before problems occur. Track hit rate, latency percentiles, memory usage, and eviction rates continuously. When issues arise, use the debugging checklist to quickly identify and resolve root causes.

Automated Cache Performance Monitoring

Cachee.ai includes built-in performance monitoring with automatic anomaly detection and optimization recommendations.

Start Free Trial