Cache Consistency in Microservices: Eventual vs Strong

December 21, 2025 • 7 min read • Distributed Systems

Cache consistency is the hardest problem in distributed microservices architectures. Update data in one service, and five other services have stale caches. Choose strong consistency, and you lose the performance benefits of caching. This guide helps you navigate the consistency spectrum and choose the right approach for each use case.

The Cache Consistency Problem

Consider an e-commerce system with separate microservices:

// Product Service updates price
await db.products.update(
  { id: 123 },
  { price: 49.99 }
);

// Problem: These caches are now stale:
// - Product Service cache
// - Cart Service cache (has old price)
// - Recommendation Service cache (has old price)
// - Search Service cache (has old price)

// How do they know to invalidate?

This is the classic distributed cache invalidation problem. The CAP theorem forces a choice between consistency and availability. Most caching systems choose availability, leading to consistency challenges.

Consistency Models Explained

Strong Consistency

Every read returns the most recent write. All services see the same data at the same time.

Pros: No stale data, simple reasoning about state
Cons: Lower performance, higher latency, reduced availability
Use case: Financial transactions, inventory counts, account balances

Eventual Consistency

Reads may return stale data temporarily, but all replicas converge to the same state eventually.

Pros: High performance, high availability, partition tolerance
Cons: Temporary inconsistencies, complex conflict resolution
Use case: Product catalogs, user profiles, recommendations

Bounded Staleness

Stale data is allowed, but only within defined limits (time or version bounds).

Pros: Balance between consistency and performance
Cons: Still requires staleness handling
Use case: Analytics dashboards, real-time metrics

Pattern 1: Time-Based Invalidation (Eventual)

The simplest approach: cache with short TTLs and accept brief inconsistency.

// Product Service
async function updateProduct(id, data) {
  await db.products.update(id, data);

  // Invalidate own cache
  await cache.delete(`product:${id}`);

  // Other services will get fresh data after TTL expires
  // Max staleness = TTL (e.g., 60 seconds)
}

// Cart Service (different microservice)
async function getProduct(id) {
  let product = await cache.get(`product:${id}`);

  if (!product) {
    product = await productServiceAPI.getProduct(id);
    // Cache for 60 seconds
    await cache.set(`product:${id}`, product, { ttl: 60 });
  }

  return product;  // May be up to 60s stale
}

When to Use TTL-Based Invalidation

Data changes infrequently
Brief staleness is acceptable
Services are loosely coupled
Simple implementation preferred

Pattern 2: Event-Driven Invalidation (Eventual)

Publish cache invalidation events when data changes. Other services subscribe and invalidate their caches.

// Product Service
async function updateProduct(id, data) {
  await db.products.update(id, data);

  // Invalidate local cache
  await cache.delete(`product:${id}`);

  // Publish invalidation event
  await eventBus.publish('product.updated', {
    productId: id,
    timestamp: Date.now(),
    fields: ['price', 'stock']
  });
}

// Cart Service (subscriber)
eventBus.subscribe('product.updated', async (event) => {
  // Invalidate cached product data
  await cache.delete(`product:${event.productId}`);
  await cache.delete(`cart:*:product:${event.productId}`);

  console.log(`Invalidated cache for product ${event.productId}`);
});

// Recommendation Service (subscriber)
eventBus.subscribe('product.updated', async (event) => {
  // Invalidate recommendation caches that include this product
  await cache.invalidatePattern(`recommendations:*:${event.productId}`);
});

Event-Driven Invalidation Benefits

Near-immediate cache invalidation
Decoupled services (via event bus)
Explicit invalidation logic
Audit trail of cache invalidations

Challenges

Event delivery latency (typically 50-500ms)
Event ordering issues
Partial failures (some services miss events)
Event bus becomes critical dependency

Pattern 3: Write-Through Cache (Strong)

Updates go through a centralized cache layer that maintains consistency.

// Centralized Cache Service
class CacheService {
  async get(key) {
    const cached = await redis.get(key);
    if (cached) return JSON.parse(cached);

    // Cache miss: fetch from database
    const data = await database.get(key);
    await this.set(key, data);
    return data;
  }

  async set(key, value, ttl = 3600) {
    // Write to database first
    await database.set(key, value);

    // Then update cache
    await redis.setex(key, ttl, JSON.stringify(value));

    // All readers get consistent data
  }

  async delete(key) {
    await database.delete(key);
    await redis.del(key);
  }
}

// All services use centralized cache
const cache = new CacheService();

// Product Service
await cache.set('product:123', { price: 49.99 });

// Cart Service reads immediately
const product = await cache.get('product:123');
// Guaranteed to see updated price

Trade-Offs

Pro: Strong consistency guarantees
Pro: Simple for services (no invalidation logic)
Con: Cache service is single point of failure
Con: All writes go through cache layer (latency)

Pattern 4: Version-Based Consistency

Include version numbers in cache keys to ensure correct data is used.

// Product Service maintains version
async function updateProduct(id, data) {
  const version = await db.products.incrementVersion(id);

  await db.products.update(id, data);

  // Cache with version in key
  await cache.set(`product:${id}:v${version}`, data, { ttl: 3600 });

  // Publish new version
  await eventBus.publish('product.updated', {
    productId: id,
    version: version
  });
}

// Cart Service
let currentVersion = 1;

eventBus.subscribe('product.updated', (event) => {
  currentVersion = event.version;
});

async function getProduct(id) {
  // Always fetch with current version
  const key = `product:${id}:v${currentVersion}`;
  let product = await cache.get(key);

  if (!product) {
    product = await productServiceAPI.getProduct(id);
    await cache.set(key, product, { ttl: 3600 });
  }

  return product;
}

Pattern 5: Read Repair

Detect stale data during reads and update automatically.

async function getProduct(id) {
  const cached = await cache.get(`product:${id}`);

  if (cached) {
    // Background validation: is cache stale?
    validateCache(id, cached.updatedAt).then(async (isStale) => {
      if (isStale) {
        // Repair cache in background
        const fresh = await productServiceAPI.getProduct(id);
        await cache.set(`product:${id}`, fresh);
      }
    });

    return cached;  // Return cached immediately
  }

  // Cache miss
  const product = await productServiceAPI.getProduct(id);
  await cache.set(`product:${id}`, product, { ttl: 300 });
  return product;
}

async function validateCache(id, cachedTimestamp) {
  // Check if source data is newer
  const lastModified = await productServiceAPI.getLastModified(id);
  return lastModified > cachedTimestamp;
}

Pattern 6: Hybrid Consistency Levels

Use different consistency models for different data types within the same system.

const CONSISTENCY_POLICIES = {
  'product.price': 'eventual',        // Can be briefly stale
  'product.description': 'eventual',   // Can be briefly stale
  'inventory.count': 'strong',        // Must be accurate
  'user.balance': 'strong',           // Financial data
  'user.profile': 'eventual',         // Can be stale
};

async function getCacheConsistency(dataType) {
  return CONSISTENCY_POLICIES[dataType] || 'eventual';
}

async function getData(type, id) {
  const consistency = await getCacheConsistency(type);

  if (consistency === 'strong') {
    // Always read from source with cache-aside
    return await getWithStrongConsistency(type, id);
  } else {
    // Use cached data with TTL
    return await getWithEventualConsistency(type, id);
  }
}

Monitoring Consistency

Track consistency metrics across services:

// Consistency lag metric
async function measureConsistencyLag() {
  const sourceData = await database.get('product:123');
  const cachedData = await cache.get('product:123');

  if (cachedData) {
    const lag = sourceData.updatedAt - cachedData.updatedAt;
    metrics.recordConsistencyLag('product', lag);

    if (lag > 5000) {  // >5 seconds stale
      logger.warn(`High consistency lag: ${lag}ms for product:123`);
    }
  }
}

// Stale read detection
async function detectStaleReads() {
  // Track version mismatches
  metrics.increment('cache.stale_reads', {
    service: 'cart',
    resource: 'product'
  });
}

Decision Framework

Use Strong Consistency When:

Financial transactions or money-related data
Inventory/stock management
Legal/compliance requirements
Stale data causes incorrect business logic

Use Eventual Consistency When:

High read volume, low write volume
Data changes infrequently
Brief staleness is acceptable
Performance is critical

Use Hybrid Approach When:

Different data types have different requirements
Some features need strong, others eventual
Balancing performance and correctness

Conclusion

Cache consistency in microservices is about choosing the right trade-off for each use case. TTL-based invalidation works for most non-critical data. Event-driven invalidation reduces staleness while maintaining loose coupling. Write-through caches provide strong consistency at the cost of performance. Version-based systems prevent stale data usage.

The best architectures use different consistency models for different data types: strong consistency for critical data, eventual consistency for everything else. Monitor consistency lag continuously and adjust TTLs and invalidation strategies based on observed behavior.

Automatic Consistency Management

Cachee.ai intelligently manages cache consistency across microservices with ML-powered invalidation timing.

Start Free Trial

The Numbers That Matter

Cache performance discussions get philosophical fast. Here are the actual measured numbers from production deployments running on documented hardware, so you can compare against your own infrastructure instead of trusting marketing copy.

L0 hot path GET: 28.9 nanoseconds on Apple M4 Max, single-threaded against pre-warmed in-memory cache. This is the floor — there's no faster way to read a key.
L1 CacheeLFU GET: ~89 nanoseconds on AWS Graviton4 (c8g.metal-48xl). Sharded DashMap with admission filtering.
Sustained throughput: 32 million ops/sec single-threaded on M4 Max, 7.41 million ops/sec at 16 workers on Graviton4 c8g.16xlarge.
L2 fallback: Sub-millisecond hits against ElastiCache Redis 7.4 over same-AZ network when L1 misses cascade through.

The compounding effect matters more than any single number. A 28-nanosecond L0 hit means your application spends almost zero time on cache lookups in the hot path, leaving the CPU free for the actual business logic that generates revenue.

The AWS-Specific Math

Most cache cost discussions ignore AWS-specific line items that turn out to dominate the bill. Three to track:

Cross-AZ data transfer. ElastiCache replicas across availability zones charge $0.01/GB for inter-AZ traffic in both directions. A workload doing 100 GB/day of cache reads across AZs runs an extra $30-60/month in transfer fees alone — invisible until you scrutinize the AWS bill line by line.
Reserved instance lock-in. ElastiCache reserved capacity gets you a 30-50% discount but locks you into a specific node type for one or three years. If your workload grows or your access pattern changes, you're paying for capacity you can't use efficiently.
Backup and snapshot storage. ElastiCache automatic backups are billed separately at S3 rates. For high-frequency snapshot configurations on large nodes, this can add 10-20% to the monthly bill that nobody attributes to "caching."

Running Cachee in-process inside your application binary eliminates all three line items at once. There's no separate cache tier to provision, no cross-AZ traffic for L0 reads, no reserved capacity to forecast, and no backup storage because the cache is reconstructible from the source of truth.

Observability And What To Measure

You can't tune what you can't measure. The four metrics that matter for any production cache deployment, in order of importance:

Hit rate, broken down by key prefix or namespace. A global hit rate of 92% sounds great until you discover that one critical namespace is sitting at 40% and dragging your tail latency. Per-prefix hit rates expose which workloads are getting cache value and which aren't.
Latency percentiles, not averages. p50, p95, p99, and p99.9 for both cache hits and cache misses. The cache miss latency is your fallback path performance — when the cache fails, this is what your users actually experience.
Memory pressure and eviction rate. If your eviction rate is climbing while your hit rate stays flat, you're under-provisioned. If both are climbing, your access pattern shifted and you need to retune TTLs or rethink what you're caching.
Stale-read rate. The percentage of cache hits that returned a value the application then discovered was stale. This is the canary for your invalidation strategy. If it's above 1%, your invalidation logic has a bug.

Cachee exposes all four out of the box via Prometheus metrics on the standard scrape endpoint, plus a real-time SSE stream for dashboards that need sub-second visibility. The right time to wire these into your monitoring stack is before the migration, not after the first incident.

Memory Efficiency Is The Hidden Cost Lever

Throughput numbers get the headlines but memory efficiency determines your monthly bill. A cache that stores the same hot data in less RAM lets you run a smaller instance class — and on AWS that's the difference between profitable and breakeven for a lot of services.

Redis stores each key as a Simple Dynamic String with 16 bytes of header overhead, plus dictEntry pointers in the main hashtable, plus embedded TTL metadata. For 1KB values, per-entry overhead lands around 1100-1200 bytes once you account for hashtable load factor and slab fragmentation. At a million keys, that's roughly 1.2 GB of resident memory just for the data.

Cachee's L1 layer uses sharded DashMap entries with compact packing — a 64-bit key hash, value bytes, an 8-byte expiry timestamp, and a small frequency counter for the CacheeLFU admission filter. Per-entry overhead lands at roughly 40 bytes of structural data on top of the value itself. For the same million-key workload, that's about 13% smaller resident memory. On AWS ElastiCache pricing, that gap is the difference between needing a cache.r7g.large versus a cache.r7g.xlarge for borderline workloads.