Microservices Caching Patterns: Complete Architecture Guide

December 20, 2025 • 8 min read • Architecture

Caching in microservices is fundamentally different from monolithic applications. With dozens of services, each potentially caching data, you face challenges around consistency, coordination, and cache invalidation across service boundaries.

This guide covers proven patterns for implementing caching in microservices architectures.

The Microservices Caching Challenge

In monoliths, caching is straightforward—one application, one cache. Microservices introduce complexity:

Data ownership: Which service owns which cached data?
Cross-service consistency: How do you invalidate when data changes?
Network overhead: Remote cache calls add latency
Failure isolation: Cache failures shouldn't cascade

Pattern 1: Service-Local Caching

When to Use

Each service maintains its own cache for data it owns or frequently accesses.

// Order Service caches its own orders
class OrderService {
    constructor() {
        this.cache = new LocalCache({ maxSize: 10000 });
    }

    async getOrder(orderId) {
        const cached = this.cache.get(`order:${orderId}`);
        if (cached) return cached;

        const order = await this.db.findOrder(orderId);
        this.cache.set(`order:${orderId}`, order, { ttl: 300 });
        return order;
    }
}

Pros: Simple, no network calls, failure isolated

Cons: Memory per instance, no sharing between replicas

Pattern 2: Distributed Cache Layer

When to Use

Shared cache cluster (Redis, Memcached) accessible by all services.

// Shared Redis cache
const redis = new Redis(process.env.REDIS_URL);

async function getCachedUser(userId) {
    const cached = await redis.get(`user:${userId}`);
    if (cached) return JSON.parse(cached);

    const user = await userService.getUser(userId);
    await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
    return user;
}

Pros: Shared across replicas, consistent view

Cons: Network latency, single point of failure if not clustered

Pattern 3: Cache-Aside with Events

When to Use

Services publish events when data changes; consuming services invalidate their caches.

// User Service publishes events
async function updateUser(userId, updates) {
    await db.updateUser(userId, updates);
    await cache.delete(`user:${userId}`);

    // Notify other services
    await eventBus.publish('user.updated', { userId, updates });
}

// Order Service subscribes
eventBus.subscribe('user.updated', async ({ userId }) => {
    // Invalidate any cached user data
    await cache.deletePattern(`orders:user:${userId}:*`);
});

Pros: Cross-service consistency, decoupled

Cons: Event infrastructure required, eventual consistency

Pattern 4: API Gateway Caching

When to Use

Cache responses at the API gateway level before requests reach services.

// Kong/NGINX configuration
location /api/products {
    proxy_cache api_cache;
    proxy_cache_valid 200 5m;
    proxy_cache_key $request_uri$http_authorization;
    proxy_pass http://product-service;
}

Pros: Transparent to services, reduces service load

Cons: Limited cache logic, coarse invalidation

Pattern 5: Sidecar Caching

When to Use

Deploy cache proxy as sidecar container alongside each service.

In Kubernetes, deploy a caching sidecar:

containers:
  - name: order-service
    image: order-service:latest
  - name: cache-sidecar
    image: cachee-sidecar:latest
    ports:
      - containerPort: 6380

Pros: Local cache access, consistent caching logic

Cons: Resource overhead per pod

Cross-Service Cache Coordination

When Service A caches data from Service B, you need coordination:

Option 1: TTL-Based Staleness

Accept that cached data may be stale for TTL duration. Simple but imprecise.

Option 2: Event-Driven Invalidation

Service B publishes change events; Service A subscribes and invalidates.

Option 3: Cache Versioning

Include version in cache keys; bump version on changes.

Warning: Avoid tight coupling between service caches. If Service A directly invalidates Service B's cache, you've created hidden dependencies.

Handling Cache Failures

Cache failures shouldn't break your services:

async function getUserWithFallback(userId) {
    try {
        const cached = await cache.get(`user:${userId}`);
        if (cached) return cached;
    } catch (error) {
        // Cache unavailable - proceed to database
        logger.warn('Cache unavailable', { error });
    }

    // Fallback to database
    return await db.getUser(userId);
}

Monitoring Distributed Caches

Track these metrics across services:

Hit rate per service: Identify which services benefit most
Cross-service latency: Measure cache network overhead
Invalidation lag: Time between change and cache update
Memory usage: Prevent cache from consuming too much RAM

Conclusion

Effective microservices caching requires choosing the right pattern for each use case. Start with service-local caching for owned data, add distributed caching for shared data, and use events for cross-service coordination.

The key principle: each service should own its caching strategy while coordinating with others through well-defined events, not direct cache manipulation.

Simplify microservices caching

Cachee.ai provides unified caching with automatic cross-service coordination.

Start Free Trial

The Numbers That Matter

Cache performance discussions get philosophical fast. Here are the actual measured numbers from production deployments running on documented hardware, so you can compare against your own infrastructure instead of trusting marketing copy.

L0 hot path GET: 28.9 nanoseconds on Apple M4 Max, single-threaded against pre-warmed in-memory cache. This is the floor — there's no faster way to read a key.
L1 CacheeLFU GET: ~89 nanoseconds on AWS Graviton4 (c8g.metal-48xl). Sharded DashMap with admission filtering.
Sustained throughput: 32 million ops/sec single-threaded on M4 Max, 7.41 million ops/sec at 16 workers on Graviton4 c8g.16xlarge.
L2 fallback: Sub-millisecond hits against ElastiCache Redis 7.4 over same-AZ network when L1 misses cascade through.

The compounding effect matters more than any single number. A 28-nanosecond L0 hit means your application spends almost zero time on cache lookups in the hot path, leaving the CPU free for the actual business logic that generates revenue.

The AWS-Specific Math

Most cache cost discussions ignore AWS-specific line items that turn out to dominate the bill. Three to track:

Cross-AZ data transfer. ElastiCache replicas across availability zones charge $0.01/GB for inter-AZ traffic in both directions. A workload doing 100 GB/day of cache reads across AZs runs an extra $30-60/month in transfer fees alone — invisible until you scrutinize the AWS bill line by line.
Reserved instance lock-in. ElastiCache reserved capacity gets you a 30-50% discount but locks you into a specific node type for one or three years. If your workload grows or your access pattern changes, you're paying for capacity you can't use efficiently.
Backup and snapshot storage. ElastiCache automatic backups are billed separately at S3 rates. For high-frequency snapshot configurations on large nodes, this can add 10-20% to the monthly bill that nobody attributes to "caching."

Running Cachee in-process inside your application binary eliminates all three line items at once. There's no separate cache tier to provision, no cross-AZ traffic for L0 reads, no reserved capacity to forecast, and no backup storage because the cache is reconstructible from the source of truth.

Observability And What To Measure

You can't tune what you can't measure. The four metrics that matter for any production cache deployment, in order of importance:

Hit rate, broken down by key prefix or namespace. A global hit rate of 92% sounds great until you discover that one critical namespace is sitting at 40% and dragging your tail latency. Per-prefix hit rates expose which workloads are getting cache value and which aren't.
Latency percentiles, not averages. p50, p95, p99, and p99.9 for both cache hits and cache misses. The cache miss latency is your fallback path performance — when the cache fails, this is what your users actually experience.
Memory pressure and eviction rate. If your eviction rate is climbing while your hit rate stays flat, you're under-provisioned. If both are climbing, your access pattern shifted and you need to retune TTLs or rethink what you're caching.
Stale-read rate. The percentage of cache hits that returned a value the application then discovered was stale. This is the canary for your invalidation strategy. If it's above 1%, your invalidation logic has a bug.

Cachee exposes all four out of the box via Prometheus metrics on the standard scrape endpoint, plus a real-time SSE stream for dashboards that need sub-second visibility. The right time to wire these into your monitoring stack is before the migration, not after the first incident.

Memory Efficiency Is The Hidden Cost Lever

Throughput numbers get the headlines but memory efficiency determines your monthly bill. A cache that stores the same hot data in less RAM lets you run a smaller instance class — and on AWS that's the difference between profitable and breakeven for a lot of services.

Redis stores each key as a Simple Dynamic String with 16 bytes of header overhead, plus dictEntry pointers in the main hashtable, plus embedded TTL metadata. For 1KB values, per-entry overhead lands around 1100-1200 bytes once you account for hashtable load factor and slab fragmentation. At a million keys, that's roughly 1.2 GB of resident memory just for the data.

Cachee's L1 layer uses sharded DashMap entries with compact packing — a 64-bit key hash, value bytes, an 8-byte expiry timestamp, and a small frequency counter for the CacheeLFU admission filter. Per-entry overhead lands at roughly 40 bytes of structural data on top of the value itself. For the same million-key workload, that's about 13% smaller resident memory. On AWS ElastiCache pricing, that gap is the difference between needing a cache.r7g.large versus a cache.r7g.xlarge for borderline workloads.