Cross-Service Cache Coherence

The number-one objection engineering teams raise against L1 or sidecar caching is always the same: “What happens when Service A updates a value and Service B still has the old copy?” It is a legitimate concern, and it is the reason most teams never move beyond a centralized Redis cluster. But centralized caching means network hops on every read, and network hops mean latency floors you cannot break through. Cache coherence eliminates the objection — giving you the sub-microsecond speed of L1 with the consistency guarantees of a centralized store.

Why Teams Are Afraid of L1 Caching

The architecture is simple: instead of routing every cache read through a shared Redis instance over the network, you embed a cache directly in each service instance — in-process, zero network hops, sub-microsecond reads. This is L1 caching. It is the single fastest way to serve data. And almost nobody uses it in production for shared or mutable data.

The reason is not performance. L1 caching is orders of magnitude faster than any networked cache. The reason is consistency. If you have 20 instances of your API service, each with its own L1 cache, and a user updates their email address through one instance, the other 19 instances still have the old email. For the duration of the TTL, 95% of your fleet is serving stale data. Not “eventually consistent” stale — completely unaware that anything changed stale.

This is a real problem, and it cannot be solved by shortening TTLs, adding background refresh jobs, or hoping that users always hit the same instance. It requires a mechanism that propagates invalidations across instances in real time. That mechanism is cache coherence.

The Coherence Channel

When any Cachee instance writes, deletes, or invalidates a key, it broadcasts a lightweight invalidation message over the coherence channel. Every other Cachee instance in the cluster receives this message and invalidates its local copy of the key. The propagation completes in sub-millisecond time.

# Instance A writes a new value
SET user:jane:email "jane@newdomain.com"

# Within <1ms, every other instance invalidates its local copy
# Instance B, C, D, ... all drop user:jane:email from their L1
# Next read on any instance fetches fresh data

There is no pub/sub infrastructure to deploy. No Kafka cluster, no Redis Pub/Sub channel, no custom message broker. The coherence channel is built into the Cachee runtime and operates over a lightweight gossip protocol optimized for cache invalidation messages.

The design has three critical properties:

Sub-millisecond propagation: Invalidation messages are small (key name + sequence number) and propagated via a gossip protocol that reaches all nodes in O(log N) rounds. For a 100-instance cluster, that is 7 rounds, each measured in microseconds.
Built-in deduplication: Every invalidation carries a monotonic sequence number. If an instance receives the same invalidation twice (due to gossip fan-out), it is silently dropped. No duplicate processing, no thundering herds.
Partition tolerance with TTL fallback: If a network partition temporarily prevents an instance from receiving invalidation messages, its cached values still expire via TTL as a safety backstop. When the partition heals, automatic reconciliation catches up any missed invalidations. Coherence is the fast path. TTL is the safety net.

            Zero infrastructure overhead: Coherence is a feature of the Cachee runtime, not an external dependency. There is no message broker to deploy, no pub/sub topic to manage, no subscriber code to write. Enable it with one configuration flag and every instance in the cluster participates automatically.
        

Cross-Key + Cross-Instance: The Full Picture

Coherence handles cross-instance propagation — making sure every copy of a given key stays synchronized. But what about cross-key propagation — making sure that when a base key is invalidated, every derived key that depends on it is also invalidated?

That is the job of the causal dependency graph. And coherence composes with it. Here is the full cascade:

CDC detects a database change — a row in the users table is updated. CDC auto-invalidation fires on the connected Cachee instance and invalidates user:jane:email.
Dependency graph cascades — the dependency graph on that instance identifies that user:jane:dashboard and user:jane:profile-card depend on user:jane:email. Both derived keys are invalidated locally.
Coherence propagates — the coherence channel broadcasts all three invalidations (the base key + both derived keys) to every other Cachee instance in the cluster.
Remote dependency graphs cascade — on each receiving instance, the dependency graph evaluates whether there are additional local-only derived keys that need invalidation. Any further cascades are handled locally.

A single row change in PostgreSQL results in correct, immediate invalidation of every base key and every derived key, on every instance, with zero application code involved. The entire path — database commit to full cluster consistency — completes in under 1ms.

Why Not Just Use Redis Pub/Sub?

The most common DIY approach to L1 cache coherence is: write through to Redis, subscribe to a Redis Pub/Sub channel for invalidation events, and invalidate the local L1 when a message arrives. This works in demos. It fails in production for several reasons:

Message loss: Redis Pub/Sub is fire-and-forget. If an instance is disconnected or busy when a message is published, the message is lost. There is no replay. The L1 cache on that instance becomes permanently stale until the TTL expires. Cachee’s gossip protocol has built-in retry and reconciliation.
No dependency awareness: Redis Pub/Sub sends you a key name. It does not know about derived keys. You must build and maintain the dependency logic in every subscriber. Cachee’s dependency graph handles it automatically.
Additional infrastructure: You need a Redis cluster (or an additional Redis instance) just for the pub/sub channel. This is an entire distributed system to manage, scale, and pay for — all for a coordination role that should be built into the cache itself.
Latency overhead: Pub/Sub messages traverse the network to Redis, are fanned out to all subscribers, and traverse the network back. Round-trip times measured in milliseconds. Cachee’s gossip protocol operates peer-to-peer with no central broker.

            The difference between demo and production: L1 caching without coherence is a demo. Every team that tries it encounters the stale-copy problem within days and either abandons L1 or builds a fragile DIY invalidation system. L1 caching with coherence is production infrastructure — sub-microsecond reads with centralized-cache consistency guarantees.
        

Partition Tolerance and Auto-Reconciliation

Network partitions happen. A Cachee instance may be temporarily unable to reach the coherence channel due to a network split, a host migration, or a transient infrastructure issue. The system is designed for this.

During a partition, the isolated instance continues to serve cached data with TTL-based expiration as a safety backstop. Stale data is bounded by the TTL, just as it would be in a traditional cache. When the partition heals, the instance performs automatic reconciliation — comparing its local state against the cluster’s invalidation log and catching up any missed invalidations. The reconciliation is incremental (only missed events, not a full sync) and completes in milliseconds.

The coherence protocol does not require strong consensus. It uses eventual consistency with bounded staleness — invalidations arrive within sub-millisecond under normal conditions, and within TTL-bounded time under partitioned conditions. For cache coherence, this is the optimal tradeoff: you get the speed of L1, the consistency of a centralized store under normal operation, and graceful degradation when the network misbehaves.

What This Unlocks

With coherence, L1 caching is no longer a single-instance optimization. It is a fleet-wide architecture. Every instance reads from its local cache with sub-microsecond latency. Every write propagates to every instance in sub-millisecond time. The consistency model is the same as a centralized cache, but the performance model is that of in-process memory access.

This is the missing piece that has kept L1 caching in the “interesting but impractical” category for a decade. Coherence makes it practical. Combined with CDC and dependency graphs, it makes it production-grade.

L1 Speed. Centralized Consistency. Zero Tradeoff.

Cross-service cache coherence. Sub-millisecond propagation. Partition-tolerant with auto-reconciliation. No pub/sub infrastructure required.

Start Free Trial Schedule Demo

Cross-Service Cache Coherence: The Missing Piece for L1 at Scale