Causal Dependency Graphs: The Cache Invalidation

“There are only two hard things in computer science: cache invalidation and naming things.” Everyone quotes it. Nobody has actually solved the first one. The reason is not that invalidation is inherently complex — it is that caches have never had a way to express why a value exists. When a cache has no model of causality, every invalidation strategy degrades to either guessing (TTL), over-killing (pattern flush), or scattering dependency knowledge across every service that writes data. We built the missing primitive. It is a directed acyclic graph of key dependencies, evaluated on every invalidation event, and it changes what is safe to cache.

The Problem: Derived Keys Have No Parents

Consider a common production scenario. You have a key called user:123:dashboard. The value behind that key is not a single database row. It is an aggregation — assembled from users.name, users.prefs, orders.recent, and billing.plan. Four tables, possibly four different services, combined into one expensive computation that you want to cache because it takes 200ms to rebuild.

Now orders.recent changes. A new order comes in. The dashboard is stale. But the cache does not know this. It has no concept that user:123:dashboard is derived from user:123:orders. It stores keys and values. The relationship between them is invisible.

This is the fundamental gap. And it produces a predictable set of bad outcomes that every engineering team recognizes:

Hand-coded invalidation: Every service that writes to orders, users, or billing must know to also invalidate user:123:dashboard. One team adds a new derived key. The writer service does not get updated. Stale data ships to production.
TTL and pray: Set a 60-second TTL and accept that users see stale dashboards for up to 59 seconds. Shorten the TTL to 5 seconds and your hit rate collapses. There is no winning position on this curve.
Don’t cache it at all: The most common outcome. Teams look at the invalidation problem, decide it is not worth the correctness risk, and serve the 200ms computation on every request. The most expensive values in your system are the ones that never get cached.

That last point is worth sitting with. The entire purpose of a cache is to eliminate redundant computation. But the computations that cost the most — dashboards, reports, aggregated views, composite API responses — are precisely the ones that teams refuse to cache because invalidation is too dangerous. The cache is protecting you from cheap lookups and abandoning you on expensive ones.

Why Every Existing Strategy Fails

The standard toolkit for cache invalidation was designed for a world of simple, single-source keys. None of it accounts for derived values.

TTL is a timer, not an invalidation strategy. You are guaranteed to serve stale data for up to TTL - 1 seconds. For a 300-second TTL, that is five minutes of wrong answers after every write. Shorter TTLs improve freshness but destroy hit rates. You are not solving invalidation; you are choosing how much staleness you can tolerate.

Pattern invalidation (KEYS user:123:*) does not know which keys are actually affected by a given change. If billing.plan changes, the affected derived keys do not share a prefix with billing. Pattern matching is a lexicographic operation applied to a causal problem. It either over-invalidates (nuclear option: flush everything matching a broad pattern) or under-invalidates (misses derived keys with different prefixes).

Application-code invalidation requires that every writer know every consumer. In a monolith, this is tedious but tractable. In a microservices architecture, it is a distributed coordination problem. The orders service writes to the database. The dashboard service caches a derived value. The orders service must somehow know about the dashboard cache key, and every other derived key that depends on order data, across every other service. This breaks at the second microservice.

Pub/sub notification (Redis Pub/Sub, Kafka, etc.) moves the problem but does not solve it. Now every consumer must subscribe to the right channels and know what to invalidate when a message arrives. The dependency knowledge is still scattered — it just moved from write-side code to subscription handlers.

CDC-based auto-invalidation — which Cachee already provides — solves the table-to-key mapping. When a database row changes, the corresponding cache key is invalidated automatically. But CDC operates at the source level. It does not handle key-to-key dependencies: derived keys that are built from other cached keys rather than directly from a database row.

The Primitive: A Directed Acyclic Graph of Key Dependencies

The solution is to give the cache a dependency model. When you cache a derived value, you declare what it depends on:

SET user:123:dashboard <value> DEPENDS_ON user:123:profile user:123:orders billing:plan:enterprise

The cache records these relationships in a directed acyclic graph (DAG). user:123:dashboard has three incoming edges: one from user:123:profile, one from user:123:orders, and one from billing:plan:enterprise. When any of those source keys is invalidated — by CDC, by explicit DEL, by TTL expiry, by eviction — the cache walks the graph and invalidates every downstream key.

billing:plan:enterprise ——> user:123:dashboard
user:123:orders ———————> user:123:dashboard
user:123:profile —————> user:123:dashboard

Invalidate any source → all dependents are automatically invalidated

The graph is transitive. If user:123:dashboard depends on user:123:orders, and user:123:weekly-summary depends on user:123:dashboard, then invalidating user:123:orders cascades through the entire chain. The summary is invalidated even though it has no direct relationship to the orders key. The cache traces the causal path automatically.

This is the same model that build systems have used for decades. make tracks file dependencies and rebuilds only what changed. Bazel constructs a DAG of build targets and propagates invalidation transitively. The insight is not new. What is new is applying it to cache invalidation — and composing it with CDC, cross-instance coherence, and trigger systems to produce something that no existing cache offers.

            Key insight: Most teams do not cache their most expensive computations because invalidation is too hard. The dependency graph makes it safe to cache everything. The cache itself enforces correctness — not application code, not TTLs, not hope.
        

How It Composes With Existing Cachee Features

The dependency graph is not a standalone feature. Its power comes from composition with the primitives Cachee already provides.

CDC + Dependency Graph

CDC auto-invalidation detects database row changes and invalidates the corresponding base cache key. The dependency graph picks up where CDC stops. When CDC invalidates user:123:orders, the graph automatically propagates that invalidation to user:123:dashboard, user:123:weekly-summary, and every other derived key in the cascade. Zero application code. The database change flows through CDC into the graph and out to every affected key.

Coherence + Dependency Graph

Cache coherence handles cross-instance propagation — ensuring that when a key is invalidated on one Cachee instance, every other instance in the cluster invalidates it too. The dependency graph handles cross-key propagation. Together: when a database row changes, CDC fires on one instance, coherence propagates the base invalidation to all instances, and the dependency graph on each instance cascades it to all derived keys. A single row change in PostgreSQL results in correct, immediate invalidation of every derived key on every instance. No application code involved.

Triggers + Dependency Graph

Cache triggers fire custom logic on invalidation events — logging, webhooks, pre-warming. With the dependency graph, ON_INVALIDATE triggers fire for every key in the cascade. A single database write can trigger a webhook for the dashboard, a log entry for the summary, and a pre-warm call for the analytics rollup. The trigger system gives you visibility and control over the entire cascade.

Why Nobody Else Has Built This

This is not an oversight. It is a structural gap in how caches have been designed.

Redis is a data structure server. It stores keys and values. It has no concept of key relationships because it was built as a single source of truth, not as a derived-value cache. The dependency problem does not exist when every key is independently authoritative.

Memcached is a flat key-value store with TTL-based expiration. There is no eviction callback, no invalidation propagation, and no metadata model that could support dependency tracking.

Application-level caches (Caffeine, Guava, lru_cache) operate within a single process and have no dependency model. They are eviction engines, not invalidation engines. They decide what to remove when memory is full, not what becomes stale when upstream data changes.

Hazelcast and other distributed caches provide near-cache and replication but treat every key as independent. They solve the distribution problem but not the derivation problem.

Every L1 and sidecar cache in production has this problem. None have solved it. The combination of DAG-based invalidation with CDC source detection and cross-instance coherence propagation is novel, and it is the reason the L1 cache category has not broken into enterprise production at scale. Without a dependency model, teams cannot trust the cache with their most important data. With one, they can.

What This Changes

The dependency graph does not make caching faster. It makes caching safe. The performance characteristics of an L1 cache are already extraordinary — sub-microsecond reads, zero network hops, AI-driven prefetching. The bottleneck has never been speed. It has been trust. Teams do not cache derived values because they cannot guarantee correctness. The dependency graph is the correctness guarantee.

With it, the calculus changes:

Dashboard responses: Cache the full computed result. Declare dependencies on the underlying keys. When any source changes, the dashboard is automatically invalidated and can be lazily or eagerly rebuilt.
Aggregated API responses: Cache the composite payload. The graph ensures that a price change in the billing service invalidates every API response that includes pricing, across every instance.
Report data: Cache the materialized view. When the underlying metrics shift, the report cache is invalidated and the next request triggers a fresh computation.
Multi-service composites: Service A caches a value that depends on data from services B, C, and D. The graph tracks the dependencies. No cross-service coordination required.

These are the workloads that cost the most and benefit the most from caching. They have been off-limits for the entire history of key-value caches. They are not off-limits anymore.

            The line between demo and production: L1 caching without dependency graphs is a demo. L1 caching with dependency graphs is production infrastructure. This is the difference between “engineers experiment with it” and “it runs the business.”
        

Make Every Derived Value Safe to Cache.

Causal dependency graphs. CDC auto-invalidation. Cross-instance coherence. Zero application-code invalidation logic.

Start Free Trial Schedule Demo

Causal Dependency Graphs: The Cache Invalidation Primitive Nobody Has Built