Skip to main content
Why CacheeHow It Works
All Verticals5G TelecomAd TechAI InfrastructureFraud DetectionGamingTrading
PricingDocsBlogSchedule DemoLog InStart Free Trial
← Back to Blog
Engineering

CDC Auto-Invalidation: The End of Stale Cache Data

TTL is a lie. If you set a 300-second TTL, you are guaranteed to serve stale data for up to 299 seconds after every write. Shorten the TTL and your hit rate collapses. Lengthen it and your users see ghosts of data that changed minutes ago. This is not a tradeoff you should have to make. Change Data Capture eliminates it entirely — invalidating cache keys within a millisecond of the underlying database row changing, with zero application code.

The TTL Tradeoff Is a Trap

Every caching tutorial starts the same way: set a TTL, move on. It sounds reasonable until you think about what TTL actually means. A TTL of 60 seconds does not mean your data is “mostly fresh.” It means that for every single write, there is a window — up to 59 seconds long — during which your cache is confidently serving the wrong answer.

Teams try to optimize this. They shorten TTLs to 5 seconds. Hit rates drop from 95% to 40%. The cache is barely doing anything. They lengthen TTLs to 600 seconds. Hit rates climb, but users see stale prices, stale inventory counts, stale profile data. Support tickets come in. Engineers add manual invalidation code, then more manual invalidation code, then give up and stop caching that endpoint entirely.

The fundamental problem is that TTL is time-based when the actual invalidation signal is event-based. Data does not become stale after N seconds. It becomes stale when someone writes a new value to the database. The cache should react to the write, not to a countdown timer.

CDC: The Event Your Cache Has Been Missing

Change Data Capture is a database feature that streams every row-level change as a structured event. PostgreSQL exposes it through the Write-Ahead Log (WAL). MySQL provides it through the binary log. DynamoDB offers DynamoDB Streams. Every major database has a change stream. Almost nobody connects it to their cache.

Cachee connects to your database’s change stream and maps row changes to cache key invalidations. The configuration is a single line:

CDC MAP users.email -> user:{email}

That line tells Cachee: when the email column in the users table changes, invalidate the cache key constructed by interpolating the email value. When jane@example.com updates her profile, the key user:jane@example.com is invalidated — not in 60 seconds, not when a background job runs, but within the same millisecond that PostgreSQL commits the transaction.

Multiple mappings compose naturally:

CDC MAP users.id       -> user:{id}
CDC MAP users.id       -> user:{id}:profile
CDC MAP orders.user_id -> user:{user_id}:orders
CDC MAP products.sku   -> product:{sku}
CDC MAP products.sku   -> product:{sku}:price

One table change can invalidate multiple keys. Multiple table changes can invalidate the same key. The mapping is declarative — you state what depends on what, and the CDC connector handles the rest.

Zero code, real numbers: Teams that switch from TTL-based invalidation to CDC typically see cache hit rates climb from 85% to 99%+. Not because they are caching more aggressively — but because they no longer need conservative TTLs that throw away perfectly valid cache entries.

Why Hit Rates Jump From 85% to 99%

The math is straightforward. With TTL-based caching, you set short TTLs on frequently-changing data to limit staleness. Short TTLs mean frequent expirations. Frequent expirations mean cache misses. Misses mean database hits, latency spikes, and wasted compute.

With CDC, the TTL becomes irrelevant for correctness. You can set it to infinity — or more practically, to a very long value as a safety backstop — because the actual invalidation is driven by writes. A cache entry for user:jane@example.com stays valid for hours, days, or weeks if no one updates that row. The moment someone does, it invalidates instantly. You get the hit rate of aggressive caching with the correctness of no caching at all.

This is not theoretical. In production workloads where reads outnumber writes 100:1, the difference between a 60-second TTL and a CDC-driven approach is the difference between an 85% hit rate and a 99.3% hit rate. That gap translates directly into fewer database connections, lower p99 latency, and significantly reduced infrastructure cost.

How It Works Under the Hood

Cachee’s CDC connector runs as a logical replication subscriber for PostgreSQL (or binlog reader for MySQL, or stream consumer for DynamoDB). It does not poll. It does not scan. It receives events as the database produces them, in commit order, with exactly-once delivery guarantees.

For each event, the connector:

  1. Extracts the changed columns from the WAL record.
  2. Matches against the CDC MAP rules to determine which cache keys are affected.
  3. Emits invalidation commands to the local Cachee instance.
  4. Propagates via coherence to every other Cachee instance in the cluster, if cross-instance coherence is enabled.

The entire path — from database commit to cache invalidation on every instance — completes in under 1ms. There is no queue backlog, no polling interval, no eventual consistency window measured in seconds.

Composition: CDC + Dependency Graphs + Coherence

CDC invalidates base keys — the ones that map directly to database rows. But many of your most valuable cache entries are derived keys: dashboards assembled from multiple tables, API responses combining data from several services, aggregated metrics built from raw event streams.

This is where CDC composes with Cachee’s causal dependency graph. When CDC invalidates a base key like user:123:orders, the dependency graph automatically cascades that invalidation to every derived key that declared a dependency on it — user:123:dashboard, user:123:weekly-summary, the aggregated team-level view. One database write, zero application code, complete correctness across the entire key hierarchy.

And because CDC events propagate through the coherence channel, the invalidation reaches every instance. A row change in PostgreSQL results in correct, immediate invalidation of every base key and every derived key, on every Cachee node in the cluster. The entire pipeline — database commit to full cluster invalidation — is automatic, sub-millisecond, and requires exactly zero lines of invalidation code in your application.

The end of invalidation code: You should not have hand-written cache invalidation logic in your application. CDC MAP rules replace it. Dependency graphs cascade it. Coherence propagates it. The cache manages its own correctness.

What This Means for Your Architecture

CDC auto-invalidation changes what is safe to cache. Endpoints that were too dangerous to cache — pricing data, inventory counts, user preferences, billing state — become trivially cacheable. The risk of serving stale data drops from “guaranteed for TTL-1 seconds” to “under 1ms after the write commits.”

It also changes how teams think about caching as a discipline. Instead of engineers writing invalidation logic and hoping they covered every write path, the cache itself subscribes to the source of truth and reacts in real time. The database is the authority. The cache is the subscriber. The relationship is explicit, auditable, and automatic.

TTL is not an invalidation strategy. It is a confession that your cache does not know when data changes. CDC gives it that knowledge.

Related Reading

Also Read

Stop Serving Stale Data. Start Reacting in Real Time.

CDC auto-invalidation. Sub-millisecond database-to-cache sync. Zero invalidation code. Hit rates above 99%.

Start Free Trial Schedule Demo