Cache Contracts: Enforceable Freshness SLAs That Make

Every cache in production today operates on a handshake agreement: you set a TTL, and the cache promises to evict the key when the timer expires. That is the entire contract. There is no guarantee the data was fresh at any point during the TTL window. There is no audit trail. There is no alerting when data goes stale. There is no compliance report you can hand to an auditor. For regulated industries — finance, healthcare, government — this is not acceptable. We built what is missing: per-key freshness contracts that the cache engine enforces, monitors, and reports on.

TTL Is Not a Freshness Guarantee

TTL tells the cache when to evict a key. It says nothing about when the key was last refreshed. A key with a 60-second TTL might contain data that is 59 seconds stale, and no system will notice, alert, or log it. The value sits in the cache, silently wrong, until the timer runs out and someone re-fetches.

This model works for content that tolerates staleness — blog posts, marketing pages, static assets. It does not work for data with real-time correctness requirements: financial prices, patient records, inventory counts, trading positions, account balances. For these keys, “stale for up to 59 seconds” is not a rounding error. It is a regulatory violation, a financial loss, or a clinical risk.

The problem is structural. TTL is a timer, not a contract. It does not guarantee freshness. It does not measure freshness. It does not report freshness. It is a best-effort expiration mechanism bolted onto a key-value store. Every cache system — Redis, Memcached, Hazelcast, Caffeine — uses this same model. None of them offer anything better.

Regulated Industries Cannot Trust Best-Effort

SOC 2 auditors are now asking about cached data. The question is straightforward: “How do you ensure that cached data reflects the current state of the source system?” When the answer is “we set a TTL and hope for the best,” the conversation does not go well.

FINRA requires audit trails on financial data, including data served from caches. If a trading platform serves a stale price from cache, the platform needs to prove it was fresh at the time of the trade, or explain why it was not. Today, no cache produces this evidence.

HIPAA mandates that patient data be current. A clinical system that caches patient vitals must guarantee those values reflect the latest readings. A 30-second TTL on patient data is not a HIPAA-compliant freshness strategy — it is a liability.

E-commerce platforms that cache inventory counts oversell products every time the cache is stale. Every oversell triggers a refund, a customer service ticket, and a trust loss. The cost of stale inventory data is directly measurable in dollars per minute.

None of these industries can accept “best-effort.” They need enforceable guarantees, violation tracking, and compliance evidence. Caching has never offered any of these.

Cache Contracts: Per-Key SLAs with Teeth

A cache contract is a freshness SLA attached to a specific key at write time. The syntax is a single modifier on the SET command:

SET user:123:balance <value> SLA FRESH_WITHIN 100ms REFRESH_FROM GET /api/users/123/balance

This tells the cache engine three things: (1) this key must never be older than 100ms, (2) when the freshness window is about to expire, refresh the value by calling the specified source, and (3) if the refresh fails, fire an alert and log the violation.

The engine does not wait for the key to expire. It schedules a proactive refresh before the 100ms window closes. The value is always fresh when a client reads it — not because of lucky timing, but because the engine enforced the contract. If the source is unreachable, the engine fires an alert before any client sees stale data. Every refresh, every violation, and every recovery is logged with a timestamp, the key name, the contract terms, and the source response code.

            The difference: TTL is a timer that silently lets data go stale. A cache contract is an SLA that the engine proactively enforces, monitors, and reports on. One is best-effort. The other is production infrastructure.
        

Composition with CDC and Dependency Graphs

Cache contracts do not replace CDC auto-invalidation or the causal dependency graph. They compose with them.

CDC provides event-driven freshness. When a database row changes, CDC invalidates the corresponding cache key immediately. But CDC depends on the database change event actually firing. If the source is not a database (an API, an external feed, a computed value), CDC cannot help. If the CDC pipeline has latency, there is a window of staleness.

Cache contracts provide time-bound freshness. Regardless of whether a CDC event fires, the engine guarantees the value will be refreshed within the contracted window. If CDC fires first, the contract is satisfied early. If CDC is delayed or unavailable, the contract catches it. Together, they provide both reactive and proactive freshness — zero gaps.

The dependency graph propagates contracts to derived keys. When a contracted key participates in a dependency graph, the freshness guarantee cascades. If user:123:dashboard has a 100ms FRESH_WITHIN contract and depends on user:123:balance, the engine ensures the balance is fresh before refreshing the dashboard. The contract follows the causal chain, not just the individual key.

The Compliance Pitch

Imagine walking into a SOC 2 audit and handing the auditor a report that says: “Over the past 90 days, our cache maintained 99.97% contract compliance across 14,200 contracted keys. There were 3 violations, all on the stock:* prefix, caused by a source API timeout on March 14. Mean freshness across all contracted keys was 34ms. P99 staleness was 82ms. Every refresh event is logged with timestamps and source response codes. Here is the exportable report.”

That is a different conversation from “we set TTLs and we think the data is usually fresh.”

The compliance dashboard exposes contract compliance percentage, violations per key prefix, mean freshness, p99 staleness, and refresh success rate. Metrics are available via the dashboard, the API, and as exportable CSV/JSON. Violation logs include the key name, the contract terms, the refresh timestamp, the source response code, and the resolution time. This is the evidence that SOC 2, FINRA, and HIPAA auditors require.

Why Nobody Else Has This

Redis offers TTL and nothing else. There is no per-key SLA, no proactive refresh, no violation alerting, and no compliance metrics. TTL is fire-and-forget. When the timer expires, the key is gone. What happened between SET and expiry is invisible.

Memcached has the same model. Keys expire by TTL. There is no freshness tracking, no refresh mechanism, and no audit trail. Memcached was designed as a simple volatile store. Compliance was never in scope.

Caffeine and Guava offer refreshAfterWrite, which is the closest analog. But it is local-only (single JVM), has no distributed propagation, no violation alerting, no compliance metrics, and no audit log. It is a local refresh heuristic, not an enforceable SLA.

CDNs (CloudFront, Fastly, Akamai) use TTL and stale-while-revalidate headers. These are HTTP-level hints, not enforceable contracts. There is no per-key SLA, no violation tracking, and no compliance reporting. A CDN cannot tell you whether a specific key was fresh at a specific time.

No caching system — distributed, local, or CDN — offers per-key freshness contracts with proactive refresh, violation alerting, and compliance metrics. This is a category of functionality that has not existed until now.

            The line: Best-effort caching is a demo. Contracted caching is production. The difference is whether you can prove your data was fresh when it mattered.
        

The Numbers That Matter

Cache performance discussions get philosophical fast. Here are the actual measured numbers from production deployments running on documented hardware, so you can compare against your own infrastructure instead of trusting marketing copy.

L0 hot path GET: 28.9 nanoseconds on Apple M4 Max, single-threaded against pre-warmed in-memory cache. This is the floor — there's no faster way to read a key.
L1 CacheeLFU GET: ~89 nanoseconds on AWS Graviton4 (c8g.metal-48xl). Sharded DashMap with admission filtering.
Sustained throughput: 32 million ops/sec single-threaded on M4 Max, 7.41 million ops/sec at 16 workers on Graviton4 c8g.16xlarge.
L2 fallback: Sub-millisecond hits against ElastiCache Redis 7.4 over same-AZ network when L1 misses cascade through.

The compounding effect matters more than any single number. A 28-nanosecond L0 hit means your application spends almost zero time on cache lookups in the hot path, leaving the CPU free for the actual business logic that generates revenue.

Observability And What To Measure

You can't tune what you can't measure. The four metrics that matter for any production cache deployment, in order of importance:

Hit rate, broken down by key prefix or namespace. A global hit rate of 92% sounds great until you discover that one critical namespace is sitting at 40% and dragging your tail latency. Per-prefix hit rates expose which workloads are getting cache value and which aren't.
Latency percentiles, not averages. p50, p95, p99, and p99.9 for both cache hits and cache misses. The cache miss latency is your fallback path performance — when the cache fails, this is what your users actually experience.
Memory pressure and eviction rate. If your eviction rate is climbing while your hit rate stays flat, you're under-provisioned. If both are climbing, your access pattern shifted and you need to retune TTLs or rethink what you're caching.
Stale-read rate. The percentage of cache hits that returned a value the application then discovered was stale. This is the canary for your invalidation strategy. If it's above 1%, your invalidation logic has a bug.

Cachee exposes all four out of the box via Prometheus metrics on the standard scrape endpoint, plus a real-time SSE stream for dashboards that need sub-second visibility. The right time to wire these into your monitoring stack is before the migration, not after the first incident.

What This Actually Costs

Concrete pricing math beats hypothetical. A typical SaaS workload with 1 billion cache operations per month, average 800-byte values, and a 5 GB hot working set currently runs on AWS ElastiCache cache.r7g.xlarge primary plus a read replica — roughly $480 per month for the two nodes, plus cross-AZ data transfer charges that quietly add another $50-150 per month depending on access patterns.

Migrating the hot path to an in-process L0/L1 cache and keeping ElastiCache as a cold L2 fallback drops the dedicated cache spend to $120-180 per month. For workloads where the hot working set fits inside the application's existing memory budget, you can eliminate the dedicated cache tier entirely. The cache becomes a library you link into your binary instead of a separate service to operate.

Compounded over twelve months, that's $3,600 to $4,500 per year on a single small workload. Multiply across a fleet of services and the savings start showing up in finance team conversations. The bigger savings usually come from eliminating cross-AZ data transfer charges, which Redis-as-a-service architectures incur on every read that crosses an availability zone.

The AWS-Specific Math

Most cache cost discussions ignore AWS-specific line items that turn out to dominate the bill. Three to track:

Cross-AZ data transfer. ElastiCache replicas across availability zones charge $0.01/GB for inter-AZ traffic in both directions. A workload doing 100 GB/day of cache reads across AZs runs an extra $30-60/month in transfer fees alone — invisible until you scrutinize the AWS bill line by line.
Reserved instance lock-in. ElastiCache reserved capacity gets you a 30-50% discount but locks you into a specific node type for one or three years. If your workload grows or your access pattern changes, you're paying for capacity you can't use efficiently.
Backup and snapshot storage. ElastiCache automatic backups are billed separately at S3 rates. For high-frequency snapshot configurations on large nodes, this can add 10-20% to the monthly bill that nobody attributes to "caching."

Running Cachee in-process inside your application binary eliminates all three line items at once. There's no separate cache tier to provision, no cross-AZ traffic for L0 reads, no reserved capacity to forecast, and no backup storage because the cache is reconstructible from the source of truth.

Make Your Cache Auditable.

Per-key freshness contracts. Proactive refresh. Compliance metrics. Enforceable SLAs for every key that matters.

Start Free Trial Schedule Demo

Cache Contracts: Enforceable Freshness SLAs That Make Caching Auditable