AI inference at scale requires caching outputs. But cached AI outputs have zero provenance. Which model? Which version? Which training data? Cryptographic computation fingerprints change that.
AI inference at scale requires caching outputs. A sepsis risk model that runs in 200ms cannot run on every API call when the same patient vitals produce the same result. A credit scoring model that costs $0.03 per inference cannot recompute for every microservice that needs the score. A content moderation model that processes 10M posts per hour cannot evaluate each one from scratch. So you cache the results.
But cached AI outputs have zero provenance. Which model produced this result? Which version? Which training data? Which prompt? Which temperature? When the EU AI Act requires traceability for high-risk AI decisions, cached inference results are a compliance gap.
When a patient sues over an AI clinical recommendation, the cached result has no proof of origin. When a credit decision is challenged for bias, the cached score can't prove what rules produced it. When the EU AI Act mandates Article 12 logging for high-risk systems, cached inference is a blind spot. The cache layer has become the weakest link in AI governance.
Models update. Training data changes. Parameters shift. But cached results from the previous version continue to serve until TTL expires. A credit model retrained to remove a biased feature still serves cached scores computed with that feature. A medical model updated with new clinical trial data still returns old predictions from cache. Stale AI outputs are invisible — indistinguishable from fresh inference.
Current AI caching: store the response as a Redis key-value pair. No provenance. No integrity. No audit trail. If the model updates, stale cached results continue to serve. If the training data changes, old results persist until TTL expires. The cache has no concept of model lineage — it's just bytes in, bytes out.
Semantic caches are worse. Embeddings of personal data can't be "deleted" without rebuilding the index. A user requests data deletion under GDPR — but their query embedding is entangled in the similarity index. You can delete the key, but the semantic ghost remains.
Attacker crafts a prompt that produces a malicious response. The response gets cached. Every subsequent user with a similar query gets the poisoned response. In Redis: no way to detect the poisoning. No way to trace the origin. No way to prove what happened. No way to identify which users received the poisoned result. The cache amplifies the attack surface from one user to every user.
Cachee changes the AI caching model. Instead of storing raw inference results with no lineage, Cachee stores signed, fingerprinted computation results that cryptographically bind every output to the exact model, parameters, and data that produced it.
SHA3-256(model_version || prompt_hash || temperature || top_p || system_prompt_hash || training_data_hash) — every inference result is bound to its exact inputs. Change ANY parameter and the fingerprint changes. Old results are not served. Stale inference is architecturally impossible.
Every cached inference is signed by three independent post-quantum signature families (ML-DSA-65, FALCON-512, SLH-DSA). Modification is detectable. Authenticity is provable. Cache poisoning is identifiable. No trust assumption required.
Every state change is recorded: when each result was produced, verified, and served. Who read it. When the model updated. When the result was superseded. Tamper-evident. Delete an entry? Detectable. Modify an entry? Detectable. Reorder entries? Detectable.
Active → Superseded (model updated) → Revoked (bias detected). Three caching patterns: exact match, semantic similarity, result-only. Verification modes: AlwaysVerify for medical/financial AI, Probabilistic for general inference. The cache knows when its contents are stale — and acts on it.
AI provenance is no longer a metadata tag. It's a mathematical property of the storage layer.
Run it yourself: brew install cachee && cachee-gold-demo
Reconstruct exactly what the model produced at any point in time. When a contested decision is challenged — "what did the AI recommend on March 14th?" — the answer is one command: AUDITLOG. The computation fingerprint proves which model, which version, which parameters. Tamper-evident. Independently verifiable.
AI agents with verifiable, tamper-evident memory. Every observation, every decision, every action cached with a computation fingerprint and three PQ signatures. Agent behavior becomes auditable. Memory becomes provable. Trust becomes mathematical.
Articles 12-15 require logging, transparency, human oversight, and accuracy for high-risk AI systems. Cachee satisfies these at the cache layer: hash-chained audit trails (Art. 12), computation fingerprints binding results to model versions (Art. 13), lifecycle state machine with manual override (Art. 14), fingerprint invalidation on model change (Art. 15).
Full chain from training data → model → inference → cache → verification. Bias audit capability: prove which training data version produced contested results. Encrypted inference: cache results without exposing the input data. The cache becomes the governance layer, not just the performance layer.
| Regulation | Requirement | Cachee Implementation |
|---|---|---|
| EU AI Act — Article 12 | Logging & traceability | Hash-chained audit log, Merkle anchoring, AUDITVERIFY |
| EU AI Act — Article 13 | Transparency | Computation fingerprints binding results to model version + parameters |
| EU AI Act — Article 14 | Human oversight | Lifecycle state machine with manual Revoke + Override states |
| EU AI Act — Article 15 | Accuracy & robustness | Fingerprint invalidation on model/data change, cache poisoning detection |
| NIST AI RMF | Govern, Map, Measure, Manage | Governance lineage from training data to verification, provenance scoring |
| ISO 42001 | AI management system | Tamper-evident audit trail, lifecycle controls, deterministic replay |
| FDA AI/ML Guidance | Predetermined change control | Superseded state on model update, version-bound fingerprints |
| HIPAA (AI + ePHI) | 45 CFR 164.312 | Encrypted inference caching, access controls, audit logging, integrity |
One architecture. Many manifestations.
Deploy Cachee in your VPC. Inference provenance built into the cache layer.
Every result signed. Every access audited. Every model version tracked.