Which Model Produced This Output? Cachee Knows.

May 14, 2026 | 14 min read | Engineering

A patient receives a medical recommendation from an AI system. Three months later, the recommendation is challenged. The hospital needs to answer one question: which model version produced this output? Not which model is running now. Which model was running then, on that day, with those parameters, producing that specific result. The answer determines liability, regulatory compliance, and whether the recommendation can be defended or must be retracted.

Traditional caches cannot answer this question. Redis stores a key and a value. Memcached stores a key and a value. ElastiCache stores a key and a value. None of them store provenance -- the record of what produced the value, when it was produced, under what conditions, and whether it has been modified since. The value exists in the cache with no history, no attribution, and no verification. When the question is "which model produced this output," the cache is silent.

This silence is the problem Cachee solves. Every cached value in Cachee carries a computation fingerprint that binds it to its exact production conditions: model name, model version, prompt hash, temperature, top_p, system prompt hash, and hardware class. The fingerprint is not metadata stored alongside the value. It is the cache key. The fingerprint is the identity of the computation, and the value is the result of that identity. Change any production condition and the identity changes. The old result is never returned under the new identity.

Fingerprint Fields Per Result

SHA3-256

Fingerprint Hash Algorithm

2^128

Collision Resistance (bits)

The Provenance Problem in AI Caching

The provenance problem has three dimensions: attribution (what produced this result), integrity (has this result been modified), and temporality (when was this result produced and is it still valid). Traditional caches address none of these dimensions. They store values as opaque byte sequences with optional time-to-live expiration. There is no concept of "what produced this value" because the cache does not care. It stores what you give it and returns what you ask for.

This design worked when caches stored database query results. If a cached database result was wrong, you could re-query the database and get the correct result. The cache was a performance optimization over a deterministic source of truth. AI inference is different. The model that produced a cached result three months ago may no longer exist. The provider may have updated the model weights. The system prompt may have changed. The temperature setting may have been adjusted. There is no deterministic source of truth to re-query. The cached result is the only record of what the model produced under those specific conditions.

When that cached result is challenged -- by a regulator, a patient, a borrower, a job applicant, or a plaintiff -- you need to answer specific questions. Not "we used GPT-4" but "we used gpt-4-0613-checkpoint-20260301 at temperature 0.0 with system prompt version 47 on A100-80GB hardware." Each of these details matters because each affects the output. A different temperature produces different text. A different system prompt produces different behavior. A different model checkpoint produces different knowledge and different biases. Without provenance, you cannot distinguish between any of these conditions.

Anatomy of a Computation Fingerprint

The Cachee computation fingerprint is a SHA3-256 hash computed over seven fields. Each field captures one dimension of the computation's identity. The choice of SHA3-256 is deliberate: it provides 128-bit collision resistance, meaning an attacker would need approximately 2^128 operations to find two different sets of production conditions that produce the same fingerprint. This is computationally infeasible with any known or foreseeable technology. The fingerprint is, for all practical purposes, a unique identifier for a specific computation under specific conditions.

Field 1: model_name

The canonical name of the model. This distinguishes between model families: "gpt-4" is not "gpt-3.5-turbo" is not "claude-3-opus." The model name is the coarsest-grained provenance field. It answers the question "which model family was used?" but not "which specific version of that model family?"

Field 2: model_version

The specific version identifier for the model. This is the field that answers "which exact model produced this output." For OpenAI models, this is the checkpoint identifier (e.g., "gpt-4-0613"). For self-hosted models, this is the model weights hash or deployment version. For fine-tuned models, this includes the fine-tuning job identifier. The model_version field is what makes provenance actionable: when a model is updated, every cached result from the previous version has a different fingerprint, and the cache automatically treats requests under the new version as misses.

Field 3: prompt_hash

The SHA3-256 hash of the complete prompt text, including any template wrapping, few-shot examples, retrieved context (for RAG applications), and dynamic content. The prompt is hashed rather than stored directly for two reasons: privacy (the prompt may contain PII or sensitive information) and efficiency (prompts can be thousands of tokens, while the hash is always 32 bytes). The hash preserves the binding between the prompt and the result without storing the prompt itself.

Field 4: temperature

The sampling temperature used for inference. Temperature directly affects output randomness. At temperature 0.0, the model produces deterministic output (selecting the highest-probability token at each step). At temperature 1.0, the model samples from the full probability distribution. A result produced at temperature 0.0 is fundamentally different from a result produced at temperature 0.7, even with the same prompt and model. The fingerprint captures this difference.

Field 5: top_p

The nucleus sampling parameter. Like temperature, top_p affects output randomness by limiting the sampling pool to tokens whose cumulative probability exceeds the threshold. A top_p of 0.1 restricts sampling to a narrow set of high-probability tokens. A top_p of 1.0 considers all tokens. The fingerprint captures top_p because it affects output independently of temperature.

Field 6: system_prompt_hash

The SHA3-256 hash of the system prompt. The system prompt defines the model's behavior, constraints, persona, and safety guardrails. Changing the system prompt can completely change the output for the same user prompt. This field is hashed separately from the prompt_hash because the system prompt typically changes at a different cadence than user prompts. When your team updates the system prompt to add safety guardrails or change the model's persona, every cached result from the old system prompt is automatically invalidated. This prevents the dangerous scenario where old, ungoverned results are served after governance updates.

Field 7: hardware_class

The class of hardware that ran the inference. Different GPUs can produce different floating-point results due to differences in instruction sets, precision handling, and parallelism strategies. An A100 may produce slightly different output than an H100 for the same model and prompt. The hardware_class field ensures that results are attributed to the correct computational environment. For reproducibility audits, this field is critical: it proves not just what model ran, but where it ran.

# Fingerprint computation — every field that affects output is captured

fingerprint = SHA3-256(
    model_name      = "gpt-4",
    model_version   = "gpt-4-0613-checkpoint-20260301",
    prompt_hash     = SHA3-256("Assess cardiovascular risk for patient with..."),
    temperature     = "0.0",
    top_p           = "1.0",
    system_prompt_hash = SHA3-256("You are a board-certified cardiologist..."),
    hardware_class  = "a100-80gb"
)

# Result: a1b2c3d4e5f6... (32 bytes, 64 hex chars)
# Change ANY field → completely different fingerprint → cache miss

Fingerprint = Identity + Audit + Integrity

The computation fingerprint serves three functions simultaneously. It is the cache key (determining whether a cached result exists for these exact conditions). It is the audit identifier (linking the result to its provenance). And it is the integrity anchor (any change to production conditions changes the fingerprint, automatically invalidating stale results). One hash, three guarantees, backed by SHA3-256 collision resistance.

Use Case: Regulatory Inquiry on a Medical Recommendation

A healthcare AI system uses an LLM to generate preliminary risk assessments for patients. The results are cached because many patients present with similar risk profiles, and recomputing the assessment for each patient costs $0.08 per API call. With caching, the cost drops to $0.000005 per cache hit -- a 16,000x reduction. The system serves 500,000 assessments per month, with a 55 percent cache hit rate, saving approximately $22,000 per month in API costs.

Six months after deployment, a regulator receives a complaint about a risk assessment. The patient argues that the assessment was biased because it did not account for a known limitation in the model's training data. The regulator sends an inquiry: which model produced this assessment, what version of the model was running, what was the system prompt, and has the assessment been modified since it was produced?

Without provenance, the healthcare organization faces a forensic reconstruction project. They search deployment logs to determine which model version was running on the date of the assessment. They search configuration management to determine which system prompt was active. They search application logs to determine the request parameters. Each of these searches may produce incomplete or contradictory information because logs are mutable and configuration history is often overwritten. The reconstruction takes two weeks and costs $75,000 in engineering and legal time. The result is a best-effort reconstruction that the regulator may or may not accept.

With Cachee, the response takes minutes. The compliance team retrieves the computation fingerprint from the cached result. The fingerprint contains the exact model version (gpt-4-0613-checkpoint-20260301), the system prompt hash (which maps to system prompt version 42 in the version control system), the temperature (0.0, deterministic), and the hardware class (a100-80gb). The AUDITLOG command produces the complete lifecycle: when the result was first cached, every time it was served, and whether it was ever verified or modified. The triple PQ signatures on the cached entry prove it has not been tampered with since creation. The regulator receives independently verifiable evidence within 48 hours instead of a reconstructed narrative after two weeks. The total cost is four hours of compliance team time.

Use Case: Bias Investigation Across Model Versions

A financial services company uses an LLM for credit risk narratives. The model is updated quarterly. After an update, the company notices that denial narratives for certain demographic groups have changed in tone. An internal bias investigation needs to answer: which results were produced by the old model version, which by the new, and how do the outputs differ?

With traditional caching, this investigation is impossible. The cache stores the latest value for each key. When the model was updated, new results overwrote old results. The old results are gone. The investigation cannot compare old and new outputs because the old outputs no longer exist.

With Cachee, the model version is part of the fingerprint. Results from the old model version have different fingerprints than results from the new version. Both exist in the cache simultaneously (the old results are in SUPERSEDED state, not deleted). The investigation team queries all cached results with the old model version fingerprint prefix, then all results with the new model version fingerprint prefix, and compares them directly. The supersession chain in the data lineage verification system records exactly when each old result was superseded by a new one and which new fingerprint replaced it.

# Query cached results by model version — both old and new coexist

# All results from old model version (now in SUPERSEDED state)
old_results = client.query_by_field(
    field="model_version",
    value="gpt-4-0613-checkpoint-20260101",
    include_superseded=True
)

# All results from new model version (ACTIVE state)
new_results = client.query_by_field(
    field="model_version",
    value="gpt-4-0613-checkpoint-20260401"
)

# Supersession chains link old → new for same prompts
for old in old_results:
    chain = client.supersession_chain(old.fingerprint)
    # chain.successor = the new result that replaced this one
    # chain.transition_time = when the replacement happened
    # chain.transition_authority = what triggered the replacement

Use Case: Model Rollback After Quality Regression

A company deploys a new model version and discovers that output quality has regressed for a specific category of queries. They need to roll back to the previous model version and identify which cached results from the new version need to be recomputed.

With Cachee, the rollback is surgical. Because the model version is part of the fingerprint, you can identify every cached result produced by the new model version. The REVOKE command marks all results with the new model version fingerprint as revoked, with a transition authority of "quality_regression_rollback" and a transition proof that links to the quality report. Revoked results are never served. Subsequent requests automatically miss the cache and trigger fresh inference against the rolled-back model version, which has a different fingerprint and therefore creates new cache entries.

The audit trail records every step: which results were revoked, when, why, and by whom. Which new results replaced them. Which model version produced the replacements. This is the kind of verifiable computation record that regulators and internal auditors require.

# Surgical model rollback — revoke all results from problematic version

# Identify all results from the regressed model version
affected = client.query_by_field(
    field="model_version",
    value="gpt-4-0613-checkpoint-20260401"
)

# Revoke with audit trail
for result in affected:
    client.revoke(
        fingerprint=result.fingerprint,
        authority="quality_team",
        reason="quality_regression_rollback",
        proof_reference="QA-2026-0514-regression-report"
    )

# Subsequent requests with the old model version will:
# 1. Miss cache (different fingerprint due to different model_version)
# 2. Run fresh inference against rolled-back model
# 3. Create new cache entries with old model version fingerprint
# 4. Audit trail records the complete rollback + recomputation

Why SHA3-256 for Fingerprints

The choice of SHA3-256 for computation fingerprints is not arbitrary. SHA3-256 provides 128 bits of collision resistance, meaning an attacker needs approximately 2^128 operations to find two different inputs that produce the same hash. To put this in perspective: if every computer on Earth computed one billion hashes per second, it would take approximately 10^19 years to find a collision. The age of the universe is approximately 1.38 x 10^10 years. Collision is not a practical concern.

SHA3-256 was chosen over SHA-256 because SHA3 is based on the Keccak sponge construction, which is structurally different from SHA-2's Merkle-Damgaard construction. This structural independence means that a cryptographic advance that weakens SHA-2 does not necessarily weaken SHA3. For a system designed for long-term audit trail integrity, this structural diversity is a meaningful security advantage. The fingerprints created today need to be verifiable years from now, and SHA3's independent construction provides confidence that they will be.

The fingerprint is also computed deterministically: the same fields in the same order always produce the same hash. This means that any party with the fingerprint fields can independently recompute the fingerprint and verify that it matches. This is what makes the audit trail independently verifiable rather than self-reported. The compliance audit infrastructure relies on this determinism: a regulator can take the claimed production conditions, recompute the fingerprint, and verify that it matches the fingerprint stored in the cache.

Fingerprint Change Detection: Automatic Invalidation

The most powerful property of the computation fingerprint is what happens when a field changes. If any field in the fingerprint changes -- model version, system prompt, temperature, anything -- the fingerprint changes. When the fingerprint changes, the cache treats the request as a miss. The old result is never served under the new conditions. This is not a cache invalidation policy that you configure. It is a mathematical property of the hash function. Different inputs produce different outputs. Always.

This automatic invalidation solves the most dangerous problem in AI caching: serving stale results after conditions change. Consider a scenario where your team updates the system prompt to add a safety guardrail that prevents the model from making medical diagnoses. With a traditional cache, the old results -- which include medical diagnoses -- are still cached under the same keys. Users continue to receive medical diagnoses from cache even though the live model would refuse to produce them. With Cachee, the system prompt change updates the system_prompt_hash field. Every fingerprint changes. Every cache lookup misses. Fresh inference runs against the new system prompt. The old, ungoverned results are never served.

The old results are not deleted. They transition to SUPERSEDED state with a supersession chain that links them to the new results. This means you can always go back and audit what the system produced before the change. The audit trail records the transition: when the old results were superseded, what triggered the supersession (system prompt update), and what new results replaced them.

Integration: Python and Rust Examples

Adding provenance to your AI inference pipeline requires constructing the fingerprint fields at inference time and using Cachee's attested read/write operations. The following examples show the complete integration pattern in both Python and Rust.

# Python — complete provenance integration

import cachee
import hashlib
from datetime import datetime

client = cachee.Client("cachee://inference-cluster:6380")

class ProvenanceConfig:
    """Capture all fields that affect inference output"""
    def __init__(self, model_name, model_version, system_prompt,
                 temperature=0.0, top_p=1.0, hardware_class="a100-80gb"):
        self.model_name = model_name
        self.model_version = model_version
        self.system_prompt = system_prompt
        self.system_prompt_hash = hashlib.sha3_256(
            system_prompt.encode()
        ).hexdigest()
        self.temperature = temperature
        self.top_p = top_p
        self.hardware_class = hardware_class

    def fingerprint_fields(self, prompt):
        return {
            "model_name": self.model_name,
            "model_version": self.model_version,
            "prompt_hash": hashlib.sha3_256(prompt.encode()).hexdigest(),
            "temperature": str(self.temperature),
            "top_p": str(self.top_p),
            "system_prompt_hash": self.system_prompt_hash,
            "hardware_class": self.hardware_class,
        }

# Production configuration — change any field and cache automatically misses
config = ProvenanceConfig(
    model_name="gpt-4",
    model_version="gpt-4-0613-checkpoint-20260501",
    system_prompt="You are a financial risk analyst. Never provide investment advice.",
    temperature=0.0,
    hardware_class="a100-80gb"
)

def assess_risk(prompt):
    fields = config.fingerprint_fields(prompt)

    # get_verified checks triple PQ signatures before returning
    result = client.get_verified(fields)
    if result:
        return result  # Provenance guaranteed by fingerprint

    # Cache miss — run fresh inference
    output = call_llm(prompt, config)

    # set_attested signs with ML-DSA-65 + FALCON-512 + SLH-DSA
    client.set_attested(fields, output, ttl=86400)
    return output

def who_produced_this(fingerprint):
    """Answer the question: which model produced this output?"""
    entry = client.get_entry_metadata(fingerprint)
    return {
        "model": entry.fields["model_name"],
        "version": entry.fields["model_version"],
        "system_prompt_hash": entry.fields["system_prompt_hash"],
        "temperature": entry.fields["temperature"],
        "hardware": entry.fields["hardware_class"],
        "created_at": entry.created_at,
        "state": entry.state,  # ACTIVE, SUPERSEDED, REVOKED, EXPIRED
        "signatures_valid": entry.verify_signatures(),
    }

// Rust — complete provenance integration

use cachee::{Client, FingerprintFields, EntryMetadata, VerificationResult};
use sha3::{Sha3_256, Digest};

struct ProvenanceConfig {
    model_name: String,
    model_version: String,
    system_prompt_hash: String,
    temperature: f32,
    top_p: f32,
    hardware_class: String,
}

impl ProvenanceConfig {
    fn new(model_name: &str, model_version: &str,
           system_prompt: &str, temperature: f32) -> Self {
        Self {
            model_name: model_name.to_string(),
            model_version: model_version.to_string(),
            system_prompt_hash: hex::encode(Sha3_256::digest(system_prompt.as_bytes())),
            temperature,
            top_p: 1.0,
            hardware_class: "a100-80gb".to_string(),
        }
    }

    fn fingerprint_fields(&self, prompt: &str) -> FingerprintFields {
        FingerprintFields {
            model_name: self.model_name.clone(),
            model_version: self.model_version.clone(),
            prompt_hash: hex::encode(Sha3_256::digest(prompt.as_bytes())),
            temperature: self.temperature.to_string(),
            top_p: self.top_p.to_string(),
            system_prompt_hash: self.system_prompt_hash.clone(),
            hardware_class: self.hardware_class.clone(),
        }
    }
}

async fn who_produced_this(
    client: &Client,
    fingerprint: &str,
) -> Result {
    // Returns full provenance: model, version, parameters, timestamps,
    // state, signature verification status
    client.get_entry_metadata(fingerprint).await
}

The Cost of Not Knowing

When you cannot answer "which model produced this output," every contested AI decision becomes a liability event. The cost is not theoretical. Healthcare organizations face medical malpractice claims where the AI recommendation is central to the case. Financial services companies face fair lending investigations where the model version and training data are material to the complaint. Employment platforms face discrimination lawsuits where the system prompt and model behavior are subject to discovery.

In each of these scenarios, the first question is attribution: which model, which version, which parameters, which conditions. If you cannot answer this question with independently verifiable evidence, you are in a defensive posture from the start. You are asking the regulator, the court, or the plaintiff to trust your reconstructed narrative. Trust is not evidence. Cryptographic provenance is evidence.

The computation fingerprint is not a logging enhancement. It is the structural difference between a cache that stores opaque values and a cache that stores provenance-bound results. Between a system that says "we think this model produced this output" and a system that says "here is the SHA3-256 fingerprint that binds this output to these exact production conditions, signed by three independent post-quantum algorithms, with a hash-chained audit trail of every event in its lifecycle." The first is a narrative. The second is proof. When the question is "which model produced this output," you want proof.

The Bottom Line

Traditional caches store values without provenance. When a cached AI result is challenged, they cannot answer "which model produced this." Cachee's computation fingerprint binds every cached result to its exact model name, model version, prompt hash, temperature, top_p, system prompt hash, and hardware class. SHA3-256 makes collision computationally infeasible. Change any parameter and the fingerprint changes -- the old result is never served under new conditions. Supersession chains preserve historical results for audit. Triple PQ signatures prove integrity. The fingerprint is the cache key, the audit identifier, and the integrity anchor. One hash answers the question every regulator will ask.

Every cached AI result should carry proof of its origin. Cachee computation fingerprints make provenance a structural property, not an afterthought.

Get Started Computation Fingerprinting Docs