Abstract. This paper specifies the architecture by which Cachee evolves from a high-performance post-quantum cache engine into a cryptographic evidence infrastructure. The core primitive — H33-74, a 58-byte receipt signed by three independent post-quantum signature families whose security rests on three independent hardness assumptions — becomes the foundation for long-term archival, third-party portability, and regulatory zero-knowledge query. We define the Cachee Archive Bundle, a self-contained verification artifact that any party can validate with nothing but NIST public specifications. We introduce the regulator key architecture, in which a scoped cryptographic key grants a regulator the ability to ask verifiable questions about encrypted data without decrypting it. We present the cross-instance federation model, the deprecation-aware verification path, and the implementation roadmap. Patent pending.

Contents

  1. Part I: Foundation
  2. Executive Summary
  3. The H33-74 Primitive
  4. The Cachee Archive Bundle
  5. Content-Addressed Storage
  6. Immutable Append-Only Architecture
  7. Part II: Portability and Verification
  8. Third-Party Verification Model
  9. Verification CLI: cachee-verify
  10. Public Verification Specification
  11. Witness Delivery API
  12. Cross-Instance Federation
  13. Part III: Lifecycle and Compliance
  14. Signature Family Status Registry
  15. Deprecation-Aware Verification
  16. HATS Tier Enforcement
  17. Genesis-Sealed Entries
  18. Cold Storage and Tiered Economics
  19. Part IV: Regulator Key Architecture
  20. The Three Key Types
  21. Zero-Knowledge Query Capability
  22. Industry Applications
  23. Part V: Implementation
  24. Gap Closure Matrix
  25. Implementation Roadmap
Part I: Foundation The primitives, formats, and storage architecture

1. Executive Summary

Cachee began as a high-performance, post-quantum cache engine — an in-process alternative to Redis that delivers 31-nanosecond reads and signs every operation with three independent post-quantum signature families. Production deployments have demonstrated 1,667,875 authenticated operations per second on a single node, each carrying an H33-74 cryptographic receipt. The cache engine is shipping. This paper describes what comes next.

The H33-74 primitive — a 58-byte receipt combining ML-DSA-65, FALCON-512, and SLH-DSA-SHA2-128f signatures — is not merely an integrity check on cached data. It is a general-purpose cryptographic attestation that proves a specific computation occurred, at a specific time, with a specific input, and that the proof is unforgeable unless three independent mathematical hardness assumptions are simultaneously broken. This property makes H33-74 suitable for a much larger role than cache verification.

This paper specifies the architecture for that larger role. We define the Cachee Archive Bundle, a self-contained file that any party — court, regulator, auditor, counterparty — can verify with nothing but NIST public specifications and no dependency on H33 or Cachee. We introduce content-addressed storage where the retrieval key is deterministically derived from the primitive itself. We specify the witness delivery API, cross-instance federation, and the signature family deprecation path. Most significantly, we present the regulator key architecture: a scoped cryptographic capability that allows a regulator to ask verifiable questions about encrypted data and receive cryptographically proven answers without the data ever being decrypted or moved. Every query, every proof, and every answer itself receives an H33-74 attestation. The system produces evidence about evidence, recursively, with no trust assumptions beyond mathematics.

Patent pending. This document is confidential to H33.ai, Inc.

2. The H33-74 Primitive

H33-74 is a 58-byte compact receipt that cryptographically attests to a computation. The name reflects the full on-chain commitment size: 74 bytes, consisting of the 58-byte primitive plus a 16-byte retrieval pointer that enables any holder to locate the full signature bundle in Cachee's content-addressed storage. The primitive is the atomic unit of evidence in the Cachee evidence infrastructure.

2.1 Byte Layout

FieldSizeDescription
version1 BPrimitive format version. Current: 0x01
timestamp8 BNanosecond-precision UNIX timestamp at attestation time
content_hash32 BSHA3-256 hash of the attested content
mldsa_prefix2 BFirst 2 bytes of the ML-DSA-65 signature (FIPS 204)
falcon_prefix2 BFirst 2 bytes of the FALCON-512 signature
slhdsa_prefix2 BFirst 2 bytes of the SLH-DSA-SHA2-128f signature (FIPS 205)
metadata11 BComputation type (2 B), tenant ID (4 B), flags (1 B), reserved (4 B)

2.2 Security Model

The three signature prefixes serve as binding commitments to the full signatures stored alongside the primitive. A verifier who holds the 58-byte primitive and the corresponding full signature bundle can confirm that (a) the content hash matches the attested data, (b) each full signature was produced by the claimed signer, and (c) the signature prefixes in the primitive match the full signatures, preventing substitution attacks.

The choice of three signature families is deliberate. ML-DSA-65 (FIPS 204) relies on the hardness of the Module Learning With Errors problem over lattices. FALCON-512 relies on the hardness of the NTRU lattice problem and short integer solutions in NTRU lattices. SLH-DSA-SHA2-128f (FIPS 205) relies solely on the collision resistance and preimage resistance of hash functions. An adversary who forges an H33-74 primitive must break all three assumptions simultaneously. These are three independent mathematical bets — a structural redundancy that no single-algorithm attestation can provide.

2.3 On-Chain Commitment

The 74-byte on-chain commitment consists of 32 bytes (the SHA3-256 content hash, which is also embedded in the primitive) and 42 bytes of retrieval information (the primitive version, timestamp, signature prefixes, and a content-addressed storage pointer). This 74-byte value is sufficient to locate, retrieve, and fully verify the corresponding archive bundle from any Cachee instance that holds it. The on-chain anchor is optional; the primitive is self-contained for verification regardless of whether an anchor exists. Patent pending.

3. The Cachee Archive Bundle

The Cachee Archive Bundle (CAB) is the canonical self-contained verification artifact. It is a single file containing everything required to verify an attestation with no external dependencies, no Cachee account, no API calls, and no trust in H33. A court, regulator, auditor, or opposing counsel who receives a CAB file can verify it using only the NIST post-quantum specifications (FIPS 203, 204, 205) and standard cryptographic libraries.

The design principle is permanence. A CAB produced today must be verifiable ten years from now even if H33 no longer exists, even if Cachee is no longer maintained, and even if the verifier has never heard of either. The bundle contains the public keys, the full signatures, the content hash, and the verification algorithm version. Nothing external is required.

3.1 Binary Format

FieldSizeDescription
magic4 BCAB1 — Cachee Archive Bundle version 1
format_version2 BBundle format version (current: 0x0001)
primitive58 BThe H33-74 primitive (Section 2)
content_hash32 BSHA3-256 of the attested content
computation_type2 BEnum: 0x01 CacheWrite, 0x02 BiometricAuth, 0x03 FHECompute, ... 0x12 PostQuantumMigration
timestamp8 BNanosecond-precision UNIX timestamp
pk_mldsa651,952 BML-DSA-65 signer public key (FIPS 204)
pk_falcon512897 BFALCON-512 signer public key
pk_slhdsa32 BSLH-DSA-SHA2-128f signer public key (FIPS 205)
sig_mldsa653,309 BML-DSA-65 full signature
sig_falcon512690 BFALCON-512 full signature
sig_slhdsa17,088 BSLH-DSA-SHA2-128f full signature
metadata_len4 BLength of CBOR metadata section
metadatavariableCBOR-encoded key-value pairs (tenant, tags, labels)
anchor_flag1 B0x01 if on-chain anchor present, 0x00 otherwise
on_chain_anchor74 BOn-chain commitment (present only if anchor_flag == 0x01)

3.2 Size Analysis

The fixed portion of the bundle — magic, version, primitive, content hash, computation type, timestamp, three public keys, three signatures, metadata length, and anchor flag — totals 24,077 bytes uncompressed (approximately 23.5 KiB). With the optional 74-byte on-chain anchor, 24,151 bytes. The variable metadata section adds typically 100-500 bytes. Under zstd compression, production bundles consistently compress to 7-9 KiB, because the signature bodies (particularly the 17 KB SLH-DSA signature) contain significant compressible structure.

This means a million attestations require approximately 8 GB of compressed storage. At current cloud storage rates ($0.023/GB/month for S3 Standard), a million attestations cost $0.18/month to store. Evidence is cheap. The absence of evidence is expensive.

3.3 Versioning

The format_version field enables forward-compatible evolution. A version 2 bundle might add a fourth signature family or expand the metadata schema. Verifiers that understand version 1 can still verify version 1 bundles indefinitely. The magic bytes CAB1 identify the file as a Cachee Archive Bundle regardless of the internal version. Future major format changes would use CAB2, ensuring no ambiguity.

3.4 Computation Type Registry

The 2-byte computation type field identifies the nature of the attested operation. The registry is append-only and itself versioned under H33-74 attestation. Key values include:

4. Content-Addressed Storage

Every Cachee Archive Bundle is stored at a key deterministically derived from the primitive and content it attests. The storage key is computed as:

storage_key = SHA3-256(primitive || content_hash)

where primitive is the 58-byte H33-74 primitive and content_hash is the 32-byte SHA3-256 hash of the attested content (which is also embedded within the primitive, creating a double-binding). The concatenation is unambiguous because the primitive is fixed-length.

This design has three critical properties. First, determinism: any party holding the 74-byte on-chain commitment (which contains the primitive) can independently compute the storage key and retrieve the bundle without knowing any Cachee-specific identifiers, without querying a name server, and without trusting a lookup table. The retrieval path is mathematical, not organizational.

Second, collision resistance: SHA3-256 provides 128-bit collision resistance. Two distinct attestations producing the same storage key requires finding a SHA3-256 collision, which is computationally infeasible. Each attestation occupies a unique, deterministic location.

Third, deduplication: if the same content is attested with the same primitive (which would require the same timestamp, same signers, and same content hash), the storage key is identical and the bundle is stored once. In practice, nanosecond timestamps make accidental duplication impossible, but the property is valuable for replicated storage systems that must detect and eliminate redundant copies during synchronization.

Content-addressed storage also eliminates an entire class of administrative failures. There are no namespace collisions, no tenant-specific key prefixes that can be misconfigured, and no mapping tables that can become inconsistent. The address is the content. If you have the primitive, you have the address. If you have the address, you can verify what you retrieve. The system is self-authenticating.

5. Immutable Append-Only Architecture

Once a Cachee Archive Bundle is written to the evidence store, it cannot be modified or deleted through the standard API. There is no UPDATE operation. There is no DELETE operation. The write path is append-only: new attestations are added; existing attestations are permanent.

This is not a software policy that could be overridden by an administrator with database access. The content-addressed storage model makes modification self-defeating: if a single byte of the bundle changes, the SHA3-256 storage key changes, and the modified bundle would be stored at a different address. The original bundle at the original address remains intact. Any attempt to overwrite the original address with different content would fail verification because the content would no longer match the address.

Deletion is handled through a privileged operation — TOMBSTONE — that does not remove the bundle but instead appends a tombstone record at a new address. The tombstone record itself receives an H33-74 attestation. It contains the storage key of the tombstoned bundle, the reason for tombstoning, the identity of the operator who authorized it, and a timestamp. The original bundle remains in storage, retrievable by any party that holds its address. The tombstone record is discoverable through the bundle's metadata chain. This means deletion produces more evidence, not less.

The legal significance of this architecture is direct. An evidence store that permits silent modification or deletion is vulnerable to spoliation claims. An evidence store where every mutation is itself attested, where the original is never destroyed, and where the mutation history is cryptographically chained provides a foundation for legal defensibility that no traditional database can match. Courts and regulators can verify not only the evidence itself but the complete chain of custody, including any attempts to suppress or alter it. Patent pending.

Part II: Portability and Verification Independence from H33, Cachee, or any single party

6. Third-Party Verification Model

The defining property of the Cachee Archive Bundle is that verification requires zero trust in H33 and zero dependency on Cachee. A bundle can be handed to any party — an insurance carrier, a financial regulator, opposing counsel in litigation, a reinsurer, a counterparty in a trade, an auditor conducting a compliance review — and that party can verify the attestation completely and independently.

Verification requires three things, all of which are contained within the bundle itself:

  1. The signer's public keys. The bundle contains the full ML-DSA-65 public key (1,952 bytes), the full FALCON-512 public key (897 bytes), and the full SLH-DSA-SHA2-128f public key (32 bytes). The verifier does not need to contact a key server, query a certificate authority, or trust a public key infrastructure. The keys are in the file.
  2. The full signatures. The bundle contains the complete ML-DSA-65 signature (3,309 bytes), the complete FALCON-512 signature (690 bytes), and the complete SLH-DSA-SHA2-128f signature (17,088 bytes). The verifier does not need to request signatures from an API or retrieve them from a separate store. The signatures are in the file.
  3. The verification algorithms. ML-DSA-65 is specified in FIPS 204. SLH-DSA-SHA2-128f is specified in FIPS 205. FALCON-512 is specified in the NIST draft standard. These are public documents available from NIST at no cost. Any implementation of these standards — OpenSSL, liboqs, pqcrypto, or a verifier's own implementation — will produce the same verification result.

The verification procedure is deterministic: compute SHA3-256 over the attested content, confirm it matches the content_hash field, then verify each of the three signatures against the corresponding public key and the content hash. If all three verify, the attestation is valid. If any fails, the attestation is invalid. There is no partial validity, no scoring, and no judgment call. The answer is binary.

This model eliminates the "vendor lock" problem that plagues proprietary attestation systems. If an organization attests a million records using Cachee and subsequently decides to stop using Cachee, every attestation remains independently verifiable. The bundles are files. They can be copied, archived, distributed, and verified using any tool that implements the NIST specifications. H33 cannot revoke them, cannot invalidate them, and cannot prevent their verification. The evidence belongs to whoever holds it.

7. Verification CLI: cachee-verify

cachee-verify is a standalone, open-source command-line tool written in Rust and published to crates.io. Its sole purpose is to verify Cachee Archive Bundles. It makes no network calls. It has no Cachee dependency. It does not require a Cachee account, an API key, or an internet connection. It is a single statically-linked binary that takes a CAB file as input and outputs a pass or fail verdict.

7.1 Usage

The interface is deliberately minimal:

cachee-verify bundle.cab

On success, the tool outputs the attestation metadata (timestamp, computation type, content hash, signer key fingerprints) and exits with code 0. On failure, it outputs the specific verification failure (which signature failed, whether the content hash mismatched, whether the format is invalid) and exits with code 1. There are no configuration files, no environment variables, and no flags required for basic operation.

7.2 Implementation

The tool implements the three NIST verification algorithms directly. ML-DSA-65 verification follows FIPS 204 Section 6. SLH-DSA-SHA2-128f verification follows FIPS 205 Section 10. FALCON-512 verification follows the NIST draft specification Section 3.3. The SHA3-256 hash computation follows FIPS 202. All implementations are constant-time to prevent timing side-channel attacks, though for a verification-only tool the threat model is limited.

The tool's dependency tree is intentionally narrow: the Rust standard library, a SHA3 implementation (the sha3 crate), and the three PQ signature verification implementations. No TLS library, no HTTP client, no serialization framework beyond the CAB binary format parser. The total compiled binary size is under 4 MB.

7.3 Auditability

Because cachee-verify is open source, any party can audit the verification logic. A regulator who does not trust H33's implementation can build the tool from source, inspect every line of code, and confirm that the verification procedure matches the NIST specifications exactly. Alternatively, they can implement their own verifier from scratch using the public verification specification (Section 8). The specification is the authority, not the tool.

8. Public Verification Specification

The Cachee Archive Bundle Verification Specification is a published, versioned document that describes exactly how to verify a CAB file using only open-source tools and NIST standards. It is independent of cachee-verify, independent of Cachee, and independent of H33. Any competent cryptographic engineer can implement a verifier from this specification alone.

The specification covers: the binary format of the CAB file (byte offsets, field sizes, endianness); the procedure for extracting the three public keys and three signatures; the construction of the signed message (how the content hash and metadata are assembled into the message that was signed); the verification algorithm for each signature family (with references to the specific NIST specification sections); and the overall verdict logic (all three must pass for a valid attestation).

The specification is versioned in lockstep with the CAB format version. Version 1.0 of the specification corresponds to CAB1 format version 0x0001. Future format versions will receive corresponding specification updates. Old specification versions remain valid for old bundle versions indefinitely. A version 1.0 bundle will always be verifiable using the version 1.0 specification, regardless of what future versions introduce.

The specification is published at a stable URL, archived in multiple public repositories, and registered with the Internet Archive's Wayback Machine. If H33 ceases to exist, the specification remains available. The permanence of the evidence infrastructure extends to the verification documentation itself.

9. Witness Delivery API

When an attestation is created, the witness delivery system pushes the complete Cachee Archive Bundle to designated recipients simultaneously. The delivery is synchronous with the attestation — the attestation is not considered complete until all designated witnesses have received their copies. This ensures that no single party can create an attestation and then selectively withhold it.

9.1 Delivery Model

At attestation creation time, the caller specifies a witness list: a set of recipient endpoints that will receive the CAB file. Each recipient gets an independent, complete copy of the bundle. The delivery uses mutual TLS with post-quantum key exchange (ML-KEM-768) to protect bundles in transit. Each delivery is itself attested — the delivery receipt (timestamp, recipient, bundle hash, delivery status) receives its own H33-74 primitive.

9.2 Insurance Example

Consider a cyber insurance attestation. The policyholder's system attests that a specific security control is in place. The witness list includes the carrier, the reinsurer, the policyholder's own archive, and optionally the regulator. All four parties receive the same bundle at the same time. No party can later claim they were not informed. No party can present a different version of the attestation. The bundle is self-verifying, so each party can independently confirm the attestation without contacting any other party.

9.3 Failure Handling

If a witness endpoint is unreachable, the delivery system retries with exponential backoff for a configurable period (default: 72 hours). Failed deliveries are recorded with their own H33-74 attestation, creating an auditable record of delivery attempts. After the retry period expires, the failure is permanently recorded. The attestation itself remains valid regardless of delivery status — delivery is a distribution concern, not a validity concern.

Witnesses can also pull bundles on demand using the content-addressed retrieval key derived from the 74-byte on-chain commitment. The push delivery is a convenience and a legal safeguard; the pull retrieval is always available as a fallback.

10. Cross-Instance Federation

The Cachee evidence infrastructure is not a single centralized service. It is a federation protocol that allows independent Cachee deployments to synchronize attestation records. A policyholder can run their own Cachee instance and mirror every attestation from their carrier's instance. A regulator can run an archival instance that receives attestation bundles from every regulated entity in their jurisdiction. No single instance is authoritative. Every instance that holds a bundle can verify it independently.

10.1 Federation Architecture

Federation follows the D-Cachee architecture specified in the patent (FIG 23). Instances discover each other through a distributed hash table (DHT) where the routing key is derived from the content-addressed storage key of each bundle. When an attestation is created on one instance, the DHT routes the bundle to all instances that have registered interest in that attestation's tenant, computation type, or content-address prefix.

Each federated instance maintains a Merkle tree of all bundles it holds. Synchronization between instances proceeds by comparing Merkle roots — if roots differ, the instances walk the tree to identify divergent subtrees and exchange only the missing bundles. This makes synchronization bandwidth-efficient: two instances that share 99% of their bundles exchange only the 1% difference, regardless of total bundle count. The synchronization protocol itself is attested; each sync operation produces an H33-74 primitive recording what was synchronized, between which instances, and at what time.

10.2 Consistency Model

Federation provides eventual consistency with cryptographic verification. An attestation created on instance A will eventually appear on instance B if B has registered interest. The propagation delay depends on network conditions and DHT routing, but is typically sub-second within the same cloud region and under 5 seconds across regions. Critically, consistency is verifiable: instance B can confirm that every bundle it receives from instance A is valid by verifying the H33-74 primitive and all three signatures. A compromised instance A cannot inject invalid bundles into instance B.

10.3 Partition Tolerance

If two instances are partitioned (unable to communicate), each continues to accept and store attestations independently. When the partition heals, the Merkle-based synchronization protocol reconciles both sides. No attestations are lost. No attestations are duplicated (content-addressed storage prevents duplication by construction). The federation is partition-tolerant because each instance is fully self-sufficient — it can create, store, and verify attestations without any other instance.

Part III: Lifecycle and Compliance Deprecation, rotation, tiering, and regulatory conformance

11. Signature Family Status Registry

Post-quantum cryptography is a young field. While the three signature families used in H33-74 rest on independent hardness assumptions and are considered secure by current analysis, the history of cryptography teaches that algorithms are eventually weakened or broken. The Cachee evidence infrastructure must survive the deprecation of one or more signature families without invalidating the evidence already produced.

The Signature Family Status Registry (SFSR) is a maintained, versioned, append-only record of the status of each signature family used in H33-74. Each entry in the registry records a family identifier, a status (active, deprecated, or revoked), an effective date, a reason, and a reference to the cryptanalytic publication or NIST advisory that motivated the status change. The registry itself is attested under H33-74 — every update produces a new attestation, creating an unforgeable history of status changes.

11.1 Status Definitions

11.2 Registry Governance

Status changes are triggered by NIST advisories, peer-reviewed cryptanalytic publications, or H33 security team assessment. No status change is made unilaterally by H33 without a published justification. The registry is publicly auditable — any party can retrieve the current registry state and verify the attestation chain that produced it. This prevents H33 from silently deprecating a family to force re-attestation fees.

12. Deprecation-Aware Verification

When a signature family transitions from active to deprecated, the verification logic adapts. The core rule is two-of-three sufficiency: an attestation is considered valid if at least two of the three signature families verify successfully, provided that neither of the two successful families has been revoked.

This rule acknowledges a practical reality: if FALCON-512 is deprecated due to advances in NTRU lattice cryptanalysis, that does not retroactively invalidate the ML-DSA-65 and SLH-DSA signatures on existing attestations. Those two signatures still rest on independent, unbroken assumptions. The attestation remains valid, with a notation that one of three families is deprecated.

12.1 Supersession and Re-Attestation

When a family is deprecated, Cachee provides a re-attestation workflow. The data owner can submit existing content for re-attestation under the current family set (which may include a replacement family). The new attestation references the original attestation's storage key, creating a supersession chain. The original attestation is not modified or deleted — it remains in storage with its original signatures. The new attestation adds a fresh three-family signature set reflecting the current state of cryptographic confidence.

12.2 Proactive Notification

When the SFSR is updated, Cachee pushes notifications to all tenants with attestations signed by the affected family. The notification includes the registry update (itself attested), the number of affected attestations, and the re-attestation API endpoint. Tenants can initiate bulk re-attestation through the API or schedule it at their convenience. The notification delivery is itself attested through the witness delivery system (Section 9).

12.3 Verification Output

The verification output (from cachee-verify or any spec-compliant implementation) includes the status of each family at the time of verification. A bundle verified when all three families are active produces: VALID (3/3 active). The same bundle verified after one family is deprecated produces: VALID (2/3 active, 1/3 deprecated: FALCON-512). If two families are deprecated, the result is: VALID (1/3 active, 2/3 deprecated) with a warning recommending re-attestation. If two families are revoked, the result is: INSUFFICIENT (1/3 active, 2/3 revoked) — the attestation can no longer be trusted.

13. HATS Tier Enforcement

HATS is a publicly available technical conformance standard for continuous AI trustworthiness; certification under HATS provides independently verifiable evidence that a system satisfies the standard's defined controls. Cachee enforces HATS tier requirements at write time, ensuring that every attestation meets the cryptographic requirements of its designated tier.

13.1 Tier Definitions

Tier 2: FIPS-Strict

Requires ML-DSA-65 (FIPS 204) and SLH-DSA-SHA2-128f (FIPS 205). Both algorithms are NIST-standardized. FALCON-512 is included if available but not required, as it is pending final NIST standardization. This tier satisfies federal procurement requirements where only finalized FIPS algorithms are acceptable.

Tier 3: Full Three-Family

Requires all three families: ML-DSA-65, FALCON-512, and SLH-DSA-SHA2-128f. This tier provides maximum cryptographic redundancy — three independent hardness assumptions — and is the default for Cachee evidence infrastructure deployments. All H33 production systems operate at Tier 3.

13.2 Write-Time Validation

When an attestation is submitted, Cachee validates that the required signatures are present and valid before storing the bundle. A Tier 3 tenant cannot accidentally produce a Tier 2 attestation. The validation is enforced at the binary level — the attestation engine will not produce an H33-74 primitive without the required signatures, regardless of API parameters or configuration overrides.

13.3 Per-Tenant Configuration

Each tenant is assigned a HATS tier at provisioning time. The tier is recorded in the tenant's configuration and cannot be downgraded through the API (upgrading from Tier 2 to Tier 3 is permitted). Tier changes are themselves attested. Compliance reporting is available per tenant, showing the total number of attestations, the tier of each, and any attestations that were produced during a tier transition.

13.4 FIPS Mode

Tenants operating in FIPS mode have additional constraints: all cryptographic operations use FIPS-validated modules, random number generation uses DRBG per SP 800-90A, and key storage follows SP 800-57 guidelines. FIPS mode is orthogonal to HATS tier — a tenant can be Tier 2 without FIPS mode (using compliant algorithms but not validated modules) or Tier 3 with FIPS mode (maximum security posture).

14. Genesis-Sealed Entries

A standard Cachee attestation proves that specific content existed at a specific time and was signed by a specific set of keys. A genesis-sealed entry goes further: it proves that the input to a computation was authentic, that the computation was correct, and that the output was signed. It is the complete chain from source data to attested result.

14.1 Genesis Seal

The genesis seal is a field in the Cachee Archive Bundle's metadata section that contains a cryptographic commitment to the input data. For a biometric authentication, the genesis seal commits to the encrypted biometric template. For an FHE computation, it commits to the encrypted input ciphertext. For a regulatory query, it commits to the query parameters. The genesis seal proves that the attested result was computed from a specific input, without revealing the input itself.

14.2 ZK Proof Attachment

The bundle metadata includes an optional ZK proof field that carries a ZK-STARK proof of computation correctness. The proof demonstrates that the attested output was correctly computed from the genesis-sealed input according to a specified computation. The proof is generated using the ZK-STARK proving system, which requires no trusted setup and whose security relies solely on collision-resistant hash functions — the same assumption class as SLH-DSA.

14.3 Completeness Levels

The completeness level is recorded in the bundle metadata and reported during verification. A verifier can enforce minimum completeness requirements — for example, a regulator might require complete entries for financial attestations but accept attested entries for operational logs.

15. Cold Storage and Tiered Economics

Evidence has different access patterns over its lifetime. A cyber insurance attestation produced yesterday might be queried multiple times per day. The same attestation two years from now might be accessed once per quarter for compliance reporting. Ten years from now, it might be accessed only if litigation arises. The Cachee evidence infrastructure supports three storage tiers that optimize cost without compromising verifiability.

15.1 Tier Definitions

TierAgeAccess LatencyStorage BackendRelative Cost
Hot< 90 days< 10 msIn-process Cachee + SSD1.0x
Warm90 days – 2 years< 100 msS3 Standard / Azure Blob Hot0.3x
Cold> 2 years< 12 hours (retrieval)S3 Glacier / Azure Archive / on-prem tape0.02x

15.2 Tiering Mechanics

Bundles are automatically promoted and demoted based on age and access frequency. The CacheeLFU admission policy (which governs the hot tier's in-process cache) extends to the warm tier: bundles that are accessed frequently remain warm regardless of age. Bundles that are never accessed after 90 days are demoted to warm. Bundles in warm that are never accessed after 2 years are demoted to cold. Access at any tier resets the demotion clock.

15.3 Export and Portability

Tenants can export bundles to their own storage at any time. The export produces standard CAB files — the same format used for witness delivery and third-party verification. A tenant who exports their bundles to on-premises storage retains full verification capability. The bundles are self-contained; they do not reference Cachee storage locations, Cachee APIs, or Cachee-specific identifiers.

15.4 Cost Reporting

Each tenant receives monthly cost reporting that breaks down storage by tier, attestation count, average bundle size (before and after compression), and cost per attestation. The reporting enables tenants to optimize their retention policies and budget for long-term evidence storage. At current cloud storage rates, the per-attestation storage cost is approximately $0.00000018/month (hot), $0.00000005/month (warm), and $0.000000004/month (cold). Evidence is not a cost problem. It is an architecture problem.

15.5 Deduplication

Content-addressed storage provides automatic deduplication across tenants (for shared attestations, such as federated bundles received from multiple sources) and within tenants (for re-attestations that reference the same content). Deduplication is transparent — each tenant sees their full attestation set, but the underlying storage holds each unique bundle exactly once. Deduplication savings are reported in the monthly cost breakdown.

Part IV: Regulator Key Architecture Zero-knowledge query over encrypted evidence

16. The Three Key Types

The Cachee evidence infrastructure introduces a three-key model that separates data ownership, regulatory query capability, and audit access into cryptographically distinct roles. Each key type grants specific capabilities and is constrained by cryptographic enforcement — not by access control lists, not by software policies, and not by trust in the platform operator. The constraints are mathematical.

Owner Key

Capability: Full access. Can encrypt and decrypt all data. Can create attestations. Can issue regulator keys and auditor keys. Can revoke issued keys. Can export bundles. Can configure witness delivery and federation.

Scope: Unrestricted within the tenant's data.

Distribution: Never shared. Held exclusively by the data owner (the organization that created the attestation). Stored in hardware security modules (HSMs) or equivalent secure key storage. The owner key is the root of authority for the tenant's evidence store.

Cryptographic basis: ML-KEM-768 key pair for encryption/decryption, plus the three H33-74 signing key pairs (ML-DSA-65, FALCON-512, SLH-DSA) for attestation creation.

Regulator Key

Capability: Scoped zero-knowledge query. Can trigger specific ZK-STARK proofs against encrypted data within the defined scope. Cannot decrypt data. Cannot see plaintext. Cannot create attestations. Cannot issue keys to other parties.

Scope: Defined at issuance by the data owner. Scope parameters include: which computation types can be queried, which date ranges, which data categories, and which proof types can be requested. The scope is cryptographically bound to the key — a regulator key issued for AML queries on 2026 transaction data cannot be used to query biometric records or 2025 data.

Distribution: Issued by the data owner to a specific regulator for a specific purpose. The issuance is attested under H33-74. The regulator key is revocable by the data owner at any time; revocation is also attested.

Cryptographic basis: A derived key that enables the holder to submit encrypted queries to the ZK-STARK proving system. The derived key is mathematically incapable of decrypting the underlying data — it can only trigger proof generation within its scope.

Auditor Key

Capability: Read-only access to proof outputs and attestation records. Can verify that proofs were generated correctly. Can read the results of regulator queries (the yes/no answers with their proofs). Cannot trigger new proofs. Cannot decrypt data. Cannot modify any record.

Scope: Defined at issuance. Typically broader than a regulator key (an auditor may see all proof outputs across all computation types) but with strictly less capability (cannot trigger new queries).

Distribution: Issued by the data owner to auditors, compliance officers, or oversight bodies. Multiple auditor keys can be issued with different scopes. Issuance and revocation are attested.

Cryptographic basis: A read-only derived key that can verify ZK-STARK proofs and read attestation metadata but cannot interact with the proving system or the encrypted data store.

16.1 Key Lifecycle

Every key lifecycle event — generation, issuance, scope definition, use, revocation — is recorded as an H33-74 attestation. The complete history of who held what key, with what scope, for what duration, is cryptographically chained and independently verifiable. This creates an unforgeable audit trail of regulatory access that satisfies both the regulator (who can prove they had authorized access) and the data owner (who can prove they maintained control over access grants).

16.2 Revocation

A data owner can revoke any regulator key or auditor key at any time. Revocation takes effect immediately — the revoked key can no longer trigger proofs or access proof outputs. The revocation is attested and pushed to all federated instances. Proofs generated before revocation remain valid and verifiable; the revocation affects future capability, not past evidence.

17. Zero-Knowledge Query Capability

This is the breakthrough. A regulator holds a scoped key. The regulated entity holds encrypted data attested under H33-74. The regulator asks a question. The system produces a cryptographically verified answer. The regulator never sees the data. The answer is unforgeable. The underlying data never moves.

17.1 The Problem

Regulators have historically had two options when overseeing regulated entities. The first is trust: accept the regulated entity's self-reported compliance assertions at face value. This is efficient but unreliable — it is the model that fails during every financial crisis, every data breach, and every compliance scandal. The second is full disclosure: demand that the regulated entity hand over all relevant data for the regulator to inspect directly. This is reliable but creates massive privacy, security, and competitive risks. It also creates a data handling burden for the regulator that scales with the number of regulated entities.

Zero-knowledge query provides a third option: mathematical certainty without disclosure. The regulator gets a proven answer to a specific question. The proof is unforgeable — it is a ZK-STARK proof that the regulated entity cannot fabricate without actually having the data that satisfies the query. The underlying data remains encrypted, in place, under the data owner's control.

17.2 Query Execution Flow

  1. The regulator constructs a query within their key's scope. Example: "Is the aggregate AML exposure across all accounts below the regulatory threshold?"
  2. The query is encrypted under the regulator key and submitted to the Cachee instance holding the attested data.
  3. The Cachee instance executes the query against the encrypted data using the FHE computation engine. The computation operates on encrypted values — at no point is the data decrypted.
  4. The computation produces an encrypted result (in this case, a boolean: yes or no).
  5. A ZK-STARK proof is generated proving that the computation was executed correctly — that the result genuinely reflects the query applied to the attested data, and that no intermediate values were altered.
  6. The encrypted result is decrypted under the regulator key's scope to produce the plaintext answer (yes or no). The scope limitation ensures only the answer is revealed, not any intermediate values or the underlying data.
  7. The answer, the ZK-STARK proof, and the query metadata are packaged into a Cachee Archive Bundle and attested under H33-74.
  8. The bundle is delivered to the regulator (who now has a proven answer) and to the data owner (who now has an auditable record that the query was executed).

17.3 Query Examples

"Is AML exposure below threshold?"

The regulator asks whether a financial institution's aggregate anti-money-laundering exposure is below a specified regulatory threshold. The answer is yes or no, delivered with a ZK-STARK proof. The regulator never sees transaction data, account balances, customer identities, or any other financial record. The institution cannot fabricate a false "yes" because the proof is computed over the attested data — the same data whose H33-74 attestation the regulator can independently verify.

"Does this AI meet HATS conformance?"

The regulator asks whether an AI system meets the controls defined in the HATS conformance standard. The answer is a conformance score with proof. The regulator never sees model weights, training data, architecture details, or proprietary algorithms. The proof demonstrates that the conformance assessment was executed correctly over the attested system configuration.

"Was this drug manufactured within temperature compliance?"

The FDA asks whether a pharmaceutical batch was manufactured and stored within required temperature ranges throughout the supply chain. The answer is pass or fail with proof. The regulator never sees batch records, manufacturing schedules, supplier relationships, or production volumes. The proof is computed over temperature sensor attestations that were H33-74 signed at the time of measurement.

"Does this financial instrument carry adequate collateral?"

The regulator asks whether a specific financial instrument is backed by collateral that meets regulatory minimums. The answer is yes or no with proof. The regulator never sees portfolio composition, position sizes, counterparty identities, or trading strategies. The proof is computed over attested position data.

17.4 Proof Properties

The ZK-STARK proofs used in regulatory queries have specific properties that make them suitable for this application:

17.5 Attestation Recursion

Every regulator query execution produces its own H33-74 primitive. Every proof is stored in a Cachee Archive Bundle. Every bundle is content-addressed and immutable. This means the system produces evidence about evidence: the attestation of the original data is one layer; the attestation of the regulator query and its proof is a second layer; the attestation of the auditor's verification of the proof is a third layer. Each layer is independently verifiable. The recursion terminates at the H33-74 primitive — the atomic unit of evidence that is self-verifying given only NIST specifications.

"Regulators have always had two options: trust the regulated entity or demand full disclosure. This gives them a third option. Mathematical certainty without either."

18. Industry Applications

The regulator key architecture and zero-knowledge query capability apply across every industry where a regulator must oversee encrypted or confidential data. The following examples illustrate the pattern: a specific regulator, a specific question, and a specific answer — all without data disclosure.

18.1 Financial Regulation (Fed, OCC, FINRA, SEC)

Financial regulators require ongoing visibility into risk exposure, capital adequacy, and market conduct. Zero-knowledge query allows a bank examiner to verify capital ratios, stress test compliance, and AML thresholds without accessing customer transaction data. The SEC can verify that a broker-dealer's net capital computation is correct without seeing position-level detail. FINRA can verify trade reporting completeness without accessing trade records. Each query produces an attested proof that the examiner can present in enforcement proceedings as cryptographically verified evidence — stronger than self-reported compliance and obtained without the data handling risks of full disclosure.

18.2 Healthcare (FDA, HIPAA)

Healthcare regulators face an acute tension between oversight and patient privacy. HIPAA constrains data access while the FDA requires manufacturing compliance verification. Zero-knowledge query resolves this: the FDA can verify that a drug manufacturing facility maintained temperature, humidity, and contamination controls throughout a production run without accessing patient-linked batch records or proprietary manufacturing processes. HIPAA auditors can verify that access controls are correctly implemented without accessing the protected health information those controls protect. The proof demonstrates compliance; the data stays encrypted.

18.3 AI Governance (EU AI Act)

The EU AI Act requires conformity assessments for high-risk AI systems. These assessments currently demand disclosure of training data characteristics, model architecture, and performance metrics — information that constitutes core intellectual property. Zero-knowledge query allows a notified body to verify that a model's bias metrics fall within regulatory limits, that training data meets representativeness requirements, and that performance thresholds are satisfied, all without accessing the model, the training data, or the performance logs. The AI developer retains their intellectual property. The regulator gets mathematical proof of conformance. The attestation record provides evidence for ongoing post-market surveillance.

18.4 Insurance (HATS Cyber Insurance)

Cyber insurance underwriting requires assessment of the policyholder's security posture. Current practice relies on questionnaires (unreliable) or full security audits (expensive and invasive). Zero-knowledge query enables continuous, automated verification: the carrier's regulator key can query whether specific security controls are in place, whether patching cadence meets policy requirements, and whether incident response times fall within SLA. The policyholder's security architecture is never disclosed to the carrier. The proofs are attested and delivered to all parties through the witness delivery system (Section 9). Claims disputes become resolvable through cryptographic evidence rather than conflicting testimony.

18.5 Tax and Customs

Tax authorities and customs agencies require verification of declared values, origin claims, and duty calculations. Zero-knowledge query allows customs to verify that a declared value is consistent with the importer's purchase records without accessing the purchase records. Tax authorities can verify that a deduction claim is supported by underlying documentation without accessing the documentation. Transfer pricing compliance can be verified across jurisdictions without disclosing intercompany pricing to any single jurisdiction. Each verification produces an attested proof that can be presented in a tax court or trade tribunal as cryptographic evidence.

Part V: Implementation Gap analysis and delivery roadmap

19. Gap Closure Matrix

The following table maps the seven architectural gaps between Cachee's current shipping state (a post-quantum cache engine with H33-74 attestation) and the full evidence infrastructure described in this paper. Each gap corresponds to one or more sections of this specification and is classified as built (shipping in production), in progress (under active development), or planned (specified but not yet in development).

GapDescriptionSectionsStatus
G1: Archive Bundle Format Self-contained CAB file format with three-family signatures, public keys, and metadata. Verifiable without Cachee. 3, 4 Built
G2: Verification CLI Standalone cachee-verify tool. Open source, no network, no Cachee dependency. Published to crates.io. 7, 8 In Progress
G3: Content-Addressed Storage Deterministic storage keys derived from SHA3-256(primitive || content_hash). Immutable append-only write path. 4, 5 In Progress
G4: Witness Delivery Synchronous push of CAB files to designated recipients at attestation time. Attested delivery receipts. 9 Planned
G5: Federation DHT-routed bundle synchronization across independent Cachee instances. Merkle-based reconciliation. 10 Planned
G6: Family Lifecycle Signature family status registry. Deprecation-aware verification. Two-of-three sufficiency. Re-attestation workflow. 11, 12 Planned
G7: Regulator Keys + ZK Query Three key types (owner, regulator, auditor). Scoped ZK-STARK query over encrypted data. Attested proofs. 16, 17, 18 Planned

19.1 Built Components

The H33-74 primitive is shipping in production at 1,667,875 attestations per second on a single Graviton4 node. The three-family signature generation (ML-DSA-65 + FALCON-512 + SLH-DSA-SHA2-128f) is integrated into the Cachee attestation pipeline. The Cachee Archive Bundle binary format (Section 3) is implemented and producing valid CAB files. Content-addressed storage key derivation is implemented. The ZK-STARK proving system is operational for biometric authentication and FHE computation verification. These components form the foundation on which the remaining gaps are closed.

19.2 Dependencies

G2 (Verification CLI) depends on G1 (Archive Bundle Format), which is complete. G4 (Witness Delivery) and G5 (Federation) depend on G3 (Content-Addressed Storage). G6 (Family Lifecycle) is independent and can proceed in parallel. G7 (Regulator Keys) depends on the ZK-STARK proving system (built) and the FHE computation engine (built), but requires new key derivation and scope-binding cryptography.

20. Implementation Roadmap

The evidence infrastructure is delivered in four phases, each building on the previous and each producing independently useful capability.

Phase 1 — Q2 2026: Archive Bundle + Verification CLI

Finalize the CAB v1 binary format specification. Publish cachee-verify to crates.io as an open-source Rust crate. Publish the Public Verification Specification (Section 8) as a versioned document. Implement content-addressed storage with the immutable append-only write path. Deliverable: any Cachee customer can export a CAB file and hand it to a third party for independent verification.

Phase 2 — Q3 2026: Storage + Cold Export + Lifecycle

Implement tiered storage (hot, warm, cold) with automatic demotion and promotion. Implement cold export to S3, Azure Blob, and on-premises storage. Implement the Signature Family Status Registry and the deprecation-aware verification path. Implement per-tenant HATS tier enforcement at write time. Deliverable: production-grade evidence storage with regulatory-compliant retention and lifecycle management.

Phase 3 — Q4 2026: Federation + Witness Delivery + Family Rotation

Implement the D-Cachee federation protocol (FIG 23 from patent) with DHT-based bundle routing. Implement the witness delivery API with attested delivery receipts. Implement the re-attestation workflow for signature family rotation. Deliverable: multi-party evidence infrastructure where insurers, policyholders, reinsurers, and regulators each hold independently verifiable copies of attested records.

Phase 4 — Q1 2027: Regulator Keys + ZK Query + Auditor Access

Implement the three-key model (owner, regulator, auditor) with cryptographic scope binding. Implement zero-knowledge query execution over FHE-encrypted attested data. Implement the ZK-STARK proof generation and packaging pipeline for regulatory queries. Implement auditor key access to proof outputs. Deliverable: regulators can ask verifiable questions about encrypted data and receive cryptographically proven answers without data disclosure. Patent pending.

20.1 Patent Coverage

Patent pending. The regulator key architecture and zero-knowledge query capability against FHE-encrypted data with scope-limited key issuance represent novel claims not yet present in any published system. The H33-74 primitive, the Cachee Archive Bundle format, the content-addressed evidence storage model, the witness delivery mechanism, the deprecation-aware verification path, and the cross-instance federation protocol are covered under the pending patent. Claims 124-125 specifically cover batched Merkle response attestation used in the federation synchronization protocol.

20.2 Conclusion

Cachee began as a cache. It became a post-quantum cache. It is becoming a cryptographic evidence infrastructure. The H33-74 primitive — 58 bytes, three independent hardness assumptions, unforgeable without breaking lattice problems, NTRU lattice problems, and hash function security simultaneously — is the atomic unit of evidence for the post-quantum era. The Cachee Archive Bundle makes that evidence portable, permanent, and independently verifiable. The regulator key architecture makes it queryable without disclosure. Every layer produces attestations about the layer below, recursively, with no trust assumptions beyond mathematics.

The infrastructure is being built. The primitives are shipping. The specification is this document.

* * *