FHE caching is the practice of storing the results of fully homomorphic encryption computations so that identical computations are never repeated. FHE operations are 1,000 to 10,000 times slower than their plaintext equivalents. A single encrypted inner product on BFV-128 takes approximately 1 millisecond. The decrypted result of that computation, once cached, can be retrieved in 31 nanoseconds -- a speedup of over 32,000x. The cached result is bound to the exact input ciphertexts, FHE parameters, and computation function via a computation fingerprint.

Can you cache FHE computation results?

Yes. FHE computations are deterministic -- the same encrypted inputs processed through the same homomorphic operations with the same parameters always produce the same decrypted result. The decrypted output (a plaintext value, typically a few bytes to a few kilobytes) can be cached with a computation fingerprint that binds it to the input ciphertext hashes, FHE scheme parameters (N, Q, t), the computation function, and the engine version. Subsequent identical queries retrieve the cached plaintext result directly, bypassing the entire encrypt-compute-decrypt pipeline.

What gets cached in FHE caching -- the ciphertext or the result?

The decrypted result, not the ciphertext. Ciphertexts are enormous: a single BFV-128 ciphertext (N=4096, one modulus) is 65KB. A BFV-256 ciphertext (N=32768, multiple moduli) can exceed 2MB. Caching ciphertexts would consume gigabytes of memory for modest workloads. The decrypted result -- the actual answer to the computation -- is typically a few bytes to a few kilobytes. By caching the output rather than the encrypted intermediate, FHE caching achieves memory efficiency that makes it practical at scale.

Yes, when implemented correctly. The cached value is the decrypted result, which is the same value the application would compute on every request anyway. The computation fingerprint ensures that a cached result can only be returned when the exact same inputs, parameters, and computation function are used. The result is signed by three independent post-quantum signature families (ML-DSA-65, FALCON-512, SLH-DSA), making it tamper-evident. Cache poisoning would require breaking three independent mathematical assumptions simultaneously. The fingerprint also includes the engine version, preventing stale results from buggy prior versions from being served.

BFV CKKS Post-Quantum Signed 10,000x Slowdown Eliminated

FHE Caching

Fully homomorphic encryption is 10,000x slower than plaintext.
Cache the decrypted result. Serve it at 31 nanoseconds.
The layer that eliminates redundant homomorphic computation.

10,000x

FHE Slowdown vs Plaintext

31ns

Cached Result

100%

Accuracy (Same Computation)

PQ Families Signed

Definition

FHE caching stores the results of fully homomorphic encryption computations. FHE operations are 1,000 to 10,000x slower than their plaintext equivalents. A single encrypted inner product on BFV-128 takes roughly 1 millisecond. The decrypted result of that computation, once cached, can be retrieved in 31 nanoseconds. A cached result eliminates the recomputation entirely -- no key generation, no encryption, no homomorphic evaluation, no decryption. The same answer, from cache, in nanoseconds.

The Gap

100 milliseconds vs 31 nanoseconds

The same computation result. One path re-encrypts, re-evaluates, and re-decrypts. The other remembers.

FHE Computation (encrypt + evaluate + decrypt) 100,000,000 ns

Re-compute every time

Cached Result (hash lookup + pointer dereference) 31 ns

3,225,806x

No encryption. No homomorphic evaluation. No decryption. Just the answer, from cache.

Critical Distinction

What Gets Cached: The Output, Not the Ciphertext

This is the most important architectural decision in FHE caching. You do not cache the ciphertext. You cache the decrypted result.

Ciphertexts are enormous. A single BFV-128 ciphertext with N=4096 and one 56-bit modulus is 65,536 bytes. BFV-256 with N=32768 and multiple moduli can exceed 2 megabytes per ciphertext. Caching ciphertexts would consume gigabytes of memory for even modest workloads -- and you would still need to decrypt them on every access.

The decrypted result -- the actual answer -- is typically a few bytes to a few kilobytes. A biometric match verdict is 1 byte. An aggregated statistic is 8 bytes. An ML inference result is a few hundred bytes. That is what gets cached.

Do Not Cache: Ciphertexts

65KB+

Per ciphertext (BFV-128, N=4096)

Too large. Still requires decryption on every read. N * M * 8 bytes per ciphertext, where M is the number of RNS moduli. A 10,000-entry cache would consume 650MB to 20GB.

Cache This: Decrypted Results

1-512B

Per cached result

Tiny. Ready to use. No decryption needed on read. A 10,000-entry cache fits in 5MB. The result is signed by three PQ families and bound to its computation fingerprint.

FHE Pipeline Deep Dive

Five Stages. All Five Eliminated by Caching.

A typical FHE computation has five stages. On a cache hit, none of them execute.

Key generation15 ms (15%)

Encryption (NTT + sampling)20 ms (20%)

Homomorphic evaluation45 ms (45%)

Key switching / relinearization10 ms (10%)

Decryption (INTT + rounding)10 ms (10%)

Total FHE Computation 100 ms

CACHED: 31 ns

The green sliver is 0.000031% of the red bars above. That is the cached path.

Identity

The Computation Fingerprint

The cache key is not a simple string. It is a computation fingerprint -- a cryptographic digest of everything that determines the FHE computation result. If any input changes, the fingerprint changes, and the cache misses. If everything is the same, the fingerprint matches, and the cached result is returned.

        fingerprint = SHA3-256(
    input_ciphertext_hash   // hash of the encrypted input(s)
    || fhe_scheme           // BFV or CKKS
    || poly_degree          // N (ring dimension: 4096, 8192, 32768)
    || modulus_chain         // Q (ciphertext modulus chain)
    || plaintext_modulus    // t (65537 for BFV, N/A for CKKS)
    || computation_function // hash of the function applied
    || engine_version       // Cachee engine semver
)
    

The fingerprint binds the cached result to the exact computation that produced it. Two different input ciphertexts produce different fingerprints. The same inputs with different FHE parameters (say, N=4096 vs N=8192) produce different fingerprints. The same everything with a different engine version produces a different fingerprint -- so a buggy engine's results are never served after an upgrade.

Why Size Matters

FHE Parameter Impact: Why Cache the Output

Ciphertext size grows with security level and ring dimension. The decrypted result stays small. This is why you cache the output, not the ciphertext.

Scheme	Ring Dim (N)	Security	Ciphertext Size	Typical Computation	Decrypted Result	Cache Impact
BFV-128	4,096	128-bit	65 KB (1 modulus)	Biometric match	1 byte (bool)	65,000x smaller
BFV-256	32,768	256-bit	2+ MB (multi-moduli)	Private database query	~200 bytes (row)	10,000x smaller
CKKS	8,192	128-bit	130 KB (1 modulus)	ML inference	~512 bytes (logits)	260x smaller
CKKS (deep)	32,768	128-bit	4+ MB (multi-level)	Neural network (10+ layers)	~1 KB (output vector)	4,000x smaller

The ratio is always lopsided. FHE ciphertexts are measured in kilobytes to megabytes. Decrypted results are measured in bytes. Caching the output instead of the ciphertext is not an optimization -- it is the only viable architecture.

Architecture

Before and After

Without FHE Caching

Plaintext input arrives

↓

Generate keys (if needed)

↓

Encrypt input (NTT + CBD sampling)

↓

Homomorphic evaluation (45ms+)

↓

Decrypt result (INTT + rounding)

↓

Return plaintext result

↻ Repeat on every request. Same input. Same result. Same 100ms cost.

With FHE Caching

Input arrives

↓

Compute fingerprint (SHA3-256)

↓

Check cache (31ns)

↓

Hit? Return instantly. Miss? Full FHE pipeline, then cache result.

Encrypt once. Evaluate once. Cache forever.

After the first FHE computation, every subsequent identical query is a 31-nanosecond in-process cache lookup. No encryption. No homomorphic evaluation. No decryption. The result is a signed, fingerprinted fact.

Trust Model

How Do You Trust a Cached FHE Result?

The computation fingerprint proves the cached result came from a specific FHE computation with specific parameters. The signatures prove it was not tampered with.

🔒

ML-DSA-65

Lattice-based (MLWE). NIST standard. Successor to Dilithium. 3,309-byte signatures.

🔐

FALCON-512

NTRU lattice-based. Compact signatures (656 bytes). Distinct mathematical basis from ML-DSA.

🔑

SLH-DSA

Stateless hash-based. No lattice assumptions. 17,088-byte signatures. Minimal attack surface.

Three independent hardness assumptions

The fingerprint is deterministic. Anyone can recompute it from the inputs and parameters. If the fingerprint matches, the cache entry was produced by exactly this computation.
The result is signed. Three independent post-quantum signature families attest the cached result. Forging a cached result requires breaking all three simultaneously.
The result is independently verifiable. The cachee-verify tool validates the H33-74 receipt against the signatures and fingerprint with no network call, no Cachee account, and no trust in any third party.

You don't trust the cache. You verify the receipt once, then trust the math.

Applications

Where FHE Caching Applies

🧠

Private ML Inference

Run a neural network on encrypted patient data. The inference takes 500ms. The same patient's re-query hits cache at 31ns. No model or data exposed.

🗃

Encrypted Database Queries

Query an encrypted database without decrypting it. The homomorphic query takes 200ms. Repeated queries for the same predicate are served from cache.

📊

Privacy-Preserving Analytics

Compute aggregations over encrypted data -- sum, average, count -- without seeing individual records. Cache the aggregated result for dashboard refreshes.

👤

Biometric Matching

Compare an encrypted biometric template against encrypted enrolled templates. The FHE inner product takes 1ms per comparison. Cache the match result permanently.

🛡

Confidential Computing

Compute on encrypted data from multiple parties without a trusted third party. Cache the joint computation result so re-runs never re-encrypt.

💰

Encrypted Financial Analysis

Risk scoring, credit checks, fraud detection on encrypted financial records. Same client, same model, same result -- served from cache on repeat queries.

Live Demo

FHE Cache Flow

cachee-fhe-demo

[1] Input arrives: biometric template, 128 dimensions

[2] Compute fingerprint: SHA3-256(input_hash || BFV || N=4096 || Q=56bit || t=65537 || fn=inner_product || v1.2.3)

[3] Cache lookup: MISS

[4] FHE pipeline: encrypt 20ms + evaluate 45ms + decrypt 10ms = 75ms total

[5] Cache result: signed ML-DSA-65 + FALCON-512 + SLH-DSA

[6] H33-74 receipt: 58 bytes stored

[7] Same input arrives again

[8] Cache lookup: HIT -- 31ns

2,419,354x faster. Zero FHE recomputation.

Run it yourself: brew install cachee && cachee-fhe-demo

The Economics

The Cost of Not Caching FHE

FHE is computationally expensive by design. The security comes from the hardness of lattice problems, and evaluating circuits over encrypted data requires polynomial arithmetic in large rings. This cost is unavoidable on the first computation. It is entirely avoidable on every subsequent identical computation.

Workload	FHE Scheme	Queries/Day	CPU Cost Without Cache	CPU Cost With Cache
Biometric Auth	BFV-128	1M	27.8 CPU-hours	0.15 CPU-hours
Private ML	CKKS	100K	13.9 CPU-hours	0.008 CPU-hours
Encrypted DB	BFV-256	500K	27.8 CPU-hours	0.04 CPU-hours
Analytics	CKKS	50K	6.9 CPU-hours	0.004 CPU-hours

Assumes 95% cache hit rate (conservative for stable workloads). At 99% hit rate, the savings are 10x greater. The cache itself costs negligible memory -- decrypted results are bytes, not kilobytes.

Scheme Comparison

BFV vs CKKS: What Changes

BFV (Exact Arithmetic)

Exact

Integer computations, perfect caching

BFV operates on integers modulo a plaintext modulus t. The decrypted result is exact -- the same inputs always produce the same output. FHE caching for BFV has 100% correctness: the cached result is bit-for-bit identical to a fresh computation. Ideal for biometric matching, encrypted search, and integer predicates.

CKKS (Approximate Arithmetic)

~2^-40

Floating-point, precision-bounded caching

CKKS operates on approximate real numbers. Decrypted results have a precision bound (typically 40+ bits of accuracy). The fingerprint includes the precision level, and the cached result is valid within that precision. For ML inference and analytics, this precision is far beyond what the application needs.

Security

What the Fingerprint Prevents

Each field in the computation fingerprint exists to prevent a specific class of cache confusion attack. Remove any field, and the cache becomes unsound.

Without input_hash

Different encrypted inputs could share a cache entry. Patient A's diagnosis could be returned for Patient B's query.

Without computation_hash

Different functions could be confused. A "sum" result could be returned for an "average" query on the same data.

Without parameter_hash

Results from different security levels could be mixed. A BFV-128 result (less precise) could be served for a BFV-256 query.

Without engine_version

A result from a buggy prior version could be served after an upgrade. Version pinning ensures old results expire with the old engine.

Without fhe_scheme

A BFV exact integer result could be returned for a CKKS approximate query, or vice versa. Different schemes produce semantically different results.

Without signatures

A cache poisoning attack could insert a forged result. With three PQ signature families, forgery requires breaking all three simultaneously.

Install

Get Started

brew tap h33ai-postquantum/tap && brew install cachee
cachee init && cachee start

# Cache an FHE computation result
SET fhe:biometric_match_user42 "verified:true" FP <fingerprint_hex>

# Retrieve at 31ns -- no FHE recomputation
GETVERIFIED fhe:biometric_match_user42

# Verify the cached result independently
cachee-verify fhe:biometric_match_user42

140+ Redis-compatible commands. Drop-in for existing infrastructure. The FHE pipeline does not change -- you add a cache check before encryption and a cache write after decryption.

Encrypt once. Evaluate once. Cache the result. Serve it forever at 31ns.

Install Cachee Computation Caching

Deep Dives

→Computation Fingerprinting: Prove What Was Computed →Cache Attestation: Signed Cache Entries →ZK Caching: Cache STARK and SNARK Verification →What is Verifiable Computation Caching? →FALCON vs Dilithium vs SPHINCS+: Which to Cache →Post-Quantum Key Sizes Reference

Knowledge Base

Explore Verifiable Computation Infrastructure

Every page in the Cachee knowledge base. Proven computation, not cached data.

→Post-Quantum Caching
The category definition. Run computation once, serve forever. →ZK Proof Caching
Cache STARK and SNARK verification. 294x speedup. →Computation Fingerprinting
Identity for results. Provenance, not just output. →Cache Attestation
Signed cache entries. Three PQ families per SET. →PQ Key Exchange Caching
ML-KEM at 31ns. Session tickets for post-quantum TLS. →Proof Reuse
Verify once, serve forever. Architecture for verified results. →Cache Bottleneck
Why your cache is slower than your compute. →Redis vs In-Process L1
31ns vs 1ms. The network hop you do not need. →PQ Key Size Reference
Every post-quantum key, ciphertext, and signature size.