Post-Quantum Caching

Architecture for the NIST Post-Quantum Transition

Eric Beans, CEO, H33.ai, Inc.
April 2026
Version 1.0

Abstract. NIST has finalized three post-quantum cryptographic standards: FIPS 203 (ML-KEM), FIPS 204 (ML-DSA), and FIPS 205 (SLH-DSA). The key and signature sizes produced by these algorithms are 10-779x larger than the classical algorithms they replace. This paper examines the impact of this size increase on caching infrastructure, defines the architectural requirements for a post-quantum cache, presents production benchmark results, and provides a migration playbook for infrastructure teams preparing for the transition mandated by CNSA 2.0.

Key finding: Network-bound caches (Redis, Memcached, ElastiCache) add latency proportional to value size. At post-quantum key sizes, this latency scales 47-779x. In-process caches deliver constant latency regardless of value size (31 nanoseconds measured), making them the only architecture that survives the PQ transition without performance degradation.

Background: The NIST Post-Quantum Standards
The Size Problem: Byte-Level Analysis
Infrastructure Impact: Five Cache Components
Architectural Requirements for Post-Quantum Caching
Production Benchmarks: Network vs In-Process at PQ Sizes
Reference Architecture: Cachee
Migration Playbook
CNSA 2.0 Timeline and Compliance
Conclusion

1. Background: The NIST Post-Quantum Standards

In August 2024, the National Institute of Standards and Technology published three Federal Information Processing Standards for post-quantum cryptography. These standards define the algorithms that will replace the classical cryptographic primitives — RSA, ECDH, ECDSA, Ed25519 — that underpin virtually all secure communication on the internet.

FIPS 203: ML-KEM (Module-Lattice-Based Key Encapsulation Mechanism). Formerly known as CRYSTALS-Kyber. Replaces ECDH and RSA for key exchange. Used in TLS handshakes, VPN establishment, and encrypted messaging. Three parameter sets: ML-KEM-512 (NIST Level 1), ML-KEM-768 (Level 3), ML-KEM-1024 (Level 5).

FIPS 204: ML-DSA (Module-Lattice-Based Digital Signature Algorithm). Formerly known as CRYSTALS-Dilithium. Replaces ECDSA and RSA for digital signatures. Used in JWTs, X.509 certificates, code signing, and API authentication. Three parameter sets: ML-DSA-44 (Level 2), ML-DSA-65 (Level 3), ML-DSA-87 (Level 5).

FIPS 205: SLH-DSA (Stateless Lattice-Based Digital Signature Algorithm). Formerly known as SPHINCS+. A conservative alternative to ML-DSA whose security relies solely on hash function properties rather than lattice assumptions. Produces the largest signatures of any standardized PQ algorithm. Six parameter sets across three security levels, each with a "small" (s) and "fast" (f) variant.

A fourth algorithm, FN-DSA (FALCON), is expected to be standardized in 2025. FALCON produces the most compact PQ signatures (690 bytes at the 512 security level), making it the preferred choice for constrained environments.

These standards are not optional for US federal systems. CNSA 2.0 (Committee on National Security Systems Advisory Memorandum 02-2022) establishes a mandatory transition timeline beginning in 2025 and completing by 2035. Commercial organizations that sell to government, operate in regulated industries, or handle federal data will be subject to the same requirements through procurement mandates, compliance frameworks, and supply chain dependencies.

2. The Size Problem: Byte-Level Analysis

The fundamental challenge of the post-quantum transition for caching infrastructure is size. Every PQ algorithm produces keys, signatures, or ciphertexts that are substantially larger than their classical equivalents.

2.1 Key Encapsulation (FIPS 203)

Algorithm	Public Key	Ciphertext	vs ECDH (32B)
X25519 (classical)	32 B	32 B	baseline
ML-KEM-512	800 B	768 B	25x
ML-KEM-768	1,184 B	1,088 B	37x
ML-KEM-1024	1,568 B	1,568 B	49x

2.2 Digital Signatures (FIPS 204)

Algorithm	Public Key	Signature	vs Ed25519 (64B)
Ed25519 (classical)	32 B	64 B	baseline
ML-DSA-44	1,312 B	2,420 B	38x
ML-DSA-65	1,952 B	3,309 B	52x
ML-DSA-87	2,592 B	4,627 B	72x
FALCON-512	897 B	690 B	11x

2.3 Hash-Based Signatures (FIPS 205)

Algorithm	Public Key	Signature	vs Ed25519 (64B)
SLH-DSA-SHA2-128f	32 B	17,088 B	267x
SLH-DSA-SHA2-192f	48 B	35,664 B	557x
SLH-DSA-SHA2-256f	64 B	49,856 B	779x

2.4 Compounding Effect

Production systems rarely use a single algorithm in isolation. A TLS session combines ML-KEM for key exchange with ML-DSA for certificate authentication. A JWT carries an ML-DSA signature over a payload that may include ML-KEM-derived session material. The sizes compound.

Scenario	Classical (bytes)	Post-Quantum (bytes)	Increase
TLS session token (KEM + sig)	96	4,493	47x
JWT with PQ signature	256	3,565	14x
Certificate chain (3 certs)	2,048	12,927	6x
Certificate chain with SLH-DSA root	2,048	65,471	32x
API credential (KEM + sig + metadata)	512	5,861	11x

3. Infrastructure Impact: Five Cache Components

3.1 Session Stores

Session stores are the highest-frequency cache in most web applications. Every authenticated request requires a session lookup. At classical key sizes, a session store holding 1 million active sessions consumes approximately 96 MB of cache memory for cryptographic material (96 bytes per session). After the PQ transition with ML-KEM-768 + ML-DSA-65, the same 1 million sessions require 4.49 GB. This is a 47x increase in memory consumption for the same number of sessions with no change in application logic.

The latency impact is equally significant. A Redis GET for a 96-byte value takes approximately 0.3ms (same-AZ, TCP round-trip dominated). A Redis GET for a 4,493-byte value takes approximately 0.55ms. The difference — 0.25ms per lookup — appears small in isolation. But a request that validates a session, checks a JWT, queries a rate limiter, and retrieves feature flags makes four cache lookups. The cumulative increase from 1.2ms to 2.2ms represents an 83% growth in per-request cache latency.

3.2 JWT Verification Caches

API gateways cache JWT issuer public keys and validated token results. The key storage impact is modest: 20 ML-DSA-65 issuer keys require 39 KB versus 640 bytes for Ed25519 keys. The token cache impact is severe: 100K cached tokens with ML-DSA-65 signatures consume 331 MB of signature data alone. Organizations that cache validated tokens for deduplication or replay detection face a 52x increase in per-token cache footprint.

3.3 TLS Session Ticket Caches

TLS terminators (Nginx, HAProxy, cloud load balancers) cache session tickets for 0-RTT resumption. A session ticket with X25519 key material is approximately 256 bytes. With ML-KEM-768, the ticket grows to 1,344+ bytes. At 500K concurrent sessions, ticket cache memory grows from 128 MB to 672 MB. At ML-KEM-1024 with ML-DSA certificate signatures, the same cache exceeds 2 GB.

3.4 Certificate Chain Caches

OCSP stapling caches and certificate chain caches store signed certificate data. With ML-DSA-65 signatures on each certificate (3,309 bytes per signature, 3 signatures per chain), a single chain grows from approximately 2 KB to 12-15 KB. With an SLH-DSA root certificate — a common conservative choice for long-lived root keys — a single chain can exceed 65 KB.

3.5 API Credential Stores

Microservices architectures cache inter-service credentials. A mesh of 50 services, each caching credentials for the other 49, creates 2,450 cached credential pairs. At classical sizes, this is negligible. At PQ sizes with ML-KEM + ML-DSA, each credential pair carries approximately 5,861 bytes. The mesh consumes 14.4 MB — still manageable, but now a measurable allocation rather than a rounding error.

4. Architectural Requirements for Post-Quantum Caching

We identify four architectural requirements that distinguish a post-quantum-capable cache from a classical cache.

Requirement 1: Size-Independent Latency

A cache lookup for a 4,493-byte PQ session token must complete at the same latency as a lookup for a 96-byte classical token. Any architecture where latency scales with value size will degrade proportionally to the PQ size increase. Network-bound caches fail this requirement because serialization, transfer, and deserialization all scale linearly with value size. In-process caches satisfy it because access is a hash table lookup and pointer dereference — operations whose cost is determined by the CPU memory hierarchy, not by the value size.

Requirement 2: Size-Aware Eviction

A 49 KB SLH-DSA signature must not evict 512 classical-sized session tokens under memory pressure. The eviction policy must incorporate entry size into priority calculations. Size-blind eviction (standard LRU, standard LFU) treats all entries equally and will preferentially retain large cold entries over small hot entries when the large entries were accessed more recently or frequently.

Requirement 3: Constant-Memory Admission Control

Tracking access frequency for millions of entries must not require per-key memory allocation that scales with entry count. A count-min sketch in fixed memory provides O(1) admission decisions. The implementation used in Cachee allocates 4 rows of 65,536 atomic counters — 512 KiB total — regardless of whether the cache holds 100K or 10M entries. This is 1,239x more memory-efficient than per-key frequency tracking at 10M keys.

Requirement 4: Zero Serialization Overhead

Network caches encode values to wire protocol (RESP, Memcached binary protocol), transfer them over TCP, and decode them on the client. For a 17 KB SLH-DSA-128f signature, this serialization round-trip adds 0.5-2ms of latency per lookup. An in-process cache stores values in the application's address space and returns them by pointer reference. There is no encoding, no transfer, and no decoding. The cost is a single memory load.

5. Production Benchmarks: Network vs In-Process at PQ Sizes

We benchmarked seven PQ value sizes on Redis 7.4 (ElastiCache, r7g.xlarge, same AZ) and Cachee L0 (in-process DashMap, same application). Each test performed 1 million GET operations with pre-warmed cache. Results are median (P50) and 99th percentile (P99) latency.

Value	Size	Redis P50	Redis P99	Cachee P50	Cachee P99
Ed25519 signature	64 B	0.31 ms	0.89 ms	0.000031 ms	0.000044 ms
FALCON-512 signature	690 B	0.33 ms	0.94 ms	0.000031 ms	0.000044 ms
ML-KEM-768 ciphertext	1,088 B	0.35 ms	1.02 ms	0.000031 ms	0.000044 ms
ML-DSA-65 signature	3,309 B	0.44 ms	1.31 ms	0.000031 ms	0.000044 ms
PQ session token	4,493 B	0.55 ms	1.58 ms	0.000031 ms	0.000044 ms
SLH-DSA-128f signature	17,088 B	0.91 ms	2.67 ms	0.000031 ms	0.000044 ms
SLH-DSA-256f signature	49,856 B	1.42 ms	4.13 ms	0.000031 ms	0.000044 ms

Key observation: The Cachee column is constant across all value sizes. The Redis column increases linearly. At SLH-DSA-256f (49 KB), Redis P99 latency is 4.13ms — longer than many database queries. At this point, the cache is a net negative on request latency for values above 10 KB.

5.1 Throughput at PQ Session Sizes

At 100,000 requests per second, each performing a session lookup with 4,493-byte PQ session tokens:

Redis: 100K lookups x 0.55ms = 55 seconds of cumulative serialization latency per second. Bandwidth: 449 MB/sec of serialized data through the Redis NIC. Requires 3-4 Redis shards to distribute the load.
Cachee: 100K lookups x 0.000031ms = 3.1 milliseconds of cumulative latency per second. Bandwidth: zero (in-process). Single instance.

6. Reference Architecture: Cachee

Cachee is an in-process cache engine written in Rust that satisfies all four architectural requirements for post-quantum caching.

6.1 Tiered Storage

L0 (hot): 64-shard concurrent DashMap. 31ns reads. Self-promoting on GET — every access increases the entry's eviction priority.
L1 (warm): 128-shard lock-free map. 59ns reads. Entries demoted from L0 under memory pressure land in L1.
L2 (fallthrough): Existing Redis, database, or origin server. Cachee fetches on L0+L1 miss and auto-promotes hot results.

6.2 CacheeLFU Admission

CacheeLFU is an adaptive admission policy that uses a count-min sketch with 4 rows of 65,536 atomic counters (512 KiB total, constant regardless of entry count). The scoring function balances access frequency against time since last access: score = frequency / ln(age_since_last_access). Higher scores indicate hotter entries that should resist eviction. The policy is size-aware: a large cold entry with a low score is evicted before a small hot entry with a high score.

6.3 Protocol Compatibility

Cachee speaks RESP (Redis Serialization Protocol) with 140+ commands. Any existing Redis client library (ioredis, redis-py, go-redis, Jedis) connects without modification by changing the host to localhost:6380. Application code does not change. The cache layer changes.

6.4 Post-Quantum Attestation

When enabled (cachee attest enable), every cache write produces a 58-byte H33-74 receipt signed by three independent post-quantum signature families: ML-DSA-65 (lattice), FALCON-512 (NTRU lattice), and SLH-DSA-SHA2-128f (stateless hash). Every cache read verifies the receipt before returning the value. Cache poisoning — injecting a malicious value that passes verification — requires simultaneously breaking all three mathematical assumptions. This is the first cache engine to offer cryptographic integrity verification on every operation.

6.5 Performance Summary

Metric	Value
L0 GET latency	31 ns (constant, size-independent)
L0 SET latency	548 ns
Single-thread throughput	32M ops/sec
Multi-thread throughput (16 workers)	7.41M ops/sec
CacheeLFU admission memory	512 KiB (constant)
Hit rate (production, adaptive)	99%+
RESP command coverage	140+ commands
H33 pipeline (with attestation)	2,209,429 auth/sec sustained

7. Migration Playbook

The following steps are recommended for infrastructure teams preparing their cache layer for the PQ transition.

Step 1: Inventory cached cryptographic material. Identify every cache that stores keys, signatures, tokens, or certificates. Document current value sizes and access frequencies. Common locations: session stores, JWT verification caches, TLS session ticket caches, OCSP stapling caches, API credential stores, certificate chain caches.

Step 2: Calculate size multipliers. For each cache, multiply the current cryptographic material footprint by the PQ equivalent. ML-KEM-768 + ML-DSA-65 (the most common adoption path): 47x. With SLH-DSA: 190x+. Use the canonical size table in Section 2.

Step 3: Separate payload from proof. Cache verification results (boolean + content hash) instead of full PQ signatures. Cache issuer public keys separately from per-token signatures. Public keys are accessed frequently and change rarely. Per-token signatures are accessed once and are large.

Step 4: Move hot-path lookups to in-process cache. Session validation, JWT verification, rate limiting, and feature flag evaluation happen on every request. These must be sub-millisecond. At PQ sizes, network caches cannot guarantee sub-millisecond for values over 1 KB. In-process caching eliminates value size from the latency equation.

Step 5: Retain network caches for appropriate workloads. Shared state across multiple processes, pub/sub, persistence and replication, and low-frequency lookups are well-served by Redis. The migration is not "replace Redis with Cachee." It is "add an in-process L0 tier for hot-path PQ material and let Redis serve as L2."

Step 6: Plan for hybrid mode (2025-2035). During the transition, systems carry both classical and PQ key material. A hybrid TLS session ticket includes X25519 (32 bytes) and ML-KEM-768 (1,088 bytes). Budget for 1.5-2x the PQ-only footprint during this period.

8. CNSA 2.0 Timeline and Compliance

Year	Requirement	Cache Impact
2024	FIPS 203/204/205 published	Standards available. Begin planning.
2025	Browsers ship ML-KEM by default	TLS session caches begin receiving PQ material.
2027	New systems must prefer PQ	All new cache deployments should be PQ-ready.
2030	PQ key exchange mandatory	All session caches carry ML-KEM material. Classical ECDH deprecated.
2033	PQ signing mandatory	All JWT and certificate caches carry ML-DSA/SLH-DSA signatures.
2035	Full transition complete	Classical algorithms prohibited. All caches are PQ caches.

9. Conclusion

The post-quantum transition is a structural change in the size of cryptographic material that caching infrastructure must handle. The increase — 10x to 779x depending on the algorithm — exceeds the capacity of network-bound cache architectures to absorb without latency degradation. In-process caching, with its size-independent latency characteristic, is the only architecture that maintains constant performance across the full range of PQ key and signature sizes.

The transition is not a future event. Chrome and Firefox negotiate ML-KEM in TLS 1.3 today. OpenSSL, Go, and Rust ship PQ algorithms as defaults. The key material is already arriving in production caches. Organizations that prepare now — by adding in-process L0 caching for hot-path cryptographic material — will experience the transition as a non-event. Organizations that wait will face a sudden 47x increase in cache memory and an 83% increase in per-request cache latency when their dependency tree upgrades.

Post-quantum caching is not a product category. It is the inevitable state of all caching infrastructure. The question is not whether to build for it, but when.

Try Cachee: the predictive cache built for post-quantum key sizes.

Install Cachee PQ Caching Overview PQ Key Size Reference