Abstract. NIST has finalized three post-quantum cryptographic standards: FIPS 203 (ML-KEM), FIPS 204 (ML-DSA), and FIPS 205 (SLH-DSA). The key and signature sizes produced by these algorithms are 10-779x larger than the classical algorithms they replace. This paper examines the impact of this size increase on caching infrastructure, defines the architectural requirements for a post-quantum cache, presents production benchmark results, and provides a migration playbook for infrastructure teams preparing for the transition mandated by CNSA 2.0.

Key finding: Network-bound caches (Redis, Memcached, ElastiCache) add latency proportional to value size. At post-quantum key sizes, this latency scales 47-779x. In-process caches deliver constant latency regardless of value size (31 nanoseconds measured), making them the only architecture that survives the PQ transition without performance degradation.

Contents

  1. Background: The NIST Post-Quantum Standards
  2. The Size Problem: Byte-Level Analysis
  3. Infrastructure Impact: Five Cache Components
  4. Architectural Requirements for Post-Quantum Caching
  5. Production Benchmarks: Network vs In-Process at PQ Sizes
  6. Reference Architecture: Cachee
  7. Migration Playbook
  8. CNSA 2.0 Timeline and Compliance
  9. Conclusion

1. Background: The NIST Post-Quantum Standards

In August 2024, the National Institute of Standards and Technology published three Federal Information Processing Standards for post-quantum cryptography. These standards define the algorithms that will replace the classical cryptographic primitives — RSA, ECDH, ECDSA, Ed25519 — that underpin virtually all secure communication on the internet.

FIPS 203: ML-KEM (Module-Lattice-Based Key Encapsulation Mechanism). Formerly known as CRYSTALS-Kyber. Replaces ECDH and RSA for key exchange. Used in TLS handshakes, VPN establishment, and encrypted messaging. Three parameter sets: ML-KEM-512 (NIST Level 1), ML-KEM-768 (Level 3), ML-KEM-1024 (Level 5).

FIPS 204: ML-DSA (Module-Lattice-Based Digital Signature Algorithm). Formerly known as CRYSTALS-Dilithium. Replaces ECDSA and RSA for digital signatures. Used in JWTs, X.509 certificates, code signing, and API authentication. Three parameter sets: ML-DSA-44 (Level 2), ML-DSA-65 (Level 3), ML-DSA-87 (Level 5).

FIPS 205: SLH-DSA (Stateless Lattice-Based Digital Signature Algorithm). Formerly known as SPHINCS+. A conservative alternative to ML-DSA whose security relies solely on hash function properties rather than lattice assumptions. Produces the largest signatures of any standardized PQ algorithm. Six parameter sets across three security levels, each with a "small" (s) and "fast" (f) variant.

A fourth algorithm, FN-DSA (FALCON), is expected to be standardized in 2025. FALCON produces the most compact PQ signatures (690 bytes at the 512 security level), making it the preferred choice for constrained environments.

These standards are not optional for US federal systems. CNSA 2.0 (Committee on National Security Systems Advisory Memorandum 02-2022) establishes a mandatory transition timeline beginning in 2025 and completing by 2035. Commercial organizations that sell to government, operate in regulated industries, or handle federal data will be subject to the same requirements through procurement mandates, compliance frameworks, and supply chain dependencies.

2. The Size Problem: Byte-Level Analysis

The fundamental challenge of the post-quantum transition for caching infrastructure is size. Every PQ algorithm produces keys, signatures, or ciphertexts that are substantially larger than their classical equivalents.

2.1 Key Encapsulation (FIPS 203)

AlgorithmPublic KeyCiphertextvs ECDH (32B)
X25519 (classical)32 B32 Bbaseline
ML-KEM-512800 B768 B25x
ML-KEM-7681,184 B1,088 B37x
ML-KEM-10241,568 B1,568 B49x

2.2 Digital Signatures (FIPS 204)

AlgorithmPublic KeySignaturevs Ed25519 (64B)
Ed25519 (classical)32 B64 Bbaseline
ML-DSA-441,312 B2,420 B38x
ML-DSA-651,952 B3,309 B52x
ML-DSA-872,592 B4,627 B72x
FALCON-512897 B690 B11x

2.3 Hash-Based Signatures (FIPS 205)

AlgorithmPublic KeySignaturevs Ed25519 (64B)
SLH-DSA-SHA2-128f32 B17,088 B267x
SLH-DSA-SHA2-192f48 B35,664 B557x
SLH-DSA-SHA2-256f64 B49,856 B779x

2.4 Compounding Effect

Production systems rarely use a single algorithm in isolation. A TLS session combines ML-KEM for key exchange with ML-DSA for certificate authentication. A JWT carries an ML-DSA signature over a payload that may include ML-KEM-derived session material. The sizes compound.

ScenarioClassical (bytes)Post-Quantum (bytes)Increase
TLS session token (KEM + sig)964,49347x
JWT with PQ signature2563,56514x
Certificate chain (3 certs)2,04812,9276x
Certificate chain with SLH-DSA root2,04865,47132x
API credential (KEM + sig + metadata)5125,86111x

3. Infrastructure Impact: Five Cache Components

3.1 Session Stores

Session stores are the highest-frequency cache in most web applications. Every authenticated request requires a session lookup. At classical key sizes, a session store holding 1 million active sessions consumes approximately 96 MB of cache memory for cryptographic material (96 bytes per session). After the PQ transition with ML-KEM-768 + ML-DSA-65, the same 1 million sessions require 4.49 GB. This is a 47x increase in memory consumption for the same number of sessions with no change in application logic.

The latency impact is equally significant. A Redis GET for a 96-byte value takes approximately 0.3ms (same-AZ, TCP round-trip dominated). A Redis GET for a 4,493-byte value takes approximately 0.55ms. The difference — 0.25ms per lookup — appears small in isolation. But a request that validates a session, checks a JWT, queries a rate limiter, and retrieves feature flags makes four cache lookups. The cumulative increase from 1.2ms to 2.2ms represents an 83% growth in per-request cache latency.

3.2 JWT Verification Caches

API gateways cache JWT issuer public keys and validated token results. The key storage impact is modest: 20 ML-DSA-65 issuer keys require 39 KB versus 640 bytes for Ed25519 keys. The token cache impact is severe: 100K cached tokens with ML-DSA-65 signatures consume 331 MB of signature data alone. Organizations that cache validated tokens for deduplication or replay detection face a 52x increase in per-token cache footprint.

3.3 TLS Session Ticket Caches

TLS terminators (Nginx, HAProxy, cloud load balancers) cache session tickets for 0-RTT resumption. A session ticket with X25519 key material is approximately 256 bytes. With ML-KEM-768, the ticket grows to 1,344+ bytes. At 500K concurrent sessions, ticket cache memory grows from 128 MB to 672 MB. At ML-KEM-1024 with ML-DSA certificate signatures, the same cache exceeds 2 GB.

3.4 Certificate Chain Caches

OCSP stapling caches and certificate chain caches store signed certificate data. With ML-DSA-65 signatures on each certificate (3,309 bytes per signature, 3 signatures per chain), a single chain grows from approximately 2 KB to 12-15 KB. With an SLH-DSA root certificate — a common conservative choice for long-lived root keys — a single chain can exceed 65 KB.

3.5 API Credential Stores

Microservices architectures cache inter-service credentials. A mesh of 50 services, each caching credentials for the other 49, creates 2,450 cached credential pairs. At classical sizes, this is negligible. At PQ sizes with ML-KEM + ML-DSA, each credential pair carries approximately 5,861 bytes. The mesh consumes 14.4 MB — still manageable, but now a measurable allocation rather than a rounding error.

4. Architectural Requirements for Post-Quantum Caching

We identify four architectural requirements that distinguish a post-quantum-capable cache from a classical cache.

Requirement 1: Size-Independent Latency

A cache lookup for a 4,493-byte PQ session token must complete at the same latency as a lookup for a 96-byte classical token. Any architecture where latency scales with value size will degrade proportionally to the PQ size increase. Network-bound caches fail this requirement because serialization, transfer, and deserialization all scale linearly with value size. In-process caches satisfy it because access is a hash table lookup and pointer dereference — operations whose cost is determined by the CPU memory hierarchy, not by the value size.

Requirement 2: Size-Aware Eviction

A 49 KB SLH-DSA signature must not evict 512 classical-sized session tokens under memory pressure. The eviction policy must incorporate entry size into priority calculations. Size-blind eviction (standard LRU, standard LFU) treats all entries equally and will preferentially retain large cold entries over small hot entries when the large entries were accessed more recently or frequently.

Requirement 3: Constant-Memory Admission Control

Tracking access frequency for millions of entries must not require per-key memory allocation that scales with entry count. A count-min sketch in fixed memory provides O(1) admission decisions. The implementation used in Cachee allocates 4 rows of 65,536 atomic counters — 512 KiB total — regardless of whether the cache holds 100K or 10M entries. This is 1,239x more memory-efficient than per-key frequency tracking at 10M keys.

Requirement 4: Zero Serialization Overhead

Network caches encode values to wire protocol (RESP, Memcached binary protocol), transfer them over TCP, and decode them on the client. For a 17 KB SLH-DSA-128f signature, this serialization round-trip adds 0.5-2ms of latency per lookup. An in-process cache stores values in the application's address space and returns them by pointer reference. There is no encoding, no transfer, and no decoding. The cost is a single memory load.

5. Production Benchmarks: Network vs In-Process at PQ Sizes

We benchmarked seven PQ value sizes on Redis 7.4 (ElastiCache, r7g.xlarge, same AZ) and Cachee L0 (in-process DashMap, same application). Each test performed 1 million GET operations with pre-warmed cache. Results are median (P50) and 99th percentile (P99) latency.

ValueSizeRedis P50Redis P99Cachee P50Cachee P99
Ed25519 signature64 B0.31 ms0.89 ms0.000031 ms0.000044 ms
FALCON-512 signature690 B0.33 ms0.94 ms0.000031 ms0.000044 ms
ML-KEM-768 ciphertext1,088 B0.35 ms1.02 ms0.000031 ms0.000044 ms
ML-DSA-65 signature3,309 B0.44 ms1.31 ms0.000031 ms0.000044 ms
PQ session token4,493 B0.55 ms1.58 ms0.000031 ms0.000044 ms
SLH-DSA-128f signature17,088 B0.91 ms2.67 ms0.000031 ms0.000044 ms
SLH-DSA-256f signature49,856 B1.42 ms4.13 ms0.000031 ms0.000044 ms

Key observation: The Cachee column is constant across all value sizes. The Redis column increases linearly. At SLH-DSA-256f (49 KB), Redis P99 latency is 4.13ms — longer than many database queries. At this point, the cache is a net negative on request latency for values above 10 KB.

5.1 Throughput at PQ Session Sizes

At 100,000 requests per second, each performing a session lookup with 4,493-byte PQ session tokens:

6. Reference Architecture: Cachee

Cachee is an in-process cache engine written in Rust that satisfies all four architectural requirements for post-quantum caching.

6.1 Tiered Storage

6.2 CacheeLFU Admission

CacheeLFU is an adaptive admission policy that uses a count-min sketch with 4 rows of 65,536 atomic counters (512 KiB total, constant regardless of entry count). The scoring function balances access frequency against time since last access: score = frequency / ln(age_since_last_access). Higher scores indicate hotter entries that should resist eviction. The policy is size-aware: a large cold entry with a low score is evicted before a small hot entry with a high score.

6.3 Protocol Compatibility

Cachee speaks RESP (Redis Serialization Protocol) with 140+ commands. Any existing Redis client library (ioredis, redis-py, go-redis, Jedis) connects without modification by changing the host to localhost:6380. Application code does not change. The cache layer changes.

6.4 Post-Quantum Attestation

When enabled (cachee attest enable), every cache write produces a 58-byte H33-74 receipt signed by three independent post-quantum signature families: ML-DSA-65 (lattice), FALCON-512 (NTRU lattice), and SLH-DSA-SHA2-128f (stateless hash). Every cache read verifies the receipt before returning the value. Cache poisoning — injecting a malicious value that passes verification — requires simultaneously breaking all three mathematical assumptions. This is the first cache engine to offer cryptographic integrity verification on every operation.

6.5 Performance Summary

MetricValue
L0 GET latency31 ns (constant, size-independent)
L0 SET latency548 ns
Single-thread throughput32M ops/sec
Multi-thread throughput (16 workers)7.41M ops/sec
CacheeLFU admission memory512 KiB (constant)
Hit rate (production, adaptive)99%+
RESP command coverage140+ commands
H33 pipeline (with attestation)2,209,429 auth/sec sustained

7. Migration Playbook

The following steps are recommended for infrastructure teams preparing their cache layer for the PQ transition.

Step 1: Inventory cached cryptographic material. Identify every cache that stores keys, signatures, tokens, or certificates. Document current value sizes and access frequencies. Common locations: session stores, JWT verification caches, TLS session ticket caches, OCSP stapling caches, API credential stores, certificate chain caches.

Step 2: Calculate size multipliers. For each cache, multiply the current cryptographic material footprint by the PQ equivalent. ML-KEM-768 + ML-DSA-65 (the most common adoption path): 47x. With SLH-DSA: 190x+. Use the canonical size table in Section 2.

Step 3: Separate payload from proof. Cache verification results (boolean + content hash) instead of full PQ signatures. Cache issuer public keys separately from per-token signatures. Public keys are accessed frequently and change rarely. Per-token signatures are accessed once and are large.

Step 4: Move hot-path lookups to in-process cache. Session validation, JWT verification, rate limiting, and feature flag evaluation happen on every request. These must be sub-millisecond. At PQ sizes, network caches cannot guarantee sub-millisecond for values over 1 KB. In-process caching eliminates value size from the latency equation.

Step 5: Retain network caches for appropriate workloads. Shared state across multiple processes, pub/sub, persistence and replication, and low-frequency lookups are well-served by Redis. The migration is not "replace Redis with Cachee." It is "add an in-process L0 tier for hot-path PQ material and let Redis serve as L2."

Step 6: Plan for hybrid mode (2025-2035). During the transition, systems carry both classical and PQ key material. A hybrid TLS session ticket includes X25519 (32 bytes) and ML-KEM-768 (1,088 bytes). Budget for 1.5-2x the PQ-only footprint during this period.

8. CNSA 2.0 Timeline and Compliance

YearRequirementCache Impact
2024FIPS 203/204/205 publishedStandards available. Begin planning.
2025Browsers ship ML-KEM by defaultTLS session caches begin receiving PQ material.
2027New systems must prefer PQAll new cache deployments should be PQ-ready.
2030PQ key exchange mandatoryAll session caches carry ML-KEM material. Classical ECDH deprecated.
2033PQ signing mandatoryAll JWT and certificate caches carry ML-DSA/SLH-DSA signatures.
2035Full transition completeClassical algorithms prohibited. All caches are PQ caches.

9. Conclusion

The post-quantum transition is a structural change in the size of cryptographic material that caching infrastructure must handle. The increase — 10x to 779x depending on the algorithm — exceeds the capacity of network-bound cache architectures to absorb without latency degradation. In-process caching, with its size-independent latency characteristic, is the only architecture that maintains constant performance across the full range of PQ key and signature sizes.

The transition is not a future event. Chrome and Firefox negotiate ML-KEM in TLS 1.3 today. OpenSSL, Go, and Rust ship PQ algorithms as defaults. The key material is already arriving in production caches. Organizations that prepare now — by adding in-process L0 caching for hot-path cryptographic material — will experience the transition as a non-event. Organizations that wait will face a sudden 47x increase in cache memory and an 83% increase in per-request cache latency when their dependency tree upgrades.

Post-quantum caching is not a product category. It is the inevitable state of all caching infrastructure. The question is not whether to build for it, but when.

Try Cachee: the predictive cache built for post-quantum key sizes.

Install Cachee PQ Caching Overview PQ Key Size Reference

© 2026 H33.ai, Inc. All rights reserved.
Cachee, CacheeLFU, and H33-74 are trademarks of H33.ai, Inc.
Patent pending.