Architecture for the NIST Post-Quantum Transition
Abstract. NIST has finalized three post-quantum cryptographic standards: FIPS 203 (ML-KEM), FIPS 204 (ML-DSA), and FIPS 205 (SLH-DSA). The key and signature sizes produced by these algorithms are 10-779x larger than the classical algorithms they replace. This paper examines the impact of this size increase on caching infrastructure, defines the architectural requirements for a post-quantum cache, presents production benchmark results, and provides a migration playbook for infrastructure teams preparing for the transition mandated by CNSA 2.0.
Key finding: Network-bound caches (Redis, Memcached, ElastiCache) add latency proportional to value size. At post-quantum key sizes, this latency scales 47-779x. In-process caches deliver constant latency regardless of value size (31 nanoseconds measured), making them the only architecture that survives the PQ transition without performance degradation.
In August 2024, the National Institute of Standards and Technology published three Federal Information Processing Standards for post-quantum cryptography. These standards define the algorithms that will replace the classical cryptographic primitives — RSA, ECDH, ECDSA, Ed25519 — that underpin virtually all secure communication on the internet.
FIPS 203: ML-KEM (Module-Lattice-Based Key Encapsulation Mechanism). Formerly known as CRYSTALS-Kyber. Replaces ECDH and RSA for key exchange. Used in TLS handshakes, VPN establishment, and encrypted messaging. Three parameter sets: ML-KEM-512 (NIST Level 1), ML-KEM-768 (Level 3), ML-KEM-1024 (Level 5).
FIPS 204: ML-DSA (Module-Lattice-Based Digital Signature Algorithm). Formerly known as CRYSTALS-Dilithium. Replaces ECDSA and RSA for digital signatures. Used in JWTs, X.509 certificates, code signing, and API authentication. Three parameter sets: ML-DSA-44 (Level 2), ML-DSA-65 (Level 3), ML-DSA-87 (Level 5).
FIPS 205: SLH-DSA (Stateless Lattice-Based Digital Signature Algorithm). Formerly known as SPHINCS+. A conservative alternative to ML-DSA whose security relies solely on hash function properties rather than lattice assumptions. Produces the largest signatures of any standardized PQ algorithm. Six parameter sets across three security levels, each with a "small" (s) and "fast" (f) variant.
A fourth algorithm, FN-DSA (FALCON), is expected to be standardized in 2025. FALCON produces the most compact PQ signatures (690 bytes at the 512 security level), making it the preferred choice for constrained environments.
These standards are not optional for US federal systems. CNSA 2.0 (Committee on National Security Systems Advisory Memorandum 02-2022) establishes a mandatory transition timeline beginning in 2025 and completing by 2035. Commercial organizations that sell to government, operate in regulated industries, or handle federal data will be subject to the same requirements through procurement mandates, compliance frameworks, and supply chain dependencies.
The fundamental challenge of the post-quantum transition for caching infrastructure is size. Every PQ algorithm produces keys, signatures, or ciphertexts that are substantially larger than their classical equivalents.
| Algorithm | Public Key | Ciphertext | vs ECDH (32B) |
|---|---|---|---|
| X25519 (classical) | 32 B | 32 B | baseline |
| ML-KEM-512 | 800 B | 768 B | 25x |
| ML-KEM-768 | 1,184 B | 1,088 B | 37x |
| ML-KEM-1024 | 1,568 B | 1,568 B | 49x |
| Algorithm | Public Key | Signature | vs Ed25519 (64B) |
|---|---|---|---|
| Ed25519 (classical) | 32 B | 64 B | baseline |
| ML-DSA-44 | 1,312 B | 2,420 B | 38x |
| ML-DSA-65 | 1,952 B | 3,309 B | 52x |
| ML-DSA-87 | 2,592 B | 4,627 B | 72x |
| FALCON-512 | 897 B | 690 B | 11x |
| Algorithm | Public Key | Signature | vs Ed25519 (64B) |
|---|---|---|---|
| SLH-DSA-SHA2-128f | 32 B | 17,088 B | 267x |
| SLH-DSA-SHA2-192f | 48 B | 35,664 B | 557x |
| SLH-DSA-SHA2-256f | 64 B | 49,856 B | 779x |
Production systems rarely use a single algorithm in isolation. A TLS session combines ML-KEM for key exchange with ML-DSA for certificate authentication. A JWT carries an ML-DSA signature over a payload that may include ML-KEM-derived session material. The sizes compound.
| Scenario | Classical (bytes) | Post-Quantum (bytes) | Increase |
|---|---|---|---|
| TLS session token (KEM + sig) | 96 | 4,493 | 47x |
| JWT with PQ signature | 256 | 3,565 | 14x |
| Certificate chain (3 certs) | 2,048 | 12,927 | 6x |
| Certificate chain with SLH-DSA root | 2,048 | 65,471 | 32x |
| API credential (KEM + sig + metadata) | 512 | 5,861 | 11x |
Session stores are the highest-frequency cache in most web applications. Every authenticated request requires a session lookup. At classical key sizes, a session store holding 1 million active sessions consumes approximately 96 MB of cache memory for cryptographic material (96 bytes per session). After the PQ transition with ML-KEM-768 + ML-DSA-65, the same 1 million sessions require 4.49 GB. This is a 47x increase in memory consumption for the same number of sessions with no change in application logic.
The latency impact is equally significant. A Redis GET for a 96-byte value takes approximately 0.3ms (same-AZ, TCP round-trip dominated). A Redis GET for a 4,493-byte value takes approximately 0.55ms. The difference — 0.25ms per lookup — appears small in isolation. But a request that validates a session, checks a JWT, queries a rate limiter, and retrieves feature flags makes four cache lookups. The cumulative increase from 1.2ms to 2.2ms represents an 83% growth in per-request cache latency.
API gateways cache JWT issuer public keys and validated token results. The key storage impact is modest: 20 ML-DSA-65 issuer keys require 39 KB versus 640 bytes for Ed25519 keys. The token cache impact is severe: 100K cached tokens with ML-DSA-65 signatures consume 331 MB of signature data alone. Organizations that cache validated tokens for deduplication or replay detection face a 52x increase in per-token cache footprint.
TLS terminators (Nginx, HAProxy, cloud load balancers) cache session tickets for 0-RTT resumption. A session ticket with X25519 key material is approximately 256 bytes. With ML-KEM-768, the ticket grows to 1,344+ bytes. At 500K concurrent sessions, ticket cache memory grows from 128 MB to 672 MB. At ML-KEM-1024 with ML-DSA certificate signatures, the same cache exceeds 2 GB.
OCSP stapling caches and certificate chain caches store signed certificate data. With ML-DSA-65 signatures on each certificate (3,309 bytes per signature, 3 signatures per chain), a single chain grows from approximately 2 KB to 12-15 KB. With an SLH-DSA root certificate — a common conservative choice for long-lived root keys — a single chain can exceed 65 KB.
Microservices architectures cache inter-service credentials. A mesh of 50 services, each caching credentials for the other 49, creates 2,450 cached credential pairs. At classical sizes, this is negligible. At PQ sizes with ML-KEM + ML-DSA, each credential pair carries approximately 5,861 bytes. The mesh consumes 14.4 MB — still manageable, but now a measurable allocation rather than a rounding error.
We identify four architectural requirements that distinguish a post-quantum-capable cache from a classical cache.
A cache lookup for a 4,493-byte PQ session token must complete at the same latency as a lookup for a 96-byte classical token. Any architecture where latency scales with value size will degrade proportionally to the PQ size increase. Network-bound caches fail this requirement because serialization, transfer, and deserialization all scale linearly with value size. In-process caches satisfy it because access is a hash table lookup and pointer dereference — operations whose cost is determined by the CPU memory hierarchy, not by the value size.
A 49 KB SLH-DSA signature must not evict 512 classical-sized session tokens under memory pressure. The eviction policy must incorporate entry size into priority calculations. Size-blind eviction (standard LRU, standard LFU) treats all entries equally and will preferentially retain large cold entries over small hot entries when the large entries were accessed more recently or frequently.
Tracking access frequency for millions of entries must not require per-key memory allocation that scales with entry count. A count-min sketch in fixed memory provides O(1) admission decisions. The implementation used in Cachee allocates 4 rows of 65,536 atomic counters — 512 KiB total — regardless of whether the cache holds 100K or 10M entries. This is 1,239x more memory-efficient than per-key frequency tracking at 10M keys.
Network caches encode values to wire protocol (RESP, Memcached binary protocol), transfer them over TCP, and decode them on the client. For a 17 KB SLH-DSA-128f signature, this serialization round-trip adds 0.5-2ms of latency per lookup. An in-process cache stores values in the application's address space and returns them by pointer reference. There is no encoding, no transfer, and no decoding. The cost is a single memory load.
We benchmarked seven PQ value sizes on Redis 7.4 (ElastiCache, r7g.xlarge, same AZ) and Cachee L0 (in-process DashMap, same application). Each test performed 1 million GET operations with pre-warmed cache. Results are median (P50) and 99th percentile (P99) latency.
| Value | Size | Redis P50 | Redis P99 | Cachee P50 | Cachee P99 |
|---|---|---|---|---|---|
| Ed25519 signature | 64 B | 0.31 ms | 0.89 ms | 0.000031 ms | 0.000044 ms |
| FALCON-512 signature | 690 B | 0.33 ms | 0.94 ms | 0.000031 ms | 0.000044 ms |
| ML-KEM-768 ciphertext | 1,088 B | 0.35 ms | 1.02 ms | 0.000031 ms | 0.000044 ms |
| ML-DSA-65 signature | 3,309 B | 0.44 ms | 1.31 ms | 0.000031 ms | 0.000044 ms |
| PQ session token | 4,493 B | 0.55 ms | 1.58 ms | 0.000031 ms | 0.000044 ms |
| SLH-DSA-128f signature | 17,088 B | 0.91 ms | 2.67 ms | 0.000031 ms | 0.000044 ms |
| SLH-DSA-256f signature | 49,856 B | 1.42 ms | 4.13 ms | 0.000031 ms | 0.000044 ms |
Key observation: The Cachee column is constant across all value sizes. The Redis column increases linearly. At SLH-DSA-256f (49 KB), Redis P99 latency is 4.13ms — longer than many database queries. At this point, the cache is a net negative on request latency for values above 10 KB.
At 100,000 requests per second, each performing a session lookup with 4,493-byte PQ session tokens:
Cachee is an in-process cache engine written in Rust that satisfies all four architectural requirements for post-quantum caching.
CacheeLFU is an adaptive admission policy that uses a count-min sketch with 4 rows of 65,536 atomic counters (512 KiB total, constant regardless of entry count). The scoring function balances access frequency against time since last access: score = frequency / ln(age_since_last_access). Higher scores indicate hotter entries that should resist eviction. The policy is size-aware: a large cold entry with a low score is evicted before a small hot entry with a high score.
Cachee speaks RESP (Redis Serialization Protocol) with 140+ commands. Any existing Redis client library (ioredis, redis-py, go-redis, Jedis) connects without modification by changing the host to localhost:6380. Application code does not change. The cache layer changes.
When enabled (cachee attest enable), every cache write produces a 58-byte H33-74 receipt signed by three independent post-quantum signature families: ML-DSA-65 (lattice), FALCON-512 (NTRU lattice), and SLH-DSA-SHA2-128f (stateless hash). Every cache read verifies the receipt before returning the value. Cache poisoning — injecting a malicious value that passes verification — requires simultaneously breaking all three mathematical assumptions. This is the first cache engine to offer cryptographic integrity verification on every operation.
| Metric | Value |
|---|---|
| L0 GET latency | 31 ns (constant, size-independent) |
| L0 SET latency | 548 ns |
| Single-thread throughput | 32M ops/sec |
| Multi-thread throughput (16 workers) | 7.41M ops/sec |
| CacheeLFU admission memory | 512 KiB (constant) |
| Hit rate (production, adaptive) | 99%+ |
| RESP command coverage | 140+ commands |
| H33 pipeline (with attestation) | 2,209,429 auth/sec sustained |
The following steps are recommended for infrastructure teams preparing their cache layer for the PQ transition.
Step 1: Inventory cached cryptographic material. Identify every cache that stores keys, signatures, tokens, or certificates. Document current value sizes and access frequencies. Common locations: session stores, JWT verification caches, TLS session ticket caches, OCSP stapling caches, API credential stores, certificate chain caches.
Step 2: Calculate size multipliers. For each cache, multiply the current cryptographic material footprint by the PQ equivalent. ML-KEM-768 + ML-DSA-65 (the most common adoption path): 47x. With SLH-DSA: 190x+. Use the canonical size table in Section 2.
Step 3: Separate payload from proof. Cache verification results (boolean + content hash) instead of full PQ signatures. Cache issuer public keys separately from per-token signatures. Public keys are accessed frequently and change rarely. Per-token signatures are accessed once and are large.
Step 4: Move hot-path lookups to in-process cache. Session validation, JWT verification, rate limiting, and feature flag evaluation happen on every request. These must be sub-millisecond. At PQ sizes, network caches cannot guarantee sub-millisecond for values over 1 KB. In-process caching eliminates value size from the latency equation.
Step 5: Retain network caches for appropriate workloads. Shared state across multiple processes, pub/sub, persistence and replication, and low-frequency lookups are well-served by Redis. The migration is not "replace Redis with Cachee." It is "add an in-process L0 tier for hot-path PQ material and let Redis serve as L2."
Step 6: Plan for hybrid mode (2025-2035). During the transition, systems carry both classical and PQ key material. A hybrid TLS session ticket includes X25519 (32 bytes) and ML-KEM-768 (1,088 bytes). Budget for 1.5-2x the PQ-only footprint during this period.
| Year | Requirement | Cache Impact |
|---|---|---|
| 2024 | FIPS 203/204/205 published | Standards available. Begin planning. |
| 2025 | Browsers ship ML-KEM by default | TLS session caches begin receiving PQ material. |
| 2027 | New systems must prefer PQ | All new cache deployments should be PQ-ready. |
| 2030 | PQ key exchange mandatory | All session caches carry ML-KEM material. Classical ECDH deprecated. |
| 2033 | PQ signing mandatory | All JWT and certificate caches carry ML-DSA/SLH-DSA signatures. |
| 2035 | Full transition complete | Classical algorithms prohibited. All caches are PQ caches. |
The post-quantum transition is a structural change in the size of cryptographic material that caching infrastructure must handle. The increase — 10x to 779x depending on the algorithm — exceeds the capacity of network-bound cache architectures to absorb without latency degradation. In-process caching, with its size-independent latency characteristic, is the only architecture that maintains constant performance across the full range of PQ key and signature sizes.
The transition is not a future event. Chrome and Firefox negotiate ML-KEM in TLS 1.3 today. OpenSSL, Go, and Rust ship PQ algorithms as defaults. The key material is already arriving in production caches. Organizations that prepare now — by adding in-process L0 caching for hot-path cryptographic material — will experience the transition as a non-event. Organizations that wait will face a sudden 47x increase in cache memory and an 83% increase in per-request cache latency when their dependency tree upgrades.
Post-quantum caching is not a product category. It is the inevitable state of all caching infrastructure. The question is not whether to build for it, but when.
Try Cachee: the predictive cache built for post-quantum key sizes.
Install Cachee PQ Caching Overview PQ Key Size Reference
© 2026 H33.ai, Inc. All rights reserved.
Cachee, CacheeLFU, and H33-74 are trademarks of H33.ai, Inc.
Patent pending.