FIPS 203 Caching: ML-KEM Key Encapsulation at 31 Nanoseconds

May 10, 2026 | 17 min read | Engineering

FIPS 203 was finalized by NIST in August 2024, standardizing ML-KEM (Module-Lattice-Based Key Encapsulation Mechanism, formerly known as CRYSTALS-Kyber) as the primary post-quantum key exchange standard. Every TLS handshake, every session establishment, every key agreement in your infrastructure will eventually use ML-KEM. The transition is already underway: Chrome, Firefox, and Cloudflare have deployed ML-KEM-768 in TLS 1.3 hybrid key exchange since 2024. Your servers are negotiating post-quantum key encapsulation right now, whether you configured it or not.

The operational impact is substantial. An ML-KEM-768 ciphertext is 1,088 bytes. A classical X25519 key share is 32 bytes. That is a 34x increase in the key exchange payload for every TLS handshake. At 100,000 TLS handshakes per second -- a modest rate for a production API gateway -- that is 104 MB/sec of ciphertext data flowing through your infrastructure just for key exchange. If you are caching these ciphertexts in Redis, you are pushing 104 MB/sec of cache traffic over the network. This is unsustainable.

But you should not be caching the ciphertext at all. The ciphertext is the envelope. The shared secret is the payload. ML-KEM always produces a 32-byte shared secret regardless of parameter set. At every security level -- ML-KEM-512, ML-KEM-768, ML-KEM-1024 -- the shared secret is exactly 32 bytes. Cache the 32-byte shared secret, not the 1,088-byte ciphertext. Use in-process caching at 31 nanoseconds with zero network transfer. This is the difference between a cache infrastructure that buckles under post-quantum key exchange and one that does not even notice the transition.

1,088 B

ML-KEM-768 Ciphertext

32 B

Shared Secret (Always)

31 ns

Cachee L1 Lookup

ML-KEM Parameter Sets: The Complete Reference

FIPS 203 defines three parameter sets targeting NIST security levels 1, 3, and 5. Unlike ML-DSA where Level 2 is the lowest option, ML-KEM starts at Level 1, giving organizations a broader range of security-performance tradeoffs. The parameter sets differ significantly in key sizes and ciphertext sizes, but critically, the shared secret is always 32 bytes.

Parameter	ML-KEM-512 (Level 1)	ML-KEM-768 (Level 3)	ML-KEM-1024 (Level 5)
NIST Security Level	Level 1 (AES-128)	Level 3 (AES-192)	Level 5 (AES-256)
Public Key Size	800 bytes	1,184 bytes	1,568 bytes
Secret Key Size	1,632 bytes	2,400 bytes	3,168 bytes
Ciphertext Size	768 bytes	1,088 bytes	1,568 bytes
Shared Secret Size	32 bytes	32 bytes	32 bytes
Module Rank (k)	2	3	4
Polynomial Degree (n)	256	256	256
Modulus (q)	3,329	3,329	3,329
Noise Parameter (eta1)	3	2	2
Noise Parameter (eta2)	2	2	2
Compressed Coefficients (du)	10	10	11
Compressed Coefficients (dv)	4	4	5
Encapsulation Time	~25 us	~40 us	~55 us
Decapsulation Time	~30 us	~50 us	~70 us

The key insight from this table is in the "Shared Secret Size" row. Every parameter set produces exactly 32 bytes. The ciphertext grows from 768 to 1,568 bytes as security level increases, but the output that your application actually uses -- the shared secret that derives session keys for symmetric encryption -- is constant. This fact is the foundation of the correct caching strategy.

The TLS 1.3 Session Resumption Problem

TLS 1.3 with ML-KEM changes the economics of session establishment. In classical TLS 1.3 with X25519, the key share in the ClientHello is 32 bytes and the key share in the ServerHello is 32 bytes. The total key exchange payload is 64 bytes. With ML-KEM-768 hybrid key exchange (X25519 + ML-KEM-768), the client sends an additional 1,184-byte ML-KEM public key (encapsulation key), and the server responds with a 1,088-byte ciphertext. The total key exchange payload grows from 64 bytes to 2,336 bytes -- a 36x increase.

TLS 1.3 session resumption exists specifically to avoid repeating the full handshake. A client that has previously established a session can resume it using a pre-shared key (PSK) derived from the original handshake, skipping the key exchange entirely. Session resumption saves the 2,336 bytes of ML-KEM key exchange data and the 90 microseconds of encapsulation/decapsulation computation. For this to work, the server must cache the session state, including the shared secret derived from the original ML-KEM key exchange.

Here is where caching becomes critical. At 100,000 new TLS handshakes per second, with a 24-hour session ticket lifetime, the server accumulates up to 8.64 billion session entries. In practice, unique sessions are far fewer due to client reuse, but even at 10% unique rate, that is 864 million session entries. If you cache the full ML-KEM ciphertext (1,088 bytes per entry), you need 873 GB of cache storage. If you cache just the derived shared secret (32 bytes per entry), you need 25.7 GB. The difference is 34x in storage and determines whether your session cache fits on a single server or requires a distributed cluster.

The Network Bandwidth Wall

Even if storage were free, network bandwidth is not. Consider what happens when session resumption lookups go through Redis.

At 100,000 TLS session resumptions per second, each requiring a Redis GET to retrieve the cached session state:

Cached Data	Size per Entry	Bandwidth at 100K/sec	Network Impact
Full ciphertext (ML-KEM-512)	768 B	73 MB/sec (585 Mbps)	Significant
Full ciphertext (ML-KEM-768)	1,088 B	104 MB/sec (832 Mbps)	Severe
Full ciphertext (ML-KEM-1024)	1,568 B	150 MB/sec (1.2 Gbps)	Unsustainable
Shared secret (any level)	32 B	3.1 MB/sec (24.4 Mbps)	Negligible
Cachee L1 (in-process)	32 B	0 MB/sec (0 Mbps)	Zero network

At ML-KEM-768, caching full ciphertexts in Redis consumes 832 Mbps of network bandwidth for cache reads alone. On a standard 10 Gbps network interface, that is 8.3% of total network capacity consumed by cache traffic. Add the cache writes, the application traffic, the database queries, and the inter-service communication, and you are approaching network saturation from cache operations alone.

With Cachee's in-process L1 tier, the network impact is zero. The shared secret is retrieved from the application's own memory space in 31 nanoseconds. There is no serialization, no deserialization, no TCP round-trip, no network contention. The cache lookup is a hash table read in the same address space as the TLS termination code.

104 MB/sec of Redis Traffic for Key Exchange Caching

At 100,000 TLS handshakes per second with ML-KEM-768 ciphertexts cached in Redis, you are consuming 832 Mbps of network bandwidth for cache operations alone. This scales linearly with handshake rate. At 500,000 handshakes per second, it is 4.16 Gbps. The network becomes the bottleneck before the CPU does. Cache the 32-byte shared secret in-process, not the 1,088-byte ciphertext over the network.

The Correct Caching Strategy: Shared Secret Caching

ML-KEM is a key encapsulation mechanism. The ciphertext is the transport. The shared secret is the result. Once decapsulation is complete, the ciphertext has served its purpose. There is no reason to cache it. The application never needs the ciphertext again -- it needs the shared secret to derive session keys.

The cache key should be derived from the session identifier, not from the ciphertext itself. In TLS 1.3, the session ticket is the natural cache key. The cached value is the 32-byte shared secret (or, more precisely, the resumption master secret derived from the shared secret and the handshake transcript). The computation fingerprint binds the cached value to the specific key exchange computation that produced it.

use cachee::{CacheeEngine, ComputationFingerprint, CacheContract};

// Define the key exchange caching contract
let contract = CacheContract::new("ml-kem-768-decaps")
    .freshness_ms(86_400_000)  // 24 hours (TLS session ticket lifetime)
    .strict_mode(true)          // never serve stale session keys
    .attestation_required(true);

// After successful ML-KEM-768 decapsulation
let shared_secret: [u8; 32] = ml_kem_768_decaps(&secret_key, &ciphertext);

// Fingerprint: bind the shared secret to the session and key exchange
let fingerprint = ComputationFingerprint::new()
    .input(&session_ticket_id)    // unique session identifier
    .input(&client_hello_random)  // 32 bytes
    .input(&server_hello_random)  // 32 bytes
    .computation("ml-kem-768-decaps")
    .version("fips-203-final")
    .hardware_class("graviton4")
    .finalize();  // SHA3-256 -> 32 bytes

// Cache the 32-byte shared secret
engine.put(&fingerprint, &shared_secret, &contract);

// Later, on session resumption:
match engine.get(&fingerprint) {
    Some(cached_secret) => {
        // Cache hit: 31ns lookup, zero network, zero decapsulation
        // Saved: ~50us of ML-KEM-768 decapsulation
        // Saved: 1,088 bytes of ciphertext transfer
        derive_session_keys(&cached_secret)
    }
    None => {
        // Cache miss: full handshake required
        perform_full_handshake()
    }
}

The pattern is straightforward. After the initial ML-KEM key exchange succeeds, cache the 32-byte shared secret keyed by the session identifier. On session resumption, look up the shared secret by session identifier. If the cache returns a hit, derive session keys from the cached secret and skip the entire ML-KEM key exchange. If the cache returns a miss, perform a full handshake. The savings per cache hit: 50 microseconds of decapsulation computation and 1,088 bytes of ciphertext transfer.

Security Model for Shared Secret Caching

Caching shared secrets is a security-sensitive operation. The shared secret is the root of all session key material. If it is compromised, every message in the session can be decrypted. The caching strategy must address several security requirements.

Cache entry confidentiality. The cached shared secret must not be readable by unauthorized parties. In Redis, any authenticated client can read any key. If a service that handles user profiles shares a Redis instance with the TLS termination service, a compromised profile service can read cached shared secrets and decrypt TLS sessions. Cachee's Owner/Regulator/Auditor key hierarchy prevents this: only the TLS termination service's Owner key can read the cached shared secret. Regulator and Auditor keys can verify that the cache entry exists and is properly attested without reading the secret itself.

Cache entry integrity. If an attacker modifies a cached shared secret, they can perform a key substitution attack. The TLS termination service would derive session keys from the attacker's chosen secret, allowing the attacker to decrypt and modify all subsequent traffic. Cachee's triple PQ signatures on every cache entry prevent modification: any change to the cached value invalidates the signatures, causing the cache entry to be rejected on read and triggering a full handshake instead.

Forward secrecy preservation. TLS 1.3 provides forward secrecy through ephemeral key exchange. Each session uses a fresh ML-KEM key pair, so compromising one session's key material does not compromise other sessions. Caching shared secrets does not break forward secrecy as long as the cache enforces its freshness window. When the session ticket expires, the cached shared secret must be deleted -- not just expired, but cryptographically invalidated. Cachee's state machine transitions provide this: an expired entry moves to the Expired state with a TransitionProof, and the value is no longer readable.

Harvest-now-decrypt-later defense. The entire point of migrating to ML-KEM is to defend against quantum computers that could break classical key exchange. If you cache the shared secret using classical encryption (AES-GCM with an RSA-wrapped key, for example), you have rebuilt the vulnerability you were trying to eliminate. The cached shared secret must be protected with post-quantum cryptography. Cachee uses ML-DSA-65, FALCON-512, and SLH-DSA-SHA2-128f-simple for cache entry attestation, ensuring that the cache layer does not become the weak link in the PQ migration chain.

Scale Analysis: 100K, 500K, and 1M Handshakes per Second

Let us model three production scenarios to understand the caching requirements at different scales. All scenarios assume ML-KEM-768 (the parameter set that Chrome, Firefox, and Cloudflare have deployed) with a 24-hour session ticket lifetime and 70% session resumption rate (30% full handshakes, 70% resumptions from cache).

100,000 Handshakes per Second

Full handshakes: 30,000/sec requiring ML-KEM-768 encapsulation and decapsulation. CPU cost: 30,000 * 50us = 1.5 CPU-seconds/sec (1.5 cores for decapsulation). Network cost for ciphertext: 30,000 * 1,088 B = 31 MB/sec inbound (from clients).

Session resumptions: 70,000/sec requiring cache lookup. With Redis caching full ciphertexts: 70,000 * 1,088 B = 73 MB/sec cache read traffic. With Cachee caching 32-byte shared secrets in-process: 70,000 * 31ns = 2.17ms of CPU time (negligible), zero network.

Unique cached sessions (24 hours): ~260 million entries (30K new/sec * 86,400 sec, assuming low overlap). Cache memory with full ciphertexts: 263 GB. Cache memory with shared secrets: 7.7 GB. Cache memory with Cachee (shared secret + 80B metadata): 28 GB.

500,000 Handshakes per Second

Full handshakes: 150,000/sec. CPU: 7.5 cores for decapsulation. Network: 156 MB/sec inbound ciphertext.

Session resumptions: 350,000/sec. Redis full ciphertext: 364 MB/sec (2.9 Gbps) cache traffic. Cachee in-process: 10.85ms CPU, zero network.

Unique cached sessions: ~1.3 billion entries. Full ciphertexts: 1.31 TB. Shared secrets only: 38.6 GB. Cachee (shared secret + metadata): 140 GB.

At 500K handshakes per second, Redis ciphertext caching consumes 2.9 Gbps of network bandwidth for cache reads alone. This is nearly a third of a 10 Gbps link. Adding cache writes, application traffic, and database queries makes this approach non-viable on standard network infrastructure.

1,000,000 Handshakes per Second

Full handshakes: 300,000/sec. CPU: 15 cores for decapsulation. Network: 312 MB/sec inbound ciphertext.

Session resumptions: 700,000/sec. Redis full ciphertext: 728 MB/sec (5.8 Gbps) cache traffic. Cachee in-process: 21.7ms CPU, zero network.

Unique cached sessions: ~2.6 billion entries. Full ciphertexts: 2.63 TB. Shared secrets only: 77.2 GB. Cachee (shared secret + metadata): 280 GB.

Scale	Redis (Full CT)	Redis (Secret)	Cachee L1 (Secret)
100K/sec network	73 MB/sec	2.1 MB/sec	0 MB/sec
500K/sec network	364 MB/sec	10.7 MB/sec	0 MB/sec
1M/sec network	728 MB/sec	21.4 MB/sec	0 MB/sec
100K/sec memory	263 GB	7.7 GB	28 GB*
500K/sec memory	1.31 TB	38.6 GB	140 GB*
1M/sec memory	2.63 TB	77.2 GB	280 GB*

*Cachee memory includes 80 bytes of attestation metadata per entry (computation fingerprint, state, PQ signature references). The memory cost is higher than raw shared secret caching, but every entry is independently verifiable and tamper-proof. The tradeoff is 3.5x more memory for full compliance and integrity guarantees on every cached session key.

Zero Network for Session Resumption

TLS 1.3 session resumption with ML-KEM-768 is the highest-volume caching use case in post-quantum infrastructure. At 100K handshakes/sec, caching ciphertexts in Redis consumes 73 MB/sec of network bandwidth. At 1M handshakes/sec, it consumes 728 MB/sec. Cachee's in-process L1 tier eliminates all cache network traffic by serving the 32-byte shared secret from the application's own memory space in 31 nanoseconds. The network is not the bottleneck because the network is not involved.

Implementation: ML-KEM Session Cache with Cachee

The following configuration demonstrates a production-ready ML-KEM session cache. The key design decisions are: cache the shared secret (not the ciphertext), use in-process L1 for zero-network lookups, set freshness to match session ticket lifetime, and enable attestation for integrity verification on every cached session key.

# cachee.toml -- ML-KEM session cache configuration

[engine]
l1_max_entries = 50_000_000      # 50M sessions in-process
l1_eviction = "CacheeLfu"        # evict least-frequently-used sessions
l2_enabled = false               # L1-only for session keys (no network)

[contracts.tls-session-mlkem768]
computation = "ml-kem-768-session"
freshness_ms = 86_400_000        # 24 hours (session ticket lifetime)
strict_mode = true               # never serve expired session keys
attestation_required = true      # PQ-sign every cached secret

[attestation]
enabled = true
algorithms = ["ML-DSA-65", "FALCON-512", "SLH-DSA-SHA2-128f"]
fingerprint_hash = "SHA3-256"
fingerprint_fields = ["session_id", "client_random", "server_random",
                       "computation", "version", "hardware_class"]

[state_machine]
states = ["Active", "Expired"]
require_transition_proof = true
on_expiry = "cryptographic_invalidation"  # zero memory, not just TTL

The on_expiry = "cryptographic_invalidation" setting is critical for session key caching. When a session ticket expires, the cached shared secret is not just removed from the hash table -- it is cryptographically invalidated with a state transition proof. This means an attacker who gains access to cache memory after expiration cannot recover the shared secret from residual memory. The transition from Active to Expired zeroes the secret and records the invalidation proof.

Hybrid Key Exchange Caching

Most current ML-KEM deployments use hybrid key exchange: X25519 + ML-KEM-768. The classical X25519 provides security today, and ML-KEM-768 provides security against future quantum computers. The cached value should be the combined shared secret, not the individual shared secrets from each algorithm.

// Hybrid key exchange: X25519 + ML-KEM-768
let x25519_secret: [u8; 32] = x25519_diffie_hellman(&x25519_sk, &peer_x25519_pk);
let mlkem_secret: [u8; 32] = ml_kem_768_decaps(&mlkem_sk, &mlkem_ciphertext);

// Combine per TLS 1.3 hybrid key exchange draft
let combined_secret: [u8; 32] = hkdf_extract(
    &[x25519_secret.as_ref(), mlkem_secret.as_ref()].concat(),
    b"tls13-hybrid-key"
);

// Cache the combined 32-byte secret
let fingerprint = ComputationFingerprint::new()
    .input(&session_ticket_id)
    .input(&client_hello_random)
    .input(&server_hello_random)
    .computation("hybrid-x25519-mlkem768")
    .version("fips-203-final")
    .hardware_class("graviton4")
    .finalize();

engine.put(&fingerprint, &combined_secret, &contract);

// On resumption: one 31ns lookup replaces both X25519 DH and ML-KEM decaps
// Saved: ~80us total (30us X25519 + 50us ML-KEM-768)
// Saved: 1,120 bytes of key exchange payload (32 X25519 + 1,088 ML-KEM)

Caching the combined secret is more efficient than caching individual secrets because it eliminates both the classical and post-quantum key exchange computations on resumption. A single 31-nanosecond cache lookup replaces approximately 80 microseconds of cryptographic computation -- a 2,580x speedup per session resumption.

Migration Considerations

Organizations migrating from classical-only TLS to hybrid or PQ-only TLS face a transition period where the session cache must handle both types of sessions. The cache should be structured to accommodate this without requiring separate cache infrastructure.

Computation type separation. Use different computation types in the fingerprint for classical, hybrid, and PQ-only sessions. This ensures that a cached classical shared secret is never accidentally used for a PQ session or vice versa.

Freshness window alignment. Classical X25519 session tickets and ML-KEM session tickets may have different lifetimes based on the organization's key rotation policy. Set the cache contract freshness window per computation type, not globally. Classical sessions might use 4-hour tickets while PQ sessions use 24-hour tickets during the transition period.

Memory budgeting. During migration, the cache holds both classical and PQ sessions. Classical sessions are smaller (32 bytes for the X25519 shared secret) and PQ sessions require additional metadata. Budget cache memory for the peak of the migration when both types coexist at maximum volume.

The transition from classical to post-quantum key exchange is the largest cryptographic infrastructure change since the move from RSA to elliptic curves. The key sizes are larger. The ciphertexts are larger. The computation is more expensive. But the output -- the shared secret -- is the same size it has always been: 32 bytes. Cache the output. Do not cache the transport. Use in-process caching to eliminate the network overhead that makes PQ key exchange sizes unmanageable. And ensure that the cache itself provides post-quantum integrity guarantees, because a session cache protected only by classical cryptography is a harvest-now-decrypt-later target for the same adversary you are deploying ML-KEM to defend against.

The FIPS 203 Caching Formula

ML-KEM-768 ciphertexts are 1,088 bytes. The shared secret is always 32 bytes. At 100K TLS handshakes/sec, caching ciphertexts in Redis pushes 104 MB/sec of network traffic. Caching 32-byte shared secrets in-process with Cachee pushes zero. The lookup takes 31 nanoseconds. Every cached session resumption saves 50 microseconds of ML-KEM decapsulation, 1,088 bytes of ciphertext transfer, and the network round-trip to Redis. At scale, this is the difference between ML-KEM being a performance crisis and being invisible.

ML-KEM-768 ciphertexts are 1,088 bytes. Shared secrets are 32 bytes. Cachee serves them at 31ns with zero network and PQ attestation.

Get Started PQ Key Exchange Caching