FALCON vs Dilithium vs SPHINCS+: Cache Guide
NIST standardized three post-quantum digital signature families. Not one. Not two. Three. This was deliberate. Each family relies on a different mathematical hardness assumption. If one assumption breaks -- if someone discovers an efficient quantum algorithm for lattice problems, or finds a way to invert hash functions faster than brute force -- the other families survive. The diversity is the security. But diversity comes with a practical cost: each family has radically different sizes, performance characteristics, and cache implications. Choosing which to deploy means choosing your cache architecture.
This guide compares FALCON (FN-DSA), ML-DSA (Dilithium), and SLH-DSA (SPHINCS+) through the lens that matters most for production systems: how much memory does each family consume in cache, how fast are the operations that caching eliminates, and what does the memory table look like at 100,000, 1 million, and 10 million cached entries. The answer determines whether your cache fits in a single server's RAM, requires a distributed cluster, or breaks your infrastructure budget entirely.
Family 1: FALCON (FN-DSA) -- NTRU Lattices
The Math
FALCON's security is based on the hardness of the NTRU lattice problem. Specifically, it relies on the difficulty of finding short vectors in NTRU-structured lattices. The NTRU problem is distinct from the Module Learning With Errors (MLWE) problem that underpins ML-DSA. While both are lattice problems, they involve different lattice structures and different reduction techniques. An algorithm that breaks MLWE does not necessarily break NTRU, and vice versa. This independence is what makes combining FALCON with ML-DSA meaningful for defense in depth.
FALCON uses a "hash-and-sign" paradigm with a trapdoor sampler based on the GPV framework (Gentry, Peikert, Vaikuntanathan). The signer generates a lattice basis (the private key), uses it to sample a short lattice vector that is close to the hash of the message, and outputs this short vector as the signature. The verifier checks that the signature is indeed a short vector in the NTRU lattice defined by the public key. The "short" criterion is what makes forgery hard: finding a short vector in an NTRU lattice without the trapdoor is believed to be computationally infeasible.
The Sizes
| Parameter | FALCON-512 (Level 1) | FALCON-1024 (Level 5) |
|---|---|---|
| Public key | 897 bytes | 1,793 bytes |
| Private key | 1,281 bytes | 2,305 bytes |
| Signature | 690 bytes (avg) | 1,330 bytes (avg) |
| Sig + PK (cached) | 1,587 bytes | 3,123 bytes |
FALCON signatures are variable-length because the Gaussian sampling process produces vectors of different lengths depending on the random coins. The 690-byte figure for FALCON-512 is the average; signatures can range from approximately 650 to 710 bytes. The 1,330-byte figure for FALCON-1024 similarly varies. This variable length has a minor cache implication: the cache must accommodate the maximum signature size, not the average, or use variable-length value storage with associated memory management complexity.
The Performance
FALCON-512 key generation takes 8-12 milliseconds -- significantly slower than ML-DSA's sub-millisecond keygen. This is because FALCON keygen requires generating an NTRU lattice basis with specific properties, which involves polynomial factoring and basis reduction. Signing takes 0.5-2 milliseconds, dominated by the Gaussian sampling step. Verification takes 0.1-0.3 milliseconds, which is competitive with ML-DSA and faster than SLH-DSA.
The slow keygen is rarely a cache concern because keys are generated infrequently and cached for their entire lifetime. The signing speed matters for systems that sign frequently (transaction processing, real-time attestation), and the 0.5-2 millisecond range is acceptable for most applications. The verification speed is the primary cache target: caching verification results eliminates the 0.1-0.3 millisecond cost on cache hits, replacing it with a 35-nanosecond hash map lookup.
The Cache Implication
FALCON is the most cache-friendly PQ signature family. At 1,587 bytes per cached entry (signature + public key), one million entries require 1.59 GB. This fits comfortably in a single server's RAM. Even 10 million entries at 15.9 GB is feasible on a high-memory instance. The compact signatures mean that FALCON-based systems have the lowest cache memory footprint of any PQ signature family, which directly translates to lower infrastructure costs and simpler cache architectures.
Family 2: ML-DSA (Dilithium) -- Module Lattices
The Math
ML-DSA's security is based on the Module Learning With Errors (MLWE) problem. MLWE is a generalization of the Learning With Errors (LWE) problem to module lattices, which are lattices defined over polynomial rings. The signer generates a random matrix A (the public key), commits to a secret short vector s, and produces a signature by combining the message hash with the secret vector using a "Fiat-Shamir with aborts" technique. The verifier checks that the signature, when combined with the public key matrix, produces a value consistent with the message hash.
The "Fiat-Shamir with aborts" technique is the key to ML-DSA's simplicity. The signer generates a candidate signature and checks whether it leaks information about the secret key. If it does (the "abort" condition), the signer discards the candidate and tries again. This rejection sampling approach is simpler to implement in constant time than FALCON's Gaussian sampling, which is why ML-DSA is the default recommendation for most implementations. The abort probability is calibrated so that signing succeeds within a small number of attempts (typically 4-7 on average), keeping the expected signing time low.
The Sizes
| Parameter | ML-DSA-44 (Level 2) | ML-DSA-65 (Level 3) | ML-DSA-87 (Level 5) |
|---|---|---|---|
| Public key | 1,312 bytes | 1,952 bytes | 2,592 bytes |
| Private key | 2,560 bytes | 4,032 bytes | 4,896 bytes |
| Signature | 2,420 bytes | 3,309 bytes | 4,627 bytes |
| Sig + PK (cached) | 3,732 bytes | 5,261 bytes | 7,219 bytes |
ML-DSA signatures are fixed-length, which simplifies cache memory management. Each signature at a given security level is exactly the same size, enabling fixed-size slab allocation for cache entries. This eliminates memory fragmentation that can occur with FALCON's variable-length signatures.
The Performance
ML-DSA key generation is fast: under 0.5 milliseconds for all parameter sets. Signing takes 1-3 milliseconds (including the expected number of rejection sampling iterations). Verification takes 0.3-0.5 milliseconds for ML-DSA-65. These timings are consistent and predictable, which simplifies capacity planning. The verification time is the cache target: at 0.3-0.5 milliseconds per ML-DSA-65 verification, caching delivers a 10,000-15,000x speedup on hits (0.3 milliseconds vs 35 nanoseconds).
The predictable performance of ML-DSA is one of its strongest practical advantages. FALCON's Gaussian sampling can occasionally produce outlier signing times (up to 5-10 milliseconds in rare cases). ML-DSA's rejection sampling has a bounded expected number of iterations with tight concentration. For systems that require predictable worst-case latency (financial transactions, real-time systems), ML-DSA's consistent timing profile is preferable.
The Cache Implication
ML-DSA is the middle ground: larger than FALCON, smaller than SLH-DSA. At ML-DSA-65 (Level 3), one million cached entries at 5,261 bytes each require 5.26 GB. This is feasible on a single server with 16+ GB of RAM but leaves less headroom for application memory than FALCON. At 10 million entries, 52.6 GB requires a high-memory instance or distributed caching. ML-DSA-87 at Level 5 is even larger: 7,219 bytes per entry, 7.22 GB for one million entries, 72.2 GB for 10 million entries.
The fixed-length signatures simplify cache entry sizing. Each cache entry for ML-DSA-65 is exactly 5,261 bytes (signature + public key) plus metadata. This enables a slab allocator with fixed 6 KB slabs (5,261 bytes payload + metadata + alignment padding), eliminating fragmentation and simplifying memory accounting.
Family 3: SLH-DSA (SPHINCS+) -- Hash Functions
The Math
SLH-DSA is fundamentally different from FALCON and ML-DSA. It relies on no lattice assumption whatsoever. Its security is based solely on the security of the underlying hash function (SHA-256 or SHAKE-256). As long as the hash function is preimage-resistant, second-preimage-resistant, and collision-resistant, SLH-DSA is secure. This makes SLH-DSA the most conservative choice: it survives even if both NTRU lattices and module lattices are broken by a future algorithm. The only way to break SLH-DSA is to break the hash function, and hash functions have withstood decades of cryptanalysis with no realistic attacks approaching practical threat levels.
SLH-DSA constructs a signature using a hypertree of one-time signatures (WOTS+ instances) and a few-time signature scheme (FORS). The signer uses the message hash to select a FORS key pair, signs the message with FORS, and then authenticates the FORS public key using a chain of WOTS+ signatures arranged in a hypertree. The result is a signature that contains multiple WOTS+ signatures, FORS signature components, and Merkle authentication paths. This structure makes SLH-DSA signatures large -- much larger than FALCON or ML-DSA signatures.
The Sizes
| Parameter | SLH-DSA-128f (Level 1) | SLH-DSA-192f (Level 3) | SLH-DSA-256f (Level 5) |
|---|---|---|---|
| Public key | 32 bytes | 48 bytes | 64 bytes |
| Private key | 64 bytes | 96 bytes | 128 bytes |
| Signature | 17,088 bytes | 35,664 bytes | 49,856 bytes |
| Sig + PK (cached) | 17,120 bytes | 35,712 bytes | 49,920 bytes |
The size asymmetry is remarkable. SLH-DSA public keys are tiny -- 32 bytes for Level 1, the same as a classical Ed25519 key. But the signatures are enormous: 17,088 bytes for SLH-DSA-128f, which is 24.8x larger than FALCON-512's 690-byte signature and 7x larger than ML-DSA-65's 3,309-byte signature. The "f" suffix denotes the "fast" variant, which optimizes for signing and verification speed at the cost of larger signatures. The "s" (small) variant has signatures approximately 40% smaller but with 5-10x slower signing.
The Performance
SLH-DSA-128f key generation is fast: approximately 0.1 milliseconds. Signing takes 3-10 milliseconds, depending on the parameter set. Verification takes 1-3 milliseconds, which is significantly slower than both FALCON (0.1-0.3 ms) and ML-DSA (0.3-0.5 ms). The slow verification is a direct consequence of the signature structure: the verifier must recompute multiple WOTS+ chains and Merkle authentication paths, which involves hundreds of hash function invocations.
The slow verification makes caching even more valuable for SLH-DSA than for the other families. At 2 milliseconds per SLH-DSA-128f verification versus 35 nanoseconds for a cached lookup, the speedup is approximately 57,000x. For a system processing 100,000 SLH-DSA verifications per second with a 90% cache hit rate, caching saves 180 seconds of CPU time per second -- the equivalent of freeing 180 CPU cores from verification duty. No other optimization comes close to this impact.
The Cache Implication
SLH-DSA is the most cache-hostile PQ signature family. At 17,120 bytes per cached entry for SLH-DSA-128f, one million entries require 17.12 GB. At 10 million entries, 171.2 GB exceeds the memory capacity of all but the largest server instances. SLH-DSA-256f at Level 5 is even worse: 49,920 bytes per entry, 49.9 GB for one million entries, 499 GB for 10 million entries. These numbers make network-attached caching completely infeasible and in-process caching a serious memory challenge.
The saving grace for SLH-DSA caching is that the public keys are tiny (32-64 bytes). If you cache only the verification results (fingerprint + valid/invalid, 41 bytes per entry) and not the full signatures, the cache memory drops to 41 MB per million entries -- the same as any other scheme. The decision of what to cache depends on whether you need to re-verify from the cached material or whether you only need the verification result. For most production systems, caching only the verification result is sufficient, and the 17 KB signatures are transmitted but not stored in cache.
The Complete Comparison
| Attribute | FALCON-512 | ML-DSA-65 | SLH-DSA-128f |
|---|---|---|---|
| NIST standard | FIPS 206 | FIPS 204 | FIPS 205 |
| Hardness assumption | NTRU lattices | Module LWE | Hash functions |
| Security level | Level 1 | Level 3 | Level 1 |
| Public key | 897 B | 1,952 B | 32 B |
| Signature | 690 B | 3,309 B | 17,088 B |
| Keygen time | 8-12 ms | <0.5 ms | ~0.1 ms |
| Sign time | 0.5-2 ms | 1-3 ms | 3-10 ms |
| Verify time | 0.1-0.3 ms | 0.3-0.5 ms | 1-3 ms |
| Sig+PK per entry | 1,587 B | 5,261 B | 17,120 B |
| 1M entries | 1.59 GB | 5.26 GB | 17.12 GB |
| 10M entries | 15.9 GB | 52.6 GB | 171.2 GB |
| Cache speedup | ~3,000-8,500x | ~10,000-15,000x | ~30,000-85,000x |
The Three Hardness Assumptions
The reason NIST standardized three families -- not just the smallest (FALCON) or the most standard (ML-DSA) or the most conservative (SLH-DSA) -- is defense in depth through mathematical diversity. Each family's security rests on a different mathematical problem. Breaking the combined attestation of all three families requires breaking NTRU lattices, module lattices, and hash functions simultaneously. These are three independent mathematical bets.
This independence is not hypothetical. The history of cryptography includes multiple instances where a hardness assumption that was believed to be secure was broken by a new algorithm. The RSA problem was undermined (though not broken outright) by advances in integer factoring. The discrete logarithm problem in specific groups was broken by index calculus methods. Lattice problems have seen continuous improvement in attack algorithms (BKZ, sieving, algebraic approaches), though no practical break exists for the NIST-standardized parameter sizes. Hash functions have been remarkably resilient -- no practical preimage or collision attacks exist for SHA-256 or SHA-3 -- which is why SLH-DSA is considered the most conservative choice.
If you sign with all three families, an attacker must break all three to forge a signature. If NTRU lattices fall (breaking FALCON), ML-DSA and SLH-DSA still protect you. If module lattices fall (breaking ML-DSA), FALCON and SLH-DSA still protect you. If both NTRU and module lattices fall (both are lattice problems, and a general lattice breakthrough could affect both), SLH-DSA still protects you because it uses no lattice mathematics at all.
Cache Cost of Three-Family Attestation
Signing with all three families produces three signatures per attestation: FALCON-512 (690 B) + ML-DSA-65 (3,309 B) + SLH-DSA-128f (17,088 B) = 21,087 bytes of signature data. Add three public keys: 897 + 1,952 + 32 = 2,881 bytes. Total per entry: 23,968 bytes. At 1 million entries: 23.97 GB. This is the cost of maximum diversified security. For most applications, you can cache only the verification results (41 bytes per entry) and store the full signatures in cold storage.
Cache Memory at Scale
The following table shows the cache memory required for each family at three scale points: 100,000 entries (a medium-traffic application), 1 million entries (a large-traffic application), and 10 million entries (a hyperscale deployment). These figures assume caching the full signature + public key per entry.
| Scale | FALCON-512 | ML-DSA-65 | SLH-DSA-128f | All Three |
|---|---|---|---|---|
| 100K entries | 159 MB | 526 MB | 1.71 GB | 2.40 GB |
| 1M entries | 1.59 GB | 5.26 GB | 17.12 GB | 23.97 GB |
| 10M entries | 15.9 GB | 52.6 GB | 171.2 GB | 239.7 GB |
At 100,000 entries, all three families are individually cacheable on a standard server. Even the combined three-family attestation at 2.40 GB is manageable. At 1 million entries, FALCON remains comfortable (1.59 GB), ML-DSA is feasible but requires dedicated memory (5.26 GB), SLH-DSA is a significant allocation (17.12 GB), and the combined attestation at 23.97 GB requires a high-memory instance. At 10 million entries, only FALCON fits on a standard high-memory server (15.9 GB). ML-DSA requires distributed caching or extreme memory provisioning (52.6 GB). SLH-DSA at 171.2 GB and the combined attestation at 239.7 GB are infeasible for full-material caching on any single server.
The practical implication is that hyperscale deployments should cache verification results (41 bytes per entry, 41 MB per million) rather than full cryptographic material. The full material can be stored in L2 warm storage or fetched from the originating key server when re-verification is required. This verification-result-only caching strategy is uniform across all three families because the cached result is always the same size: a 32-byte fingerprint plus a 1-byte result plus 8 bytes of metadata.
The Recommendation Matrix
The choice of signature family depends on your primary constraint. The following matrix maps common deployment scenarios to the recommended family and the cache architecture implications.
High-Frequency, Compact Signatures: FALCON
If your system generates and verifies signatures at high frequency (100,000+ per second) and bandwidth or storage is a constraint (mobile clients, IoT devices, high-throughput APIs), FALCON is the right choice. Its 690-byte signatures minimize bandwidth consumption and cache memory. The trade-off is implementation complexity: FALCON's Gaussian sampler must be implemented in constant time to avoid side-channel attacks, and key generation is slower than ML-DSA. Use FALCON when you have the cryptographic engineering expertise to implement it correctly and signature compactness is a primary requirement.
Standard Compliance, Broad Adoption: ML-DSA
If your primary concern is standards compliance (FIPS 204), implementation simplicity, and broad ecosystem support, ML-DSA is the default choice. It is the most widely implemented PQ signature scheme, with reference implementations in every major language and hardware acceleration on multiple platforms. The 3,309-byte signatures at Level 3 are larger than FALCON but manageable at scale. Use ML-DSA when you need a "safe default" that every auditor, regulator, and compliance framework recognizes.
Maximum Security, Conservative Assumptions: SLH-DSA
If your threat model includes the possibility that lattice-based cryptography could be broken (either by a new algorithm or by unexpected advances in quantum computing that specifically target lattice problems), SLH-DSA is the only choice that provides unconditional security against lattice attacks. Its reliance on hash functions alone means it survives any lattice breakthrough. The cost is massive signatures (17,088 bytes at Level 1) and slow verification (1-3 ms). Use SLH-DSA when the data being signed must remain unforgeable for decades and you cannot accept any risk of lattice-based signature compromise.
All Three Together: H33-74
If your threat model requires surviving the simultaneous compromise of any single or pair of hardness assumptions, deploy all three families together. Each attestation carries three signatures: FALCON (NTRU lattices), ML-DSA (module lattices), and SLH-DSA (hash functions). Breaking the attestation requires breaking all three -- NTRU lattices AND module lattices AND hash functions. The H33-74 attestation format packages all three signatures plus their public keys into a 74-byte on-chain commitment (via Merkle compression), with the full material stored off-chain. This approach provides maximum diversified security with minimal on-chain footprint.
For caching, the H33-74 approach caches the composite verification result: a single fingerprint over all three proofs, mapped to a single valid/invalid result. The cache entry is 41 bytes regardless of how many signature families are involved. The verification cost on cache miss is the sum of all three families' verification times (approximately 1.5-3.8 ms total), making cached hits even more valuable at the combined speedup of 43,000-109,000x.
The Bottom Line
FALCON-512 (690B) is the most cache-friendly: 1.59 GB per million entries. ML-DSA-65 (3,309B) is the standard default: 5.26 GB per million entries. SLH-DSA-128f (17,088B) is the most conservative: 17.12 GB per million entries. All three together require 23.97 GB per million entries for full material caching, or 41 MB per million entries for verification-result-only caching. The signature family determines your cache architecture. Choose FALCON for compactness, ML-DSA for compliance, SLH-DSA for maximum conservative security, and all three for defense in depth across three independent hardness assumptions.
Cache any PQ signature family at 35 nanoseconds. FALCON, ML-DSA, SLH-DSA, or all three.
brew install cachee PQ Key Size Reference