FALCON-512 Caching: 690 Bytes at 31ns

May 1, 2026 | 16 min read | Engineering

Post-quantum signatures come in three families, each built on a different mathematical hardness assumption. FALCON-512, based on NTRU lattices, produces the smallest signatures of any NIST post-quantum standard at its security level: 690 bytes. Compare that to ML-DSA-65 at 3,309 bytes or SLH-DSA-128f at 17,088 bytes. When you are caching millions of signatures for high-frequency verification, the size difference is not academic. It is the difference between 690 MB and 17 GB for the same million entries.

This post is a deep dive into FALCON-512 from the perspective of caching infrastructure. We cover the NTRU lattice structure that enables compact signatures, the constant-time sampling challenge that makes key generation expensive, the verification cost that makes caching valuable, and the concrete memory math at scale. FALCON-512 is the most cache-friendly post-quantum signature scheme, and understanding why requires understanding both the cryptography and the systems engineering.

690 B

FALCON-512 Signature

31 ns

Cached Verification

4.8x

Smaller Than ML-DSA-65

NTRU Lattices: Why FALCON Is Different

FALCON is built on NTRU lattices, which are fundamentally different from the module lattices used by ML-DSA (CRYSTALS-Dilithium). The distinction matters because it explains both FALCON's compact output and its implementation complexity.

An NTRU lattice is defined by a pair of polynomials (f, g) in the ring Z[x]/(x^n + 1), where n is a power of 2 (512 for FALCON-512). The public key is h = g/f mod q, where q is a modulus (12289 for FALCON-512). The private key is the pair (f, g) along with a precomputed "NTRU tree" that enables efficient sampling. The security assumption is that given h, it is computationally infeasible to recover f and g. This is the NTRU assumption, which is distinct from the Module-LWE assumption used by ML-DSA and the hash-function assumption used by SLH-DSA.

The signature scheme works by hash-then-sign. To sign a message m, the signer hashes the message to obtain a target point c in the lattice, then uses the private key to find a short lattice vector (s1, s2) such that s1 + s2*h = c mod q. The signature is the compressed form of s2 (s1 can be recomputed by the verifier as s1 = c - s2*h mod q). The shortness of (s1, s2) is the proof that the signer knows the private key: only someone who knows the NTRU trapdoor (f, g) can find short vectors close to c in the NTRU lattice.

The reason FALCON signatures are small is that NTRU lattices have particularly good geometry for short vector sampling. The trapdoor (f, g) defines a "nice" basis for the lattice, and the Gaussian sampling procedure (called "fast Fourier sampling" or "tree sampling") produces vectors that are close to the target with high probability. The resulting signature vector s2 has small coefficients that can be compressed efficiently. In contrast, ML-DSA uses a rejection sampling approach over module lattices, which produces larger signatures because the coefficients have wider distributions and require more bits to encode.

FALCON-512 Size Breakdown

The sizes for FALCON-512 are precisely specified. The public key is 897 bytes: 1 byte for the header (log(n) and format flags) plus 896 bytes for the polynomial h encoded as 512 coefficients of 14 bits each, packed into a byte string. The signature is 690 bytes on average: 41 bytes of header (including a 40-byte salt/nonce used in the hash) plus approximately 649 bytes for the compressed s2 polynomial. The compression uses a variable-length encoding optimized for the Gaussian distribution of the coefficients. Individual signatures vary slightly in size (the NIST specification allows up to 666 bytes for the compressed polynomial, plus the 41-byte header, for a maximum of 707 bytes), but the average is 690 bytes.

The private key is 1,281 bytes: it contains the polynomials f, g, F, G (where F, G form the extended NTRU basis satisfying fG - gF = q) plus precomputed values for the sampling tree. The private key is larger than the public key, which is unusual among signature schemes. This is because the private key stores the full NTRU trapdoor, including the tree structure needed for fast Fourier sampling.

PQ Signature Scheme	Public Key	Signature	Private Key	NIST Level	Hardness Assumption
FALCON-512	897 B	690 B	1,281 B	1	NTRU lattices
ML-DSA-65 (Dilithium)	1,952 B	3,309 B	4,032 B	3	Module-LWE lattices
SLH-DSA-128f (SPHINCS+)	32 B	17,088 B	64 B	1	Hash functions

The comparison is stark. FALCON-512 produces signatures that are 4.8x smaller than ML-DSA-65 and 24.8x smaller than SLH-DSA-128f. The public keys are larger than SLH-DSA's (897 B vs 32 B) but smaller than ML-DSA's (897 B vs 1,952 B). For caching, the signature size is what matters most because signatures are what you cache and transmit repeatedly. Public keys are typically cached once per identity and rarely retransmitted.

Why FALCON-512 Is Ideal for High-Frequency Operations

High-frequency signing and verification scenarios include authentication tokens (signed every session, verified on every request), WebSocket session attestations (signed at connection establishment, verified periodically), IoT device attestations (signed by each device, verified by the gateway), and API request signatures (signed per request, verified at the gateway and at each downstream service).

In all of these scenarios, the system is producing and verifying thousands to millions of signatures per second. The size of each signature directly affects three costs: network bandwidth (transmitting the signature), memory (caching the signature), and verification time (which scales with signature size because the verifier must decompress and process more data). FALCON-512's 690-byte signature minimizes all three costs.

Network Bandwidth

At 100,000 signatures per second transmitted over the network, FALCON-512 consumes 69 MB/s of bandwidth for the signature data alone. ML-DSA-65 consumes 331 MB/s. SLH-DSA-128f consumes 1.71 GB/s. On a typical 10 Gbps internal network, the FALCON signatures are negligible (0.55% of capacity), the ML-DSA signatures are noticeable (2.6%), and the SLH-DSA signatures are a significant fraction of the link (13.7%). At 1 million signatures per second, SLH-DSA would saturate the network link with signature data alone.

Memory for Caching

This is the critical metric for caching infrastructure. The memory required to cache N signatures is N times the signature size plus per-entry overhead (cache metadata, hash map bucket pointers, etc.). With a typical per-entry overhead of 72 bytes (32-byte key + 8-byte pointer + 8-byte TTL + 8-byte frequency counter + 16-byte alignment), the total per-entry cost is signature_size + 72 bytes.

Cached Entries	FALCON-512 (690 B)	ML-DSA-65 (3,309 B)	SLH-DSA-128f (17,088 B)
100,000	76 MB	338 MB	1.72 GB
1,000,000	762 MB	3.38 GB	17.2 GB
10,000,000	7.62 GB	33.8 GB	172 GB

At 1 million cached entries, FALCON-512 requires 762 MB. This fits comfortably in the memory of a single server. ML-DSA-65 at 3.38 GB is feasible but constrains what else can run on the machine. SLH-DSA-128f at 17.2 GB requires either a high-memory instance or a distributed cache, which adds network latency and defeats the purpose of in-process caching. At 10 million entries, only FALCON-512 remains viable for in-process caching on standard server hardware (16-32 GB RAM).

Verification Time

FALCON-512 verification involves decompressing the signature to recover s2, computing s1 = c - s2*h mod q (a polynomial multiplication in Z_q[x]/(x^512 + 1), which can be done via NTT in O(n log n) time), and checking that the norm of (s1, s2) is below a specified bound. The total verification time on production hardware is approximately 1.2 microseconds. ML-DSA-65 verification is faster at approximately 0.8 microseconds because the algebraic structure is simpler (no NTT needed for verification, just matrix-vector multiplication). SLH-DSA-128f verification is the slowest at approximately 3.5 microseconds because it involves multiple hash tree traversals.

However, the absolute verification time is less important than the cached verification time when caching is available. With in-process caching, all three schemes verify at 31 nanoseconds (the cache lookup cost). The difference is in what you are caching: at 31 nanoseconds per lookup, the cost is dominated by memory, not compute. And FALCON-512's 690-byte signatures require 4.8x less memory to cache than ML-DSA-65's 3,309-byte signatures.

The Constant-Time Sampling Challenge

FALCON's primary implementation challenge is constant-time Gaussian sampling during signing. The signing operation requires sampling a short vector from a discrete Gaussian distribution over the NTRU lattice. The canonical sampling algorithm, fast Fourier sampling over the NTRU tree, involves floating-point arithmetic and conditional branches that depend on the private key and the message. A naive implementation leaks timing information that can be exploited to recover the private key.

Constant-time implementations of FALCON signing must replace floating-point arithmetic with fixed-point arithmetic (to avoid variable-time floating-point operations), eliminate all data-dependent branches (using constant-time conditional moves), and implement constant-time table lookups for the CDT (cumulative distribution table) used in Gaussian sampling. These requirements make FALCON signing significantly more complex to implement correctly than ML-DSA signing, which uses uniform rejection sampling that is straightforward to make constant-time.

The key generation is even more expensive because it involves computing the NTRU basis (f, g, F, G) and building the sampling tree. FALCON-512 key generation takes approximately 8-12 milliseconds on production hardware, compared to 0.15 milliseconds for ML-DSA-65. This is a one-time cost per key pair, but it means that FALCON key pairs should be cached aggressively. Generating a new key pair for every session would add 8-12 milliseconds of latency and burn CPU on an operation whose output can be reused millions of times.

Caching Key Pairs

The caching strategy for FALCON key pairs is different from the caching strategy for signatures. Key pairs are long-lived (days to months), few in number (one per identity or service), and expensive to generate. Signatures are short-lived (minutes to hours), numerous (one per operation), and relatively cheap to generate (approximately 0.5 milliseconds for FALCON-512).

For key pairs, the cache entry is the full (public_key, private_key) tuple: 897 + 1,281 = 2,178 bytes per entry. At 10,000 cached key pairs, this is 21.8 MB. The cache key is the identity or service identifier. The TTL matches the key rotation policy (typically 24 hours to 90 days). The eviction policy is irrelevant at this scale because 10,000 entries fit trivially in memory.

For signatures, the cache entry depends on your use case. If you are caching the full signature for retransmission, the entry is 690 bytes. If you are caching only the verification result (boolean: valid or invalid), the entry is 33 bytes (32-byte content hash + 1-byte result). The verification-result approach is 21x more memory-efficient and is the correct choice when the signature itself does not need to be retransmitted.

Caching Verification Results vs. Full Signatures

There are two distinct caching strategies for FALCON-512 signatures, and the choice depends on your architecture.

Strategy 1: Cache the full signature. The cache stores the complete 690-byte signature, keyed by the content hash (32 bytes). This strategy is appropriate when downstream services need the actual signature bytes -- for example, when the signature is part of an attestation chain and must be forwarded to the next verifier. The memory cost is 690 + 72 = 762 bytes per entry (762 MB per million entries). The cache hit returns the signature without needing to re-sign, saving the 0.5-millisecond signing cost.

Strategy 2: Cache the verification boolean. The cache stores only whether the signature was valid (1 byte), keyed by the computation fingerprint (32 bytes). The fingerprint is SHA3-256(signature_bytes || public_key_hash || message_hash). This strategy is appropriate when the verifier only needs to know whether the signature is valid and does not need to retransmit the signature bytes. The memory cost is 33 + 72 = 105 bytes per entry (105 MB per million entries). The cache hit skips the 1.2-microsecond verification.

// Strategy 1: Cache full signature
fn get_or_sign(message: &[u8], sk: &FalconPrivateKey) -> FalconSignature {
    let key = sha3_256(message);
    if let Some(sig) = SIGNATURE_CACHE.get(&key) {
        return sig;  // 31ns, saves 0.5ms signing
    }
    let sig = falcon_sign(sk, message);  // 0.5ms
    SIGNATURE_CACHE.insert(key, sig);
    sig
}

// Strategy 2: Cache verification result
fn verify_cached(sig: &[u8], pk: &FalconPublicKey, msg: &[u8]) -> bool {
    let fp = sha3_256(sig, pk.hash(), sha3_256(msg));
    if let Some(result) = VERIFY_CACHE.get(&fp) {
        return result;  // 31ns, saves 1.2us verification
    }
    let result = falcon_verify(pk, msg, sig);  // 1.2us
    VERIFY_CACHE.insert(fp, result);
    result
}

In most production deployments, both strategies are used simultaneously. The signing service caches full signatures to avoid redundant signing. The verifying services cache verification results to avoid redundant verification. The two caches operate independently and have different sizes, TTLs, and access patterns.

FALCON-512 in a Three-Family Post-Quantum Stack

FALCON-512 is one of three post-quantum signature families standardized by NIST. A robust post-quantum deployment uses all three families, each built on a different hardness assumption. The security model is that an attacker must break NTRU lattices (FALCON), module-LWE lattices (ML-DSA), and stateless hash functions (SLH-DSA) simultaneously. These are three independent mathematical bets. If any one assumption holds, the system remains secure.

In this three-family architecture, each family serves a different role. FALCON-512 handles high-frequency operations: auth tokens, session attestations, and API request signatures. Its compact 690-byte signature minimizes bandwidth and cache memory in the hot path. ML-DSA-65 handles medium-frequency operations: certificate signing, document attestation, and inter-service authentication. Its simpler implementation and FIPS 204 standardization make it the default choice when implementation complexity is a concern. SLH-DSA-128f handles low-frequency, high-assurance operations: root certificate signing, long-lived attestations, and conservative fallback authentication. Its hash-only security assumption provides the ultimate fallback if lattice-based assumptions are broken by quantum algorithms beyond Shor's.

From a caching perspective, FALCON-512 generates the most cache entries (high-frequency operations produce many signatures), ML-DSA-65 generates a moderate number, and SLH-DSA-128f generates very few. The total cache memory is dominated by the FALCON entries because they are the most numerous, even though each individual entry is the smallest. This is the ideal distribution: the most frequent entries are the smallest, and the largest entries are the least frequent.

The Redis Problem at 690 Bytes

A natural question is whether to use Redis (or any external cache) for FALCON-512 signature caching. At 690 bytes per signature, the value size is well within Redis's comfort zone (Redis handles values up to 512 MB). The problem is not value size. The problem is latency.

A Redis GET for a 690-byte value takes approximately 120-180 microseconds on a typical deployment (same-AZ, AWS ElastiCache). This includes network round-trip (80-120 microseconds), Redis command processing (10-20 microseconds), and serialization/deserialization (10-20 microseconds). The in-process DashMap lookup takes 31 nanoseconds. The ratio is approximately 4,000x to 6,000x.

For FALCON-512 verification caching (where the cached value is 1 byte, not 690 bytes), Redis is even more wasteful. You are paying 150 microseconds of network latency to retrieve a single boolean that could be looked up in 31 nanoseconds from in-process memory. The FALCON-512 verification itself only takes 1.2 microseconds. Using Redis to cache a 1.2-microsecond operation adds 150 microseconds of latency -- you are 125x slower with the cache than without it.

The only scenario where Redis makes sense for FALCON signature caching is when you need to share cached signatures or verification results across multiple processes or machines. In that case, the 150-microsecond Redis latency is compared against the alternative of re-signing (0.5 milliseconds) or re-verifying (1.2 microseconds) in each process independently. Redis saves 0.35 milliseconds per cross-process signature cache hit, which is a meaningful savings if the same signature is verified by many different processes.

But even in the cross-process scenario, the optimal architecture is a two-tier cache: an in-process DashMap (L1, 31 nanoseconds) backed by a shared cache (L2, 150 microseconds). The L1 cache handles the 90%+ of lookups that are process-local, and the L2 cache handles the cross-process sharing. Most FALCON verification lookups never leave L1.

FALCON-512 Standardization Status

FALCON is standardized by NIST as FN-DSA (Federal Networked Digital Signature Algorithm), but as of early 2026, the final FN-DSA standard (FIPS 206) has not been published. ML-DSA (FIPS 204) and SLH-DSA (FIPS 205) are published and final. This means FALCON is approved for use but does not yet have a finalized FIPS standard. For deployments that require FIPS compliance today, ML-DSA-65 is the compliant choice. For deployments that prioritize signature compactness and are willing to track the FN-DSA standardization timeline, FALCON-512 is the better choice for high-frequency operations.

Memory Math at Scale

Let us work through the complete memory calculation for a production deployment using FALCON-512 for authentication token signatures. The system has 500,000 concurrent users, each with an active session that produces a FALCON-512 signature. Each signature is verified by 5 downstream services, and each service caches the verification result (not the full signature).

Per-service verification cache: 500,000 entries at 105 bytes each (33 bytes data + 72 bytes overhead) = 52.5 MB. Across 5 services: 262.5 MB total. With a CacheeLFU eviction policy and a cache capacity of 1 million entries per service, the cache is 50% full at steady state, leaving headroom for traffic spikes.

Compare this to ML-DSA-65 at the same scale. If caching full signatures: 500,000 entries at 3,381 bytes each (3,309 + 72) = 1.69 GB per service. Across 5 services: 8.45 GB. If caching verification booleans: 500,000 entries at 105 bytes each = 52.5 MB per service (same as FALCON, because the verification boolean is the same size regardless of signature scheme). The advantage of FALCON-512 manifests when you need to cache the full signature rather than just the verification result.

For signing services that cache full signatures to avoid re-signing, the difference is dramatic. A signing service caching 1 million FALCON-512 signatures uses 762 MB. The same service caching 1 million ML-DSA-65 signatures uses 3.38 GB. The same service caching 1 million SLH-DSA-128f signatures uses 17.2 GB. At FALCON-512's size, a single-server signing cache is practical up to approximately 10 million entries (7.6 GB). At SLH-DSA-128f's size, you hit memory limits at well under 1 million entries on standard hardware.

When to Choose FALCON-512 Over ML-DSA-65

The decision between FALCON-512 and ML-DSA-65 for high-frequency operations comes down to four factors: signature size (FALCON wins), implementation complexity (ML-DSA wins), standardization status (ML-DSA wins today, FALCON catches up when FN-DSA publishes), and the specific hardness assumption you want to rely on (NTRU vs Module-LWE -- both are lattice-based but mathematically distinct).

Choose FALCON-512 when signature volume is high (more than 10,000 signatures per second), cache memory is constrained (less than 4 GB available for signature caching), network bandwidth is limited (FALCON saves 4.8x bandwidth per signature), or you need the NTRU hardness assumption as a diversification bet against potential Module-LWE weaknesses.

Choose ML-DSA-65 when FIPS 204 compliance is required today, implementation simplicity is a priority (no constant-time Gaussian sampling to get right), key generation speed matters (0.15ms vs 8-12ms), or you are already using Module-LWE for key exchange (ML-KEM) and want a consistent algebraic foundation.

In a three-family deployment that uses both FALCON and ML-DSA alongside SLH-DSA, the choice is not either/or. FALCON handles the hot path (high-frequency, cache-sensitive). ML-DSA handles the warm path (medium-frequency, compliance-sensitive). SLH-DSA handles the cold path (low-frequency, maximum-assurance). Each family is cached independently, with cache sizes and TTLs appropriate to their access patterns.

The Bottom Line

FALCON-512 produces the smallest post-quantum signatures at 690 bytes -- 4.8x smaller than ML-DSA-65 and 24.8x smaller than SLH-DSA-128f. At scale, this size advantage translates to 4.8x less cache memory, 4.8x less network bandwidth, and the ability to cache 10 million signatures in 7.6 GB of in-process memory. Cached verification at 31 nanoseconds eliminates the 1.2-microsecond verification cost entirely. For high-frequency post-quantum signing workloads, FALCON-512 with in-process caching is the optimal combination of compact output, fast cached verification, and memory efficiency.

690-byte PQ signatures cached at 31 nanoseconds. FALCON-512 meets in-process caching.

brew install cachee PQ Key Size Guide