SLH-DSA Signatures: Cache 49KB at 31ns

May 1, 2026 | 16 min read | Engineering

SLH-DSA is the conservative choice in post-quantum cryptography. Its security depends entirely on hash functions -- no lattices, no structured algebraic assumptions, no number theory that a future quantum algorithm might exploit. If SHA-256 and SHAKE-256 remain preimage-resistant, SLH-DSA remains secure. That guarantee comes with a cost: the signatures are enormous. SLH-DSA-128f produces 17,088-byte signatures. SLH-DSA-192f produces 35,664 bytes. SLH-DSA-256f produces 49,856 bytes -- nearly 50 kilobytes per signature. Caching these signatures requires a fundamentally different strategy than caching the 690-byte signatures produced by FALCON-512.

This post covers SLH-DSA from the caching perspective: why the signatures are so large, why they exist despite their size, the "f" versus "s" variant tradeoff, when SLH-DSA is the right choice, and the caching strategy that makes it practical. The core insight: do not cache the full 49 KB signature. Cache the 33-byte verification result instead. That is a 1,511x reduction in cache memory, and it delivers 31-nanosecond lookups regardless of the original signature size.

49,856 B

SLH-DSA-256f Signature

31 ns

Cached Verification

1,511x

Cache Size Reduction

Why SLH-DSA Exists

The post-quantum standardization effort produced three signature families, each built on a different hardness assumption. FALCON uses NTRU lattices. ML-DSA uses module lattices (Module-LWE). SLH-DSA uses hash functions. The three families exist because the cryptographic community is not certain which assumptions will survive the quantum era. Lattice-based cryptography is efficient and well-studied, but the lattice problems (NTRU, Module-LWE) are structured algebraic problems. A future algorithmic breakthrough -- perhaps exploiting the ring or module structure -- could weaken or break lattice-based schemes without affecting hash-based schemes.

SLH-DSA is the insurance policy. Its security reduces to the preimage resistance and collision resistance of standard hash functions (SHA-256, SHAKE-256, or SHA-256 with SHAKE-256 for the "simple" variant). These hash functions have been studied for decades, are well-understood, and are not known to be vulnerable to quantum attacks beyond Grover's algorithm (which provides at most a quadratic speedup, addressed by doubling the output length). If every lattice-based assumption is broken tomorrow, SLH-DSA still works.

This makes SLH-DSA the appropriate choice for operations where security must survive the longest timescale: root certificates (valid for 20-30 years), compliance attestations (must be verifiable decades later), archival signatures (documents that must remain authenticated for the lifetime of the archive), and conservative security postures where the organization prefers hash-based security even at the cost of larger signatures.

The Size Problem: Anatomy of 49 KB

SLH-DSA signatures are large because the signature scheme is fundamentally a many-time signature built on top of a one-time signature scheme (WOTS+), organized into a hypertree of Merkle trees. The signature must include authentication paths through multiple levels of this tree structure, and each authentication path includes multiple hash values.

SLH-DSA Architecture

The SLH-DSA structure has three layers. At the bottom is WOTS+ (Winternitz One-Time Signature), which produces one-time signatures of approximately 1-2 KB each. WOTS+ signatures are organized into XMSS trees (eXtended Merkle Signature Scheme), where each XMSS tree is a Merkle tree whose leaves are WOTS+ public keys. Multiple XMSS trees are organized into a hypertree, where each non-leaf XMSS tree signs the root of the XMSS tree below it.

An SLH-DSA signature includes a FORS (Forest of Random Subsets) signature that signs the message, an authentication path through the bottom-level XMSS tree, WOTS+ signatures at each hypertree level, and authentication paths through each intermediate XMSS tree up to the root. The total signature size is determined by the security parameter (which sets the hash output length), the height of the hypertree (which determines how many XMSS levels exist), and the number of FORS trees (which determines the FORS signature size).

SLH-DSA Variant	Public Key	Signature	Private Key	NIST Level	Signing Time
SLH-DSA-128f	32 B	17,088 B	64 B	1	~36 ms
SLH-DSA-128s	32 B	7,856 B	64 B	1	~520 ms
SLH-DSA-192f	48 B	35,664 B	96 B	3	~68 ms
SLH-DSA-192s	48 B	16,224 B	96 B	3	~1,100 ms
SLH-DSA-256f	64 B	49,856 B	128 B	5	~130 ms
SLH-DSA-256s	64 B	29,792 B	128 B	5	~2,800 ms

Note the remarkable asymmetry: public keys are tiny (32-64 bytes), private keys are small (64-128 bytes), but signatures are enormous (7,856-49,856 bytes). This is the opposite of FALCON, where keys are larger but signatures are compact. The asymmetry comes from the tree structure: the public key is just the root hash of the hypertree, but the signature must include authentication paths through the entire tree, which requires transmitting many intermediate hash values.

The "f" vs "s" Tradeoff

Each SLH-DSA parameter set comes in two variants: "f" (fast signing) and "s" (small signatures). The tradeoff is in the hypertree height. A taller hypertree has more leaves (supporting more signatures from a single key pair), shorter authentication paths per XMSS level (smaller signatures), but requires more tree traversals during signing (slower signing). A shorter hypertree has fewer leaves, longer authentication paths (larger signatures), but fewer tree traversals (faster signing).

For SLH-DSA-256: the "f" variant signs in approximately 130 milliseconds and produces 49,856-byte signatures. The "s" variant signs in approximately 2,800 milliseconds (21.5x slower) and produces 29,792-byte signatures (40% smaller). The verification time is similar for both variants: approximately 3.5-4.0 microseconds for SLH-DSA-256f and approximately 4.5-5.0 microseconds for SLH-DSA-256s.

For caching, the "f" variant is generally preferred because the larger signature size is offset by the much faster signing speed. The signature size only matters for cache memory if you are caching full signatures (which, as we will argue, you should not be doing). If you are caching verification booleans (33 bytes), the "f" and "s" variants have identical cache footprints.

The Caching Strategy: Verification Booleans, Not Full Signatures

The single most important insight for SLH-DSA caching is: do not cache the full signature. Cache the verification result.

A full SLH-DSA-256f signature is 49,856 bytes. With 72 bytes of cache overhead, each full-signature cache entry is 49,928 bytes. The verification boolean (valid/invalid) plus a content fingerprint is 33 bytes. With 72 bytes of cache overhead, each verification-boolean cache entry is 105 bytes.

The ratio is 49,928 / 105 = 475x. You can cache 475 verification results in the same memory as a single full signature. Or equivalently: 1 million SLH-DSA-256f verification results fit in 105 MB, while 1 million full signatures require 49.9 GB.

Cached Entries	Full Signature (49,928 B)	Verification Boolean (105 B)	Reduction
100,000	4.99 GB	10.5 MB	475x
1,000,000	49.9 GB	105 MB	475x
10,000,000	499 GB	1.05 GB	475x

At 10 million entries, full-signature caching requires 499 GB -- roughly half a terabyte of RAM. This is not feasible on any single-server deployment. Verification-boolean caching requires 1.05 GB -- trivially feasible on any modern server.

For SLH-DSA-128f (17,088-byte signatures), the numbers are less extreme but still compelling. Full-signature caching: 17,160 bytes per entry, 17.2 GB per million entries. Verification-boolean caching: 105 bytes per entry, 105 MB per million entries. The reduction is 163x.

The Fingerprint for SLH-DSA Verification Caching

The computation fingerprint for a cached SLH-DSA verification result is a SHA3-256 hash over the signature bytes, the public key, and the message hash. Because SLH-DSA public keys are tiny (32-64 bytes), the fingerprint computation is dominated by hashing the signature bytes.

fingerprint = SHA3-256(
    slh_dsa_signature_bytes  ||   // 17,088 - 49,856 bytes
    public_key               ||   // 32 - 64 bytes
    sha3_256(message)             // 32 bytes (message pre-hashed)
)

// Fingerprint computation time:
// SLH-DSA-128f (17 KB sig): ~17 ns (SHA3-256 at ~1 GB/s)
// SLH-DSA-256f (49 KB sig): ~50 ns (SHA3-256 at ~1 GB/s)
// Cache lookup: 31 ns
// Total cache hit cost: 48-81 ns

For SLH-DSA-256f, the fingerprint computation takes approximately 50 nanoseconds because SHA3-256 must hash the full 49,856-byte signature. Combined with the 31-nanosecond cache lookup, the total cost of a cache hit is approximately 81 nanoseconds. This is still dramatically faster than the full verification (3.5-4.0 microseconds), providing a 43-49x speedup. For SLH-DSA-128f, the fingerprint computation is approximately 17 nanoseconds, for a total cache hit cost of 48 nanoseconds and a 73-83x speedup over the 3.5-microsecond verification.

Note that the fingerprint for SLH-DSA is more expensive to compute than for FALCON-512 (where the 690-byte signature hashes in under 1 nanosecond) or ML-DSA-65 (where the 3,309-byte signature hashes in approximately 3 nanoseconds). This is an inherent consequence of the large signature size. Even though you are not caching the full signature, you must still hash it to compute the fingerprint. The hashing cost is negligible compared to the full verification cost, but it is measurable at the nanosecond scale.

When Full-Signature Caching Is Necessary

There are scenarios where you must cache the full SLH-DSA signature, not just the verification result. These scenarios are rare but important.

Attestation chains. When SLH-DSA signatures are part of an attestation chain that must be forwarded to downstream verifiers, each node in the chain needs the full signature bytes, not just a boolean. The H33-74 attestation pipeline handles this by compressing the multi-family attestation into a 74-byte proof, but the underlying SLH-DSA signature must be available for any verifier that wants to independently verify the attestation from scratch.

Archival storage with verification replay. Archival systems that store signed documents must store the full signature so that future verifiers can re-verify the document without relying on a cache. In this case, the "cache" is really persistent storage, and the 49 KB per signature is the cost of archival. SLH-DSA is often chosen for archival specifically because its hash-based security is the most conservative long-term bet.

Certificate transparency logs. Transparency logs that include SLH-DSA-signed certificates must store the full certificate including the signature. A certificate with an SLH-DSA-256f signature is approximately 50-52 KB (49,856 bytes of signature plus 1-2 KB of certificate metadata). At 100 million certificates, this is 5 TB of storage. Compression helps (SLH-DSA signatures compress approximately 15-20% with zstd because the hash chains have some statistical structure), but the storage cost is still substantial.

In all these scenarios, the caching strategy is different from verification-boolean caching. You cache the full signature on disk or in a high-capacity memory tier, and you cache the verification boolean in the in-process L1 cache for fast verification. The two caches serve different purposes: the full-signature cache enables retransmission and archival, while the verification-boolean cache enables fast re-verification without re-parsing and re-computing the 3.5-microsecond verification pipeline.

Redis at 49 KB: The 1.4ms Problem

Redis latency scales with value size because the server must serialize the value into the RESP protocol, transmit it over TCP, and deserialize it on the client. For small values (under 1 KB), the latency is dominated by the network round-trip and is approximately 100-150 microseconds. For large values, the serialization and transmission time becomes significant.

Value Size	Redis GET Latency	In-Process Lookup	Ratio
33 B (verification boolean)	110 us	31 ns	3,548x
7,856 B (SLH-DSA-128s)	420 us	31 ns	13,548x
17,088 B (SLH-DSA-128f)	680 us	31 ns	21,935x
29,792 B (SLH-DSA-256s)	1,050 us	31 ns	33,871x
35,664 B (SLH-DSA-192f)	1,200 us	31 ns	38,710x
49,856 B (SLH-DSA-256f)	1,400 us	31 ns	45,161x

At 49,856 bytes, a Redis GET takes 1.4 milliseconds. This is 1,400 microseconds to retrieve a value that can be looked up in 31 nanoseconds from in-process memory. The ratio is 45,161x. Using Redis to cache SLH-DSA-256f signatures is not just slower than in-process caching -- it is slower than performing the full SLH-DSA verification from scratch. The verification takes 3.5 microseconds. The Redis retrieval takes 1,400 microseconds. You would be 400x faster by throwing away the cache and re-verifying every time.

This is the fundamental problem with external caching for large post-quantum values. The cache is supposed to be faster than the computation it replaces. When the cached value is 49 KB, the network transmission cost alone exceeds the computation cost. Redis becomes a net-negative optimization -- it makes performance worse, not better.

In-process caching does not have this problem. The DashMap lookup returns a reference to the value in shared memory. There is no serialization, no network transmission, no TCP overhead. The 31-nanosecond cost is the time to hash the key, traverse the hash map buckets, and return a pointer. The value size does not affect the lookup latency because no data is copied or transmitted. A 33-byte verification boolean and a 49,856-byte full signature have the same lookup latency: 31 nanoseconds.

SLH-DSA in the Three-Family Stack

In a three-family post-quantum deployment, SLH-DSA serves as the conservative anchor. Its role is not high-frequency signing (that is FALCON's role) or compliance-critical medium-frequency signing (that is ML-DSA's role). Its role is maximum-assurance, low-frequency signing where the security guarantee must survive the longest timescale.

Typical SLH-DSA operations in a production stack include root certificate signing (once per year or per multi-year rotation), long-lived attestations (once per deployment, valid for months), policy signatures (once per policy update), and fallback authentication (only used if lattice-based schemes are compromised). These operations produce a small number of SLH-DSA signatures (hundreds to thousands per year, not millions per second). The caching requirement is correspondingly modest: a few thousand verification-boolean entries at 105 bytes each is well under 1 MB.

The three families together create a security model based on three independent hardness assumptions. An attacker must break NTRU lattices (FALCON), module-LWE lattices (ML-DSA), and hash preimage resistance (SLH-DSA) simultaneously. These are three independent mathematical bets. If any one assumption holds, the system remains secure. SLH-DSA provides the most conservative bet: hash functions have no known quantum vulnerability beyond Grover's quadratic speedup, which is fully addressed by the parameter sizing.

SLH-DSA Signing Is Slow

SLH-DSA signing is orders of magnitude slower than FALCON or ML-DSA. SLH-DSA-256f takes approximately 130 milliseconds per signature. SLH-DSA-256s takes approximately 2,800 milliseconds (nearly 3 seconds). Compare this to FALCON-512 at 0.5 milliseconds or ML-DSA-65 at 0.8 milliseconds. This signing cost makes SLH-DSA inappropriate for high-frequency operations. It is a low-frequency, high-assurance scheme. If you find yourself needing to sign more than 10 SLH-DSA operations per second, you should reconsider your architecture: the frequent operations should use FALCON or ML-DSA, and SLH-DSA should be reserved for rare, high-value operations.

The Verification-Boolean Cache Architecture

The recommended caching architecture for SLH-DSA uses verification booleans exclusively. Full signatures are stored in a separate persistence layer (disk, object storage, or a specialized archive) but are never cached in the hot-path L1 cache.

// SLH-DSA verification cache: 105 bytes/entry
// 1M entries = 105 MB, 10M entries = 1.05 GB

fn verify_slh_dsa_cached(
    sig: &[u8],        // 17,088 - 49,856 bytes
    pk: &[u8],         // 32 - 64 bytes
    msg: &[u8],        // variable
) -> bool {
    // Step 1: Compute fingerprint (~17-50 ns depending on sig size)
    let fp = sha3_256_concat(sig, pk, &sha3_256(msg));

    // Step 2: Cache lookup (31 ns)
    if let Some(result) = SLH_DSA_CACHE.get(&fp) {
        return result;  // Total: 48-81 ns
    }

    // Step 3: Full SLH-DSA verification (3.5-5.0 us)
    let result = slh_dsa_verify(pk, msg, sig);

    // Step 4: Cache the result
    SLH_DSA_CACHE.insert(fp, result);

    result
}

// For full-signature retrieval (archival, forwarding):
fn get_full_signature(attestation_id: &[u8; 32]) -> Option<Vec<u8>> {
    // NOT in L1 cache -- read from persistent storage
    SIGNATURE_STORE.get(attestation_id)
}

The separation between the verification cache (in-process, 105 bytes per entry, 31ns lookup) and the signature store (persistent, 17-50 KB per entry, millisecond access) is the key architectural decision. The verification cache handles the hot path: "is this signature valid?" The signature store handles the cold path: "give me the full signature bytes for retransmission or archival." These two use cases have completely different access patterns, size requirements, and latency expectations, and they should be served by different storage tiers.

Memory Comparison Across All SLH-DSA Variants

The following table shows the cache memory for all six SLH-DSA parameter sets, comparing full-signature caching versus verification-boolean caching at 1 million entries.

SLH-DSA Variant	Signature Size	Full Sig Cache (1M)	Verify Boolean (1M)	Reduction
SLH-DSA-128f	17,088 B	17.2 GB	105 MB	163x
SLH-DSA-128s	7,856 B	7.93 GB	105 MB	75x
SLH-DSA-192f	35,664 B	35.7 GB	105 MB	340x
SLH-DSA-192s	16,224 B	16.3 GB	105 MB	155x
SLH-DSA-256f	49,856 B	49.9 GB	105 MB	475x
SLH-DSA-256s	29,792 B	29.9 GB	105 MB	285x

The verification-boolean column is constant at 105 MB because the boolean size (33 bytes data + 72 bytes overhead = 105 bytes) does not depend on the signature size. This is the key insight: when you decouple the cached representation from the signature representation, the cache size becomes independent of the cryptographic parameter choices. You can switch from SLH-DSA-128f to SLH-DSA-256f (tripling the signature size) without any change to the cache memory requirement.

This decoupling also means you can use the same cache infrastructure for all three PQ signature families. FALCON-512 verification booleans, ML-DSA-65 verification booleans, and SLH-DSA-256f verification booleans all have the same 105-byte per-entry cost. The cache does not need to know which signature scheme was used. It stores a fingerprint (32 bytes) and a result (1 byte), and it returns the result when queried with the same fingerprint. The cryptographic diversity is in the verification function, not in the cache.

The Bottom Line

SLH-DSA signatures are the largest in post-quantum cryptography: 17,088 bytes at Level 1, 35,664 bytes at Level 3, and 49,856 bytes at Level 5. Caching full signatures is impractical at scale -- 1 million SLH-DSA-256f signatures require 49.9 GB. The solution is to cache verification booleans instead: 33 bytes per entry (a 1,511x reduction for SLH-DSA-256f), 105 MB per million entries, 31 nanoseconds per lookup. Redis takes 1.4 milliseconds for a 49 KB GET -- 400x slower than re-verifying from scratch. In-process caching is not optional for SLH-DSA. It is the only viable strategy. SLH-DSA exists as the conservative anchor in a three-family post-quantum stack, and its caching strategy reflects its role: low-frequency, high-assurance, verification-boolean-only.

Cache SLH-DSA verification at 31 nanoseconds. 1,511x smaller than caching full signatures.

brew install cachee PQ Key Size Guide