Cache PDF Previews at 31 Nanoseconds

April 18, 2026 | 9 min read | Engineering

Every SaaS product that handles documents has the same problem. Users upload PDFs -- contracts, invoices, quarterly reports, compliance filings, tax forms, insurance claims -- and expect to see a preview instantly. Not in two seconds. Not after a loading spinner. Instantly. The preview needs to render before the user's finger lifts from the trackpad.

The standard architecture for this is server-side rendering. The user uploads a PDF. Your backend runs a render pipeline -- Poppler, pdf.js on Node, Apache PDFBox, or a commercial engine like PSPDFKit -- to produce either a rasterized image (PNG/JPEG thumbnail) or rendered HTML that mirrors the document's layout. That rendered output is the preview. It gets stored, and every subsequent viewer receives the cached preview instead of waiting for a fresh render.

The rendering itself is expensive. A 10-page contract takes 200-800ms to render to HTML depending on complexity, fonts, and embedded images. A rasterized PNG thumbnail of the first page takes 50-150ms. You cannot afford to render on every view. Caching the preview is not optional. It is the entire architecture.

The question is where you cache it, and what that costs you in latency.

The Preview Payload Problem

A rendered PDF preview is not a small value. A first-page HTML preview of a typical business document -- with layout divs, positioned text spans, embedded SVG paths for vector graphics, and base64-encoded images for logos and signatures -- runs 100 to 300 KB. A rasterized JPEG thumbnail at 1200px width runs 40-120 KB. A multi-page preview (first three pages rendered to HTML) can hit 500 KB to 1 MB.

These are not session tokens. These are not feature flags. These are large binary or text payloads that need to be served to every viewer of the document, often within milliseconds of a click.

200 KB

Avg preview size

3.2ms

Redis GET at 200KB

31ns

Cachee L0 GET

When you store a 200 KB preview in Redis and fetch it with a GET, the latency is approximately 3.2 milliseconds within the same availability zone. That number comes from three components: the network round-trip (~0.3ms baseline), the serialization and TCP transfer of 200 KB of payload (~2.5ms), and the deserialization on the client side (~0.4ms). Cross-AZ, the same operation takes 8.5ms.

For a single viewer loading a single document, 3.2ms sounds acceptable. It is not. Here is why.

The Concurrency Math That Breaks Redis

Document management SaaS products do not serve one viewer at a time. A contract management platform has dozens or hundreds of users reviewing documents simultaneously. An invoice processing system has automated workflows pulling previews in parallel. A compliance platform has auditors opening 20 documents in rapid succession.

Consider a document management SaaS with 100 concurrent viewers requesting PDF previews at any given moment. Each request triggers a Redis GET of approximately 200 KB. At 3.2ms per GET, the Redis instance spends 320 milliseconds of cumulative processing time every second just serving PDF previews. That is 32% of a single Redis thread's capacity consumed by one feature.

Scale to 500 concurrent viewers -- a modest number for a B2B SaaS with 10,000 active users -- and Redis is spending 1.6 seconds per second on preview GETs alone. A single-threaded Redis instance cannot keep up. You either shard (adding operational complexity and cross-shard latency) or you accept that preview latency will degrade under load.

Redis Concurrency Wall for Document Previews

At 200 KB per preview and 3.2ms per GET, a single Redis instance saturates at approximately 312 preview GETs per second before latency starts climbing. At 500 concurrent viewers with 2 preview loads per minute each, you need ~17 GETs per second -- well within limit. But add thumbnail grids (10 documents visible per page, each needing a preview), and the same 500 users generate 170 GETs per second. Add multi-page previews at 500 KB each, and you are at the wall.

In-process caching eliminates this bottleneck entirely. A Cachee L0 GET is 31 nanoseconds regardless of value size. The same 500 concurrent viewers generating 170 GETs per second consume 5.27 microseconds of cumulative processing time per second. Not milliseconds. Microseconds. The cache layer is invisible in your performance profile.

The Document Preview Architecture

Let us walk through the complete lifecycle of a PDF preview in a well-architected document management system.

Step 1: Upload and Render

A user uploads a PDF. Your backend receives the file, stores it in object storage (S3, R2, GCS), and enqueues a render job. The render worker pulls the PDF, runs it through your rendering engine, and produces the preview -- a 200 KB HTML fragment representing the first page, or a 100 KB JPEG thumbnail, or both.

The render takes 200-800ms depending on document complexity. This is a one-time cost. Every subsequent viewer sees the cached result.

Step 2: Cache the Preview

The rendered preview is written to Cachee with a key derived from the document ID and render parameters:

# Cache the rendered HTML preview
cachee set "doc:preview:{doc_id}:html:page1" "$(cat rendered_preview.html)"

# Cache the JPEG thumbnail
cachee set "doc:preview:{doc_id}:thumb:1200w" "$(cat thumbnail.jpg | base64)"

# Both are served at 31ns on subsequent reads
cachee get "doc:preview:{doc_id}:html:page1"

The preview is now in L0 -- the hot tier. The first viewer after upload sees the freshly rendered preview at 31ns. The hundredth viewer sees it at 31ns. The ten-thousandth viewer sees it at 31ns. There is no degradation.

Step 3: Subsequent Viewers Hit Cache

When a user opens a document, the application checks Cachee for the preview. The lookup is a hash computation on the key string and a pointer dereference into the DashMap. The 200 KB HTML fragment is returned as a reference -- no copy, no serialization, no network transfer. The application streams it directly to the HTTP response.

For comparison, the Redis path would be: application serializes the key, sends it over TCP to Redis, Redis looks up the key, Redis serializes 200 KB of value data, Redis sends it over TCP, application receives and deserializes 200 KB. Six steps, 3.2ms. The in-process path is: hash the key, dereference the pointer. Two steps, 31ns. That is a 103,000x difference.

Step 4: Eviction and Tiering

Not every document is active. A contract signed six months ago rarely gets viewed. A quarterly report from two years ago is accessed once a year during audits. Keeping every preview in L0 is wasteful. This is where CacheeLFU admission control operates.

CacheeLFU tracks access frequency per key using a count-min sketch -- 512 KiB of constant memory regardless of how many documents exist in the system. Documents accessed frequently (active contracts, recent invoices, current reports) stay in L0 at 31ns. Documents accessed rarely get demoted to L1 at 59ns, then evict entirely. When an evicted document is accessed again, Cachee fetches from L2 (your Redis fallback or re-renders from the original PDF) and auto-promotes back to L0 if access frequency warrants it.

Working Set Math for Document Management

Let us size a real deployment. A mid-market document management SaaS with:

50,000 active documents in the system
200 KB average preview size (HTML render of first page)
Total working set: 10 GB (50,000 x 200 KB)

10 GB is too much for L0 on most application servers. You do not need all of it in L0. Document access follows a power law: 20% of documents receive 80% of views. The hot set is approximately 10,000 documents.

Tier	Documents	Memory	Read Latency	Content
L0 (hot)	10,000 (20%)	2 GB	31ns	Active contracts, recent invoices, current reports
L1 (warm)	15,000 (30%)	3 GB	59ns	Documents accessed in last 30 days
L2 (Redis fallback)	25,000 (50%)	5 GB (Redis)	3.2ms	Older documents, re-rendered on access

With this tiering, 80% of preview requests hit L0 at 31ns. Another 15% hit L1 at 59ns. Only 5% fall through to Redis at 3.2ms. The average latency across all preview requests is: (0.80 x 31ns) + (0.15 x 59ns) + (0.05 x 3,200,000ns) = 24.8ns + 8.85ns + 160,000ns = 160,034ns, or about 0.16ms. Compare to a pure Redis architecture where every request takes 3.2ms. That is a 20x improvement in average preview latency with only 2 GB of application memory dedicated to L0.

Multi-Page Previews and Thumbnail Grids

Document viewers often render more than one page. A user scrolling through a 30-page contract needs previews for each page as they scroll. A document list page shows thumbnail grids with 10-20 document thumbnails visible simultaneously.

Scroll-Ahead Rendering

When a user opens a document, you render the first three pages immediately and cache them. As the user scrolls, you render pages 4-6 in the background and cache them. Each page is a separate cache entry: doc:preview:{doc_id}:html:page{n}. The user scrolls into a pre-cached page at 31ns. They never wait for a render after the initial load.

Thumbnail Grids

A document list page showing 20 thumbnails requires 20 cache reads. With Redis, that is 20 x 1.5ms (thumbnails are smaller, ~80 KB each) = 30ms of cache latency just to render the list. With in-process Cachee, it is 20 x 31ns = 620 nanoseconds. The list renders before the browser finishes parsing the HTML response.

The Post-Quantum Angle: Signed Document Previews

Document management systems increasingly need to prove preview integrity. When a user views a contract preview, the system must guarantee that the preview matches the original PDF -- that no intermediary has altered the rendered output. This is especially critical in legal, healthcare, and financial services where document tampering has regulatory consequences.

The standard approach is to sign the preview with a digital signature. The server renders the PDF, hashes the rendered output, signs the hash, and stores the signature alongside the preview. When a viewer requests the preview, the application returns both the preview and the signature. The viewer's client verifies the signature before displaying the preview.

With classical signatures (Ed25519, ECDSA), the signature is 64 bytes. Negligible overhead on a 200 KB preview. With post-quantum signatures, the story changes dramatically.

ML-DSA-65 (Dilithium) produces a 3,309-byte signature. That adds 1.6% to the cached payload. Manageable. SLH-DSA-SHA2-128f (SPHINCS+) produces a 17,088-byte signature. That adds 8.5% to the cached payload -- the preview entry grows from 200 KB to 217 KB. At 10,000 documents in L0, the signature overhead alone consumes 163 MB of additional memory.

Signature Scheme	Sig Size	Preview + Sig	L0 Overhead (10K docs)	Redis GET Latency
Ed25519 (classical)	64 B	200.06 KB	0.6 MB	3.20ms
ML-DSA-65	3,309 B	203.2 KB	31.5 MB	3.25ms
SLH-DSA-128f	17,088 B	216.7 KB	163 MB	3.47ms
ML-DSA-65 + SLH-DSA-128f (hybrid)	20,397 B	219.9 KB	194 MB	3.52ms

The in-process latency remains unchanged. 31 nanoseconds whether the cached entry is 200 KB (unsigned), 203 KB (ML-DSA signed), or 220 KB (hybrid signed). The value size does not enter the latency equation. Redis, however, now takes 3.52ms per GET for the hybrid-signed preview -- an additional 320 microseconds per request compared to unsigned, purely from transferring the extra signature bytes over TCP.

Cache the Signature with the Preview

In-process caching makes it practical to store preview + signature as a single cache entry. No separate lookup for the signature. No second Redis GET. One 31ns read returns the preview and its PQ integrity proof together. At 194 MB of total signature overhead for 10K documents, the memory cost is trivial on a 16 GB application server. The alternative -- separate signature lookups or on-demand re-signing -- adds latency and complexity that in-process caching eliminates entirely.

Implementation: Tiered Document Preview Cache

Here is the architecture for a production document preview cache using Cachee with tiered eviction.

# Initialize Cachee with 2GB L0 budget for previews
cachee init --l0-memory 2G

# On document upload: render and cache
# (called from your render worker after PDF processing)
cachee set "doc:preview:inv-2026-0418:html:page1" \
    "$(cat /tmp/renders/inv-2026-0418-page1.html)" \
    --ttl 7d

# On document view: read from cache
# Returns in 31ns if in L0, auto-promotes from L2 if evicted
PREVIEW=$(cachee get "doc:preview:inv-2026-0418:html:page1")

# Cache with PQ signature (ML-DSA-65)
cachee set "doc:preview:inv-2026-0418:sig:mldsa65" \
    "$(cat /tmp/renders/inv-2026-0418-page1.sig)"

# Bulk thumbnail load for document list (20 docs)
cachee mget \
    "doc:preview:contract-001:thumb:400w" \
    "doc:preview:contract-002:thumb:400w" \
    "doc:preview:contract-003:thumb:400w" \
    # ... returns all 20 thumbnails in ~620ns total

# Check preview cache stats
cachee status --prefix "doc:preview:"

Cachee speaks RESP. Any Redis client library in any language -- redis-py, ioredis, Lettuce, go-redis -- connects to localhost:6380 and works without code changes. Your existing Redis-based preview cache migrates by changing the connection string. The application code stays the same. The latency drops from 3.2ms to 31ns.

When to Re-Render vs. Cache Forever

PDF previews are immutable for a given document version. If the source PDF does not change, the preview does not change. Set a long TTL (7-30 days) and let CacheeLFU handle eviction based on access frequency, not time. When a user uploads a new version of a document, invalidate the old preview key and render the new version.

The exception is dynamic previews that include viewer-specific annotations, highlights, or redactions. These are per-viewer and should be cached with a composite key: doc:preview:{doc_id}:v{version}:user:{user_id}:page{n}. The working set grows linearly with active viewers, but CacheeLFU automatically evicts low-frequency viewer-specific previews while keeping the base preview hot.

The Bottom Line

Document preview caching is a large-value problem masquerading as a simple caching problem. At 200 KB per preview, the value size exceeds the payload threshold where network caches perform well. Redis adds 3.2ms per GET -- more than most database queries. For a document-centric SaaS, this latency appears on every page load, every scroll event, every thumbnail grid render.

In-process caching at 31ns eliminates the problem. CacheeLFU tiered eviction keeps 2 GB of hot previews in L0 and lets the long tail fall through to Redis or re-render. Post-quantum document signatures add 17-20 KB per preview, which changes nothing for in-process latency but adds measurable overhead to every Redis GET.

Your users open documents and see previews before they finish clicking. That is the experience you are building. A 3.2ms cache hit does not deliver it. A 31ns cache hit does.

Serve PDF previews at 31ns. Drop your document viewer latency by 103,000x.

Install Cachee PQ Document Signing Guide