One Sketch, Six Tenants: The Multi-Tenant Cache Math

A shared cache fronting six independent SaaS tenants has one failure mode nobody wants to write about: the noisy neighbor. One tenant with a pathological workload — a flood of unique keys, a cold-start burst, an accidental cache-buster loop — poisons the admission counters for everyone else on the same instance. Their hot data gets evicted, their hit rates collapse, and the cache's eviction policy starts making bad decisions about keys it has no business comparing to. We fixed this with a single data-structural change: partition the admission sketch by tenant, give each tenant their own disjoint slice of the counter array, and let the memory footprint stay flat regardless of how many tenants you add. The result is that one 512 KiB count-min sketch now provides admission control for six (or sixty, or six hundred) independent tenants without any cross-pollination. This post is the math.

The problem is old and the fix is new. Shared cache instances have been a thing forever — every CDN, every SaaS backend, every white-label infrastructure product runs shared caches. The noisy-neighbor pathology is well-documented and usually handled by either (a) giving each tenant their own cache instance (expensive, doesn't scale), (b) applying per-tenant quota limits that only partially constrain the issue, or (c) accepting that a bad tenant degrades everyone and building alerting to catch it quickly. None of these are satisfying. We wanted a primitive that made the problem go away at the data structure level — no quotas, no per-tenant instances, no alerts — and we ended up building it inside Cachee's admission sketch.

The noisy neighbor problem, precisely

Let's describe the pathology exactly, because the shape of the fix depends on understanding exactly what's broken. A count-min sketch is a fixed-size table of counters used to approximate frequency counts for an unbounded key universe. In the CacheeLFU admission policy Cachee uses, the sketch answers one question on every eviction: how often has this candidate been accessed compared to the victim I'd have to evict to make room? If the candidate is hotter than the victim, admit. If colder, reject.

For a single-tenant cache, this works beautifully. The sketch accumulates access frequency for every key in the workload, and eviction decisions are informed by real per-key history. Cachee's default sketch has 65,536 cells per row, 4 rows, and stores 2-byte atomic counters — 512 KiB total. The collision probability is low enough that even large keyspaces get reasonable accuracy, and the fixed memory footprint is the whole point of a sketch.

Now put six tenants on the same cache. Every tenant's keys hash into the same 65,536-cell table. Tenant A writes a billion unique keys in a streaming workload. Every one of those keys increments cells in the shared table. Tenant B, meanwhile, has ten hot keys that are their entire working set. Before tenant A shows up, tenant B's hot keys have high frequency counts in their cells and win every admission comparison. After tenant A's flood, tenant B's hot-key cells now share values with whatever tenant A keys happened to collide with them — and because tenant A's flood added hundreds of thousands of increments across the entire table, tenant B's once-distinctive hot-key counts are now buried in the noise.

Concretely: tenant B's hot key used to have a cell value of 500 (reflecting 500 reads). After tenant A's flood, the same cell has a value of 8,423 (500 from tenant B plus 7,923 from tenant A keys that collided into the same cell). When tenant B tries to admit a new candidate, the sketch says the victim (hot key) has frequency 8,423. New candidates rarely beat that. The cache stops admitting legitimate tenant B traffic.

This is the noisy neighbor, and it's worse than the simpler per-key eviction pollution you might expect — because the admission sketch is what decides whether a candidate even gets a chance to be cached, not just which victim gets dropped. A poisoned admission sketch can turn a perfectly healthy cache into a stone wall for legitimate tenants without evicting a single key.

Why we didn't take the obvious fixes

The classical fixes for multi-tenant cache pathology are all available, and we rejected all of them for specific reasons before we started designing the new primitive.

One cache instance per tenant. This is the simplest thing that works. Every tenant gets their own Cachee, their own memory budget, their own admission sketch. The noisy neighbor can't cross instance boundaries because there are no shared instances. It works perfectly and it's how most multi-tenant SaaS deployments start. The reason we rejected it: memory doesn't compose. Six tenants means six 512 KiB sketches means 3 MiB of admission state. Sixty tenants means 30 MiB. Six hundred tenants means 300 MiB. White-label deployments and shared-infrastructure products routinely serve hundreds or thousands of tenants, and the "one cache per tenant" model turns linear cost into nonlinear cost precisely when you want to scale.

Per-tenant quota limits. You keep one shared cache but enforce that no single tenant can consume more than some fraction of the cells, eviction budget, or admission attempts per second. This partially works — it caps the worst-case damage from a noisy neighbor — but it doesn't prevent the pathology. Tenant A still pollutes tenant B's admission counters, just more slowly. And quotas are usually enforced at the application layer, which means the damage happens inside the cache before the quota kicks in. Plus, quota tuning is notoriously annoying: set it too low and legitimate tenants get throttled; set it too high and noisy tenants still cause outages.

Separate caches for separate tenant tiers. A compromise: group tenants by expected workload (hot, warm, cold) and give each tier its own cache. It reduces the blast radius but doesn't eliminate it — a noisy neighbor within the hot-tier group still poisons everyone else in that tier. And you've introduced a classification problem: which tier does a new tenant belong to? The classification is usually wrong for edge cases, and edge cases are exactly when the noisy neighbor pathology bites.

None of these fixes address the root cause. The root cause is that the sketch's cell space is shared, and shared resources get polluted by the noisiest user. The fix has to be at the data structure level: make the cells unshared, without multiplying memory.

Width partitioning: the primitive

Here's the fix, in one sentence: divide the sketch's total width across tenants so that each tenant owns a disjoint slice of the cell array, and hash keys only into the slice that belongs to the key's tenant.

The math is trivial and the implementation is a one-line change to the cell indexing function, but the consequences are worth walking through carefully. Start with the single-tenant version:

// Single-tenant sketch: hash fold into [0, width) per row
fn cell_index(row: usize, hash: u64, width_mask: usize) -> usize {
    row * (width_mask + 1) + ((hash as usize) & width_mask)
}

Now the multi-tenant version:

// Multi-tenant sketch: fold tenant id into [0, num_tenants),
// then fold hash into [0, per_tenant_width) *inside* that tenant's slice
fn cell_index(
    tenant: u32,
    row: usize,
    hash: u64,
    per_tenant_width: usize,
    total_width: usize,
    num_tenants: usize,
) -> usize {
    let tenant_slot = (tenant as usize) & (num_tenants - 1);
    let tenant_offset = tenant_slot * per_tenant_width;
    let cell_within = (hash as usize) & (per_tenant_width - 1);
    row * total_width + tenant_offset + cell_within
}

The difference is two multiplications and an addition. The total allocation is unchanged — we still have total_width × depth × 2 bytes of counter storage. What changes is that tenant A's hash can now only ever land in tenant A's slice. Tenant B's hash can only ever land in tenant B's slice. There is literally no physical cell that both tenants can touch.

This gives us cross-tenant isolation at the data structure level, for free, with zero runtime overhead. The cell-index computation is one extra multiply and add per hash operation, which is a handful of nanoseconds. You cannot measure it against the background noise of the rest of the cache hot path.

The memory invariance property

Here's the counterintuitive part. The memory footprint of the multi-tenant sketch is identical to a single-tenant sketch of the same total width. One 512 KiB sketch split across 64 tenants takes the same memory as one 512 KiB sketch serving a single tenant. More tenants doesn't mean more memory. This is the property that makes the primitive useful for shared-infrastructure products.

Compare to "one cache per tenant" math:

Architecture	1 tenant	6 tenants	64 tenants	640 tenants
One cache per tenant	512 KiB	3.0 MiB	32 MiB	320 MiB
Width-partitioned sketch	512 KiB	512 KiB	512 KiB	512 KiB

The partitioned sketch's memory footprint is a constant 512 KiB regardless of tenant count. The per-tenant width shrinks as tenants multiply — 65,536 cells per row for one tenant, 1,024 cells per row for 64 tenants, 102 cells per row for 640 tenants — but the total memory stays flat. This is why the primitive is the right answer for high-density multi-tenant deployments.

            The rule we ship under: memory footprint is decided at construction time, not at runtime. An operator sizes the sketch once, based on the largest tenant count they expect to serve, and then never thinks about the memory budget again. Adding a new tenant is free — it just means the existing sketch gets partitioned one finer slice.
        

The accuracy trade-off

Memory invariance doesn't come free. The trade-off is per-tenant accuracy, and it's worth being honest about exactly what degrades and by how much.

The count-min sketch error bound is standard: with width w cells per row and depth d rows, the per-estimate error is bounded by approximately e / w times the total number of operations, with confidence roughly 1 - exp(-d). For Cachee's default 65,536-wide, 4-deep single-tenant sketch, that's an error bound of roughly 4 × 10^-5 per operation with about 98% confidence. For a workload with a million operations, the typical overestimate per key is on the order of 40 — meaning a key observed once might be reported as frequency 40 due to collision noise.

When we partition 65,536 cells across 64 tenants, each tenant gets 1,024 cells per row. The per-tenant error bound becomes approximately e / 1024, or about 2.6 × 10^-3 per operation — roughly 60 times higher than the single-tenant case. That sounds alarming until you realize that the per-tenant operation count is also divided: instead of one tenant doing a million operations against 65,536 cells, each of 64 tenants does their own share against 1,024 cells.

For a typical multi-tenant workload where tenants are roughly comparable in size, the practical accuracy per tenant stays in the useful range. A tenant with 10,000 operations gets a typical overestimate of 26 — higher than the single-tenant case at the same op count, but still accurate enough for admission decisions. The key insight is that count-min sketches degrade gracefully under narrower widths, not catastrophically. You get coarser counts, not random noise.

The break-even point where width-partitioning stops being worth it is when per-tenant width drops below about 16 cells per row. At that point, the collision rate becomes too high and admission decisions become essentially random. Cachee's multi-tenant sketch constructor refuses to build a sketch with fewer than 16 cells per tenant — it panics at construction time with a clear error message — because a broken sketch is worse than no sketch at all.

Cross-tenant isolation: the proof

A design property is only real if you can test it. Here's the test we ship with the multi-tenant sketch — it's in src/admission/multi_tenant.rs tests, and it runs on every cargo test invocation:

#[test]
fn tenant_isolation_prevents_cross_pollution() {
    // Tenant 0 floods with 1000 hits on "hot". Tenant 1 should
    // NOT see any frequency for "hot" — their partitions are disjoint.
    let sketch = MultiTenantSketch::new(65536, 16);
    for _ in 0..1000 {
        sketch.record_for(0, "hot");
    }

    let tenant_0_freq = sketch.estimate_for(0, "hot");
    let tenant_1_freq = sketch.estimate_for(1, "hot");

    assert!(tenant_0_freq >= 1000,
        "tenant 0 should see at least 1000 hits, got {}",
        tenant_0_freq);
    assert_eq!(tenant_1_freq, 0,
        "tenant 1 must not see tenant 0's writes, got {}",
        tenant_1_freq);
}

This test is the isolation guarantee, turned into machine-checkable code. Tenant 0 records the key "hot" a thousand times. We then ask the sketch what frequency it observed for the same literal string key from tenant 1's perspective. The answer must be exactly zero. Not "low" — zero. If tenant 1 sees any frequency for a key that only tenant 0 wrote to, the partitioning is broken and the test fails. It passes deterministically on every run because the cell indexing function mathematically cannot produce an overlap — the two tenants operate on disjoint memory regions.

We also ship a test that verifies the memory invariance property:

#[test]
fn memory_footprint_independent_of_tenant_count() {
    let s1 = MultiTenantSketch::new(65536, 1);
    let s50 = MultiTenantSketch::new(65536, 64);
    assert_eq!(s1.total_cells(), s50.total_cells(),
        "total cells must match regardless of tenant count");
    let diff = s1.memory_bytes().abs_diff(s50.memory_bytes());
    assert!(diff < 64,
        "memory footprints should differ by at most struct alignment");
}

One sketch serves 1 tenant. Another serves 64 tenants. Both allocate the same 65,536 × 4 × 2 = 524,288 bytes of counter storage. The total cells match exactly. The only difference in memory footprint is a handful of bytes of struct metadata (a few extra usize fields tracking the partition parameters), which the test asserts is under 64 bytes. The memory invariance property holds at the byte level.

The Auth1 use case, concretely

This primitive was built for a specific production problem: caching session validations for six independent Auth1 tenants from a single shared Cachee instance. Auth1 is the authentication service that sits behind Cachee, H33, RevMine, Mirror1, L100, and BabyZilla. All six are independent SaaS products, all six use Auth1 for session tokens, and all six sit behind the same front-end infrastructure that validates those tokens on every authenticated request.

The question: can one Cachee instance cache session validations for all six tenants without any one tenant's traffic pattern polluting another tenant's admission decisions? Before the multi-tenant sketch, the answer was "yes, but carefully — you have to monitor for noisy-neighbor effects and potentially split the instance if a tenant misbehaves." After the multi-tenant sketch, the answer is "yes, unconditionally, by construction."

The production wire-up looks like this:

use cachee_core::admission::multi_tenant::MultiTenantSketch;
use std::sync::Arc;

// 65,536 total cells × 64 tenant slots → 1,024 cells per tenant × 4 rows
// × 2 bytes = 512 KiB total, isolated per-tenant.
let admission_sketch = Arc::new(MultiTenantSketch::new(65_536, 64));

// Stable tenant IDs for the Auth1 tenant set.
fn tenant_id_for(api_key: &str) -> u32 {
    match api_key {
        k if k.starts_with("auth1_pk_cachee_")     => 0,
        k if k.starts_with("auth1_pk_h33_")        => 1,
        k if k.starts_with("auth1_pk_revmine_")    => 2,
        k if k.starts_with("auth1_pk_mirror1_")    => 3,
        k if k.starts_with("auth1_pk_l100_")       => 4,
        k if k.starts_with("auth1_pk_babyzilla_")  => 5,
        _                                          => 63, // default bucket
    }
}

// In the session-validation middleware, bump the sketch with the
// tenant-scoped frequency for admission decisions:
admission_sketch.record_for(tenant_id_for(api_key), jwt_hash);
let freq = admission_sketch.estimate_for(tenant_id_for(api_key), jwt_hash);

Six real tenants plus a default bucket for anything unrecognized (to catch misconfigurations without crashing). Each tenant's session tokens get recorded into their own slice of the 65,536-cell width. A flood of token validations from Cachee's own traffic cannot affect H33's admission decisions. BabyZilla can onboard a million new users in an afternoon without displacing RevMine's hot-session state. The tenants share the same physical memory region and the same Cachee process, but their admission counters are guaranteed disjoint by the cell-indexing math.

Why partition by width and not by depth

If you've read count-min sketch literature, you might wonder why we partition along the width axis (cells per row) and not along the depth axis (number of rows). Partitioning by depth would give each tenant fewer hash rows but full-width cells, which is an arguably cleaner approach from an information-theoretic standpoint because it preserves the independence of the hash functions.

We chose width partitioning for three reasons, in order of importance.

Cache-line locality. The hot path of an admission sketch read or write touches four cells per operation (one per row). If those cells are far apart in memory, each one probably lives on a different cache line, which means four cache-line loads per operation. If the cells are within the same row's slice, they have better locality — adjacent cells in the same slice are more likely to fit on the same cache line. Width partitioning preserves this locality; depth partitioning would scatter a tenant's cells across rows.

Simpler indexing arithmetic. Width partitioning is a single multiply and add (row * total_width + tenant_offset + cell_within). Depth partitioning would require computing which subset of rows the tenant owns and dispatching hashes into those rows specifically, which is an extra branch on every operation. Branches are cheap but they're not free, and the cell-indexing function is on a path where every nanosecond is counted.

Power-of-two alignment. We require both total width and tenant count to be powers of two so we can use bitmask operations instead of modulo. Width partitioning keeps the per-tenant width as a power of two automatically (because total width and tenant count are both powers of two, their quotient is too). Depth partitioning would break this because dividing 4 rows by, say, 6 tenants is not a clean integer — you'd need overlap rules or tenant-to-row assignment tables, which is more state and more branches.

Width partitioning is mathematically equivalent to depth partitioning for the isolation property we care about, but it's computationally cheaper and simpler. The choice is the kind of detail that looks arbitrary until you profile both approaches and see the width-partitioned version hitting L1 cache more predictably.

What breaks if you scale past per-tenant width of 16

Every architectural choice has a limit, and the width-partitioned sketch's limit is the minimum per-tenant width. We ship with a hard lower bound of 16 cells per row per tenant — any construction request that would produce a smaller partition panics at startup with an explicit error message. This is worth documenting because operators sometimes try to push the sketch past its sane configuration and it's better to panic loudly than to silently degrade.

The math behind the bound: with 16 cells per row and 4 rows, the sketch has 64 total counters per tenant. A tenant running 1,000 operations will have, on average, 16 operations per cell — high enough collision density that the count-min minimum-across-rows estimate is no longer reliably accurate. Per-tenant admission decisions start to lose their signal. At 8 cells per row, the collision rate approaches one per operation and the sketch is effectively reporting noise. At 4 cells per row, there's no signal at all.

The 16-cell floor is our "this is when the primitive stops being useful" line, and we enforce it in the constructor so an accidental misconfiguration fails loudly instead of silently corrupting admission decisions. If you need more tenants than your total width supports at 16 cells per tenant, the answer is to increase total width (which costs memory proportionally) or to split your tenant base across multiple Cachee instances (which costs instance count proportionally). Both options are honest about the trade-off.

For the default 65,536-wide sketch, the hard ceiling is 4,096 tenants. At 4,096 tenants, each one gets 16 cells per row × 4 rows = 64 counters. This is almost certainly past the point where you'd want to be using a single shared sketch anyway — at that tenant density, you're probably running several sketches on several Cachee instances and load-balancing across them. But the number is there as a ceiling, and the primitive degrades gracefully until you hit it.

What this enables that wasn't possible before

The width-partitioned sketch unlocks a specific class of deployment: high-density shared infrastructure where many tenants share one Cachee instance without risking cross-pollination. This is the architecture behind white-label SaaS products, shared CDN edge caches, and multi-customer authentication services — exactly the category of product that wants to amortize infrastructure cost across a customer base while preserving per-customer isolation guarantees.

Before this primitive, the defensive choice was always "one cache per tenant" because nothing else gave you cross-tenant isolation at the admission layer. After this primitive, the defensive choice becomes "one shared sketch per instance, partitioned across tenants" — the same isolation guarantee, but with flat memory costs. A white-label product serving 200 customers can run one Cachee process with 200 slices of a 512 KiB sketch instead of 200 Cachee processes with 200 full sketches. The memory savings are linear in tenant count, which means they compound exactly where they matter: at scale.

The 512 KiB per 64 tenants number is the slide, and the test that proves it runs on every cargo test invocation of cachee-core. Both claims are load-bearing — the memory footprint is constant, and the isolation is machine-checkable. Put those two together and you have a primitive that changes the economics of multi-tenant caching for the class of products where a shared instance was previously unsafe.

This is the kind of change that doesn't show up in a benchmark number — it shows up in the deployment architecture you can choose. Before the primitive, white-label products had to choose between operational complexity (one cache per tenant, linear ops cost) and correctness risk (one shared cache, noisy neighbor hazard). After the primitive, they don't have to choose. One shared cache with width-partitioned admission is both operationally simple and correctness-complete. That's what ships in Cachee v1.0.

64 tenants in 512 kilobytes. By construction.

Cachee's multi-tenant admission sketch gives you cross-tenant isolation at the data structure level, with flat memory costs regardless of how many customers you serve. Shared infrastructure, without the noisy neighbor.

Start Free Trial Read the Integration Guide

One Sketch, Six Tenants: The Multi-Tenant Cache Math Behind Cachee's Shared Infrastructure