How We Made Session Validation 30 Nanoseconds

Every authenticated request to a production API has to answer the same question: is this session still valid? For most services, that question costs 15 to 40 milliseconds of round-trip latency per request because the answer lives in an auth database. We wired Cachee's L0 hot tier in front of Auth1's session validation endpoint and turned that 15-40 millisecond lookup into a 30 nanosecond atomic read on cache hit, while keeping every Auth1 semantic intact — revocation, refresh, logout, sliding expiry. This post is how we built it, what we measured, and the production code that's running in front of every Auth1 tenant today.

The mechanism is simple: hash the JWT, look it up in Cachee, return the cached session on hit, fall through to Auth1 on miss, populate the cache with the remaining session TTL. What's not simple is getting every detail right — the key format, the serialization, the TTL tuning, the eviction policy, the invalidation-on-logout path, and the failure mode that you don't cache. This is the kind of integration that looks like ten lines of middleware code until you actually ship it, and then you spend a week discovering all the edge cases.

The 59x math that started the project

Auth1 runs as a shared service behind the full Cachee-H33-RevMine-Mirror1-L100-BabyZilla tenant set. At steady state across all tenants, Auth1 was handling around ten thousand session validations per second against its backing database. Each validation took 15 to 40 milliseconds depending on query cache state, connection pool contention, and cross-AZ hop variance. The database was doing fine on load, but every front-end API server was paying a latency tax on every authenticated request — and latency taxes compound when your users care about end-to-end response time.

The math that sold the project was one line on a whiteboard: at 10,000 requests per second with 60-second session tokens and ~170 unique active sessions at any moment, a validation cache with TTL matching the token expiry would handle ~170 Auth1 calls per second instead of 10,000. That's a 59x reduction in Auth1 load, with zero change to application semantics and zero compromise on security. The reason it works: the same user makes many requests per session, and the same session can be validated once and reused until it expires.

Before we started, we wrote down the explicit success criteria so we wouldn't spend weeks chasing tail optimizations that didn't matter. Three numbers, all measurable from production traffic:

P50 hit latency under 1 microsecond — the session cache must disappear into the critical path. Nobody should notice it's there on the happy path.
P99 hit rate above 95% in steady state — anything lower means the cache isn't doing its job and we've added complexity without reward.
Zero increase in successful attack surface — no way for a compromised or expired token to bypass Auth1 via a stale cache entry. Revocation must propagate correctly.

All three targets are now in steady-state production. This post walks through how each one was achieved and what trade-offs we made along the way.

Why Cachee in front of Auth1 instead of another approach

Three alternatives were on the whiteboard. Each was rejected for a specific reason worth documenting.

Option A: make Auth1 faster. This sounds obvious and is always wrong. You can shave two or three milliseconds off a database-backed validation, maybe five if you add a Redis cluster in front of the auth service, but you don't get from 15 milliseconds to 30 nanoseconds with database engineering. The physics of network round-trips stops you at around 100 microseconds best case — a thousand times slower than an in-process cache. This is the same lesson Cachee started with: if you can avoid the network, the network is always the expensive part.

Option B: put Redis in front of Auth1. Closer but still wrong. ElastiCache adds 300-400 microseconds per round trip in the best case, and an application server talking to it still pays connection pool checkout, serialization, kernel network stack traversal, and response parse time. A cached Redis read is about ten times faster than a database validation, which is nice, but it's still four orders of magnitude slower than an in-process memory load. And the operational burden is not zero: you're maintaining a cluster, watching its failover behavior, and paying for it.

Option C: put Cachee in front of Auth1. Cachee is an in-process Rust cache engine that we built for exactly this class of problem. The L0 hot tier is a sharded RwLock<HashMap<u64, entry>> that reads at roughly 30 nanoseconds on Graviton4 hardware under normal load. There's no network, no serialization, no connection pool. The validated session lives in the same address space as the API handler, and accessing it is a single atomic load followed by a refcount bump on the bytes::Bytes clone. This is the option we picked.

            The asymmetry that makes this work: session validation is read-heavy, not write-heavy. A token is issued once and read hundreds or thousands of times over its lifetime. Cachee is built for exactly this shape — the cost of populating the cache is amortized over every subsequent read, and the populate cost only happens once per token per session, not once per request.
        

The architecture

The integration lives entirely in an axum middleware between the client request and the application handler. Every request that arrives with a Bearer token gets routed through one function that either returns a cached session immediately or falls through to Auth1 for a fresh validation. On successful validation, the session is stored in Cachee with a TTL matching Auth1's reported expires_in — so the cache entry expires exactly when the token would have become invalid anyway.

The complete flow, top to bottom:

Extract the JWT. Parse the Authorization: Bearer <jwt> header. Reject early on any malformed or missing header — no cache lookup required.
Derive the cache key. SHA-256 the JWT and take the first 16 bytes. Prefix with session: for namespace discipline. This is the key we'll use for all Cachee operations on this token.
Check Cachee. Call cache.get(&key). On a hit, deserialize the cached session bytes via bincode, attach the session to the request as an axum extension, and call next.run(request). Done. Total added latency: roughly 80 nanoseconds end-to-end.
On miss, validate with Auth1. POST to /api/auth/session with the bearer token. Parse the response. On a valid session, continue to step 5. On an invalid session (401 or malformed response), reject the request immediately and do not cache the failure.
Populate Cachee. Compute the TTL from Auth1's expires_in field. Serialize the session with bincode. Call cache.set(key, bytes, Some(ttl)). The entry lives in Cachee until the token expires or someone explicitly logs out.
Attach to request and continue. Same as step 3.

That's the happy path. The revocation and logout paths are just as important, and we'll get to them in a moment. First, let's look at the middleware code.

The middleware

Here's the core function that runs on every request. This is running in production today, in a service behind Cachee's own front door.

use axum::{
    extract::{Request, State},
    http::StatusCode,
    middleware::Next,
    response::Response,
};
use bytes::Bytes;
use sha2::{Digest, Sha256};
use std::sync::Arc;

#[derive(Clone)]
pub struct AuthState {
    pub cache: Arc<cachee_core::CacheeEngine>,
    pub auth1: Arc<Auth1Client>,
}

fn token_key(jwt: &str) -> String {
    let hash = Sha256::digest(jwt.as_bytes());
    format!("session:{}", hex::encode(&hash[..16]))
}

pub async fn auth1_middleware(
    State(state): State<AuthState>,
    mut request: Request,
    next: Next,
) -> Result<Response, StatusCode> {
    let jwt = request
        .headers()
        .get("authorization")
        .and_then(|h| h.to_str().ok())
        .and_then(|s| s.strip_prefix("Bearer "))
        .ok_or(StatusCode::UNAUTHORIZED)?;

    let cache_key = token_key(jwt);

    // Fast path: Cachee lookup
    if let Some((bytes, _level)) = state.cache.get(&cache_key) {
        if let Ok(session) = bincode::deserialize::<Auth1Session>(&bytes) {
            request.extensions_mut().insert(session);
            return Ok(next.run(request).await);
        }
    }

    // Slow path: hit Auth1 for validation
    let session = state
        .auth1
        .verify_session(jwt)
        .await
        .map_err(|_| StatusCode::UNAUTHORIZED)?;

    // Populate Cachee with the remaining TTL from Auth1
    let ttl_seconds = session.expires_in.unwrap_or(900) as u32;
    let bytes = bincode::serialize(&session).unwrap_or_default();
    state.cache.set(cache_key, Bytes::from(bytes), Some(ttl_seconds));

    request.extensions_mut().insert(session);
    Ok(next.run(request).await)
}

This is the whole integration on the request side. Twenty-five lines of Rust. Every detail in here was chosen for a reason, and the reasons are more interesting than the code itself.

Why SHA-256 the JWT before using it as a key

The first thing that jumps out if you're reading this code carefully is that we don't use the JWT directly as the cache key. Instead, we hash it with SHA-256 and take the first 16 bytes. Three reasons, in order of how much they cost us.

Log safety. Cache keys show up in debug logs. If someone enables verbose tracing in production for a hot bug, the logs are going to spill cache keys into your log aggregator. A hashed key is useless to anyone who happens to read those logs — you can't reverse it into a usable credential. Raw JWTs in logs are a credential leak waiting to happen. This alone is reason enough.

Length bound. JWTs can be 500 bytes to 4 KB depending on the claims they carry. A Cachee key needs to live in the DashMap's shard as a String, and a 4 KB key multiplies your cache's memory footprint by the total active session count. A 32-byte SHA-256 digest (or 16 bytes if we truncate, which is safe for this use case) bounds the memory regardless of what Auth1 decides to put in the token.

Consistent key shape. Downstream tooling — Prometheus exporters, debug dumps, operational scripts — doesn't need to handle variable-length keys. Every Cachee entry for this cache has the same key format. That's valuable when you're writing the tenth operational script and you don't want to rediscover the key format every time.

The first 16 bytes of a SHA-256 hash is a 128-bit value, which gives us a 2^64 collision resistance against preimage attacks — enough for a cache key that's never treated as authoritative. The actual session validation is still done by Auth1 on miss, so even a theoretical collision can't forge a valid session.

Why bincode serialization and not JSON

The cached session is stored as bincode-serialized bytes, not JSON. This is a performance choice that matters more than it looks like it should.

JSON serialization of a typical session object takes 800-2000 nanoseconds on the kind of CPU a production API server runs on. Deserialization is similar. For a request that's serving a 100-microsecond handler, spending 4 microseconds round-tripping the session through JSON is noise, but spending that on every cached lookup defeats the point of having a cache in the first place. Bincode does the same thing in 50-200 nanoseconds because it's a binary format that mirrors the in-memory layout of the struct with minimal framing. We pay a one-time bincode cost on cache population and a small bincode cost on every hit, and the numbers add up in our favor.

The other reason is that bincode gives us deterministic byte-level comparison. If two cached sessions serialize to the same bytes, they're semantically identical. This matters for cache sharding, invalidation, and debug tooling that hashes cache entries for comparison. JSON doesn't give you that — key ordering and whitespace can vary, and you end up with the same logical object serializing to different bytes in different contexts.

Why the TTL matches Auth1's expires_in

When we populate the cache on validation success, we use the session's expires_in field as the TTL, not a hardcoded number. This is the single most important line in the integration.

Consider two alternatives. If we use a short hardcoded TTL like 60 seconds, we'd re-validate every session every minute regardless of how long it had left to live. That's fine for security (we'd always pick up revocations within a minute) but it gives us a lower hit rate and more Auth1 load than necessary. If we use a long hardcoded TTL like 24 hours, we'd cache sessions past their natural expiry, which means a request could succeed against Cachee with a token that Auth1 would reject. That's a security regression.

The right answer is: the cache entry lives exactly as long as the token is valid, and not one millisecond longer. When Auth1 tells us a token has 847 seconds left, we set TTL to 847 seconds. When that entry expires in Cachee, the next request for the same token falls through to Auth1, which will either issue a fresh session (refresh flow) or return 401 (expired). Either way, the behavior matches what Auth1 alone would have done.

The subtle benefit is that tokens approaching expiry automatically spend less time in Cachee, which means a burst of near-expired traffic naturally flows back to Auth1 without any explicit handling. You get graceful degradation for free.

The logout path

Caching creates one explicit risk: if a user logs out, the Cachee entry still says "this token is valid" until its TTL naturally expires. The time window between logout and TTL expiry is a valid-looking stale-hit window. In most cases it's a few minutes long and the user has already left the building, but if you're serving a bank or a medical records system, a few minutes of post-logout validity is not OK.

The fix is a two-step logout handler: revoke at Auth1, then purge the local Cachee entry.

pub async fn logout_handler(
    State(state): State<AuthState>,
    headers: axum::http::HeaderMap,
) -> Result<Response, StatusCode> {
    let jwt = headers
        .get("authorization")
        .and_then(|h| h.to_str().ok())
        .and_then(|s| s.strip_prefix("Bearer "))
        .ok_or(StatusCode::UNAUTHORIZED)?;

    // 1. Revoke at Auth1
    state.auth1.logout(jwt).await.ok();

    // 2. Purge the Cachee entry so the next request can't hit a stale cache
    let key = token_key(jwt);
    state.cache.delete(&key);

    Ok(Response::new("logged out".into()))
}

Both steps matter. The Auth1 revocation is the authoritative source-of-truth change — it's what guarantees that if the token is used against any other Cachee instance (or against Auth1 directly) it will be rejected. The Cachee delete is the local optimization — it guarantees that this instance of the service cannot return a stale validation for a token the user just revoked.

What about distributed invalidation across multiple API server instances? Each instance has its own Cachee. If you have ten API servers and a user logs out from one of them, the other nine still have cached sessions. This is where the TTL-bounded nature of Cachee entries matters: the worst-case post-logout validity window is the token's remaining TTL. For typical 15-minute session tokens, the worst case is about 15 minutes across instances that didn't see the logout directly. For most products, that's fine. For products where it isn't fine, the standard fix is a small distributed invalidation channel — publish logout events to a shared stream, subscribe to it on each Cachee-fronted instance, and call cache.delete in response. We don't do this in the base integration because the complexity is only justified when the TTL-bounded window isn't acceptable, and in most production apps it is.

Why we don't cache failures

When Auth1 returns a 401 on validation, the middleware propagates the 401 to the client and does not cache the failure. This is deliberate and it's worth unpacking why.

The naive optimization is: if Auth1 says this token is invalid, cache that negative result so subsequent requests for the same bad token don't have to hit Auth1. Cheap to implement, helpful against brute-force token flooding. Sounds obviously good.

It isn't. Here's the attack: an adversary floods your service with randomized bogus tokens. Each one misses Cachee, hits Auth1, returns 401, and then (in the naive negative-cache model) gets stored as a negative entry in Cachee. The adversary can burn through your cache's capacity in seconds, evicting legitimate active sessions to make room for garbage negative entries. Your hit rate collapses, your Auth1 load spikes, and your legitimate users experience the 15-40 millisecond tax on every request while your cache is polluted.

The fix is what Cachee already does for you: the bloom filter. When a token is seen for the first time, Cachee's bloom filter returns possibly present or definitely not present. For a genuinely novel bogus token, the bloom filter typically says "definitely not present" and the request falls through to Auth1 at essentially zero cost. For a repeat lookup of the same bogus token within a small window, the bloom filter says "possibly present" and the main cache returns a miss (because we didn't cache the failure), so we still fall through to Auth1 — but the Auth1 failure path is fast, because the auth service has its own caches. The end result: flood attacks burn attacker bandwidth, not your cache capacity. No explicit negative caching required.

The numbers

Here's what we measured in production on the Auth1 tenant that handles the bulk of Cachee.ai's own user traffic. All numbers are from a 15-minute observation window during normal weekday traffic — nothing cherry-picked.

Metric	Value
Requests/sec (peak during window)	8,420
Unique active sessions	1,203
Cachee hit rate (steady state)	99.41%
L0 hit rate (fraction of all hits)	91.7%
L1 hit rate (fraction of all hits)	8.3%
P50 Cachee hit latency	~30 ns
P99 Cachee hit latency	~80 ns
Auth1 calls/sec (miss path)	~50
Auth1 reduction vs no-cache	168x
Cachee admission rejection rate	0.03%

The 168x reduction is higher than the 59x back-of-envelope estimate because real traffic has more per-session repeat requests than the whiteboard model assumed. The more a user interacts with your API in a session, the higher the reduction — and modern apps with live dashboards, websocket pings, and polling endpoints drive this number up aggressively.

            The number that matters for operations: cachee_admission_rejection_rate at 0.03%. Cachee's CacheeLFU admission sketch is doing essentially nothing under this workload — the cache is large enough that no eviction pressure exists for active sessions. When it does fire, it's protecting hot sessions from cold one-visit tokens, which is exactly what a bloom-filtered front-end should do. We monitor this metric in Grafana because an anomalous spike is a leading indicator of attack traffic or cache sizing problems.
        

Prometheus metrics

Cachee exposes a built-in prometheus_metrics() method that emits standard exposition format in O(1) time — every field is a single atomic load, no iteration of the cache, no lock acquisition. We wire it into an axum route and let the metrics pipeline scrape it every 15 seconds:

use axum::{routing::get, Router};

async fn metrics_handler(State(state): State<AuthState>) -> String {
    state.cache.prometheus_metrics()
}

let app = Router::new()
    .route("/metrics", get(metrics_handler))
    .with_state(state);

The resulting metrics let us build an Auth1 + Cachee dashboard with the numbers that matter for an integration like this: overall hit rate, L0 share of hits, admission rejection rate, memory footprint, and the count of expired entries being reaped proactively by the background sweeper. Anomalies in any of these metrics are early warnings for real operational problems.

The background sweeper

Cachee's base expiry model is lazy: expired entries are removed when somebody tries to read them. That's fine for hot keys that get read constantly, but it leaves long-lived cold entries taking up space until eviction pressure fires. For a session cache, that means tokens from users who logged out an hour ago but never made another request are still taking up cache slots.

The fix is Cachee's BackgroundSweeper helper. It spawns a dedicated std::thread that calls sweep_expired(sample_size) on a timer. Drop the handle to stop the thread cleanly. The engine itself doesn't own any threads; the helper is the explicit opt-in.

use cachee_core::BackgroundSweeper;
use std::time::Duration;

let _sweeper = BackgroundSweeper::start(
    Arc::clone(&state.cache),
    Duration::from_secs(30),
    1000, // Sweep up to 1000 entries per pass
);

With a 30-second sweep interval and a 1000-entry sample per pass, we reap expired sessions within about a minute of their TTL firing, without any lock contention with the hot path. The sweep runs in the background, scans a bounded sample of the cache, collects expired keys, and removes them after the iterator is dropped. Memory stays tight under low-traffic conditions, and the hot path pays zero cost for the sweeper's existence.

What this gets you in production

The integration ships as a single axum middleware (or an Express middleware if you're in JavaScript, or a request hook in whatever framework you use). Total net-new code in the application is maybe fifty lines. Total net-new infrastructure is zero — Cachee lives in-process, so there's no new service to deploy, no new cluster to maintain, no new cost line in your AWS bill.

What you get in exchange: a 99% hit rate on session validations, sub-microsecond p99 cached lookups, and a 50-170x reduction in traffic to your auth service. Your API response times get visibly faster for authenticated users. Your auth service runs cooler. Your observability dashboard has a new metric that nobody on your team ever has to explain because it's just the number showing how much Cachee is saving you every second.

This is the pattern we're running in production today on every Auth1 tenant. It's open-source, runs on any hardware that can compile Rust, and the benchmark numbers hold up on everything from an M4 Max laptop to bare-metal Graviton4. If you're running Auth1 and you haven't added Cachee in front yet, you're leaving the 99% hit rate on the table.

30 nanosecond session validation. In-process. Lock-free.

Cachee's Rust-native engine ships with CacheeLFU admission, a lock-free L0 hot tier, and built-in Prometheus telemetry. Wire it in front of your auth service and cut 99% of your validation load in one afternoon.

Start Free Trial Read the Integration Guide