How Predictive Caching Works | 31ns Hits, 99% Hit Rate

Architecture

Request Lifecycle: Before vs After

Watch how a data request travels through your stack. Every hop adds latency you are paying for. Then see what happens when Cachee intercepts the chain.

👤

User Request

0ms

🌐

API Gateway

2.5ms

App Server

5ms

🔴

Redis Cache

12ms

Cache Miss → DB

25ms

Response

3ms

Total Request Latency

47.5ms

6 hops · 2 network round-trips · 1 database query

👤

User Request

0ms

🌐

API Gateway

0.5ms

Cachee L1

0.001ms

Response

0.3ms

🔴

Redis

skipped

Database

skipped

Total Request Latency

0.801ms

3 hops · 0 database queries · 95% served from L1 · 59x faster

👤

User Request

0ms

AI Pre-Fetched

0.001ms

Instant Response

~0ms

🌐

Gateway

bypassed

🔴

Redis

bypassed

Database

bypassed

Total Request Latency

0.001ms

Data pre-fetched by AI · Already in L1 before request arrives · 47,500x faster

Auto-cycling views

The Pipeline

Four Steps. Sub-Millisecond.

Every request that hits Cachee passes through a four-stage pipeline. Each stage is optimized for high-performance execution. The entire pipeline completes before most systems finish a single network hop.

01

AI Prediction

ML models analyze access patterns in real-time, predicting which data your application will request next. Models train continuously on your traffic, reaching 95%+ accuracy within hours.

~50ns prediction

02

Tiered Storage

Hot data lives in L1 (sub-microsecond). Warm data in L2 (single-digit microsecond). Cold data in L3 (sub-millisecond). AI manages promotion and eviction across all tiers automatically.

31ns L1 hit

03

Consistency Engine

Write-through invalidation with causal ordering ensures stale data is never served. 31ns propagation across all cache tiers. Distributed consistency protocol for distributed deployments.

<1µs propagation

04

Adaptive Tuning

The system continuously optimizes itself. Cache sizes, eviction policies, TTLs, and prefetch aggressiveness are all adjusted in real-time based on workload characteristics. Zero manual tuning required.

Continuous

Try It

See It Running Live

Deploy Cachee in your environment in minutes. Our CLI handles configuration, connection, and optimization automatically.

$ npm install -g @cachee/cli

$ cachee init --project my-app

Detecting infrastructure... connected ✓

Generating config... done

$ cachee deploy --watch

Deploying Cachee overlay...

L1 cache initialized (optimized)

AI model training started on live traffic

Status: ACTIVE | Hit rate: 87% (warming) | Latency: 3.2ns

Status: OPTIMIZED | Hit rate: 95%+ | Latency: sub-microsecond

Origin load reduced by 94.7% | Est. savings: $2,847/mo

Open Full Demo Start Free Trial

Integration

Three Ways to Deploy

Don't rip out your Redis stack. Every integration model wraps your existing infrastructure and makes it dramatically faster — in under an hour.

Managed Cloud

Your app

→ SDK

Cachee

0.46ms

Your DB

3–12ms

Best for: new projects, teams with no infra ops

Sidecar Container

Your app

localhost

Sidecar

31ns

Your DB

3–12ms

Best for: latency-critical apps, Docker/K8s shops

Self-Hosted

Your app

→ Agent

Your infra

~0.01ms

Cachee AI

cloud ctrl

Best for: regulated industries, air-gapped, enterprise

What changes in your stack

Pick the model that fits how you deploy.

Every model wraps your existing infrastructure. Nothing gets ripped out.

Managed Cloud

BEFORE

Your App

↓

Redis / DB

AFTER

Your App

↓

Cachee SDK (new)

↓ L1 miss only

Redis / DB (unchanged)

Sidecar Container

BEFORE

Your App

↓ network hop

Redis (remote)

AFTER

Your App (no code change)

↓ localhost

Cachee Sidecar (new container)

↓ L1 miss only

Redis / DB (unchanged)

Self-Hosted

BEFORE

Your App

↓

Your Cache Infra

AFTER

Your App (no code change)

↓

Cachee Agent (on your infra)

↓ cache

↗ metrics only

Your Infra

Cachee AI

Choose your integration model

Recommended

Managed Cloud

Cachee provisions and runs your cache infrastructure. You get an API key. Point our SDK at it. Done.

< 1hr

from signup to first cache hit

Zero infrastructure to manage — we run it
SDK available for Node.js, Python, Go, and Java
Automatic scaling, patching, and failover
99.99% SLA with global redundancy
Works alongside your existing Redis — no migration

Lowest Latency

Sidecar Container

Deploy a Cachee agent container alongside your app. Cache calls go over localhost — no network hop.

31ns

p99 cache hit latency

Redis-protocol compatible — change one line in your config
One Docker image, one env var — that's the full setup
Works in Docker Compose and Kubernetes pod specs
AI optimization runs in cloud; data never leaves your host
Required for Growth tier and above

Enterprise

Self-Hosted

Run the Cachee agent on your own bare-metal or VPC. Cachee's AI connects via the control plane — your data never leaves.

Air-gapped

data sovereignty, your rules

Full data isolation — cache data stays on your infra
Deploy in regulated environments — data never leaves your infra
Control plane handles AI decisions, you handle the hardware
Bring your own cloud account (AWS, GCP, Azure)
Dedicated Cachee solutions engineer during onboarding

Overlay

Already using ElastiCache, CloudFlare KV, or Redis Cloud? Cachee sits in front as an L1 acceleration layer — zero code changes.

10-50x

faster cache hits vs origin provider

Works with ElastiCache, CloudFlare KV, Redis Cloud, Azure, GCP, Upstash
Swap one connection string — your app connects to Cachee instead
L1 serves hot data in ~10us, misses forwarded to your existing backend
Reduces API calls and costs to your underlying cache provider
No migration — your existing cache stays as the durable L2 backend

Managed Cloud — Setup Guide

Avg setup: 18 minutes

Node.js quickstart

bash

npm install @cachee/sdk

javascript

import { CacheeClient } from '@cachee/sdk'

const cache = new CacheeClient({
  apiKey: process.env.CACHEE_API_KEY,
  region: 'auto',        // nearest edge
  fallback: 'local',     // in-memory if offline
  timeout: 2000
})

// Set a key
await cache.set('user:1234', userData, { ttl: 300 })

// Get a key
const user = await cache.get('user:1234')

// Batch set
await cache.mset({ 'a': 1, 'b': 2, 'c': 3 })

python

from cachee import CacheeClient

cache = CacheeClient(api_key="your_key")

await cache.set("order:99", order_data, ttl=60)
result = await cache.get("order:99")

Steps to go live

1

Create your free account

Sign up at cachee.ai/start — no credit card required. Your account is active immediately.

2 min

2

Copy your API key

Your API key is generated on first login. Copy it to your environment variables as CACHEE_API_KEY.

1 min

3

Install the SDK

Run npm install @cachee/sdk or pip install cachee. Import the client and initialize it with your key.

5 min

4

Make your first cache call

Call cache.set() and cache.get() wherever you were calling Redis. The API is deliberately identical.

10 min

5

Watch the dashboard populate

Your hit rate, latency, and request volume appear in the dashboard within 60 seconds of your first call.

instant

Sidecar Container — Setup Guide

Avg setup: 12 minutes

Docker Compose

docker-compose.yml

services:
  your-app:
    image: your-app:latest
    environment:
      # Point your Redis client here instead:
      REDIS_HOST: cachee-sidecar
      REDIS_PORT: "6379"

  cachee-sidecar:
    image: cacheeai/sidecar:latest
    environment:
      CACHEE_API_KEY: ${CACHEE_API_KEY}
    # No ports exposed — localhost only

kubernetes (pod spec)

containers:
  - name: your-app
    image: your-app:latest
    env:
      - name: REDIS_HOST
        value: "localhost"
      - name: REDIS_PORT
        value: "6379"

  - name: cachee-sidecar
    image: cacheeai/sidecar:latest
    env:
      - name: CACHEE_API_KEY
        valueFrom:
          secretKeyRef:
            name: cachee-secrets
            key: api-key

How the sidecar works

1

Pull the image

docker pull cacheeai/sidecar:latest — it's under 50MB. No root, no surprise dependencies.

1 min

2

Add it to your compose or pod spec

Paste the 5-line snippet alongside your existing app container. Set your CACHEE_API_KEY env var.

3 min

3

Change one line in your app config

Point REDIS_HOST to cachee-sidecar (Compose) or localhost (Kubernetes). Your existing Redis client works without any code changes.

2 min

4

The sidecar handles everything else

On startup it authenticates to the Cachee control plane, downloads your configuration, and begins serving the Redis protocol on port 6379.

auto

5

Supported Redis commands

SET, GET, DEL, EXISTS, MGET, MSET, EXPIRE, TTL. Any unsupported command returns a clear error — never a silent hang.

Self-Hosted — Setup Guide

Avg setup: 45 min with dedicated engineer

Connect your infrastructure

bash — generate connection token

# From the Cachee dashboard → Self-Hosted → New Token
# Token expires in 24 hours, single-use

CACHEE_CONNECT_TOKEN="ct_live_xxxxxxxxxxxx"

# Run the agent on your infrastructure
docker run -d \
  -e CACHEE_CONNECT_TOKEN=$CACHEE_CONNECT_TOKEN \
  -e CACHEE_REGION="us-east-1" \
  -p 6379:6379 \
  cacheeai/agent:latest

verify connection

# Within 60 seconds the dashboard shows CONNECTED.
# The connect token is automatically invalidated.

# Test the connection:
redis-cli -h localhost -p 6379 PING
# → PONG (served by Cachee agent)

What stays where

1

Your data never leaves your infrastructure

All cache data — keys, values, TTLs — lives entirely on your hardware or VPC. Cachee has zero access to cache contents.

2

What the control plane does receive

Operational metrics only: hit rate, latency percentiles, memory utilization, request count. No keys, no values, ever.

3

AI optimization runs on telemetry, not data

Cachee's AI models analyze usage patterns from operational metrics and push eviction and pre-warming decisions back to your agent.

4

Works in air-gapped environments

The agent can operate in restricted-egress networks. Configure a proxy for control plane sync if direct outbound is not allowed.

5

Dedicated onboarding engineer included

All self-hosted accounts are assigned a Cachee solutions engineer who runs the first deployment with your team live.

Overlay — Setup Guide

Avg setup: 10 minutes

Deploy Cachee in front of your existing cache

docker-compose.yml

services:
  your-app:
    image: your-app:latest
    environment:
      # Point at Cachee instead of your old cache:
      REDIS_HOST: cachee-overlay
      REDIS_PORT: "6379"

  cachee-overlay:
    image: cacheeai/proxy:latest
    environment:
      CACHEE_API_KEY: ${CACHEE_API_KEY}
      # Your existing cache becomes the L2 backend:
      UPSTREAM: ${YOUR_EXISTING_CACHE_ENDPOINT}
    ports:
      - "6379:6379"

supported backends

# ElastiCache / Redis Cloud / Azure / GCP / Upstash:
UPSTREAM=redis://your-elasticache.abc.cache.amazonaws.com:6379

# CloudFlare Workers KV (HTTP adapter):
UPSTREAM=cloudflare://ACCOUNT_ID/NAMESPACE_ID
UPSTREAM_CF_TOKEN=${CF_API_TOKEN}

How the overlay works

1

Deploy the Cachee proxy

One container, one env var for your API key, one for your existing cache endpoint. The proxy speaks Redis protocol on port 6379.

3 min

2

Swap one connection string

Change your app's REDIS_HOST from your existing cache to the Cachee proxy. Zero code changes needed — your existing Redis client works as-is.

2 min

3

Hot data served from L1 in ~10us

Cachee's in-memory L1 (Tiny-Cachee engine) serves frequently accessed keys without touching your backend. Cache misses are forwarded transparently.

4

Reduce costs and API calls

With 90%+ hit rates on the L1, you cut 90% of calls to your existing provider — lowering both latency and cost.

5

No migration required

Your existing cache keeps all its data. Cachee only accelerates reads — writes pass through to your backend for durability.

Which model is right for your team?

Side-by-side comparison of all four deployment models across the metrics that matter.

Capability	Managed Cloud	Sidecar	Overlay	Self-Hosted
Setup time	< 1 hour	12 minutes	10 minutes	45 minutes
p99 cache hit latency	0.46ms	31ns	~0.01ms	~0.01ms
Infrastructure to manage	✓ None — we run it	⊘ One container	⊘ One container	— Your hardware
Existing Redis client works	⊘ SDK change	✓ Zero code change	✓ Zero code change	✓ Zero code change
Keep existing cache provider	—	—	✓ Your L2 backend	—
Reduces provider API costs	—	—	✓ Up to 90%	—
AI pre-warming & optimization	✓	✓	✓	✓
Automatic scaling	✓ Fully managed	✓ Managed	✓ Managed	⊘ Controlled
Multi-region failover	✓	✓	✓	✓
Dedicated solutions engineer	—	—	—	✓ Included
Available on tier	Starter +	Growth +	Starter +	Enterprise

Already using a cache? Overlay it. Starting fresh? Go Managed.

Every deployment model uses the same control plane. Start with Overlay to accelerate your existing ElastiCache or CloudFlare KV, or go Managed for a turnkey solution.

Start Free Trial Try It Free

Capabilities

Platform Capabilities

Every feature is designed for production workloads at scale. No toy benchmarks. No asterisks. These are the capabilities running in production today.

Native Engine

High-performance data paths. No garbage collection pauses. No runtime overhead. The entire hot path runs in CPU cache lines, delivering consistent nanosecond latency under load.

31ns average L1 hit latency

AI Prediction Engine

Proprietary ML models trained on your access patterns. Predicts next-access with 95%+ accuracy. Models update every 30 seconds without downtime. Custom per-tenant model isolation.

95%+ hit rate in production

3-Tier Storage

L1 (sub-microsecond), L2 (single-digit microsecond), L3 (sub-millisecond). AI manages data placement across tiers. Hot data automatically promoted, cold data evicted. No manual tuning.

128x storage reduction vs raw

Overlay Architecture

Deploys alongside your existing Redis, Memcached, or database. No migration. No code changes. Cachee optimizes requests transparently and serves from L1 when possible.

Zero code changes required

Multi-Region Sync

Distributed consistency protocol across regions. Sub-millisecond local reads with automatic conflict resolution. Causal ordering guarantees prevent stale reads after writes.

Global consistency in <5ms

Enterprise Security

AES-256 encryption at rest and in transit. Role-based access. Audit logging. Tenant isolation with zero data leakage. Self-hosted option keeps all cache data on your infrastructure.

Encryption + tenant isolation

Comparison

How Cachee Compares

Side-by-side with the caching solutions you already know. Same metrics, same workloads, independently verifiable.

Metric	Redis	Memcached	CloudFront	Cachee
Read Latency (p50)	0.8 - 2ms	0.5 - 1ms	5 - 50ms	31ns
Read Latency (p99)	5 - 15ms	3 - 8ms	50 - 200ms	12ns
Throughput	500K ops/s	1M ops/s	N/A (CDN)	32M ops/s
AI Prediction	None	None	None	95%+ accuracy
Auto-Tuning	Manual TTLs	Manual config	Basic TTLs	Fully autonomous
Network Hops	2-3 hops	2-3 hops	1-4 hops	Near-zero
GC Pauses	Rare (C)	None (C)	Varies	None
Origin Load Reduction	60 - 80%	60 - 75%	40 - 70%	95%+
Deploy Complexity	Moderate	Moderate	Low (CDN)	1 command overlay

ROI Calculator

Calculate Your Savings

Input your current infrastructure metrics. See exactly what changes when Cachee deploys. All calculations use conservative estimates based on production deployments.

Monthly Requests

100M

Current Infra Spend ($/month)

$85,000

Average Response Latency (ms)

47ms

Current Cache Hit Rate (%)

65%

$0

Monthly Savings

$0 annually

0ms

New Avg Latency

0x faster

0x

ROI Multiplier

Based on Scale tier ($500/mo)

Benchmarks

Production Benchmarks

These numbers come from production deployments, not synthetic benchmarks. Measured on real infrastructure under real workloads. All benchmarks are independently reproducible.

L1 Cache Read Latency

0

nanoseconds (p50)

Redis: 800,000ns Cachee: 31ns

Operations Per Second

0

million ops/sec (single node)

Redis: 0.5M ops/s Cachee: 32M ops/s

L1 Hit Rate (After Training)

0

percent (production average)

Redis: ~65% typical Cachee: 95%+

Origin Load Reduction

0

percent fewer database queries

Redis: ~70% reduction Cachee: 95%+ reduction

Infrastructure Economics

Four Metrics Shift the Moment You Deploy

Memory utilization rises because Cachee is actively using it. Everything else -- server hits, infrastructure cost, response latency -- drops dramatically.

▲ GOES UP

0%

Memory Utilization

Cachee actively uses L1 memory to store predicted data. Higher utilization = more cache hits = fewer expensive backend calls.

▼ GOES DOWN

0%

Database / Origin Hits

95%+ of requests served from L1 memory. Your database goes from handling millions of queries to handling thousands.

▼ GOES DOWN

0%

Infrastructure Spend

Fewer database replicas, smaller Redis clusters, less compute. Enterprises typically see 40-70% infrastructure cost reduction.

▲ GOES UP

0x

Request Performance

P99 latency drops from tens of milliseconds to sub-millisecond. Same hardware handles orders of magnitude more throughput.

P&L Impact (100M requests/month)

Representative enterprise running on a standard AWS stack. These are the line items that change when Cachee deploys.

Line Item	Before Cachee	After Cachee	Delta
ElastiCache / Redis Cluster	$18,000/mo	$4,500/mo	−$13,500
RDS / Aurora Database	$32,000/mo	$12,000/mo	−$20,000
Compute (EC2 / ECS / Lambda)	$24,000/mo	$10,000/mo	−$14,000
Data Transfer / CDN	$11,000/mo	$4,500/mo	−$6,500
DevOps Hours (cache mgmt)	60 hrs/mo ($12,000)	4 hrs/mo ($800)	−$11,200
Cachee Platform Cost	—	$500/mo	+$500
NET MONTHLY IMPACT	$97,000/mo	$32,300/mo	−$64,700/mo

$776,400 annual savings · 129x ROI on Scale tier

Representative figures based on typical enterprise deployment. Actual results vary by infrastructure configuration, workload patterns, and scale.

Data Engine

Beyond GET/SET: Native Data Structures

Cachee does not just cache strings. It natively stores hashes, sorted sets, sets, and lists — and executes Lua scripts in-process. All 50+ Redis-compatible commands run inside the Cachee engine with zero external dependencies.

Tiered Architecture with Native Data Types

Your App

Application Request (GET, HGET, ZADD, EVAL...)

↓

Cachee L1

All data types in-process 1.5µs

↓ L1 miss only

Optional Redis L2

Network fallback ~1ms

↓ miss

Database

Origin query ~15ms

Hashes, Sets, Sorted Sets, Lists

Native support for every Redis data structure. HSET, ZADD, SADD, LPUSH — all running in the same process as your cache, no serialization, no network round-trips.

All data types at in-process speed

Embedded Lua Scripting

EVAL executes Lua 5.4 scripts inside a sandboxed runtime. Complex multi-key logic runs atomically without leaving the process — no network latency, no external interpreter.

Sandboxed Lua 5.4 runtime

In-Process Transactions

MULTI/EXEC runs atomic command batches with zero network overhead. Combined with Lua scripting, this gives you full transactional semantics at cache speed.

MULTI/EXEC with zero network hops

Enterprise Data Engine details → Read the engineering blog post →

Full Platform

The Complete Cachee Architecture

Five memory tiers, sixteen features, one API. Every request traverses the fastest path available.

                L0
                Zero-Copy Shared Memory
                <1ns (pointer deref, multi-process)

                L1
                In-Process RAM (Cachee-FLU)
                1.5µs (DashMap, 177+ commands)

                L1.5
                NVMe SSD
                10–50µs (hybrid tiering, io_uring)

                L2
                Redis / ElastiCache
                1–5ms (network cache)

                L3
                Database
                5–50ms (source of truth)
            

How Cachee Actually Works

Request Lifecycle: Before vs After

Four Steps. Sub-Millisecond.

See It Running Live

Three Ways to Deploy

Pick the model that fits how you deploy.

Managed Cloud — Setup Guide

Node.js quickstart

Steps to go live

Create your free account

Copy your API key

Install the SDK

Make your first cache call

Watch the dashboard populate

Sidecar Container — Setup Guide

Docker Compose

How the sidecar works

Pull the image

Add it to your compose or pod spec

Change one line in your app config

The sidecar handles everything else

Supported Redis commands

Self-Hosted — Setup Guide

Connect your infrastructure

What stays where

Your data never leaves your infrastructure

What the control plane does receive

AI optimization runs on telemetry, not data

Works in air-gapped environments

Dedicated onboarding engineer included

Overlay — Setup Guide

Deploy Cachee in front of your existing cache

How the overlay works

Deploy the Cachee proxy

Swap one connection string

Hot data served from L1 in ~10us

Reduce costs and API calls

No migration required

Which model is right for your team?

Already using a cache? Overlay it. Starting fresh? Go Managed.

Platform Capabilities

How Cachee Compares

Calculate Your Savings

Production Benchmarks

Four Metrics Shift the Moment You Deploy

P&L Impact (100M requests/month)

Beyond GET/SET: Native Data Structures

Tiered Architecture with Native Data Types

The Complete Cachee Architecture

Data Integrity

Intelligence

Composition

Engine

Ready to See the Difference?