We Built a Redis-Compatible Data Engine in Rust. Here's Why.

A customer needed hashes, sorted sets, lists, and Lua scripting. The standard answer was “add a Redis instance.” We asked a different question: what if the cache engine spoke every Redis command natively, in-process, at sub-microsecond latency — without any external dependency? No sidecar. No container. No connection pool. No network hop. Just native Rust data structures behind the same RESP protocol your application already speaks. So we built it. Over 50 commands, from HSET to EVAL, running inside the same process as your application. Here is why we did it, what it looks like, and what it means for teams that are tired of managing Redis alongside their cache.

The Problem That Sparked It

V100 is a video platform running on our infrastructure. They were already using Cachee for GET/SET caching at 1.5 microseconds per lookup — standard key-value operations that replaced a Redis cluster and cut their P99 from 3ms to effectively zero. The problem was that their workload had outgrown simple key-value. They needed rate limiting for API endpoints, which requires sorted sets and atomic INCR. They needed session state for authenticated users, which requires hash maps with field-level reads and writes. They needed job queues for video transcoding pipelines, which requires list operations like LPUSH and RPOP. And they needed custom scoring logic for content recommendations, which requires Lua scripts that read multiple keys, compute a score, and write the result — atomically.

The conventional answer was obvious: stand up a Redis instance alongside Cachee. Use Cachee for the hot GET/SET path and Redis for everything else. But that answer introduced exactly the problems Cachee was designed to eliminate. Another container to deploy, monitor, and restart when it OOM-kills at 3 AM. Another connection pool to exhaust under load. Another failure mode in the request path. And critically, another 0.5–1ms of network latency per call — even on localhost, because TCP serialization does not care that your Redis is on the same machine. Every sorted set operation, every hash lookup, every Lua script execution would pay that tax. V100 was doing 40,000 rate-limit checks per second. At 1ms each, that is 40 seconds of cumulative latency per second spent on network overhead alone. We decided to build the commands natively instead.

What We Built

We implemented the full spectrum of Redis data structures and commands as native Rust types, operating on in-process memory with zero serialization and zero network overhead. Every command executes in the same address space as the cache engine itself.

Hashes are backed by DashMap<String, DashMap<String, String>> — a concurrent hash map of concurrent hash maps. This gives us lock-free concurrent reads across all hash fields. HSET, HGET, HGETALL, HMGET, and HDEL all operate without global locks. Multiple threads can read different fields of the same hash simultaneously without contention, which is not something Redis can do — Redis is single-threaded and processes commands sequentially.

Sorted sets use a BTreeMap with IEEE 754 score ordering, providing O(log n) insert and range queries — the same algorithmic complexity as Redis’s skip lists, but without the memory overhead of skip list node pointers. ZADD supports NX (only add if not exists) and XX (only update if exists) flags. ZRANGE, ZRANGEBYSCORE, ZREM, ZREMRANGEBYSCORE, and ZCARD all work as expected.

Sets are backed by DashSet for concurrent membership checks. SADD, SISMEMBER, SMEMBERS, SREM, and SCARD operate with the same lock-free read semantics as hashes.

Lists use VecDeque for O(1) push and pop from both ends. LPUSH, RPUSH, LPOP, RPOP, LRANGE, and LLEN cover the queue and stack patterns that job systems depend on.

Transactions use a per-connection command queue. MULTI starts buffering commands; EXEC executes them atomically; DISCARD clears the buffer. This gives applications the same batching and atomicity guarantees they expect from Redis transactions.

Beyond data structures, we implemented SCAN with cursor-based iteration, MATCH pattern filtering, and COUNT hints. We implemented INCR, DECR, APPEND, and MSETNX as native atomic operations. We implemented a Pub/Sub engine backed by Tokio broadcast channels, with PUBLISH delivering messages to all subscribed connections. All told, over 50 commands — zero external dependencies.

// V100's rate limiter — entirely in-process, no Redis needed

// Before: Redis sidecar (TCP localhost)
const count = await redis.incr(`rate:${userId}`);       // ~1ms (TCP round-trip)
await redis.expire(`rate:${userId}`, 60);              // ~1ms (another round-trip)
// Total: ~2ms per rate check. At 40K checks/sec = 80 seconds of latency/sec.

// After: Cachee native INCR + EXPIRE (in-process)
const count = await cachee.incr(`rate:${userId}`);      // ~0.0015ms (memory access)
await cachee.expire(`rate:${userId}`, 60);             // ~0.0005ms (same process)
// Total: ~0.002ms per rate check. At 40K checks/sec = 0.08 seconds of latency/sec.
        

Why In-Process Beats a Sidecar

The performance difference between a Redis sidecar and native in-process commands is not a marginal optimization. It is a structural elimination of overhead. A Redis sidecar running on the same machine still communicates over TCP localhost. Every command requires: serializing the command into RESP format, writing it to a TCP socket, the kernel copying the bytes between process memory spaces, Redis parsing the command, executing it, serializing the response, writing it back to the socket, the kernel copying the response bytes, and your client deserializing the result. That is 0.5–1ms minimum, regardless of how trivial the operation is. An INCR on a single integer key — arguably the simplest possible operation — still pays the full TCP round-trip cost.

Cachee’s native implementation eliminates every layer of that stack. The command is a direct function call in the same memory space. The data structure is a Rust type in the same heap. There is no serialization because there is no wire. There is no kernel copy because there is no socket. The result is 0.0015ms per operation — a 660x improvement over same-machine Redis. But the performance gap is only part of the story. In-process execution also eliminates entire categories of operational failure.

No connection pool. There is no pool to exhaust, no pool to tune, no pool exhaustion cascading into application-wide failures under traffic spikes.
No separate container. Nothing to monitor for OOM, nothing to restart at 3 AM, nothing to coordinate version upgrades with. Your cache is part of your application binary.
No serialization overhead. Objects stay as Rust types — DashMap entries, BTreeMap nodes, VecDeque elements. They are never converted to bytes and back.
Concurrent reads without single-threading. Redis processes all commands on a single thread. DashMap provides lock-free concurrent reads — multiple application threads can read different hash fields or sorted set ranges simultaneously without blocking each other.

50+ Redis Commands

0.0015ms Per Operation

0 Dependencies

172MB Docker Image

The Lua Decision

Lua scripting was the most debated feature internally. Adding an embedded scripting runtime to a cache engine is a significant decision — it increases the binary size, introduces a new execution context, and creates surface area for bugs. We built it anyway because the alternative was worse: without scripting, customers who needed atomic multi-step logic would be forced to add a Redis instance just for EVAL, even if they were using Cachee for everything else. One missing command would negate the entire “zero external dependencies” value proposition.

We embedded Lua 5.4 directly using the mlua crate with vendored compilation — no system Lua dependency, no dynamic linking, no version mismatch issues across deployment environments. The Lua runtime is compiled into the Cachee binary itself. Scripts execute atomically against the in-process data store, which means redis.call() inside a Lua script is not a network round-trip to a separate Redis process — it is a direct function call into the same DashMap and BTreeMap structures that back every other command. A Lua script that calls redis.call('GET', key) followed by redis.call('SET', key, new_value) completes in microseconds, not milliseconds.

Security was non-negotiable. The embedded Lua environment is fully sandboxed: os, io, debug, require, loadfile, and dofile are all removed. Scripts cannot access the filesystem, open network connections, or invoke system calls. A configurable instruction count limit (default equivalent to 5 seconds of execution) prevents infinite loops from consuming resources. Scripts are cached by SHA-256 hash, so EVALSHA works for repeat execution without retransmitting the script body over the wire. The redis.call() bridge supports GET, SET, DEL, INCR, EXISTS, HGET, and HSET against the native store — the same commands your scripts already use.

-- V100's content scoring script — runs atomically in-process
local views = tonumber(redis.call('HGET', KEYS[1], 'views')) or 0
local likes = tonumber(redis.call('HGET', KEYS[1], 'likes')) or 0
local age_hours = tonumber(ARGV[1])
local score = (likes * 10 + views) / (age_hours + 2) ^ 1.5
redis.call('ZADD', 'trending', score, KEYS[1])
return tostring(score)
-- In Redis: 3 network round-trips (2x HGET + 1x ZADD) = ~3ms
-- In Cachee: 3 direct function calls = ~0.005ms
        

What This Means for Customers

The practical impact is that you do not need Redis at all if your workload fits in memory. Every pattern that traditionally required a Redis instance alongside your cache can now run natively inside Cachee with no additional infrastructure.

Rate limiting: INCR + EXPIRE, entirely in-process. No TCP round-trip per request. At sub-microsecond latency, rate checks become effectively free — they add less overhead than a single log statement.

Leaderboards: ZADD + ZRANGE, with O(log n) inserts and range queries backed by BTreeMap. V100 runs a real-time trending content board with 100,000 entries, updating scores and pulling top-50 ranges at 0.0015ms per operation.

Session state: HSET + HGET, no network. Store user sessions as hashes with field-level reads and writes. Authentication middleware that checks HGET session:abc123 user_id completes in 1.5 microseconds instead of 1 millisecond.

Job queues: LPUSH + RPOP, no external broker. Video transcoding jobs push to a list, workers pop from the other end. For workloads that do not need persistence guarantees (and most in-memory job queues do not), this eliminates an entire dependency.

Custom logic: Lua scripts that read, compute, and write atomically. Recommendation scoring, conditional updates, multi-key transactions — all executing as direct function calls against in-process data structures.

The numbers: 50+ commands. 0.0015ms access. Zero external dependencies. 172MB Docker image. Drop-in RESP compatibility — your existing Redis client library works without changes. See benchmarks for full performance data, or compare against standalone Redis, KeyDB, and Dragonfly.

What We Didn’t Build (Yet)

We are deliberate about scope. There are Redis features we chose not to implement in V1, either because they require fundamentally different architecture or because they conflict with Cachee’s design philosophy as a cache engine rather than a database.

WATCH (optimistic locking) — scoped for V2. Requires tracking key modifications across connection boundaries, which interacts with DashMap’s sharding in non-trivial ways. We want to get the concurrency semantics right rather than ship a subtly broken implementation.
Native SUBSCRIBE handler — PUBLISH works and delivers to all subscribed connections via Tokio broadcast channels. Full SUBSCRIBE with blocking connection semantics is partially implemented and forwards to upstream Redis where available.
Persistence — this is a cache engine, not a database. If you need durability, configure upstream Redis as an L2 tier. Cachee handles the hot path; Redis handles the cold storage and persistence layer. Mixing cache semantics with durability guarantees creates the worst of both worlds.
Cluster mode — Cachee operates as a single-node, in-process engine. Sharding across multiple nodes is a different problem that requires consensus protocols, slot migration, and cross-node communication — exactly the network overhead we built this to eliminate. For distributed workloads, deploy Cachee per-node with Redis as the shared L2.

Design principle: We build what makes sense in-process and defer what does not. Persistence and sharding are network-bound problems. Hashes, sorted sets, lists, and Lua scripts are compute-bound problems. Cachee handles compute-bound workloads at sub-microsecond latency. Network-bound workloads stay with purpose-built systems. This is not a limitation — it is architecture. Read more about our caching architecture and how predictive pre-warming keeps hit rates above 99%.

50+ Redis Commands. Zero Redis Dependency. Sub-Microsecond Access.

Native hashes, sorted sets, lists, Lua scripting, and transactions — all running in-process at 0.0015ms per operation.

Start Free Trial Schedule Demo

We Built a Redis-Compatible Data Engine in Rust. Here’s Why.