← Back to Blog

Redis vs DragonflyDB 2026: Real Benchmarks

April 27, 2026 | 13 min read | Engineering

DragonflyDB markets itself as a Redis replacement that is 25x faster. The benchmarks back this up. On a 64-core machine, DragonflyDB sustains 4 million operations per second where Redis tops out at 150,000-200,000 on a single thread. That throughput advantage is real, measurable, and significant for workloads that are throughput-bound.

But throughput and latency are different metrics, and most applications are latency-bound, not throughput-bound. A system that processes 4 million requests per second but takes the same 300 microseconds per request as Redis has not solved the problem that most engineers actually have. This post walks through honest benchmarks of both systems at different value sizes, explains where DragonflyDB's architectural advantage matters and where it does not, and introduces the question that neither Redis nor DragonflyDB answers: what if the fastest cache is no network at all?

25x
DragonflyDB Throughput Advantage
~300 us
Both: Per-Request Latency
31 ns
In-Process L1 Latency

Architecture Comparison

Understanding why DragonflyDB is faster on throughput requires understanding how both systems use CPU cores.

Redis: Single-Threaded Command Processing

Redis processes all commands on a single thread. In Redis 7+, I/O threads can handle network read/write operations in parallel, but command execution remains single-threaded. This means that on a 64-core machine, Redis uses at most 1 core for command processing plus a handful of cores for I/O. The remaining cores sit idle. The theoretical throughput ceiling is the single-core processing rate, which is approximately 150,000-200,000 operations per second for GET/SET on small values.

Redis's single-threaded model was a deliberate design choice. It eliminates lock contention, simplifies the codebase, and ensures that every command is atomic without requiring locks. The tradeoff is that Redis cannot scale vertically beyond one core. To scale throughput, you must run multiple Redis instances (Redis Cluster) and shard your keys across them. This adds operational complexity: cluster management, resharding, cross-slot limitations, and the overhead of a client-side routing layer.

DragonflyDB: Shared-Nothing Multi-Threaded

DragonflyDB uses a shared-nothing architecture where each thread owns a shard of the keyspace. There are no locks because threads never share data. Each thread runs its own event loop, processes commands for its shard, and manages its own memory. On a 64-core machine, DragonflyDB runs 64 independent processing threads, each handling approximately 60,000-80,000 operations per second. The aggregate throughput is the per-thread rate multiplied by the thread count: 64 * 70,000 = 4,480,000 operations per second.

This is an honest architectural improvement. DragonflyDB solves the "Redis wastes 63 cores" problem without the operational complexity of Redis Cluster. A single DragonflyDB instance on a 64-core machine replaces a 64-shard Redis Cluster. The API is Redis-compatible, so migration requires no application code changes. The data structures, commands, and wire protocol are the same.

Throughput Benchmarks

We benchmarked both systems on an AWS c7g.16xlarge (64 vCPUs, ARM Graviton3, 128 GB RAM) using memtier_benchmark with 50 client connections, 4 threads, and 64-byte values. This is the standard benchmark configuration that DragonflyDB uses in its own published benchmarks, so the comparison is apples-to-apples.

MetricRedis 7.2DragonflyDB 1.15Advantage
Throughput (ops/sec)187,0004,120,00022x DragonflyDB
Avg latency (GET)0.28 ms0.26 ms1.08x DragonflyDB
P50 latency (GET)0.25 ms0.24 ms1.04x DragonflyDB
P99 latency (GET)0.48 ms0.41 ms1.17x DragonflyDB
P99.9 latency (GET)1.10 ms0.72 ms1.53x DragonflyDB
CPU utilization~1.5 cores~62 coresRedis: 97% idle
Memory efficiency~1.0x baseline~0.8x (dashtable)1.25x DragonflyDB

The throughput advantage is clear: 22x. DragonflyDB processes 22 times more operations per second on the same hardware. But look at the latency columns. The average GET latency is 0.28ms for Redis and 0.26ms for DragonflyDB -- a 7% improvement. At P99, the difference is 0.48ms vs 0.41ms -- a 15% improvement. These are meaningful but modest latency improvements, nothing close to the 22x throughput advantage.

This is the critical insight: DragonflyDB's architectural advantage is in aggregate throughput, not per-request latency. The per-request processing time is similar because both systems are doing the same work per command: receive bytes from the socket, parse the RESP protocol, look up the key in a hash table, serialize the response, send it back. DragonflyDB does this on 64 cores in parallel. Redis does it on 1 core. The individual operation takes the same time; DragonflyDB just processes 64 of them simultaneously.

The Network Floor: What Neither System Can Fix

Both Redis and DragonflyDB are network services. Every GET requires the following sequence: the client serializes the command into the RESP wire format, the kernel transmits the bytes over TCP, the server receives and parses the command, the server performs the hash table lookup (sub-microsecond), the server serializes the response, the kernel transmits the response back, and the client deserializes the response. The hash table lookup -- the actual "cache" operation -- takes approximately 50-100 nanoseconds. Everything else is network overhead.

On a local connection (client and server on the same machine, communicating over loopback), the network round-trip is 50-100 microseconds. On a same-AZ connection over a VPC, it is 100-200 microseconds. On a cross-AZ connection, it is 200-500 microseconds. This is the network floor. Neither Redis nor DragonflyDB can reduce per-request latency below this floor because the floor is determined by TCP/IP stack overhead, kernel context switches, and physical network propagation delay -- none of which are under the control of the cache server.

# Breakdown of a single Redis/DragonflyDB GET (same-AZ)
#
# Client: serialize command           ~0.2 us
# Client: kernel send()               ~1.0 us
# Network: TCP round-trip             ~120 us   <-- THE FLOOR
# Server: kernel recv()               ~1.0 us
# Server: parse RESP command           ~0.3 us
# Server: hash table lookup            ~0.08 us  <-- actual cache work
# Server: serialize RESP response      ~0.2 us
# Server: kernel send()               ~1.0 us
# Network: TCP return trip            ~120 us   <-- THE FLOOR (again)
# Client: kernel recv()               ~1.0 us
# Client: deserialize response         ~0.2 us
#
# Total: ~245 us
# Actual cache work: 0.08 us (0.03% of total)

The hash table lookup is 0.03% of the total operation time. The network is 97% of the total. This means that optimizing the server-side processing (which is what DragonflyDB does with multi-threading) has a minimal impact on per-request latency. You cannot make a 245-microsecond operation significantly faster by optimizing the 0.08-microsecond component. You can process more of them in parallel, which is the throughput win, but each individual request still takes approximately 245 microseconds.

Where Value Size Changes the Calculus

The benchmarks above use 64-byte values, which is the standard benchmark payload but not representative of many production workloads. When values get larger, the serialization and network transfer costs grow linearly, and the picture changes in important ways.

Value SizeRedis Avg LatencyDragonflyDB Avg LatencyIn-Process L1 LatencyNetwork % of Total
64 B0.28 ms0.26 ms0.000031 ms (31 ns)97%
256 B0.30 ms0.27 ms0.000035 ms95%
1 KB0.35 ms0.31 ms0.000042 ms91%
4 KB0.52 ms0.44 ms0.000065 ms85%
16 KB1.10 ms0.82 ms0.000180 ms78%
64 KB3.40 ms2.10 ms0.000650 ms71%
256 KB12.80 ms7.30 ms0.002400 ms65%

Several patterns emerge from this data. First, at small values (64B-1KB), both systems are dominated by network latency. The serialization cost is negligible. DragonflyDB's per-request latency advantage is minimal (7-11%) because there is simply not much server-side work to optimize.

Second, at medium values (4KB-64KB), DragonflyDB's multi-threaded I/O starts to show a meaningful per-request latency advantage. At 64KB values, DragonflyDB is 38% faster than Redis per-request. The reason is that serialization and network I/O at larger sizes benefit from DragonflyDB's parallel I/O threads more than Redis's limited I/O thread model. The server-side work (serializing a 64KB response) is no longer negligible, and DragonflyDB handles it more efficiently.

Third, at large values (256KB+), both systems are slow. A single GET of a 256KB value takes 7-13 milliseconds. At this point, you are transferring a quarter megabyte over TCP for every cache read. Neither system is designed for this. If your values are this large, caching them in a network-accessible store is inherently expensive, and the question is whether you should be caching them at all, or whether you should restructure your data to avoid storing 256KB blobs in a key-value cache.

Fourth, and most importantly, look at the L1 column. In-process L1 cache latency is 31 nanoseconds for a 64-byte value and 2.4 microseconds for a 256KB value. The in-process L1 is 9,000x faster than Redis and 8,400x faster than DragonflyDB at 64 bytes. At 256KB, it is 5,333x faster than Redis and 3,042x faster than DragonflyDB. The advantage narrows at larger values (because memcpy cost grows linearly with size) but remains orders of magnitude faster at every value size.

When DragonflyDB Is the Right Choice

DragonflyDB is a better Redis for workloads that are throughput-bound rather than latency-bound. Specifically, DragonflyDB is the right choice when your Redis Cluster has grown to 8+ shards and the operational complexity is becoming a burden (DragonflyDB replaces the cluster with a single instance), when your throughput exceeds what a single Redis instance can handle (200K+ ops/sec) and you do not want to manage sharding, when you are using Redis as a shared data store across many services and the aggregate query volume is high even if individual service query volume is low, and when your values are in the 4KB-64KB range where DragonflyDB's parallel I/O provides a meaningful per-request latency improvement over Redis.

DragonflyDB is also a clean operational win for teams that have outgrown single-instance Redis but find Redis Cluster too complex. A single DragonflyDB instance that uses all 64 cores eliminates cluster topology management, slot migration, cross-slot restrictions, and client-side routing. That operational simplicity has real value even if the latency improvement per request is modest.

When DragonflyDB Is Not the Answer

DragonflyDB does not solve the latency problem for applications where per-request cache latency is the bottleneck. If your application makes 3-5 cache reads per incoming request and each read takes 250-300 microseconds, your cache-related latency is 750-1500 microseconds per request. Switching from Redis to DragonflyDB reduces this to 700-1400 microseconds -- a 5-7% improvement. That is not transformative.

DragonflyDB also does not help with the cache hit rate problem. If your hit rate is 60%, DragonflyDB serves that 60% faster in aggregate (throughput) but at similar per-request latency. The 40% of requests that miss the cache still hit your database at 15ms regardless of whether the cache is Redis or DragonflyDB. The root cause of your performance problem is not the cache engine. It is the cache hit rate.

And DragonflyDB cannot help with the fundamental physics problem: network round-trips have a minimum latency that no software optimization can eliminate. Same-AZ TCP round-trip time is 100-200 microseconds. That is the floor for any network-accessible cache, no matter how fast the server-side processing is.

The Benchmark Blind Spot

Published benchmarks for both Redis and DragonflyDB use memtier_benchmark with 50+ client connections running pipelines of 10-100 commands. This measures aggregate throughput under saturation, which is the scenario that maximizes DragonflyDB's advantage. But most production applications are not saturating their cache server. They are making 5,000-50,000 operations per second -- well within Redis's single-threaded capacity. For these workloads, Redis and DragonflyDB perform nearly identically per-request. The 25x throughput advantage is real but irrelevant if your bottleneck is the 250-microsecond per-request latency, not the 200K ops/sec throughput ceiling.

The Third Option: In-Process L1 + Network L2

The Redis vs DragonflyDB comparison assumes that the cache must be a network service. This assumption is so deeply embedded in how we think about caching that most engineers never question it. But it is not a law of physics. It is an architectural choice, and it is the wrong choice for your hottest data.

An in-process L1 cache lives inside your application process. It is a hash map in your process's memory space. A lookup takes 31 nanoseconds -- the time for a few pointer dereferences and a hash computation. There is no TCP. There is no serialization. There is no kernel context switch. There is no network round-trip. The data is in your L2/L3 CPU cache, one memory access away.

The objection is always the same: "but in-process caches cannot be shared across instances." This is correct. An in-process cache on Instance A does not contain data cached by Instance B. But this objection conflates two different requirements. Reads do not need to be shared. If Instance A reads the same session token 1000 times, each of those reads can be served from Instance A's local L1 at 31 nanoseconds. The data does not need to come from a shared server. Writes do need to be shared, because a write on Instance A must eventually be visible on Instance B. But writes are 10-100x less frequent than reads in most workloads.

The architecture that captures both requirements is L1 + L2 tiering. L1 is an in-process cache for reads. L2 is a network cache (Redis, DragonflyDB, or any RESP-compatible server) for writes and L1 misses. On a read, check L1 first (31 nanoseconds). If L1 misses, fall through to L2 (250+ microseconds). On a write, write to L2 (which propagates to all instances on their next L1 miss), and optionally invalidate or update L1 locally.

# L1 + L2 tiered architecture
#
# Read path (99% of operations):
#   1. Check L1 (31 ns) -- hit? Return immediately
#   2. L1 miss: Check L2/Redis (250 us) -- hit? Populate L1, return
#   3. L2 miss: Query origin (15 ms) -- populate L2 and L1, return
#
# Write path (1% of operations):
#   1. Write to L2/Redis (250 us)
#   2. Invalidate L1 locally (31 ns)
#   3. Other instances see new value on next L1 miss
#
# Consistency model:
#   - L1 TTL: 5-30 seconds (bounds staleness)
#   - L2: source of truth for shared state
#   - Reads within TTL window may be stale by at most TTL seconds
#   - Writes are immediately consistent on the writing instance

This architecture makes the Redis vs DragonflyDB choice less important. If 90% of your reads hit L1 at 31 nanoseconds, the L2 latency only matters for the remaining 10%. Whether that 10% takes 280 microseconds (Redis) or 260 microseconds (DragonflyDB) is a 2-microsecond difference in your weighted average latency. The L1 layer absorbs the vast majority of reads and makes the L2 engine choice nearly irrelevant for latency-sensitive workloads.

The Honest Comparison

Let us put all three options side by side for a typical production workload: 30,000 operations per second, 80% reads and 20% writes, 512-byte average value size, same-AZ deployment.

MetricRedis 7.2DragonflyDB 1.15L1 + Redis L2
Throughput capacity187K ops/sec4.1M ops/sec11.7M ops/sec (L1 reads)
Avg read latency0.30 ms0.27 ms0.031 ms (at 90% L1 hit)
P99 read latency0.52 ms0.43 ms0.32 ms (L1 miss path)
Write latency0.30 ms0.27 ms0.30 ms (writes go to L2)
Operational complexityLow (single instance) or High (cluster)Low (single instance)Medium (L1 lib + Redis)
Cross-instance consistencyImmediateImmediateBounded staleness (TTL)
Network dependencyFullFullReads: 10% network, 90% local
Hardware utilization1-2 coresAll coresMinimal (L1 is in-app)

DragonflyDB is a better Redis. It uses all cores, eliminates cluster complexity, and provides a modest per-request latency improvement at larger values. If your choice is between Redis and DragonflyDB and nothing else, DragonflyDB is the better choice for most workloads that have outgrown a single Redis instance.

But framing the choice as Redis vs DragonflyDB misses the larger opportunity. Both are network caches. Both are bound by the network floor. Both require serialization, TCP round-trips, and kernel context switches on every operation. The highest-impact optimization for read-heavy workloads is not a faster network cache. It is eliminating the network for the reads that matter most.

Making the Right Choice for Your Workload

The decision framework is straightforward once you separate throughput from latency. If your primary constraint is throughput -- you are hitting Redis's single-threaded ceiling and need more aggregate operations per second -- DragonflyDB is a clean win. It replaces a multi-shard Redis Cluster with a single multi-threaded instance that uses all available cores. The migration is API-compatible and operationally simpler.

If your primary constraint is per-request latency -- your application feels slow because each cache read takes 250-500 microseconds and you make multiple reads per request -- neither Redis nor DragonflyDB will solve your problem because the latency is in the network, not the server. The fix is an in-process L1 cache at 31 nanoseconds for your hottest reads, with either Redis or DragonflyDB as an L2 for writes and cold reads.

If you need both throughput and latency, the answer is L1 + DragonflyDB as L2. The L1 absorbs 90%+ of reads at 31 nanoseconds (eliminating the throughput pressure on L2), and DragonflyDB handles the remaining 10% of reads plus all writes with multi-threaded efficiency. This combination gives you sub-microsecond average read latency and millions of operations per second of aggregate capacity, without the operational complexity of a Redis Cluster.

The Bottom Line

DragonflyDB is 22-25x faster than Redis on throughput. Per-request latency is 7-38% better depending on value size. Both are honest, measurable improvements. But both systems share the same fundamental constraint: they are network services, and the network is 97% of the total per-request latency. If your bottleneck is throughput, DragonflyDB is the right upgrade from Redis. If your bottleneck is latency, the right upgrade is an in-process L1 cache at 31 nanoseconds that eliminates the network entirely for your hottest reads. DragonflyDB is a better Redis. An L1 cache is a different layer.

In-process L1 caching at 31ns. Eliminate the network for your hottest reads.

brew install cachee Why 60% Hit Rate Costs You