Real-Time Leaderboard Without Redis ZSET
Redis sorted sets are the default answer to "how do I build a leaderboard." ZADD to update a score. ZRANGE to get the top N. ZRANK to get a player's position. ZRANGEBYSCORE to get players within a score range. The API is clean, the semantics are intuitive, and the time complexity is O(log N) for updates and O(log N + M) for range queries where M is the number of returned elements. For a leaderboard with 100,000 players, Redis sorted sets work well and are the pragmatic choice.
The problem appears at scale. Not at 100,000 players, but at 1 million. At 10 million. At the update frequency that real-time games, trading platforms, and competitive applications demand. Every ZADD is a network round-trip: 300 microseconds on a local network, 1-3 milliseconds cross-region. Every ZRANGE is another round-trip. A leaderboard that updates 50,000 times per second and serves 200,000 rank queries per second sends 250,000 operations per second over the network to a single-threaded Redis instance. At 300 microseconds per operation, that requires 75 seconds of network time per second of wall-clock time. You are network-bound long before you are compute-bound.
This post describes an alternative architecture that maintains a leaderboard in application memory with 31-nanosecond rank lookups and 548-nanosecond score updates, then periodically syncs to a centralized store for cross-instance consistency. The in-process leaderboard handles the real-time operations. The centralized store handles the global merge. Each component does what it is good at.
The Redis Sorted Set: What You Get
Before discussing alternatives, it is worth understanding exactly what Redis sorted sets provide and what they cost. A Redis sorted set is a collection of unique members, each associated with a floating-point score. The members are ordered by score, and Redis provides O(log N) operations for insertion, removal, score update, and rank lookup. The underlying data structure is a combination of a hash table (for O(1) member-to-score lookups) and a skip list (for O(log N) ordered traversal).
For a leaderboard, you use the sorted set like this:
# Update a player's score
ZADD leaderboard 1500 "player:42" # O(log N)
# Get the top 10 players
ZREVRANGE leaderboard 0 9 WITHSCORES # O(log N + 10)
# Get a player's rank (0-indexed from top)
ZREVRANK leaderboard "player:42" # O(log N)
# Get players ranked 50-59
ZREVRANGE leaderboard 50 59 WITHSCORES
# Get players with scores between 1400 and 1600
ZRANGEBYSCORE leaderboard 1400 1600 WITHSCORES
The Redis sorted set API is excellent for leaderboard operations. The problem is not the API. The problem is that every operation requires a network round-trip to the Redis server. The computational cost of the operation itself (a skip list insertion at O(log N)) is measured in hundreds of nanoseconds. The network cost of delivering the operation to Redis and receiving the response is measured in hundreds of microseconds. The network dominates by a factor of 1000x.
The Real Cost of ZADD
A ZADD operation on a sorted set with 1 million members takes approximately 1-2 microseconds of CPU time in the Redis server. The skip list insertion is O(log N) = O(20) comparisons and pointer updates. This is fast. But the client-side cost of the ZADD operation is approximately 300 microseconds: 50 microseconds for serialization, 100-200 microseconds for the network round-trip (TCP send, wait, TCP receive), and 50 microseconds for deserialization. The server-side CPU cost is 0.5% of the total operation cost. The network is the other 99.5%.
| Component | Time | % of Total |
|---|---|---|
| Client-side serialization | 50 us | 16.7% |
| Network send (TCP) | 50 us | 16.7% |
| Redis skip list insert | 1.5 us | 0.5% |
| Network receive (TCP) | 150 us | 50.0% |
| Client-side deserialization | 48.5 us | 16.1% |
| Total ZADD | 300 us | 100% |
This breakdown reveals the optimization opportunity. If you move the sorted data structure into the application process, you eliminate 298.5 microseconds of overhead and keep only the 1.5 microseconds of actual computation. But maintaining a sorted structure in-process requires careful engineering to ensure thread safety, handle the memory layout efficiently, and provide the same O(log N) performance guarantees that Redis's skip list provides.
The In-Process Leaderboard
An in-process leaderboard is a sorted data structure maintained in application memory. The application reads and writes the leaderboard directly, without any network communication. Reads (rank lookup, top-N query) complete in 31 nanoseconds for hash-based lookups or O(log N) nanoseconds for range queries. Writes (score update) complete in 548 nanoseconds, which includes the hash update, the sorted structure rebalancing, and the reverse-index update for rank-by-member queries.
The Data Structure
The in-process leaderboard uses three coordinated data structures. The first is a hash map from member ID to score, providing O(1) score lookups. The second is a balanced sorted structure (a B-tree or skip list) ordered by score, providing O(log N) range queries and rank computations. The third is a reverse index from member ID to position in the sorted structure, providing O(1) rank lookups after the initial insertion.
struct Leaderboard {
// O(1) member -> score lookup
scores: HashMap<MemberId, Score>,
// O(log N) sorted traversal, range queries
sorted: BTreeMap<(Score, MemberId), ()>,
// O(1) member -> rank (updated on modification)
rank_cache: HashMap<MemberId, usize>,
}
impl Leaderboard {
fn update_score(&mut self, member: MemberId, score: Score) {
// Remove old entry if exists
if let Some(old_score) = self.scores.get(&member) {
self.sorted.remove(&(*old_score, member));
}
// Insert new entry
self.scores.insert(member, score);
self.sorted.insert((score, member), ());
// Rank cache invalidated, rebuilt lazily
self.rank_cache.clear();
}
fn get_rank(&self, member: &MemberId) -> Option<usize> {
let score = self.scores.get(member)?;
// Count entries with higher score
Some(self.sorted.range((score, member)..).count())
}
fn top_n(&self, n: usize) -> Vec<(MemberId, Score)> {
self.sorted.iter().rev().take(n)
.map(|((score, member), _)| (*member, *score))
.collect()
}
}
The compound key (Score, MemberId) in the B-tree ensures correct ordering: members are sorted by score (descending for rankings), with ties broken by member ID. This is the same semantics as Redis's ZREVRANGE. The hash map provides O(1) lookups for "what is player X's score?" without traversing the sorted structure.
Thread Safety
In a multi-threaded application, the leaderboard must handle concurrent reads and writes safely. There are three approaches, each with different performance trade-offs.
Read-write lock (RwLock): Multiple readers can access the leaderboard simultaneously, but writers require exclusive access. This is the simplest approach and works well when reads outnumber writes by 10:1 or more. Read contention is zero. Write contention is proportional to the write rate.
Sharded leaderboard: Partition the leaderboard into N shards by hashing the member ID. Each shard has its own lock. Writes to different shards do not contend with each other. This reduces write contention by a factor of N. The cost is that cross-shard queries (top-N across all shards) require reading all shards and merging the results.
Lock-free concurrent structure: Use a concurrent skip list (like crossbeam-skiplist in Rust) that supports lock-free reads and fine-grained locking for writes. This provides the best performance under high contention but is the most complex to implement correctly.
For most leaderboard use cases, a read-write lock on a single structure is sufficient. Leaderboards are read-heavy: the typical ratio is 10-100 reads per write. The RwLock allows all those reads to proceed concurrently. Writes take exclusive access for approximately 548 nanoseconds -- the time to update the hash map and rebalance the sorted structure. At 50,000 writes per second, the lock is held for writes approximately 27 milliseconds per second (548ns * 50,000), leaving 973 milliseconds per second for concurrent reads.
The L1/L2 Architecture for Global Leaderboards
An in-process leaderboard is local to one application instance. If your application runs on 10 servers, each server has its own leaderboard with its own view of the rankings. For leaderboards that must be globally consistent -- all users see the same rankings regardless of which server handles their request -- you need a mechanism to merge local leaderboards into a global view.
The L1/L2 architecture handles this with periodic synchronization. Each application instance maintains a local L1 leaderboard that handles real-time reads and writes at sub-microsecond latency. Periodically (every 1-5 seconds), each instance publishes its local updates to a centralized L2 store (Redis sorted set, a database table, or a dedicated merge service). The L2 store merges updates from all instances and produces the global leaderboard. Each instance pulls the global leaderboard from L2 and refreshes its local L1.
class GlobalLeaderboard:
def __init__(self):
self.local = InProcessLeaderboard() # L1: 31ns reads
self.redis = Redis() # L2: global merge
self.pending_updates = []
self.sync_interval = 2.0 # seconds
def update_score(self, member, score):
# Immediate local update (548ns)
self.local.update(member, score)
# Buffer for batch sync
self.pending_updates.append((member, score))
def get_rank(self, member):
# Immediate local read (31ns)
return self.local.get_rank(member)
def get_top(self, n):
# Immediate local read (O(n) ns)
return self.local.top_n(n)
async def sync_loop(self):
while True:
await asyncio.sleep(self.sync_interval)
# Push local updates to Redis (batch)
if self.pending_updates:
pipe = self.redis.pipeline()
for member, score in self.pending_updates:
pipe.zadd("global_leaderboard", {member: score})
pipe.execute() # Single round-trip for all updates
self.pending_updates.clear()
# Pull global state from Redis
global_top = self.redis.zrevrange(
"global_leaderboard", 0, -1, withscores=True
)
self.local.rebuild(global_top)
The sync interval determines the consistency window. A 2-second sync interval means that rankings may be up to 2 seconds stale. For most leaderboard use cases -- gaming, sales dashboards, competitive programming -- a 2-second delay is imperceptible. Users cannot distinguish between a leaderboard that updates every 100 milliseconds and one that updates every 2 seconds. The visual refresh rate of the leaderboard UI is the bottleneck, not the data freshness.
The Merge Strategy
When multiple instances update the same member's score between sync intervals, the merge must resolve conflicts. The simplest strategy is "last write wins" -- the most recent score for each member takes precedence. This works when scores are absolute (the player's current rating) rather than relative (the player's score increment since the last sync). For relative scores (add 10 points for a kill), use atomic increment operations on the L2 store to ensure all instances' contributions are summed correctly.
For absolute score leaderboards (rating systems like Elo or Glicko), the architecture is straightforward. Each instance computes the player's new rating after a match and writes it to the local leaderboard. During sync, the latest rating is pushed to L2. All instances converge on the same rating within one sync interval. There is no merge conflict because the rating computation is deterministic -- two instances computing the same player's rating after the same match will produce the same result.
Latency Comparison
The following table compares the latency of common leaderboard operations across three architectures: Redis sorted sets over the network, Redis sorted sets via a local sidecar, and the in-process L1 leaderboard.
| Operation | Redis (Network) | Redis (Sidecar) | In-Process L1 |
|---|---|---|---|
| ZADD / Score update | 300 us | 30 us | 548 ns |
| ZREVRANK / Rank lookup | 280 us | 28 us | 31 ns |
| ZREVRANGE top 10 | 320 us | 32 us | 85 ns |
| ZREVRANGE top 100 | 450 us | 45 us | 310 ns |
| ZRANGEBYSCORE (1K results) | 2,100 us | 210 us | 4,200 ns |
| ZCARD / Member count | 250 us | 25 us | 12 ns |
The in-process leaderboard is 547x faster than Redis for score updates and 9,032x faster for rank lookups. Even compared to a Redis sidecar (which eliminates the pod network overhead), the in-process leaderboard is 54x faster for score updates and 903x faster for rank lookups. The difference is the elimination of serialization and IPC overhead. The in-process leaderboard reads directly from memory. There is no protocol parsing, no command dispatch, no response formatting.
Use Cases
Gaming Leaderboards
Gaming leaderboards are the canonical use case. A multiplayer game with 100,000 concurrent players generates 10,000-50,000 score updates per second (kills, deaths, objectives, match completions). Each player checks their rank 1-5 times per minute. The game client displays the top 10 or top 50 players on a scoreboard that refreshes every 2-5 seconds.
At 50,000 updates per second, a Redis sorted set on the network handles the write load but adds 300 microseconds of latency per update. If the game server processes score updates in the hot path (between receiving a game event and sending the response to the client), that 300 microseconds per ZADD delays every game event response. With an in-process leaderboard, the update takes 548 nanoseconds -- invisible in a game tick that typically has a 16-millisecond budget (60 FPS).
The leaderboard is synchronized to Redis every 2 seconds for cross-server consistency. Players on different game servers see a consistent global leaderboard within 2 seconds of any score change. For competitive games that require tighter consistency (esports finals, tournament rankings), the sync interval can be reduced to 200 milliseconds at the cost of higher L2 write traffic.
Trading P&L Rankings
Trading floors display P&L (profit and loss) rankings in real time. Each trader's P&L updates on every trade execution, which can happen hundreds of times per second during market hours. The leaderboard shows the top performers, the bottom performers, and each trader's rank relative to their desk or the entire firm.
P&L leaderboards have two characteristics that make them especially suited for in-process caching. First, the update frequency is extremely high (10,000+ updates per second across all traders) but concentrated on a relatively small number of members (hundreds to thousands of traders, not millions). The in-process leaderboard holds the entire state in a few kilobytes. Second, the read frequency is high but localized: each trader's terminal queries their own rank and the top-10 list every second. These reads do not need to go over the network because each terminal's application instance maintains the local leaderboard.
Sales Dashboards
Sales organizations display real-time rankings of sales representatives by revenue, deals closed, or pipeline value. These leaderboards update less frequently than gaming or trading leaderboards (a few hundred updates per day rather than thousands per second) but are accessed by many users simultaneously (the entire sales team refreshing the dashboard during business hours).
For sales dashboards, the primary benefit of in-process caching is read performance, not write performance. A dashboard page load that queries the leaderboard 5 times (top 10, current user's rank, user's regional rank, comparison to last month, comparison to quota) completes in 155 nanoseconds with an in-process cache versus 1.5 milliseconds with Redis. At 500 simultaneous users refreshing every 10 seconds, the in-process approach handles the read load without any network overhead.
Competitive Programming
Online judges and competitive programming platforms maintain real-time standings during contests. Scores update when a submission is accepted (adding the problem's point value to the contestant's total score). Tie-breaking is based on submission time -- the contestant who achieved the same score earlier ranks higher. The leaderboard must display accurate rankings for all contestants at all times during the contest.
The tie-breaking requirement makes the sorted key more complex. Instead of (score, member_id), the key becomes (score, -timestamp, member_id). The negative timestamp ensures that earlier submissions rank higher at the same score. This compound key works identically in both Redis sorted sets and in-process B-trees. The in-process version simply uses a larger key in the sorted structure, with negligible performance impact (an extra 8 bytes for the timestamp in each comparison).
When to Stay with Redis Sorted Sets
If your leaderboard has fewer than 10,000 members and fewer than 1,000 updates per second, Redis sorted sets are the simpler and correct choice. The network overhead is not a bottleneck at this scale. The operational simplicity of a single Redis instance outweighs the performance advantage of in-process caching. Switch to the in-process pattern when the network overhead becomes measurable in your latency budget -- typically above 10,000 updates per second or when sub-millisecond update latency is a hard requirement.
Memory Footprint
An in-process leaderboard is surprisingly compact. Each entry requires: 8 bytes for the member ID (a 64-bit integer), 8 bytes for the score (a 64-bit float), 8 bytes for the timestamp (optional, for tie-breaking), approximately 40 bytes for the B-tree node overhead, and approximately 56 bytes for the hash map entries (member-to-score and member-to-rank). That is roughly 120 bytes per entry.
A leaderboard with 1 million members occupies approximately 120 MB of memory. A leaderboard with 10 million members occupies approximately 1.2 GB. For comparison, a Redis sorted set with 1 million members uses approximately 85 MB of Redis memory -- but the Redis process consumes additional memory for connection buffers, replication buffers, and overhead, and the data must be serialized and deserialized on every access.
| Leaderboard Size | In-Process Memory | Redis Memory | Update Latency (In-Process) | Update Latency (Redis) |
|---|---|---|---|---|
| 10,000 members | 1.2 MB | 0.9 MB | 420 ns | 300 us |
| 100,000 members | 12 MB | 8.5 MB | 490 ns | 300 us |
| 1,000,000 members | 120 MB | 85 MB | 548 ns | 310 us |
| 10,000,000 members | 1.2 GB | 850 MB | 630 ns | 350 us |
The in-process approach uses approximately 40% more memory than Redis for the same data set because of the dual indexing (hash map + B-tree + rank cache). But the in-process memory is application-managed -- it shares the process's heap, which is already allocated and warm. Redis memory is a separate process with its own allocation, fragmentation, and overhead. The 40% memory increase in the in-process approach is typically offset by eliminating the Redis process entirely (for applications that only use Redis for leaderboard operations).
Building It: The Complete Architecture
The complete real-time leaderboard architecture has four components: the local L1 leaderboard (in-process, sub-microsecond), the sync service (periodic batch push/pull), the global L2 store (Redis or database, cross-instance consistency), and the persistence layer (database, for durability across restarts).
# Architecture flow:
# Game event arrives
# -> L1 leaderboard update (548ns)
# -> Buffered for sync
# Client requests rank
# -> L1 leaderboard read (31ns)
# -> Return immediately
# Every 2 seconds (sync service):
# -> Batch push pending updates to Redis L2
# -> Pull global state from Redis L2
# -> Rebuild L1 from global state
# Every 5 minutes (persistence):
# -> Snapshot global leaderboard to database
# -> Used for recovery after full restart
On startup, the application loads the leaderboard from the persistence layer (database), which takes 1-5 seconds depending on leaderboard size. After the initial load, all reads and writes hit the L1 leaderboard. The sync service runs in a background thread and does not block leaderboard operations. The persistence snapshot runs every 5 minutes and is also non-blocking. If the application crashes and restarts, it recovers the leaderboard from the most recent database snapshot (at most 5 minutes old) and then catches up via the L2 store, which has the updates from other instances since the snapshot.
The Bottom Line
Redis sorted sets are the right choice for leaderboards under 10,000 members at under 1,000 updates per second. Above that threshold, the network overhead of ZADD and ZRANGE becomes the bottleneck. An in-process leaderboard delivers 31-nanosecond rank lookups and 548-nanosecond score updates -- 9,677x faster than Redis ZADD over the network. For global consistency, sync to Redis every 1-5 seconds. Each component does what it is good at: in-process for speed, Redis for consistency, database for durability.
31ns leaderboard lookups. 548ns score updates. Sub-microsecond rankings at any scale.
brew install cachee LLM Caching: Cut API Costs 60%