Real-Time Leaderboard Without Redis ZSET

April 27, 2026 | 14 min read | Engineering

Redis sorted sets are the default answer to "how do I build a leaderboard." ZADD to update a score. ZRANGE to get the top N. ZRANK to get a player's position. ZRANGEBYSCORE to get players within a score range. The API is clean, the semantics are intuitive, and the time complexity is O(log N) for updates and O(log N + M) for range queries where M is the number of returned elements. For a leaderboard with 100,000 players, Redis sorted sets work well and are the pragmatic choice.

The problem appears at scale. Not at 100,000 players, but at 1 million. At 10 million. At the update frequency that real-time games, trading platforms, and competitive applications demand. Every ZADD is a network round-trip: 300 microseconds on a local network, 1-3 milliseconds cross-region. Every ZRANGE is another round-trip. A leaderboard that updates 50,000 times per second and serves 200,000 rank queries per second sends 250,000 operations per second over the network to a single-threaded Redis instance. At 300 microseconds per operation, that requires 75 seconds of network time per second of wall-clock time. You are network-bound long before you are compute-bound.

This post describes an alternative architecture that maintains a leaderboard in application memory with 31-nanosecond rank lookups and 548-nanosecond score updates, then periodically syncs to a centralized store for cross-instance consistency. The in-process leaderboard handles the real-time operations. The centralized store handles the global merge. Each component does what it is good at.

31 ns

Rank Lookup

548 ns

Score Update

9,677x

Faster Than Redis ZADD

The Redis Sorted Set: What You Get

Before discussing alternatives, it is worth understanding exactly what Redis sorted sets provide and what they cost. A Redis sorted set is a collection of unique members, each associated with a floating-point score. The members are ordered by score, and Redis provides O(log N) operations for insertion, removal, score update, and rank lookup. The underlying data structure is a combination of a hash table (for O(1) member-to-score lookups) and a skip list (for O(log N) ordered traversal).

For a leaderboard, you use the sorted set like this:

# Update a player's score
ZADD leaderboard 1500 "player:42"     # O(log N)

# Get the top 10 players
ZREVRANGE leaderboard 0 9 WITHSCORES  # O(log N + 10)

# Get a player's rank (0-indexed from top)
ZREVRANK leaderboard "player:42"      # O(log N)

# Get players ranked 50-59
ZREVRANGE leaderboard 50 59 WITHSCORES

# Get players with scores between 1400 and 1600
ZRANGEBYSCORE leaderboard 1400 1600 WITHSCORES

The Redis sorted set API is excellent for leaderboard operations. The problem is not the API. The problem is that every operation requires a network round-trip to the Redis server. The computational cost of the operation itself (a skip list insertion at O(log N)) is measured in hundreds of nanoseconds. The network cost of delivering the operation to Redis and receiving the response is measured in hundreds of microseconds. The network dominates by a factor of 1000x.

The Real Cost of ZADD

A ZADD operation on a sorted set with 1 million members takes approximately 1-2 microseconds of CPU time in the Redis server. The skip list insertion is O(log N) = O(20) comparisons and pointer updates. This is fast. But the client-side cost of the ZADD operation is approximately 300 microseconds: 50 microseconds for serialization, 100-200 microseconds for the network round-trip (TCP send, wait, TCP receive), and 50 microseconds for deserialization. The server-side CPU cost is 0.5% of the total operation cost. The network is the other 99.5%.

Component	Time	% of Total
Client-side serialization	50 us	16.7%
Network send (TCP)	50 us	16.7%
Redis skip list insert	1.5 us	0.5%
Network receive (TCP)	150 us	50.0%
Client-side deserialization	48.5 us	16.1%
Total ZADD	300 us	100%

This breakdown reveals the optimization opportunity. If you move the sorted data structure into the application process, you eliminate 298.5 microseconds of overhead and keep only the 1.5 microseconds of actual computation. But maintaining a sorted structure in-process requires careful engineering to ensure thread safety, handle the memory layout efficiently, and provide the same O(log N) performance guarantees that Redis's skip list provides.

The In-Process Leaderboard

An in-process leaderboard is a sorted data structure maintained in application memory. The application reads and writes the leaderboard directly, without any network communication. Reads (rank lookup, top-N query) complete in 31 nanoseconds for hash-based lookups or O(log N) nanoseconds for range queries. Writes (score update) complete in 548 nanoseconds, which includes the hash update, the sorted structure rebalancing, and the reverse-index update for rank-by-member queries.

The Data Structure

The in-process leaderboard uses three coordinated data structures. The first is a hash map from member ID to score, providing O(1) score lookups. The second is a balanced sorted structure (a B-tree or skip list) ordered by score, providing O(log N) range queries and rank computations. The third is a reverse index from member ID to position in the sorted structure, providing O(1) rank lookups after the initial insertion.

struct Leaderboard {
    // O(1) member -> score lookup
    scores: HashMap<MemberId, Score>,

    // O(log N) sorted traversal, range queries
    sorted: BTreeMap<(Score, MemberId), ()>,

    // O(1) member -> rank (updated on modification)
    rank_cache: HashMap<MemberId, usize>,
}

impl Leaderboard {
    fn update_score(&mut self, member: MemberId, score: Score) {
        // Remove old entry if exists
        if let Some(old_score) = self.scores.get(&member) {
            self.sorted.remove(&(*old_score, member));
        }

        // Insert new entry
        self.scores.insert(member, score);
        self.sorted.insert((score, member), ());

        // Rank cache invalidated, rebuilt lazily
        self.rank_cache.clear();
    }

    fn get_rank(&self, member: &MemberId) -> Option<usize> {
        let score = self.scores.get(member)?;
        // Count entries with higher score
        Some(self.sorted.range((score, member)..).count())
    }

    fn top_n(&self, n: usize) -> Vec<(MemberId, Score)> {
        self.sorted.iter().rev().take(n)
            .map(|((score, member), _)| (*member, *score))
            .collect()
    }
}

The compound key (Score, MemberId) in the B-tree ensures correct ordering: members are sorted by score (descending for rankings), with ties broken by member ID. This is the same semantics as Redis's ZREVRANGE. The hash map provides O(1) lookups for "what is player X's score?" without traversing the sorted structure.

Thread Safety

In a multi-threaded application, the leaderboard must handle concurrent reads and writes safely. There are three approaches, each with different performance trade-offs.

Read-write lock (RwLock): Multiple readers can access the leaderboard simultaneously, but writers require exclusive access. This is the simplest approach and works well when reads outnumber writes by 10:1 or more. Read contention is zero. Write contention is proportional to the write rate.

Sharded leaderboard: Partition the leaderboard into N shards by hashing the member ID. Each shard has its own lock. Writes to different shards do not contend with each other. This reduces write contention by a factor of N. The cost is that cross-shard queries (top-N across all shards) require reading all shards and merging the results.

Lock-free concurrent structure: Use a concurrent skip list (like crossbeam-skiplist in Rust) that supports lock-free reads and fine-grained locking for writes. This provides the best performance under high contention but is the most complex to implement correctly.

For most leaderboard use cases, a read-write lock on a single structure is sufficient. Leaderboards are read-heavy: the typical ratio is 10-100 reads per write. The RwLock allows all those reads to proceed concurrently. Writes take exclusive access for approximately 548 nanoseconds -- the time to update the hash map and rebalance the sorted structure. At 50,000 writes per second, the lock is held for writes approximately 27 milliseconds per second (548ns * 50,000), leaving 973 milliseconds per second for concurrent reads.

The L1/L2 Architecture for Global Leaderboards

An in-process leaderboard is local to one application instance. If your application runs on 10 servers, each server has its own leaderboard with its own view of the rankings. For leaderboards that must be globally consistent -- all users see the same rankings regardless of which server handles their request -- you need a mechanism to merge local leaderboards into a global view.

The L1/L2 architecture handles this with periodic synchronization. Each application instance maintains a local L1 leaderboard that handles real-time reads and writes at sub-microsecond latency. Periodically (every 1-5 seconds), each instance publishes its local updates to a centralized L2 store (Redis sorted set, a database table, or a dedicated merge service). The L2 store merges updates from all instances and produces the global leaderboard. Each instance pulls the global leaderboard from L2 and refreshes its local L1.

class GlobalLeaderboard:
    def __init__(self):
        self.local = InProcessLeaderboard()  # L1: 31ns reads
        self.redis = Redis()                  # L2: global merge
        self.pending_updates = []
        self.sync_interval = 2.0              # seconds

    def update_score(self, member, score):
        # Immediate local update (548ns)
        self.local.update(member, score)
        # Buffer for batch sync
        self.pending_updates.append((member, score))

    def get_rank(self, member):
        # Immediate local read (31ns)
        return self.local.get_rank(member)

    def get_top(self, n):
        # Immediate local read (O(n) ns)
        return self.local.top_n(n)

    async def sync_loop(self):
        while True:
            await asyncio.sleep(self.sync_interval)

            # Push local updates to Redis (batch)
            if self.pending_updates:
                pipe = self.redis.pipeline()
                for member, score in self.pending_updates:
                    pipe.zadd("global_leaderboard", {member: score})
                pipe.execute()  # Single round-trip for all updates
                self.pending_updates.clear()

            # Pull global state from Redis
            global_top = self.redis.zrevrange(
                "global_leaderboard", 0, -1, withscores=True
            )
            self.local.rebuild(global_top)

The sync interval determines the consistency window. A 2-second sync interval means that rankings may be up to 2 seconds stale. For most leaderboard use cases -- gaming, sales dashboards, competitive programming -- a 2-second delay is imperceptible. Users cannot distinguish between a leaderboard that updates every 100 milliseconds and one that updates every 2 seconds. The visual refresh rate of the leaderboard UI is the bottleneck, not the data freshness.

The Merge Strategy

When multiple instances update the same member's score between sync intervals, the merge must resolve conflicts. The simplest strategy is "last write wins" -- the most recent score for each member takes precedence. This works when scores are absolute (the player's current rating) rather than relative (the player's score increment since the last sync). For relative scores (add 10 points for a kill), use atomic increment operations on the L2 store to ensure all instances' contributions are summed correctly.

For absolute score leaderboards (rating systems like Elo or Glicko), the architecture is straightforward. Each instance computes the player's new rating after a match and writes it to the local leaderboard. During sync, the latest rating is pushed to L2. All instances converge on the same rating within one sync interval. There is no merge conflict because the rating computation is deterministic -- two instances computing the same player's rating after the same match will produce the same result.

Latency Comparison

The following table compares the latency of common leaderboard operations across three architectures: Redis sorted sets over the network, Redis sorted sets via a local sidecar, and the in-process L1 leaderboard.

Operation	Redis (Network)	Redis (Sidecar)	In-Process L1
ZADD / Score update	300 us	30 us	548 ns
ZREVRANK / Rank lookup	280 us	28 us	31 ns
ZREVRANGE top 10	320 us	32 us	85 ns
ZREVRANGE top 100	450 us	45 us	310 ns
ZRANGEBYSCORE (1K results)	2,100 us	210 us	4,200 ns
ZCARD / Member count	250 us	25 us	12 ns

The in-process leaderboard is 547x faster than Redis for score updates and 9,032x faster for rank lookups. Even compared to a Redis sidecar (which eliminates the pod network overhead), the in-process leaderboard is 54x faster for score updates and 903x faster for rank lookups. The difference is the elimination of serialization and IPC overhead. The in-process leaderboard reads directly from memory. There is no protocol parsing, no command dispatch, no response formatting.

Use Cases

Gaming Leaderboards

Gaming leaderboards are the canonical use case. A multiplayer game with 100,000 concurrent players generates 10,000-50,000 score updates per second (kills, deaths, objectives, match completions). Each player checks their rank 1-5 times per minute. The game client displays the top 10 or top 50 players on a scoreboard that refreshes every 2-5 seconds.

At 50,000 updates per second, a Redis sorted set on the network handles the write load but adds 300 microseconds of latency per update. If the game server processes score updates in the hot path (between receiving a game event and sending the response to the client), that 300 microseconds per ZADD delays every game event response. With an in-process leaderboard, the update takes 548 nanoseconds -- invisible in a game tick that typically has a 16-millisecond budget (60 FPS).

The leaderboard is synchronized to Redis every 2 seconds for cross-server consistency. Players on different game servers see a consistent global leaderboard within 2 seconds of any score change. For competitive games that require tighter consistency (esports finals, tournament rankings), the sync interval can be reduced to 200 milliseconds at the cost of higher L2 write traffic.

Trading P&L Rankings

Trading floors display P&L (profit and loss) rankings in real time. Each trader's P&L updates on every trade execution, which can happen hundreds of times per second during market hours. The leaderboard shows the top performers, the bottom performers, and each trader's rank relative to their desk or the entire firm.

P&L leaderboards have two characteristics that make them especially suited for in-process caching. First, the update frequency is extremely high (10,000+ updates per second across all traders) but concentrated on a relatively small number of members (hundreds to thousands of traders, not millions). The in-process leaderboard holds the entire state in a few kilobytes. Second, the read frequency is high but localized: each trader's terminal queries their own rank and the top-10 list every second. These reads do not need to go over the network because each terminal's application instance maintains the local leaderboard.

Sales Dashboards

Sales organizations display real-time rankings of sales representatives by revenue, deals closed, or pipeline value. These leaderboards update less frequently than gaming or trading leaderboards (a few hundred updates per day rather than thousands per second) but are accessed by many users simultaneously (the entire sales team refreshing the dashboard during business hours).

For sales dashboards, the primary benefit of in-process caching is read performance, not write performance. A dashboard page load that queries the leaderboard 5 times (top 10, current user's rank, user's regional rank, comparison to last month, comparison to quota) completes in 155 nanoseconds with an in-process cache versus 1.5 milliseconds with Redis. At 500 simultaneous users refreshing every 10 seconds, the in-process approach handles the read load without any network overhead.

Competitive Programming

Online judges and competitive programming platforms maintain real-time standings during contests. Scores update when a submission is accepted (adding the problem's point value to the contestant's total score). Tie-breaking is based on submission time -- the contestant who achieved the same score earlier ranks higher. The leaderboard must display accurate rankings for all contestants at all times during the contest.

The tie-breaking requirement makes the sorted key more complex. Instead of (score, member_id), the key becomes (score, -timestamp, member_id). The negative timestamp ensures that earlier submissions rank higher at the same score. This compound key works identically in both Redis sorted sets and in-process B-trees. The in-process version simply uses a larger key in the sorted structure, with negligible performance impact (an extra 8 bytes for the timestamp in each comparison).

When to Stay with Redis Sorted Sets

If your leaderboard has fewer than 10,000 members and fewer than 1,000 updates per second, Redis sorted sets are the simpler and correct choice. The network overhead is not a bottleneck at this scale. The operational simplicity of a single Redis instance outweighs the performance advantage of in-process caching. Switch to the in-process pattern when the network overhead becomes measurable in your latency budget -- typically above 10,000 updates per second or when sub-millisecond update latency is a hard requirement.

Memory Footprint

An in-process leaderboard is surprisingly compact. Each entry requires: 8 bytes for the member ID (a 64-bit integer), 8 bytes for the score (a 64-bit float), 8 bytes for the timestamp (optional, for tie-breaking), approximately 40 bytes for the B-tree node overhead, and approximately 56 bytes for the hash map entries (member-to-score and member-to-rank). That is roughly 120 bytes per entry.

A leaderboard with 1 million members occupies approximately 120 MB of memory. A leaderboard with 10 million members occupies approximately 1.2 GB. For comparison, a Redis sorted set with 1 million members uses approximately 85 MB of Redis memory -- but the Redis process consumes additional memory for connection buffers, replication buffers, and overhead, and the data must be serialized and deserialized on every access.

Leaderboard Size	In-Process Memory	Redis Memory	Update Latency (In-Process)	Update Latency (Redis)
10,000 members	1.2 MB	0.9 MB	420 ns	300 us
100,000 members	12 MB	8.5 MB	490 ns	300 us
1,000,000 members	120 MB	85 MB	548 ns	310 us
10,000,000 members	1.2 GB	850 MB	630 ns	350 us

The in-process approach uses approximately 40% more memory than Redis for the same data set because of the dual indexing (hash map + B-tree + rank cache). But the in-process memory is application-managed -- it shares the process's heap, which is already allocated and warm. Redis memory is a separate process with its own allocation, fragmentation, and overhead. The 40% memory increase in the in-process approach is typically offset by eliminating the Redis process entirely (for applications that only use Redis for leaderboard operations).

Building It: The Complete Architecture

The complete real-time leaderboard architecture has four components: the local L1 leaderboard (in-process, sub-microsecond), the sync service (periodic batch push/pull), the global L2 store (Redis or database, cross-instance consistency), and the persistence layer (database, for durability across restarts).

# Architecture flow:

# Game event arrives
#   -> L1 leaderboard update (548ns)
#   -> Buffered for sync

# Client requests rank
#   -> L1 leaderboard read (31ns)
#   -> Return immediately

# Every 2 seconds (sync service):
#   -> Batch push pending updates to Redis L2
#   -> Pull global state from Redis L2
#   -> Rebuild L1 from global state

# Every 5 minutes (persistence):
#   -> Snapshot global leaderboard to database
#   -> Used for recovery after full restart

On startup, the application loads the leaderboard from the persistence layer (database), which takes 1-5 seconds depending on leaderboard size. After the initial load, all reads and writes hit the L1 leaderboard. The sync service runs in a background thread and does not block leaderboard operations. The persistence snapshot runs every 5 minutes and is also non-blocking. If the application crashes and restarts, it recovers the leaderboard from the most recent database snapshot (at most 5 minutes old) and then catches up via the L2 store, which has the updates from other instances since the snapshot.

The Bottom Line

Redis sorted sets are the right choice for leaderboards under 10,000 members at under 1,000 updates per second. Above that threshold, the network overhead of ZADD and ZRANGE becomes the bottleneck. An in-process leaderboard delivers 31-nanosecond rank lookups and 548-nanosecond score updates -- 9,677x faster than Redis ZADD over the network. For global consistency, sync to Redis every 1-5 seconds. Each component does what it is good at: in-process for speed, Redis for consistency, database for durability.

31ns leaderboard lookups. 548ns score updates. Sub-microsecond rankings at any scale.

brew install cachee LLM Caching: Cut API Costs 60%