← Back to Blog
Engineering

Adaptive-Density Solana Wallet PnL: The Warm-Path Story With Cachee

We measured the Helius Enhanced Parse endpoint. Server-side processing: 589 ms. Client-observed wall clock: 1,050 ms. The difference — 461 ms, 44% of every API call — is Cloudflare overhead: TLS termination, WAF inspection, bot management, L7 proxying. Your customers pay it on every call, even when the answer hasn't changed since the last query.

Cachee sits client-side. Cloudflare stays exactly where it is — protecting the origin. Nothing changes server-side. But repeat queries never touch the network:

First query: 1,050 ms through the full Helius + Cloudflare stack.
Every repeat: 0.12 ms from Cachee's L0 cache. Zero network. Zero CDN tax.
Origin load reduction: 60% of traffic never reaches Helius.

This is the architecture:

App → Cachee L0 (30 ns) → hit? done.
                         → miss? → Cloudflare → Helius origin → cache the result

Cloudflare protects the server. Cachee protects the client from paying for the same answer twice. Below is the algorithm that produces the cold-path answer, the warm-path answer, and the measurements on both.

The cold-path algorithm

Reconstructing a Solana wallet's full transaction history is a data-acquisition problem more than an arithmetic one. Solana RPC nodes expose transaction history through paginated signature queries that walk backward through time, one page at a time. Helius's getTransactionsForAddress (gTFA) method returns up to 1,000 signatures per call with a constructable paginationToken, which means sequential enumeration with parallel parse dispatch — the shape of the pipeline that follows.

The adaptive-density algorithm, in five steps

The right shape for the solver is a pipeline that discovers its own workload before executing it. Here are the five stages:

Step 1 — Lifetime bounds. Every wallet has a first-seen slot and a last-seen slot. A real production implementation queries an indexer for these; a demo can hardcode a lookback window. This defines the total search space the solver has to cover.

Step 2 — Probe density. Fire a small burst of parallel pilot probes at logarithmically-spaced (or golden-ratio-spaced) slot ranges across the lifetime. Each probe is a narrow getTransactionsForAddress call that returns a real count of transactions in that range. Eight probes is enough; the goal isn't to find every transaction, the goal is to build a rough density histogram that tells the next stage where to look harder.

Step 3 — Adaptive window sizing. From the density histogram, compute a target window width that's expected to return roughly N transactions per window (we default to 300). Dense regions get narrow windows, sparse regions get wide windows, and the solver tiles the lifetime into a variable-width fetch plan. This is the step that makes the algorithm honest — a naive fixed-width scheduler wastes calls on empty ranges for sparse wallets and floods on dense ones. Adaptive windows pay for exactly the work that needs doing.

Step 4 — Parallel fetch with bounded concurrency. Fire the fetch windows through a concurrency-capped FuturesUnordered. On a typical wallet this means 8-32 simultaneous gTFA calls. The bound matters — Helius rate-limits aggressive callers, and without a cap the solver would produce 429s. With the cap set right, fetch latency is dominated by the slowest window's RTT, not the sum of all windows.

Step 5 — Streaming PnL with price overlay. As each fetch window's response arrives, stream the transactions through a running PnlSummary::apply() call that looks up SOL/USD price at each transaction's block time (binary search on a pre-cached price history), converts lamport delta to USD at historical price, and accumulates the net position. The key word is streaming — the PnL number is ready the moment the last window's response arrives, not after a final aggregation pass. This hides decode latency behind network latency.

A dedicated sparse-wallet fast path sits in step 2: if the first four probes return fewer than five transactions combined, the solver short-circuits the density phase entirely and fires one wide-window call that covers the whole lifetime. For cold-storage wallets and occasional-use accounts this is roughly ten times faster than running the full density discovery.

Where Cachee enters the picture

The algorithm above is what every solver-on-gTFA would look like. What makes this implementation different is what we do between runs of the same wallet, and that's where Cachee's cache layers change the shape of the problem.

The solver wraps three Cachee engines, each tuned to a different working set:

The density cache is the one that matters. When the solver sees a wallet it has queried before, the first thing it does is check the density cache for a pre-computed density map. On a hit, the probe phase is entirely skipped — steps 1, 2, and 3 above collapse into a single L0 hash lookup that takes on the order of 30 nanoseconds. The solver goes straight to window fetching with a plan it already had, and the cold-path wall clock time of eight round-trip probes disappears.

The warm-path trick is not that transactions are cached. It's that the plan is cached. A cached density map eliminates an entire stage of the pipeline. Density cache hit → skip probe burst → go directly to fetch. This is why repeat queries drop from 37 ms to 7 ms without any change in the RPC call count for the actual data fetch.

Transaction blobs and price samples get cached too, of course — and they help on workloads where wallets share signatures or where two queries hit the same price minute — but the density cache is the single biggest win because it's the only one that removes round trips, not just decode work.

Measurements

All numbers below come from the criterion benchmark harness in solana-pnl-cachee/benches/pnl_bench.rs, run on a dev machine in release mode with a mock RPC configured at 5 ms per-call simulated latency. The mock latency approximates Helius RTT without the actual network, which keeps the measurements deterministic and reproducible. Live Helius numbers are a separate run (see below).

Cold cache (fresh CacheLayer per iteration)

Wallet sizeCold (mean)Throughput
100 txs36.88 ms2,711 txs/sec
500 txs37.04 ms13,500 txs/sec
2000 txs46.70 ms42,822 txs/sec

Cold-cache latency is dominated by mock RPC RTT. The 100-tx and 500-tx wallets both finish in ~37 ms because the critical path is the same: four parallel pilot probes (5 ms), four follow-up probes (5 ms), then one or two rounds of fetch windows (5-15 ms) to drain the actual data. The 2000-tx wallet adds roughly 10 ms because it needs an extra round of fetch windows to drain the larger working set.

These cold numbers are what a naive-but-correct non-Cachee implementation would also spend — they're dominated by network RTT, and Cachee doesn't make RTT go away. What Cachee does is make repeat queries stop paying for the discovery phase.

Warm cache (shared CacheLayer, density cache primed)

Wallet sizeWarm (mean)Speedup vs cold
100 txs6.74 ms5.47×
500 txs7.05 ms5.25×
2000 txs15.28 ms3.06×

Warm runs skip the probe phase entirely — the density cache hits, and the solver jumps straight to fetching windows it already knows it needs. The 100-tx and 500-tx warm numbers both land around 7 ms because the critical path collapses to one round of fetch windows at ~5 ms latency plus a handful of milliseconds for decode, streaming PnL application, and price lookups. The 2000-tx case is slower because it legitimately needs more fetch rounds even on the warm path — but at 15 ms it's still three times faster than its own cold run.

Hot wallet repeat (same wallet queried N times through the same cache)

RepetitionsTotal wall clockPer-query mean
10112.74 ms11.27 ms
100766.09 ms7.66 ms
10007.66 s7.66 ms

This is the number worth staring at. At ten repetitions, the first cold run drags the mean up to 11.27 ms — the cold prime is still visible in the average. At one hundred repetitions, the cold prime has been diluted into the steady state and the per-query mean settles at 7.66 ms. At one thousand repetitions, the per-query mean is exactly 7.66 ms — the steady state is perfectly flat. This is what hot-wallet workloads look like in production: a small set of "interesting" wallets queried over and over by portfolio trackers, tax tools, and trading bots, and every one of those queries after the first lands in Cachee's warm path.

Runnable demo output

$ cargo run --release --example cold_vs_warm_demo
=== Cold-vs-warm cache demo (Cachee Solana PnL solver) ===

Pass 1 (cold)   |   23.04 ms | txs=500 | net_sol=-73.2630 | L0 hits=0 misses=1
Pass 2 (warm)   |    7.36 ms | txs=500 | net_sol=-73.2630 | L0 hits=1 misses=1
Pass 3 (warm+n) |    7.32 ms | txs=500 | net_sol=-73.2630 | L0 hits=2 misses=2

Cache footprint | L0 hits=2 L1 hits=0 misses=2 | overall hit ratio=50.0%
Memory bytes    | 168.2 KiB across tx/price/density caches

Pass 3 is the interesting one. Between passes 2 and 3 the demo runs a completely unrelated wallet through the same CacheLayer. Without CacheeLFU admission control, the interference wallet's 500 transactions would have had an even chance to evict the hot wallet's cached density map. With CacheeLFU admission, the hot wallet's entry has already accumulated enough frequency counts to beat any new candidate in the admission comparison, and pass 3 lands at 7.32 ms — functionally indistinguishable from pass 2.

Live Helius mainnet measurements

Real mainnet numbers against vines1vzrYbzLMRdu58ou5XTby4qAqVRLmqo36NKPTg (3,990 transactions). 10-run median, all levers applied (gzip compression, skip per-tx cache, HTTP/3 toggle, dual HTTP clients, JoinSet parallel parse, simd-json decode).

MetricCold (real Helius)Warm (cached)
Median (10 runs)2,286 ms0.12 ms
Best1,911 ms0.12 ms
Per-tx cost0.573 ms/tx0.00003 ms/tx
Speedup19,050×
Correctness10/10 runs, 3,990/3,990 txs, net_sol=-0.7831

Where the 2,286 ms goes

ComponentTime% totalWhat it is
Helius server-side processing~1,300 ms57%The real computation (measured via helius-total-latency header)
Cloudflare overhead~900 ms39%TLS termination, WAF, bot management, L7 proxy, gzip-only compression
Network RTT~90 ms4%Client to Cloudflare MIA edge

39% of the cold path is CDN overhead, not computation. On warm path, Cachee eliminates all three rows — server processing, CDN overhead, and network — in one L0 hash lookup.

These numbers aren't a simulation. They are real round trips to api-mainnet.helius-rpc.com, real bytes on the wire, real JSON parsed into real TxRecords. The cold path is dominated by Helius RTT for this very active wallet — roughly 1.0-1.6 seconds per page, which tracks for an Enhanced API call that has to assemble pre/post balance deltas server-side for an address with an enormous history. Less-active wallets finish the cold path an order of magnitude faster because the Enhanced API's per-page cost scales with the complexity of the activity on that address.

The warm path is the interesting number. Zero RPC calls. No network. The solver hits the history:{wallet} cache entry, bincode-deserializes ~300 transactions from memory, and streams them through the PnL calculator. Total wall clock: 50 microseconds to 160 microseconds. That's memory bandwidth, not network latency.

The speedup is huge because the baseline is real Helius and the warm path is pure Cachee. Every time this solver is queried for a wallet it has seen in the last ten minutes, the work required to answer is the same work required to read a bincode blob from a hash map — and that's a primitive operation, not a protocol.

What the live numbers demonstrate about CacheeLFU

Between the warm run and the warm-after-interference run, the solver ran a completely unrelated high-activity wallet (Raydium AMM) through the same cache. The cached JUP entry stayed resident: pass 3 landed at 0.37 ms versus the 0.16 ms clean warm pass. That's 2.3x slower than the clean warm run because the JUP entry dropped from L0 to L1 under cross-wallet pressure, but it's still ~9,000x faster than cold. CacheeLFU admission did exactly what admission control is supposed to do: kept the hot thing hot under load from a noisy neighbor.

Reproducing the live numbers

$ cargo run --release --example live_helius_bench -- \
    CapuXNQoDviLvU1PxFiizLgPNQCxrsag1uMeyk6zLVps 3

=== Live Helius benchmark (Cachee Solana PnL solver) ===
Wallet:    CapuXNQoDviLvU1PxFiizLgPNQCxrsag1uMeyk6zLVps
Max pages: 3 (300 txs max)

Pass 1 (cold)
  done       |   3295.89 ms | txs= 300 | rpc_pages~ 3
Pass 2 (warm)
  done       |      0.16 ms | txs= 300 | rpc_pages~ 0
Pass 3 (warm+n)
  done       |      0.37 ms | txs= 300 | rpc_pages~ 0

=== Summary ===
Cold        :   3295.89 ms
Warm        :      0.16 ms
Warm+interf :      0.37 ms
Speedup     :  20476.56x  (cold / warm)

The .env file next to Cargo.toml holds the Helius API key; the binary loads it at startup via dotenvy. Any V-team member with access to the repo can re-run this benchmark against any wallet they want, and the numbers will tell the same story: cold is bounded by Helius RTT, warm is bounded by nothing but memory.

What the warm-path story actually demonstrates

The 5x speedup from cold to warm on a 100-tx wallet is not the headline. The headline is that the speedup exists at all in a problem shape that looks like it shouldn't cache. Transaction history fetch feels like it should be fundamentally network-bound — every byte of history is sitting on a remote RPC, the only way to get it is to round-trip, and there's nothing a local cache can do about network latency.

That intuition is wrong in exactly one place: the density discovery phase isn't actually fetching the data, it's fetching the shape of the data. The shape doesn't change between runs of the same wallet (barring new activity since the last query), and the shape is what the expensive probe-burst phase is computing. Caching the shape is legal and correct, and caching the shape is what makes repeat queries of the same wallet finish in the time it takes to run one round of fetch windows.

This is the kind of optimization you only find when you're willing to look at the pipeline from the cache's perspective. Most solvers treat the cache as a convenience — a place to stash data you already fetched — and never think about caching the plan the solver uses to decide what to fetch. Cachee's integration here is a reminder that plans are data too, and plans are often the cheapest things to cache.

Implementation pointers

The whole codebase is one small Rust crate. Here's a map:

Every file has a module-level doc comment explaining what it does and why. Every public function has its reasoning in the comments, not just its signature. The test suite runs 28 tests that cover sparse wallets, dense wallets, repeat queries, bounded concurrency correctness, and all three cache layers. The entire crate compiles clean with zero warnings.

What this isn't

This isn't a competition entry. Helius is running a weekend contest for Solana developers and the obvious play is to submit this as the solution and see how it ranks. That's not the goal here. The goal is to show Mert (and anyone else at Helius who reads this) what a cache-backed PnL solver looks like when the cache is a first-class primitive in the pipeline, not a convenience wrapper around a hashmap. The numbers are what they are. You can reproduce them with cargo bench --bench pnl_bench. The algorithm is open to scrutiny, and the Cachee integration is a handful of function calls — nothing magical, just deliberate.

The other thing this isn't: a benchmark against live Helius. We'll run that separately with a real API key and report those numbers in a follow-up post. The mock-based numbers in this post are honest about what they measure — simulated 5 ms RTT approximating the shape of a real Helius call — and the algorithmic win (cached density map collapses probe phase into L0 lookup) is independent of whether the RPC is mocked or live. Live RTT will be higher than 5 ms for some fraction of calls, but the cold-vs-warm ratio will be the same or larger, not smaller.

Why we think this matters

Helius is the Solana RPC provider most serious builders default to, and the shape of their workload is increasingly dominated by repeat queries from automated tools. Portfolio trackers poll wallet history on a schedule. Tax tools reconstruct full history once per tax year per user. Trading bots scan hot wallets every few seconds looking for activity. In all three cases, the same wallet gets queried over and over, and every query after the first is work that could have been avoided if the system had a place to remember what it already computed.

The place to remember things is a cache. The shape of the cache that makes this kind of solver fast is a cache that can store plans, not just data — a cache with an L0 hot tier fast enough that cached plan lookups don't themselves become a bottleneck, with admission control smart enough to keep hot plans resident under cross-wallet interference, and with a memory budget small enough to fit next to the solver in a single process. Cachee happens to be all three of those things, which is why we built this.

If you're reading this and you work on a workload that has a discovery phase followed by a fetch phase followed by some kind of streaming aggregation, the pattern in this post probably applies. Find the discovery phase, find the thing that phase computes, and put it in a cache keyed by whatever identifies the workload. You will probably find a 5x speedup waiting for you.

Get the code

The full solver crate is open source. Read the algorithm, run the benchmarks, and plug your own Helius key in when you're ready to see live numbers.

Talk to us Read the docs