19,050x Faster Solana PnL on Helius With Cachee

Mert Mumtaz, CEO of Helius, posted a weekend challenge: compute SOL PnL for any wallet at runtime with the lowest possible latency. No indexing. RPC only. The new getTransactionsForAddress method opens up parallel range queries for the first time. The core question: how do you search a wallet's transaction history efficiently when you don't know how sparse it is?

We built the solver in Rust, wired it to Cachee's L0 cache engine, ran it against real Helius mainnet, and spent the next 12 hours pulling every latency lever we could find. Some worked. Most didn't. Along the way we discovered something more interesting than the contest itself: 44% of every Helius Enhanced Parse call — 461 ms out of 1,050 ms — is Cloudflare infrastructure overhead, not Helius computation. And we can prove it because Helius tells you in a response header.

This is the full story: the algorithm, the optimizations, the failures, the measurements, and the math on what a client-side cache layer changes for Helius customers at scale.

The contest

Mert's rules were specific. Compute SOL PnL at runtime. No pre-built indexes. Only Solana RPC methods: getSignaturesForAddress, getTransaction, and the new getTransactionsForAddress. The judging criterion: lowest average latency across busy, sparse, and periodic wallet types.

The new getTransactionsForAddress method is the key unlock. The old getSignaturesForAddress can only walk backward — you paginate from newest to oldest, one page at a time, one direction. The new method accepts a constructable paginationToken that lets you start from any slot and walk backward from there. In theory, this enables parallel range queries. In practice, it's more nuanced than that.

Discovering gTFA

Before we could build anything, we had to figure out what getTransactionsForAddress actually accepts. Mert's tweet described parallel range queries, but the method is new and undocumented. We probed every parameter combination we could think of against both beta.helius-rpc.com and mainnet.helius-rpc.com.

What we found: the method accepts limit (max 1000), commitment, minContextSlot, and paginationToken. It does not accept startSlot, endSlot, before, until, or any slot-range filter. The paginationToken is a string in the format "slot:transactionIndex" — and critically, you can construct it yourself. Fire a request with paginationToken: "400000000:0" and you get signatures starting from that slot walking backward.

This is the parallelism lever. Not slot-range queries, but constructable cursor positions. The distinction matters because there's no "stop" boundary — each call returns up to 1000 signatures going backward from the token with no way to say "stop at slot X." This makes parallel enumeration fundamentally unsafe, as we learned the hard way.

The algorithm we shipped

After testing and discarding parallel enumeration (more on that below), we settled on a two-stage pipeline:

Stage 1: Sequential gTFA pagination. Fire getTransactionsForAddress with no token to get the newest 1000 signatures. If the page is full (1000 returned), chain to the next page using the returned paginationToken. Repeat until a page comes back with fewer than 1000 signatures, indicating end of history. Safety ceiling: 20 pages, 20,000 transactions max.

Stage 2: Parallel Enhanced Parse dispatch. Each gTFA page, the moment it returns, spawns its parse batches onto the tokio runtime via JoinSet::spawn. Each batch POSTs up to 100 signatures to Helius's Enhanced Parse endpoint (/v0/transactions) which returns fully parsed transactions with nativeTransfers, fee, feePayer, and transactionError — everything needed to compute SOL deltas without a second RPC round-trip.

The two stages overlap. While gTFA page 2 is in flight, page 1's parse batches are already running. By the time the final gTFA page arrives, most earlier pages are already parsed. A shared tokio::sync::Semaphore(32) bounds total in-flight parse calls so we don't trip Helius's rate limiter.

SOL delta computation uses meta.postBalances[i] - meta.preBalances[i] where i is the wallet's index in accountKeys. This is the authoritative formula — it correctly handles every transaction shape including SPL transfers, failed transactions, and compute budget instructions.

For the warm path, we store the complete parsed history in Cachee's L0 cache keyed by history:{wallet}. On repeat query: one hash lookup, bincode deserialize, stream through PnL calculator. 0.12 ms. Zero network.

The results

3,990 transactions on vines1vzrYbzLMRdu58ou5XTby4qAqVRLmqo36NKPTg. 10-run median on real Helius mainnet:

Cold path: 2,286 ms. 0.573 ms per transaction. 44 API calls.
Warm path: 0.12 ms. Zero API calls.
Speedup: 19,050x.
Correctness: 10 out of 10 runs returned exactly 3,990 transactions with net_sol = -0.7831.

The optimization gauntlet

After the baseline was working, we spent the next several hours pulling every lever we could find. We tested 24 distinct optimizations. 10 shipped. 14 failed. Here's what happened with each one.

What worked

gzip/brotli/zstd response compression (-154 ms). We enabled all four compression features in reqwest. The server returns content-encoding: gzip — we confirmed via response headers. Brotli and zstd are not available on the Helius Enhanced Parse endpoint despite Cloudflare supporting them; the API Gateway configuration only enables gzip. Still, gzip alone shaves 154 ms off the median by reducing wire bytes on the 30-50 KB parse responses.

Skip per-transaction cache inserts on cold path (-53 ms). The original solver called cache.put_tx(tx) for every transaction during the cold PnL streaming loop — 3,990 individual cache inserts. But the solver never does per-signature lookups on the warm path; it reads the single history:{wallet} blob. Those 3,990 inserts were cargo-culted from the mock solver and cost 40-200 ms of unnecessary work. Removing them shaved 53 ms off the median and tightened variance.

HTTP/3 via QUIC (-131 ms on best runs). Helius's Cloudflare edge advertises alt-svc: h3=":443". We enabled reqwest's experimental http3 feature (requires --cfg reqwest_unstable) and gated it behind a HELIUS_USE_H3=1 environment variable. Median improved modestly but best-run latency dropped 131 ms. Tail variance increased slightly — HTTP/3's QUIC connection migration adds occasional hiccups. We kept it as an optional toggle.

Dual reqwest::Client separation. We use two independent HTTP clients: one for gTFA (sequential, needs 1-2 connections) and one for Enhanced Parse (parallel, needs 32 connections). When they shared a single client, the sequential gTFA chain would wait for connection pool slots that the 32 concurrent parse calls were holding. Splitting them eliminated a measured 2x regression on enum latency.

JoinSet spawn instead of FuturesUnordered. This was a subtle but important fix. FuturesUnordered does not start futures when you push them — they only execute when you poll the FuturesUnordered via .next().await. We were pushing parse batch futures during the enum loop but not polling them until after the loop, meaning all parse work ran sequentially after enumeration instead of concurrently during it. Switching to tokio::task::JoinSet::spawn starts each parse task immediately on the runtime. Parse drain time dropped from 2,100 ms (all-after-enum) to 1,400 ms (most-done-by-enum-end).

simd-json zero-copy decode. Both gTFA responses and Enhanced Parse responses decode via simd_json::from_slice into typed Rust structs with serde_derive. No intermediate serde_json::Value tree walking on the hot path. Roughly 2x faster than standard serde_json for the 30-50 KB parse responses.

What failed

Parse concurrency 32 to 40 (+184 ms). We hypothesized that bumping the semaphore from 32 to 40 would let a 4,000-tx wallet's 40 parse batches run in a single wave instead of two. The median regressed by 184 ms. Helius's Enhanced Parse backend is non-linear past 32 concurrent requests per client — adding more in-flight contends server-side without improving throughput. 32 is the hard ceiling.

Pre-opening 32 warm connections at startup (+975 ms, 30-second outliers). We modified warmup() to fire 32 parallel empty-body POSTs to the parse endpoint, intending to prime the connection pool so the first real wave of parse batches would hit already-established sockets. Cloudflare's abuse detection flagged the burst and throttled subsequent real requests. One run took 31 seconds. Single warmup call per endpoint is the safe ceiling.

Parallel gTFA enumeration with linear-spaced tokens (lost 944 transactions). We fired 8 parallel gTFA calls with pagination tokens spaced linearly across the wallet's estimated slot range, attempting to parallelize the enumeration phase. Total raw signatures fetched: 8,000. Unique after dedup: 2,044. Expected: 2,988. We lost 944 transactions because the parallel chunks overlapped each other in some regions and left uncovered gaps in others. The backward cursor has no "stop" boundary, so chunks can't guarantee disjoint coverage.

Parallel gTFA enumeration with slot-prediction and gap repair (lost 39 transactions). A more sophisticated attempt: use the head call's density to predict where to place parallel tail chunks, then run a sequential "makeup chain" from the deepest covered slot to fill gaps. Still lost 39 transactions. We tried two different gap repair strategies and both had correctness bugs under non-uniform density. Sequential chain is the only correct enumeration strategy for a backward-only cursor without a stop boundary.

JSON-RPC batch getTransaction (4x faster per-call, killed by rate limiting). This was the most promising failure. We measured batch-of-50 getTransaction calls via standard JSON-RPC at 250 ms median per call — versus 1,050 ms for Enhanced Parse batch-of-100. At 32-way concurrency, 32 parallel curls completed in 489 ms versus 1,873 ms for Enhanced Parse. A 4x improvement. But in the Rust client, it crashed with 429 immediately. The reason: Helius counts each sub-call in a JSON-RPC batch against the rate limit. 16 parallel batches of 50 signatures = 800 equivalent getTransaction calls per second, which blasts past the quota. Enhanced Parse counts each POST as 1 credit regardless of how many signatures are in the batch. This rate-limit accounting difference makes Enhanced Parse the correct choice at any concurrency above 5, even though individual batch latency is 4x worse.

HTTP/2 multiplexing (6x slower). We set http2_prior_knowledge() on the reqwest client to force HTTP/2 and multiplex all 32 concurrent parse calls over a single TCP connection. Measured: 32 parallel HTTP/2 streams took approximately 12 seconds. 32 parallel HTTP/1.1 connections via separate TCP sockets took 2 seconds. Cloudflare's HTTP/2 stream scheduler serializes parallel responses instead of interleaving them. We forced HTTP/1.1 with a wide connection pool.

The 44% discovery

While instrumenting the parse calls, we noticed a helius-total-latency response header that Helius injects before Cloudflare processes the response. It reports server-side processing time in milliseconds. For a 100-signature Enhanced Parse call:

Server-side processing: 589 ms. Client-observed wall clock: 1,050 ms. The 461 ms difference — 44% of every API call — is infrastructure overhead: TLS termination at the Cloudflare edge, WAF and bot management inspection, L7 reverse proxy routing, gzip compression, and the return path through the CDN edge.

For our 3,990-transaction wallet requiring 44 API calls at 32-way parse concurrency, the infrastructure overhead aggregates to approximately 900 ms — 39% of the 2,290 ms cold path.

This is not a Helius bug. It is the expected cost of a CDN-fronted API architecture that provides DDoS protection and global edge routing. But for repeat queries — portfolio trackers polling the same wallets, tax tools rerunning the same history, trading bots scanning the same hot addresses — that 44% overhead is paid on data that hasn't changed since the last query.

What Cachee changes

Cachee sits in the client's process. The first query runs the full Helius pipeline: 2,290 ms, 44 API calls, the whole Cloudflare round trip. Cachee stores the result in its L0 hot tier — a sharded hash map with 30-nanosecond lookups backed by CacheeLFU admission control that protects hot entries from eviction under cross-wallet interference.

Every subsequent query for the same wallet: 0.12 ms. One hash lookup. One bincode deserialize. Zero API calls. Zero credits burned. Zero Cloudflare overhead.

We verified this with a three-pass benchmark: cold query (2,290 ms), warm query (0.12 ms), then a completely unrelated wallet queried through the same cache to test admission pressure, then warm query again (0.15 ms). CacheeLFU kept the original wallet's entry resident. The warm path stays fast even under interference.

The dollar math

Helius credits cost approximately $5 per million across published tiers. One wallet PnL query consumes 44 credits. A portfolio tracker monitoring 50 wallets at 10-second refresh burns 570 million credits per month — requiring Enterprise pricing.

With Cachee at 95% hit rate (measured on stable wallets where transaction history doesn't change between refreshes): 28.5 million credits per month. That fits the Developer plan at $49.

At 200 wallets: 2.28 billion credits per month without cache. 114 million with Cachee. At 500 wallets at 5-second refresh: 11.4 billion without cache. 570 million with Cachee.

The savings range from $2,700 to $54,000 per month depending on scale. Every eliminated credit is one fewer request hitting Helius origin — lower customer bills and lower Helius infrastructure load.

What's left on the table

The 2,286 ms cold path breaks down as: 1,300 ms Helius server processing (57%), 900 ms infrastructure overhead (39%), 90 ms network RTT (4%). The addressable latency is 990 ms.

Five infrastructure changes — none requiring code — could reduce the cold path to approximately 1,296 ms:

Running from us-east-1 instead of a dev machine cuts network RTT from 20 ms per call to 2 ms, saving approximately 790 ms across 44 calls.
Enabling per-request rate limiting on JSON-RPC batch (instead of per-sub-call) would unlock the 4x-faster batch path we measured.
Enabling zstd compression on the API Gateway config would reduce wire bytes 30% over gzip.
Raising the 100-signature batch cap to 200 would halve the number of parse batches.
A dedicated parse node at the measured 178 ms versus 264 ms baseline would save approximately 200 ms.

All five are conversations with Helius, not code changes. The client-side algorithm is at its floor.

The code

The full solver is open source at github.com/H33ai-postquantum/solana-pnl-cachee. Clone, add your Helius API key, run cargo run --release --example live_helius_bench -- <wallet-address>. The benchmark prints cold path, warm path, cache hit stats, and correctness verification. 33 unit tests pass. The code is Rust, the cache engine is Cachee, and every optimization described in this post is in the shipped binary.

19,050x. From 2,286 ms to 0.12 ms. In-process.

Cachee's Rust-native L0 engine turns repeat Solana PnL queries into sub-millisecond hash lookups. Zero API calls. Zero credits burned. Wire it in and stop paying for the same data twice.

Start Free Trial View on GitHub