Skip to main content
Why CacheeHow It Works
All Verticals5G TelecomAd TechAI InfrastructureFraud DetectionGamingTrading
PricingDocsBlogSchedule DemoLog InStart Free Trial
← Back to Blog
AI Infrastructure

How Stripe Could Save $10M/Year on Fraud ML Feature Lookups

Every fraud decision at Stripe, PayPal, Block, Mastercard, and Visa starts with the same bottleneck: fetching features. The ML model itself runs in 1–2 milliseconds. But before that model can score a transaction, it needs 5–10 real-time feature lookups — user embeddings, merchant embeddings, device fingerprints, velocity aggregates, and graph features — each taking 1–5ms over the network. Feature fetching consumes 95% of the total inference latency. At billions of transactions per year, that overhead translates into tens of millions of dollars in wasted compute and, worse, hundreds of milliseconds of added latency that degrades the checkout experience.

The Anatomy of a Fraud Decision

When a transaction hits Stripe’s fraud detection pipeline, the ML model does not operate on the raw transaction data alone. It requires a rich context vector assembled from multiple feature stores. A typical fraud scoring request needs: a user embedding (128–512 dimensions representing transaction history, behavioral patterns, and risk profile), a merchant embedding (category risk, chargeback rate, average transaction value), a device fingerprint embedding (browser fingerprint, IP geolocation, device reputation), velocity features (transaction count in last 1/5/15/60 minutes, amount velocity, geographic velocity), and graph features (connections to known fraud rings, shared payment instruments, network clustering coefficients).

Each of these features lives in a different data store. The user embedding sits in a vector database or feature store like Feast, Tecton, or an internal system. The velocity features require real-time aggregation over sliding windows. The graph features demand traversal of a relationship database. Every single feature lookup is a network round-trip. At Stripe’s scale — processing hundreds of millions of transactions per day — every additional millisecond of feature fetch latency compounds into staggering infrastructure costs.

Typical fraud scoring latency breakdown (no cache)

User Embedding
3.2ms
Merchant Embedding
2.4ms
Device Fingerprint
1.8ms
Velocity Features
2.7ms
Graph Features
4.1ms
ML Inference
1.5ms

Total
15.7ms

Look at those numbers. The model itself — the actual fraud scoring computation — takes 1.5ms. Feature fetching takes 14.2ms. That is 90% of the total latency spent just assembling the input. The model is not the bottleneck. The data pipeline feeding it is.

The $10M Math

Stripe processes roughly 10 billion transactions per year. Each transaction requires fraud scoring, which means 10 billion feature-fetch cycles. At 10 feature lookups per transaction and an average network round-trip of 2ms per lookup, that is 200 billion milliseconds — or 200 million seconds — of compute time spent purely on feature fetching. At typical cloud compute costs of $0.05 per vCPU-second, the direct compute cost of feature lookups alone approaches $10 million per year. This does not account for the infrastructure cost of the feature stores themselves, the network bandwidth, or the opportunity cost of slower checkout times increasing cart abandonment.

PayPal, Block (Cash App), Mastercard, and Visa face identical economics. Mastercard’s Decision Intelligence product scores over 125 billion transactions annually. Visa’s Advanced Authorization processes 65,000 transactions per second at peak. Every one of these systems has the same architectural bottleneck: the model is fast, but the feature pipeline is slow.

95% Latency from features
1.5ms Model inference
14.2ms Feature fetching
$10M+ Annual compute waste

L1 Caching Eliminates the Bottleneck

The key insight is that fraud features exhibit extreme temporal locality. A user making three purchases in an hour will trigger three fraud checks, and all three will need the same user embedding. A popular merchant like Amazon or Walmart will have its merchant embedding requested thousands of times per second. Device fingerprints repeat across sessions. The hot set of features — the subset that serves 95% of lookups — is surprisingly small relative to the total feature corpus.

An L1 in-process cache exploits this locality. Instead of making a network round-trip to the feature store for every lookup, the fraud scoring service maintains a local cache of hot features in its own process memory. A cache hit returns the feature in 1.5 microseconds — not 1.5 milliseconds, microseconds. That is a 1,333x speedup over a typical 2ms network lookup. At 10 features per transaction, L1-cached scoring completes feature assembly in 15 microseconds total instead of 14.2 milliseconds. The feature store becomes the L2 fallback for cold features.

With L1 feature caching (95% hit rate)

User Embedding (L1)
0.0015ms
Merchant Embedding (L1)
0.0015ms
Device FP (L1)
0.0015ms
Velocity (L1)
0.0015ms
Graph Features (L1)
0.0015ms
ML Inference
1.5ms

Total
1.5075ms

Total fraud scoring latency drops from 15.7ms to 1.5ms. The feature fetch time becomes invisible. The ML model is now the bottleneck again — which is exactly where you want it. Your data scientists can focus on model accuracy instead of fighting infrastructure latency.

Pre-Warming by Popularity

The architecture for fraud feature caching follows a tiered model: L1 (in-process) holds the hottest features, L2 (feature store) holds the full corpus. The critical engineering decision is the pre-warming strategy. Cachee’s predictive warming layer learns access patterns and pre-loads features before they are requested. For fraud scoring, this means pre-warming merchant embeddings for the top 10,000 merchants by transaction volume (which cover 80%+ of all transactions), pre-loading user embeddings for active users based on time-of-day patterns, and maintaining device fingerprint caches that refresh on session boundaries.

Pre-warming math: The top 10,000 merchants by volume cover 80%+ of all Stripe transactions. At 512 dimensions × 4 bytes per float, each embedding is 2KB. 10,000 merchant embeddings = 20MB in L1. The top 1M active users add another 2GB. Total L1 footprint under 3GB — trivial on modern servers — covering 95%+ of all feature lookups.

Velocity features require special handling because they change with every transaction. But the update pattern is predictable: a user’s 1-minute velocity counter increments by 1 per transaction. Rather than re-fetching from the feature store, the L1 cache can apply the delta locally and write-back asynchronously. This eliminates the most latency-sensitive feature lookup entirely.

The Broader Impact on Fraud Detection

Faster feature assembly does not just save money — it improves fraud detection accuracy. When feature lookups take 15ms, engineering teams are forced to limit the number of features per model to stay within latency budgets. At 1.5ms total, you can feed the model 50 features instead of 10. More features mean better model accuracy, lower false positive rates, and fewer legitimate transactions incorrectly declined. Visa reports that each 1% improvement in false positive rates saves merchants $1.2 billion annually in lost sales. The latency reduction enables accuracy improvements that dwarf the direct compute savings.

For companies building real-time fraud detection systems, the feature lookup bottleneck is not a theoretical concern — it is the single largest drag on both latency and cost. L1 feature caching eliminates it. The model becomes the bottleneck. The feature pipeline becomes invisible. And the $10M+ in annual compute waste becomes available for actual improvements to fraud detection accuracy.

Related Reading

Also Read

Stop Losing Millions to Feature Fetch Latency.

L1 feature caching delivers 1.5µs lookups — 1,333x faster than network round-trips. See the impact on your fraud pipeline.

Start Free Trial Schedule Demo