Stripe Radar evaluates fraud risk on every transaction that flows through Stripe’s payment infrastructure. Each evaluation checks multiple signals: user embeddings, merchant embeddings, device fingerprints, velocity features, geographic features, and graph features. But here is the constraint that most people outside Stripe do not appreciate: Radar is currently limited to checking 3–5 signals per transaction because each feature fetch from the feature store takes 1–5ms, and the authorization window is approximately 100ms. L1 caching at 1.5µs per feature changes this from 3–5 signals to 30+ signals in 45 microseconds. That is not just faster — it is fundamentally better fraud detection.
The Signal Ceiling Problem
Stripe’s fraud detection team has some of the best ML researchers in the industry. Their models are sophisticated. Their training data is massive. But the deployed model is constrained by the real-time feature pipeline, not by model quality. When a transaction hits Radar, the system has roughly 100ms to return a fraud score before the authorization timeout. The ML model inference itself takes 1–2ms. That leaves 98ms for feature collection. At 2–5ms per feature fetch (network round-trip to a feature store, serialization, deserialization), Radar can realistically check 5 signals before hitting the latency wall. In practice, engineering teams budget conservatively and limit it to 3–5 critical signals to maintain p99 latency targets.
This creates an artificial ceiling on fraud detection quality. Stripe’s data scientists know that adding graph features, second-order velocity aggregates, cross-merchant pattern features, and behavioral biometrics would improve the model. The offline evaluation shows it. But the real-time pipeline cannot support it. Every additional signal adds 2–5ms of latency. The model is bottlenecked by the speed of data, not the quality of the algorithm.
The Architecture: 5 Signals in 15ms vs 30 Signals in 45µs
Today, a Stripe Radar evaluation looks like this: the transaction arrives, and the fraud scoring service makes parallel requests to the feature store for the user’s embedding, the merchant’s risk profile, the device fingerprint, a velocity counter, and a geographic risk score. Even with parallel fetches, the slowest request determines the total latency. At p99, a single feature fetch can take 5–8ms due to tail latency in the feature store. Five parallel fetches with a p99 of 5ms each means the feature assembly step has a p99 of approximately 5–8ms. Sounds manageable — until you realize this leaves zero room for additional signals.
With an L1 in-process cache, each feature lookup completes in 1.5 microseconds. Thirty features in serial take 45 microseconds. Thirty features. Not five. The latency budget consumed by feature assembly drops from 5–8ms to 0.045ms. That frees up 99.5% of the authorization window for additional model complexity, ensemble methods, or simply as a reliability margin.
Current: 5 signals, network feature store
With L1: 30 signals, in-process cache
More Signals = Better Detection = Fewer False Positives
The relationship between signal count and fraud detection quality is well-established in the literature and in Stripe’s own published research. Each additional signal provides an incremental improvement in the model’s ability to distinguish legitimate transactions from fraudulent ones. The first 5 signals capture the most obvious fraud patterns. Signals 6–15 catch the sophisticated fraud that slips through basic checks. Signals 16–30 are where the real value emerges: cross-merchant velocity, graph-based features (shared devices, linked accounts, transaction chains), behavioral biometrics, and second-order risk aggregates.
| Signal Count | Feature Latency | False Positive Rate | Improvement |
|---|---|---|---|
| 5 signals (current) | 3.2ms (parallel) | ~2.5% | Baseline |
| 15 signals (L1) | 0.0225ms | ~1.8% | -28% FP rate |
| 30 signals (L1) | 0.045ms | ~1.2% | -52% FP rate |
False positives are the silent killer of payment revenue. Every legitimate transaction declined by Radar is lost revenue for the merchant and a damaged customer relationship. At Stripe’s volume — hundreds of millions of transactions per month — a false positive rate reduction from 2.5% to 1.2% means millions of additional legitimate transactions approved. At an average transaction value of $65, a 1.3 percentage point reduction in false positives across Stripe’s transaction volume translates to hundreds of millions of dollars in additional approved revenue for merchants annually.
What 30 Signals Looks Like
With L1 caching, Stripe Radar could evaluate all of the following on every single transaction in under 50 microseconds: user embedding (transaction history encoding), merchant risk embedding, device fingerprint hash, 1-minute transaction velocity, 5-minute amount velocity, 15-minute geographic velocity, 1-hour unique merchant count, IP reputation score, BIN risk classification, shipping/billing address mismatch, email domain risk, phone number risk, cross-merchant velocity (same card, different merchants), graph clustering coefficient, shared-device linkage count, transaction amount deviation from user mean, time-of-day anomaly score, merchant category risk for user, payment method age, account age risk score, prior chargeback count, refund velocity, 3D Secure history, AVS match quality, historical decline rate, session duration anomaly, click-to-purchase velocity, cart composition risk, promotion abuse score, and cross-network fraud ring membership.
Every one of these signals exists in Stripe’s data. The models that use them perform better in offline evaluation. The only reason they are not all in the real-time pipeline is latency. L1 caching removes that constraint. The feature store becomes an L2 fallback for cold features, while the hot features — the ones that matter for 95%+ of transactions — live in-process at 1.5 microseconds per lookup.
The Infrastructure Savings
Beyond detection quality, there is a direct infrastructure cost reduction. When feature assembly takes 5ms at p99, Stripe needs enough fraud scoring servers to handle peak transaction volume with headroom for that latency. When feature assembly takes 0.045ms, each server can process transactions faster, which means fewer servers for the same throughput. At Stripe’s scale, reducing fraud scoring latency by 3–5ms per transaction translates into millions of dollars in annual infrastructure savings — on top of the merchant revenue recovered from fewer false positives.
For Stripe, the question is not whether to speed up feature lookups. The question is how much better Radar could be if feature latency were not the constraint. With L1 caching, the answer is: 10x more signals, 52% fewer false positives, and sub-2ms total fraud scoring. The model becomes the bottleneck again, which is exactly where it should be.
Related Reading
Also Read
Check 30 Fraud Signals in 45 Microseconds.
L1 feature caching at 1.5µs per lookup unlocks 10x more fraud signals per transaction. See the impact on your detection rates.
Start Free Trial Schedule Demo