There are 21 billion connected devices generating telemetry right now. Alongside them, the $82 billion A2P messaging industry pushes billions of SMS and RCS messages per day. These two industries look different on the surface, but underneath they share an identical bottleneck: state lookups at scale. Every device shadow read, every opt-out check, every carrier route decision is a cache hit or miss that determines whether the system operates in real time or falls behind.
Cachee is an AI-powered L1 caching layer that resolves these state lookups in 1.5 microseconds — roughly 2,000x faster than a typical Redis or cloud cache round-trip. For IoT platforms managing millions of device twins and messaging providers routing billions of daily messages, that speed difference is the gap between real-time decisioning and stale-data liability.
The State Lookup Tax in IoT
Every IoT platform maintains device shadows — server-side representations of each device's last known state. Firmware version, connectivity status, sensor values, desired configuration, last heartbeat timestamp. When a device sends telemetry or a control plane issues a command, the platform reads that shadow to decide what to do next. At 1 million devices averaging 10 reads per second, that is 10 million state lookups per second. At the 3–5ms latency of a typical Redis or DynamoDB read, those lookups consume 30,000–50,000 CPU-seconds per second. The platform is not processing data. It is waiting for data.
The consequences are architectural. Engineers add read replicas. They shard by device ID. They build local caches with hand-rolled invalidation logic that breaks under partition. Each layer of workaround adds operational complexity and still does not solve the core problem: cloud cache reads are milliseconds away, and real-time IoT decisioning needs microseconds.
The Compliance Problem in A2P Messaging
The A2P messaging side has its own version of the same problem, but with a legal consequence attached. Every outbound SMS requires between 6 and 12 state lookups before it can be sent: opt-out status, DNC registry check, campaign compliance verification, carrier routing decision, throughput throttle state, sender reputation score, content filter evaluation, and delivery confirmation write-back.
At a throughput of 1,000 messages per second on a single server, those 6–12 lookups per message at 1–3ms each consume the entire processing window. The server spends more time reading compliance state than actually sending messages. Worse, stale opt-out data is not just a performance issue. Under TCPA, sending a single message to a number that has opted out carries a penalty of $500 to $1,500 per violation. At messaging scale, even a 50ms delay in propagating an opt-out across your cache layer creates real legal exposure.
How Cachee Solves Both
Cachee deploys as an L1 caching tier between your application and your existing cache or database. It intercepts read requests and serves hot data from in-process memory in 1.5 microseconds. Cold or evicted keys cascade transparently to your existing Redis, DynamoDB, or Memcached backend. There is no data model change and no application rewrite.
For IoT: Predictive Device Shadow Warming
Cachee's AI prediction engine learns device access patterns — which devices report telemetry on regular intervals, which are bursty, and which cluster into correlated groups (all sensors on the same factory floor, all vehicles on the same route). It pre-warms device shadows into L1 before the read request arrives. The result is a 99.05% L1 hit rate that holds even during fleet-wide firmware rollouts and mass reconnection events that would normally crush a traditional cache tier.
For platforms running at scale, this changes the infrastructure equation. A workload that previously required 50 Redis read replicas to stay under 5ms p99 can consolidate to a fraction of that, because 99% of reads never leave the application process. The 80/20 rule applies aggressively to IoT — 20% of devices generate 80% of the events. Cachee keeps that hot 20% permanently in L1 and predicts when cold devices are about to wake up.
For Messaging: Real-Time Compliance State
In the messaging pipeline, Cachee eliminates the compliance bottleneck by serving opt-out status, DNC registry lookups, and carrier routing state in 1.5µs per check. A message that previously required 24ms of state read overhead (8 checks at 3ms each) now completes all 8 checks in 12 microseconds. Per-server throughput jumps from 100 MPS to over 3,100 MPS — a 31x improvement on the same hardware.
The AI pre-warming layer is particularly valuable for opt-out compliance. When a consumer texts STOP, the opt-out record is written to the database and simultaneously pushed into Cachee's L1 tier. Every subsequent routing decision for that phone number sees the opt-out status in 1.5µs with zero propagation lag. There is no window where a stale cache could cause a violation.
Edge-to-Cloud: Where It Gets Interesting
IoT architectures increasingly split processing between edge gateways and cloud backends. A factory floor gateway handles local control loops while syncing to the cloud for analytics and fleet management. The challenge is state consistency: the edge needs device state at microsecond latency for real-time control, while the cloud needs aggregated state for dashboard and alerting.
Cachee runs at both tiers. At the edge, it serves local device shadows from L1 memory for sub-microsecond control loop reads. In the cloud, it absorbs the aggregation read load that would otherwise hammer your database. The AI prediction engine at each tier learns its own local access pattern — the edge layer knows which devices on its floor are about to report, and the cloud layer knows which dashboards are being watched and pre-warms their underlying queries.
This dual-tier architecture enables what was previously impractical: true closed-loop control where the edge reads state, runs inference, and issues commands within a single millisecond — while the cloud maintains full visibility without the read load that traditionally comes with it.
The Infrastructure Savings
The cost math is direct. For a messaging platform sending 1 billion messages per day with 8 state lookups per message at 3ms each, the current infrastructure requires enough servers to absorb 24ms of read overhead per message. At 100 MPS per server, that is 115 servers running 24/7 just for compliance state checks. With Cachee reducing the overhead to 12µs per message, the same throughput runs on 4 servers. That is a 90% infrastructure reduction — before accounting for the Redis read replicas that can be consolidated once 99% of reads serve from L1.
For IoT platforms, the savings compound with device count. Every device added to the fleet adds 10+ reads per second to the cache layer. At Redis latency, scaling requires proportionally more cache nodes. With Cachee absorbing hot reads in-process, the cache tier scales sub-linearly. Add devices, and the L1 hit rate stays at 99% because the AI prediction engine adapts to the new access patterns. The cache infrastructure that used to grow with the fleet now grows with the cold fraction of the fleet — which, thanks to the 80/20 rule, is a much smaller number.
IoT and messaging are converging. RCS is replacing SMS. Devices are becoming conversational. The platform that handles both needs state lookups that are fast enough for real-time control and compliant enough for TCPA. At 1.5 microseconds, Cachee delivers both.
Ready to Eliminate the State Lookup Tax?
See how Cachee's 1.5µs reads transform IoT and messaging infrastructure economics.
Explore IoT Solutions Start Free Trial