Caching IoT Sensor Streams: 10,000 Devices at 1KB/sec
A factory floor with 10,000 sensors. Each sensor produces a 1KB JSON payload once per second. Temperature readings, vibration amplitudes, pressure gauges, humidity levels -- the stream never stops. The cache sitting between those sensors and the application layer has two jobs: serve the latest value for any device instantly, and hold a 60-second sliding window of readings for anomaly detection. Both jobs require sub-millisecond reads under continuous write pressure.
This is not a hypothetical workload. It is the baseline for any serious industrial IoT deployment, any fleet management system with GPS trackers, any smart building with thousands of environmental sensors. The numbers scale linearly: 50,000 devices is the same problem at 5x, and 100,000 devices is the same problem at 10x. If your caching architecture cannot handle 10,000 devices at 1KB/sec, it cannot handle the next order of magnitude either.
This post walks through the specific caching patterns required for IoT sensor streams, the math that makes Redis fail at this workload, and why in-process caching turns a 3.5-second-per-second write bottleneck into a 5-millisecond afterthought.
The Write Pressure Problem
Start with the arithmetic. 10,000 devices, each producing one reading per second, each reading approximately 1KB. That is 10,000 SET operations per second. Each SET carries a 1KB payload. Total ingest: 10MB per second, continuously, 24 hours a day, 365 days a year.
Redis can handle 10,000 SETs per second on a single instance. That is well within its documented throughput ceiling. But throughput is not the problem. Latency is the problem.
Every Redis SET follows the same path: the application serializes the value into a byte buffer, opens a TCP connection (or reuses a pooled one), sends the RESP-encoded command across the network, Redis receives and parses the command, writes the value to its hash table, sends the acknowledgment back, and the client deserializes the response. For a 1KB value in the same availability zone, this round-trip costs approximately 0.35 milliseconds.
At 10,000 operations per second, the cumulative write latency is 10,000 x 0.35ms = 3,500 milliseconds. Your write pipeline is spending 3.5 seconds of wall-clock latency budget per second of real time. Even with connection pooling and 10 parallel writers, each writer is consuming 350ms of every second on cache writes alone. That is 35% of each writer's time budget spent on serialization, TCP, and deserialization -- not on processing sensor data, not on anomaly detection, not on anything useful.
In-process caching eliminates the entire TCP round-trip. A 1KB SET to an in-process hash map takes approximately 548 nanoseconds: the hash computation, the key comparison, and the value copy into local memory. No serialization. No TCP. No deserialization. At 10,000 operations per second, the cumulative write time is 10,000 x 548ns = 5.48 milliseconds.
That is 638 times less overhead. The in-process write pipeline finishes all 10,000 SETs in 5.48 milliseconds, leaving 994.52 milliseconds of every second available for actual work. The Redis pipeline consumes 3.5x more time than exists in the time window. It must be parallelized just to keep up, which adds connection management complexity, retry logic, and back-pressure handling -- none of which solve the fundamental problem that every byte is traveling across a network it does not need to cross.
Pattern 1: Latest-Value Store
The first and most common IoT cache pattern is the latest-value store. Every device has a key like device:{id}:latest. The value is the most recent reading -- a JSON blob between 0.5KB and 2KB containing the sensor measurement, a timestamp, and device metadata. The TTL is 30 seconds. If the TTL expires, the device is considered offline.
Dashboards and monitoring systems query the latest-value store constantly. An operations dashboard showing all 10,000 devices refreshes every 2-5 seconds. Each refresh reads 100 to 1,000 device values (paginated or filtered by zone, building, or sensor type). Alert systems poll specific device keys at higher frequency -- 10-50 times per second for critical sensors.
The read pattern for a dashboard refresh with Redis: 100-1,000 GET operations at 0.35ms each. That is 35 to 350 milliseconds per dashboard refresh, purely for cache reads. For a dashboard refreshing every 2 seconds, Redis contributes 17-175ms of each 2-second cycle -- up to 8.75% of the available time budget, just reading cached values.
In-process reads change the math entirely. The same 100-1,000 GETs at 31 nanoseconds each cost 3.1 to 31 microseconds. A dashboard refresh that takes 350ms from Redis takes 31 microseconds in-process. That is an 11,290x reduction in read latency for the dashboard query.
# Latest-value key structure
device:sensor-4821:latest = {
"device_id": "sensor-4821",
"type": "temperature",
"value": 72.4,
"unit": "fahrenheit",
"timestamp": "2026-04-18T14:23:01.445Z",
"zone": "building-3-floor-2",
"battery": 0.87
}
TTL: 30 seconds (stale = device offline)
The TTL behavior is particularly important for IoT. When a sensor goes silent -- battery dies, network partition, physical failure -- the cache entry expires after 30 seconds, and the dashboard immediately shows the device as offline. There is no stale data problem because the cache is the authoritative source for "what is this device reading right now." If the key does not exist, the device is not reporting. This pattern does not work if cache reads are slow enough to affect dashboard responsiveness, because operators need to see failures in real time.
Pattern 2: Sliding Window for Anomaly Detection
The second pattern is more demanding. Each device maintains a circular buffer of the last 60 readings -- one reading per second for the last minute. The key is device:{id}:window. The value is a serialized array of 60 readings, each approximately 1KB. Total value size: roughly 60KB per device. Across 10,000 devices, the sliding window store holds 600MB of data.
Anomaly detection queries this window constantly. For each incoming reading, the system retrieves the device's 60-second window, computes the mean and standard deviation, and checks whether the new reading falls outside 2 standard deviations. If it does, an alert fires. This means one read of the 60KB window per device per second -- 10,000 reads of 60KB values per second.
| Operation | Redis | In-Process | Difference |
|---|---|---|---|
| Single 60KB GET | 1.0ms | 31ns | 32,258x |
| 10K anomaly checks/sec | 10,000ms | 0.31ms | 32,258x |
| Write (append to window) | 1.2ms | 580ns | 2,069x |
| 10K window updates/sec | 12,000ms | 5.8ms | 2,069x |
The Redis column reveals the fundamental impossibility. Reading 10,000 60KB values per second from Redis requires 10 seconds of cumulative read time per second. Even with 20 parallel connections, each connection spends 500ms of every second on cache reads. Add the 12 seconds of write time for updating the windows, and the cache layer alone requires 22 seconds of work per second. You need at least 22 parallel connections to Redis just to keep up, and that is before accounting for P99 spikes, connection overhead, and the single-threaded serialization bottleneck inside Redis itself.
In-process, the entire anomaly detection pipeline -- 10,000 reads and 10,000 writes per second -- consumes 6.11 milliseconds. The remaining 993.89 milliseconds are available for the actual statistical computation, alerting logic, and database persistence.
The 600MB Question
Holding 600MB of sliding window data in-process is feasible on modern server hardware (64-256GB RAM is standard for IoT gateways and aggregation servers). On constrained edge hardware, you can reduce the window to 30 seconds (300MB), use half-precision floats (halving the value size), or shard the window across multiple processes. The key insight is that 600MB of in-process memory replaces 600MB of Redis memory plus the network overhead to access it -- you are not adding memory, you are relocating it from a separate process to the one that actually uses it.
Pattern 3: MQTT Bridge Caching
Most IoT deployments use MQTT as the transport protocol. Devices publish readings to topics like sensors/building-3/floor-2/temperature/sensor-4821. An MQTT broker (Mosquitto, HiveMQ, EMQX) receives these messages and fans them out to subscribers. The problem is burst traffic. When a fleet of sensors reconnects after a network partition, or when a new subscriber comes online and requests retained messages for 10,000 topics, the broker generates a burst that can overwhelm downstream consumers.
A cache layer between the MQTT broker and the application absorbs these bursts. The architecture is straightforward:
MQTT Broker
|
v
Cachee (in-process, CacheeLFU eviction)
|
+---> Application Logic (anomaly detection, dashboards)
|
+---> Database (TimescaleDB, InfluxDB for persistence)
The MQTT subscriber writes every incoming message to the in-process cache before processing it. The application reads from the cache, never directly from the MQTT broker. This decouples the consumption rate from the production rate. If the broker delivers 50,000 messages in a 1-second burst (reconnection storm), the cache absorbs them at 548ns each -- 27.4 milliseconds total for the burst. The application processes them at its own pace, reading from the cache at 31ns per GET.
Without the cache, the application must either process messages at the broker's delivery rate (risking dropped messages if it falls behind), or implement its own buffering logic (reinventing what a cache already does). With the cache, the application has a clean separation: the MQTT subscriber is a writer, the application is a reader, and the cache is the buffer between them. The subscriber never blocks on application processing. The application never blocks on broker delivery.
This pattern also eliminates redundant MQTT subscriptions. If three different application components need the latest reading for sensor-4821 -- the dashboard, the anomaly detector, and the alerting engine -- only one MQTT subscription is required. All three read from the cache. Without caching, each component subscribes independently, and the broker delivers the same message three times. At 10,000 devices, that triples the broker's fan-out load from 10,000 messages/sec to 30,000 messages/sec.
Industrial IoT: Mixed Sensor Workloads
Real manufacturing deployments do not have 10,000 identical sensors. They have mixed workloads: high-frequency vibration sensors producing 1-10KB samples at up to 1KHz, medium-frequency temperature and pressure sensors at 1-10 Hz, and low-frequency environmental sensors (humidity, air quality) at 0.1 Hz. A typical factory floor might have 500 vibration sensors, 2,000 temperature/pressure sensors, and 500 environmental sensors.
The cache workload for this mix:
| Sensor Type | Count | Freq (Hz) | Size | Writes/sec | MB/sec |
|---|---|---|---|---|---|
| Vibration | 500 | 100 | 2KB | 50,000 | 100 |
| Temperature/Pressure | 2,000 | 1 | 0.5KB | 2,000 | 1 |
| Environmental | 500 | 0.1 | 0.5KB | 50 | 0.025 |
| Total | 3,000 | 52,050 | 101 |
Vibration sensors dominate: 50,000 writes per second at 2KB each. Redis at 0.35ms per SET: 17.5 seconds of write time per second. Completely impossible on a single instance. You need at least 18 Redis instances just for the write side, before reads.
In-process at 548ns per SET: 50,000 x 548ns = 27.4ms. All 52,050 writes fit in 28.5 milliseconds per second. One process. No sharding. No connection pooling. No retry logic.
CacheeLFU handles the mixed-frequency workload naturally. The 500 vibration sensors that a technician is actively monitoring on a dashboard stay in L0 -- their access frequency is high, so CacheeLFU scores them above the admission threshold. The 2,500 environmental and temperature sensors that nobody is currently viewing evict to make room. When a shift supervisor opens the temperature dashboard for building 3, those sensors' CacheeLFU scores rise as reads begin, and they get promoted to L0 automatically. No manual cache warming. No TTL tuning. The access pattern drives the eviction policy.
Edge Deployment: Constrained Hardware
IoT gateways sit at the edge of the network, aggregating sensor data before forwarding it to the cloud. These gateways run on constrained hardware: Raspberry Pi 4 (4-8GB RAM), NVIDIA Jetson Nano (4GB), industrial ARM single-board computers (2-8GB), or ruggedized x86 boxes (8-16GB). Every megabyte of RAM matters. Every process matters.
Redis on a 4GB gateway is a problem. The Redis server process itself consumes 50-100MB at startup. With 10,000 keys of 1KB each, it holds 10MB of data but uses 30-50MB of overhead for the hash table, per-key metadata, and output buffers. Under load with 10 concurrent connections, Redis memory usage climbs to 150-300MB. On a 4GB device where the operating system takes 500MB and the application takes 500MB-1GB, Redis claims 5-7% of total memory just for its process overhead -- memory that holds zero bytes of your data.
Cachee runs in-process. There is no separate daemon. There is no TCP listener. The cache is a library linked into your application. The memory it uses is the memory your data occupies -- a hash map of keys to values with minimal per-entry overhead (approximately 80 bytes per entry for the hash bucket, key storage, and CacheeLFU metadata). For 10,000 1KB entries, the total memory footprint is 10,000 x (1,024 + 80) = 10.8MB. Compare that to Redis at 150-300MB for the same data.
On a 4GB Raspberry Pi, the difference between 10.8MB and 200MB is the difference between running your anomaly detection model in the remaining memory and not running it at all. Edge IoT is a zero-sum game with RAM. Every byte Redis uses for process overhead is a byte your ML inference engine cannot use.
There is also the operational complexity. Redis on edge hardware means managing a separate systemd service, monitoring its health, handling restarts, configuring persistence (RDB/AOF on SD cards with limited write endurance), and debugging connectivity issues between the application and the Redis socket. In-process caching eliminates all of this. The cache starts when the application starts and stops when the application stops. There is no separate process to monitor, no socket to connect to, no persistence configuration to tune.
Post-Quantum Device Attestation Caching
As IoT deployments grow, device authentication becomes a security bottleneck. Each device must prove its identity before its readings are accepted. Traditional TLS client certificates work but are vulnerable to quantum computing attacks on RSA and ECDSA. Post-quantum signature schemes -- ML-DSA (Dilithium), FALCON, SLH-DSA (SPHINCS+) -- provide quantum-resistant device attestation, but they come with larger signatures.
An ML-DSA-65 signature is 3,309 bytes. A FALCON-512 signature is 897 bytes. Verifying these signatures is computationally expensive: ML-DSA verification takes approximately 150 microseconds, and FALCON verification takes approximately 60 microseconds. If every sensor reading requires signature verification, 10,000 readings per second means 10,000 verification operations -- 1.5 seconds of CPU time per second for ML-DSA alone.
The solution is to cache the attestation result. When device sensor-4821 presents a valid ML-DSA signature, the cache stores the verification result at key device:sensor-4821:attested with a TTL of 5 minutes. For the next 5 minutes, every reading from that device is authenticated with a 31-nanosecond cache lookup instead of a 150-microsecond signature verification. The device re-attests every 5 minutes -- 12 verifications per hour instead of 3,600.
# Without attestation caching
10,000 devices x 1 reading/sec x 150us verification
= 1,500,000us = 1.5 seconds of CPU per second
(100% of one core, just for signature verification)
# With attestation caching (5-minute TTL)
10,000 devices x 1 reading/sec x 31ns cache lookup
= 310,000ns = 0.31ms of CPU per second
+ 10,000 devices x (1 verify / 300 sec) x 150us
= 5,000us = 5ms per second for re-attestation
Total: 5.31ms vs 1,500ms = 282x reduction
This is not optional for PQ-secured IoT deployments. Without caching, the CPU cost of post-quantum signature verification scales linearly with device count and reading frequency. At 10,000 devices, it consumes an entire CPU core. At 100,000 devices, it consumes 10 cores. Caching reduces the verification load by a factor proportional to the TTL divided by the reading interval -- with a 5-minute TTL and 1-second readings, that is a 300x reduction in verification operations, bringing the CPU cost from cores to fractions of a core.
The attestation cache must be in-process. Using Redis for attestation lookups means every reading incurs a 0.35ms network round-trip to check device identity -- adding 3,500ms of cumulative latency per second back into the pipeline. The entire point of caching the attestation is to make it fast enough to be invisible. At 31 nanoseconds, it is.
Security Note
Caching attestation results does not weaken security. The TTL ensures that a compromised device is locked out within 5 minutes of key revocation (publish the revocation to the cache, or let the TTL expire). The cached value is a boolean "this device presented a valid signature" -- it contains no secret key material. The signature itself is not cached; only the verification result is. Re-attestation every 5 minutes is more frequent than most TLS certificate rotation schedules.
Putting It Together: The Full IoT Cache Architecture
A production IoT caching deployment combines all four patterns into a single in-process cache instance:
- Latest-value store: 10,000 keys, 1KB each, 30-second TTL. 10MB total. 10K writes/sec, 100-1,000 reads/sec per dashboard.
- Sliding window: 10,000 keys, 60KB each, no TTL (circular buffer). 600MB total. 10K reads + 10K writes/sec for anomaly detection.
- MQTT bridge buffer: Absorbs burst traffic at 548ns per write. Decouples broker delivery from application consumption.
- Attestation cache: 10,000 keys, ~100 bytes each (boolean + metadata), 5-minute TTL. 1MB total. 10K reads/sec, 33 writes/sec (re-attestations).
Total in-process memory: approximately 611MB. Total write overhead: 33.8 milliseconds per second. Total read overhead for anomaly detection + dashboard: 0.34 milliseconds per second. The cache layer consumes 3.4% of each second. The remaining 96.6% is available for business logic.
Compare that to the Redis architecture: 22+ seconds of cumulative latency per second for the same workload, requiring 23+ parallel connections, a dedicated Redis instance with 2-4GB of RAM, monitoring infrastructure, connection pool management, retry logic, and operational overhead that scales with device count.
# Install Cachee on your IoT gateway or aggregation server
brew tap h33ai-postquantum/tap
brew install cachee
# Initialize with IoT-appropriate settings
cachee init --mode inprocess --memory-limit 700MB
# Start the cache (runs in your application process)
cachee start
10,000 devices. 10,000 writes per second. 5.48 milliseconds of cache overhead. The rest is yours.
Install Cachee Redis Latency by Value Size