Real-Time Analytics with Distributed Caching
Real-time analytics dashboards need to process millions of events and serve insights in milliseconds. Traditional databases struggle with this requirement, but distributed caching enables sub-second query performance even at massive scale. This guide shows you how to architect high-performance analytics systems using caching strategies.
The Real-Time Analytics Challenge
Modern analytics dashboards face unique performance constraints:
- High query frequency: Dashboards auto-refresh every 5-30 seconds
- Complex aggregations: GROUP BY, COUNT, SUM, AVG across millions of rows
- Time-series data: Rolling windows, percentiles, trend calculations
- Concurrent users: Hundreds of users viewing the same or different dashboards
Without caching, analytical queries can take 2-10 seconds each. With 20 widgets per dashboard refreshing every 10 seconds, you'd need massive database clusters to handle the load.
Strategy 1: Time-Bucketed Aggregation Caching
Pre-aggregate metrics into time buckets and cache them separately. This is the foundation of fast analytics:
// Cache structure for time-series metrics
class TimeSeriesCache {
constructor(cache) {
this.cache = cache;
}
async getMetric(metric, start, end, granularity) {
const buckets = this.generateBuckets(start, end, granularity);
const cacheKeys = buckets.map(b =>
`metrics:${metric}:${granularity}:${b.timestamp}`
);
// Fetch all buckets in parallel
const values = await this.cache.mget(cacheKeys);
// Find missing buckets
const missing = buckets.filter((b, i) => values[i] === null);
if (missing.length > 0) {
// Compute missing aggregations
const computed = await this.computeAggregations(
metric, missing
);
// Cache with appropriate TTL
await Promise.all(computed.map(({ key, value, ttl }) =>
this.cache.set(key, value, ttl)
));
// Merge cached and computed results
return this.mergeResults(values, computed);
}
return values;
}
generateBuckets(start, end, granularity) {
// Generate time buckets (hourly, daily, etc.)
const buckets = [];
let current = this.roundDown(start, granularity);
while (current < end) {
buckets.push({ timestamp: current });
current = this.addInterval(current, granularity);
}
return buckets;
}
}
Choosing the Right Granularity
Match cache granularity to query patterns:
- 1-minute buckets: Real-time dashboards, last hour views
- 1-hour buckets: Daily dashboards, last 7 days views
- 1-day buckets: Historical reports, monthly/yearly views
Strategy 2: Layered Cache Architecture
Use multiple cache layers with different TTLs for optimal freshness vs. performance:
class LayeredAnalyticsCache {
constructor() {
// Hot cache: last 5 minutes, 30s TTL
this.hotCache = new Map();
// Warm cache: last hour, 5min TTL
this.warmCache = new Redis({ db: 0 });
// Cold cache: historical data, 24h TTL
this.coldCache = new Redis({ db: 1 });
}
async getAggregation(query, timeRange) {
const key = this.buildKey(query, timeRange);
const age = Date.now() - timeRange.end;
// Recent data: check hot cache first
if (age < 300000) { // 5 minutes
let value = this.hotCache.get(key);
if (value) return value;
value = await this.computeAndCache(
key, query, this.hotCache, 30
);
return value;
}
// Last hour: use warm cache
if (age < 3600000) { // 1 hour
return await this.getOrCompute(
key, query, this.warmCache, 300
);
}
// Historical: use cold cache
return await this.getOrCompute(
key, query, this.coldCache, 86400
);
}
}
Strategy 3: Incremental Aggregation
Instead of recomputing entire aggregations, update them incrementally as new data arrives:
// Incremental counter pattern
async function updateMetricCounter(cache, event) {
const key = `metrics:${event.type}:${getCurrentHour()}`;
// Atomic increment
await cache.incr(key);
// Set TTL on first write
const ttl = await cache.ttl(key);
if (ttl === -1) {
await cache.expire(key, 7200); // 2 hours
}
}
// Incremental average calculation
async function updateMetricAverage(cache, event) {
const key = `metrics:${event.type}:${getCurrentHour()}`;
const data = await cache.get(key) || { sum: 0, count: 0 };
data.sum += event.value;
data.count += 1;
await cache.set(key, data, 7200);
return data.sum / data.count;
}
Strategy 4: Query Result Caching with Smart Invalidation
Cache entire query results with automatic invalidation when underlying data changes:
class AnalyticsQueryCache {
async executeQuery(sql, params) {
const queryHash = this.hashQuery(sql, params);
const cacheKey = `query:${queryHash}`;
// Try cache first
const cached = await this.cache.get(cacheKey);
if (cached) {
return { data: cached, source: 'cache' };
}
// Execute query
const result = await this.database.query(sql, params);
// Determine TTL based on query characteristics
const ttl = this.calculateTTL(sql);
// Cache with tags for invalidation
const tags = this.extractTables(sql);
await this.cache.set(cacheKey, result, ttl, { tags });
return { data: result, source: 'database' };
}
calculateTTL(sql) {
// Recent data: shorter TTL
if (sql.includes('last_hour') || sql.includes('today')) {
return 60; // 1 minute
}
// Historical data: longer TTL
if (sql.includes('last_month') || sql.includes('last_year')) {
return 3600; // 1 hour
}
return 300; // 5 minutes default
}
async invalidateTable(tableName) {
// Invalidate all queries touching this table
await this.cache.invalidateByTag(tableName);
}
}
Strategy 5: Probabilistic Data Structures
For approximate analytics (unique visitors, distinct counts), use space-efficient probabilistic structures:
// HyperLogLog for cardinality estimation
const { HyperLogLog } = require('redis-hyperloglog');
async function trackUniqueVisitors(cache, pageId, userId) {
const key = `analytics:unique:${pageId}:${getCurrentDay()}`;
await cache.pfadd(key, userId);
// Get approximate count (0.81% standard error)
const uniqueCount = await cache.pfcount(key);
return uniqueCount;
}
// Bloom filter for "has user seen this?" checks
async function hasUserSeenContent(cache, userId, contentId) {
const key = `analytics:seen:${userId}`;
const exists = await cache.bf.exists(key, contentId);
if (!exists) {
await cache.bf.add(key, contentId);
}
return exists;
}
Real-World Example: E-commerce Analytics Dashboard
Complete implementation for a real-time sales dashboard:
class EcommerceDashboard {
async getDashboardData() {
const now = Date.now();
const oneHourAgo = now - 3600000;
// Parallel fetch of all metrics
const [
revenue,
orders,
topProducts,
conversionRate
] = await Promise.all([
this.getRevenue(oneHourAgo, now),
this.getOrderCount(oneHourAgo, now),
this.getTopProducts(oneHourAgo, now, 10),
this.getConversionRate(oneHourAgo, now)
]);
return { revenue, orders, topProducts, conversionRate };
}
async getRevenue(start, end) {
// 1-minute granularity for last hour
const buckets = this.getMinuteBuckets(start, end);
const revenueByMinute = await Promise.all(
buckets.map(minute =>
this.cache.get(`revenue:${minute}`)
)
);
return {
total: revenueByMinute.reduce((a, b) => a + b, 0),
timeseries: revenueByMinute
};
}
}
Performance Metrics and Monitoring
Track these KPIs to optimize your analytics caching:
- Cache hit rate: Target 85%+ for analytics queries
- P95 query latency: Should be under 100ms with caching
- Data freshness: Track average lag between event and visibility
- Memory efficiency: Bytes stored per metric data point
Conclusion
Real-time analytics with distributed caching transforms database-crushing workloads into sub-second user experiences. By combining time-bucketed aggregations, layered caching, incremental updates, and smart invalidation, you can serve thousands of concurrent dashboard users with minimal infrastructure.
Start with simple time-bucketed caching for your most expensive queries, add incremental aggregation as you scale, and leverage ML-powered caching systems to automatically optimize TTLs and prefetch patterns.
Power Your Analytics with Intelligent Caching
Cachee AI automatically optimizes analytics query caching with ML-powered TTL prediction and aggregation pattern recognition.
Start Free Trial