API Rate Limiting with Intelligent Caching
Rate limiting protects your APIs from abuse, manages costs for third-party services, and ensures fair resource allocation. But traditional rate limiting implementations are either too simple (inaccurate), too slow (database lookups per request), or too expensive (dedicated infrastructure). Intelligent caching solves all three problems.
Why Rate Limiting Needs Caching
Consider a high-traffic API handling 10,000 requests/second. Every request needs rate limit verification:
- Database-based: 10K queries/sec = database overload
- In-memory: Works per-server, but fails in distributed systems
- Cached counters: Fast, accurate, distributed-ready
Caching rate limit state enables sub-millisecond checks while maintaining accuracy across distributed systems.
Rate Limiting Algorithms
1. Token Bucket (Most Common)
Each user has a bucket that fills with tokens at a fixed rate. Requests consume tokens. When the bucket is empty, requests are rejected.
# Redis-backed token bucket
async function checkRateLimit(userId, limit, refillRate) {
const key = `ratelimit:${userId}`;
const now = Date.now();
// Get current state
const data = await cache.get(key);
let tokens = limit;
let lastRefill = now;
if (data) {
({ tokens, lastRefill } = JSON.parse(data));
// Refill tokens based on time elapsed
const elapsed = (now - lastRefill) / 1000;
tokens = Math.min(limit, tokens + elapsed * refillRate);
}
// Try to consume a token
if (tokens >= 1) {
tokens -= 1;
await cache.set(key, JSON.stringify({
tokens,
lastRefill: now
}), { ttl: 3600 });
return { allowed: true, remaining: Math.floor(tokens) };
}
return { allowed: false, remaining: 0 };
}
2. Sliding Window Counter
More accurate than fixed windows, tracks requests in a rolling time period:
# Redis Lua script for atomic sliding window
local key = KEYS[1]
local window = tonumber(ARGV[1]) -- seconds
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Remove old entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count current requests in window
local current = redis.call('ZCARD', key)
if current < limit then
-- Add this request
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return { 1, limit - current - 1 }
else
return { 0, 0 }
end
3. Fixed Window Counter
Simplest implementation, but allows bursts at window boundaries:
async function fixedWindowLimit(userId, limit) {
const key = `ratelimit:${userId}`;
const window = Math.floor(Date.now() / 60000); // 1-minute windows
const windowKey = `${key}:${window}`;
const count = await cache.incr(windowKey);
if (count === 1) {
// First request in this window, set expiry
await cache.expire(windowKey, 60);
}
return {
allowed: count <= limit,
remaining: Math.max(0, limit - count)
};
}
Distributed Rate Limiting Challenges
Race Conditions
Multiple servers checking limits simultaneously can exceed quotas:
# Problem: Two servers check simultaneously
Server A: reads count=99, checks (99 < 100), allows request
Server B: reads count=99, checks (99 < 100), allows request
Result: 101 requests allowed (quota exceeded!)
# Solution: Atomic increment with Lua scripts
local current = redis.call('INCR', key)
if current > limit then
redis.call('DECR', key)
return 0
else
return 1
end
Cache Consistency
Distributed caches need eventual consistency guarantees:
- Use single Redis instance/cluster per region
- Avoid local caching of rate limit state
- Implement read-through pattern for accuracy
Intelligent Rate Limiting Strategies
1. Tiered Rate Limits
Different limits for different user tiers:
const RATE_LIMITS = {
free: { requests: 100, window: 3600 },
pro: { requests: 1000, window: 3600 },
enterprise: { requests: 10000, window: 3600 }
};
async function getRateLimit(userId) {
const tier = await cache.get(`user:${userId}:tier`);
return RATE_LIMITS[tier || 'free'];
}
2. Endpoint-Specific Limits
Expensive endpoints get stricter limits:
const ENDPOINT_LIMITS = {
'/api/search': 10, // Expensive full-text search
'/api/export': 5, // Resource-intensive export
'/api/users': 100, // Cheap user lookup
};
async function checkEndpointLimit(userId, endpoint) {
const limit = ENDPOINT_LIMITS[endpoint] || 50;
const key = `ratelimit:${userId}:${endpoint}`;
return checkRateLimit(key, limit, 60);
}
3. Adaptive Rate Limiting
Automatically adjust limits based on system load:
async function getAdaptiveLimit(userId, baseLimit) {
const systemLoad = await cache.get('metrics:cpu_usage');
if (systemLoad > 80) {
// System under stress, reduce limits
return baseLimit * 0.5;
} else if (systemLoad < 30) {
// System idle, allow more requests
return baseLimit * 1.5;
}
return baseLimit;
}
4. Burst Allowance
Allow short bursts while maintaining average rate:
async function checkWithBurst(userId, sustained, burst) {
// Sustained rate: 100 req/hour
// Burst: 20 req/minute
const sustainedOk = await checkRateLimit(
`${userId}:hour`, sustained, 3600
);
const burstOk = await checkRateLimit(
`${userId}:minute`, burst, 60
);
return sustainedOk.allowed && burstOk.allowed;
}
Performance Optimization
Batch Rate Limit Checks
For internal API calls, batch multiple checks:
async function batchCheckLimits(userIds) {
const pipeline = cache.pipeline();
userIds.forEach(id => {
pipeline.get(`ratelimit:${id}`);
});
const results = await pipeline.exec();
return results.map(([err, data], i) => ({
userId: userIds[i],
allowed: data ? parseInt(data) < LIMIT : true
}));
}
Local Caching for Quota Information
Cache user tier information locally (not counters):
const tierCache = new Map();
async function getUserTier(userId) {
if (tierCache.has(userId)) {
return tierCache.get(userId);
}
const tier = await database.query(
'SELECT tier FROM users WHERE id = ?', [userId]
);
// Cache tier for 5 minutes
tierCache.set(userId, tier);
setTimeout(() => tierCache.delete(userId), 300000);
return tier;
}
Rate Limit Response Headers
Inform clients about their rate limit status:
app.use(async (req, res, next) => {
const limit = await checkRateLimit(req.userId);
res.setHeader('X-RateLimit-Limit', limit.total);
res.setHeader('X-RateLimit-Remaining', limit.remaining);
res.setHeader('X-RateLimit-Reset', limit.resetAt);
if (!limit.allowed) {
res.setHeader('Retry-After', limit.retryAfter);
return res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: limit.retryAfter
});
}
next();
});
Advanced: ML-Powered Rate Limiting
Detect abuse patterns using ML instead of simple thresholds:
async function detectAnomalousUsage(userId) {
const pattern = await getAccessPattern(userId);
const features = {
requests_per_hour: pattern.requestRate,
unique_endpoints: pattern.endpointDiversity,
error_rate: pattern.errorRate,
time_distribution: pattern.timeVariance,
geographic_diversity: pattern.ipDiversity
};
const abuseScore = await mlModel.predict(features);
if (abuseScore > 0.8) {
// Likely abuse, apply stricter limits
return reduceRateLimit(userId, 0.1);
}
}
Monitoring Rate Limits
Track these metrics to optimize rate limiting:
- Rejection rate: What % of requests hit limits?
- User distribution: Are limits too strict/loose?
- Burst patterns: Do users hit burst limits often?
- Cache hit rate: Is rate limit cache effective?
Conclusion
Effective API rate limiting requires fast, accurate, distributed-ready implementation. Intelligent caching provides the foundation: sub-millisecond checks, atomic operations, and scalable infrastructure. By combining token buckets, sliding windows, and ML-powered anomaly detection, you can build rate limiters that protect your APIs while maintaining excellent user experience.
Start with simple token buckets cached in Redis, then add sophistication as your API scales: tiered limits, endpoint-specific quotas, adaptive throttling, and intelligent abuse detection.
Built-In Intelligent Rate Limiting
Cachee.ai includes distributed rate limiting with automatic tier management and abuse detection.
Start Free Trial