Kubernetes Caching Best Practices with Sidecar Patterns
Kubernetes has transformed how we deploy applications, but caching in a containerized, orchestrated environment requires new patterns. This guide shows you how to implement high-performance caching using sidecar containers, ensuring fast local access with minimal complexity and operational overhead.
Why Sidecar Caching for Kubernetes?
Traditional centralized caching (Redis cluster, Memcached) works in Kubernetes, but sidecars offer compelling advantages:
- Ultra-low latency: localhost access eliminates network hops (0.1ms vs. 2-5ms)
- Resource isolation: Cache memory doesn't compete with application memory
- Zero network overhead: Shared volumes or localhost communication
- Automatic scaling: Cache capacity scales with pod replicas
- Simplified networking: No need to manage separate cache clusters
The sidecar pattern places a cache container alongside your application container in the same pod, sharing the network namespace and optionally storage volumes.
Pattern 1: Basic Sidecar Cache
Start with a simple Redis sidecar for local caching:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
# Main application container
- name: app
image: myapp:v1.0
ports:
- containerPort: 8080
env:
- name: CACHE_URL
value: "redis://localhost:6379"
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
# Sidecar cache container
- name: cache
image: redis:7-alpine
ports:
- containerPort: 6379
command:
- redis-server
- --maxmemory
- "256mb"
- --maxmemory-policy
- allkeys-lru
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "200m"
Pattern 2: Shared Volume Cache for Persistence
For caches that need to survive container restarts, use an emptyDir volume:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server-persistent
spec:
template:
spec:
volumes:
- name: cache-storage
emptyDir:
medium: Memory # Use RAM for fast access
sizeLimit: 512Mi
containers:
- name: app
image: myapp:v1.0
volumeMounts:
- name: cache-storage
mountPath: /cache
- name: cache
image: redis:7-alpine
command:
- redis-server
- --dir
- /data
- --appendonly
- "yes"
- --appendfsync
- "everysec"
volumeMounts:
- name: cache-storage
mountPath: /data
resources:
requests:
memory: "512Mi"
cpu: "100m"
Pattern 3: Multi-Tier Sidecar Cache
Combine in-memory and persistent caching with two sidecar containers:
spec:
template:
spec:
containers:
# Application
- name: app
image: myapp:v1.0
env:
- name: L1_CACHE_URL
value: "redis://localhost:6379"
- name: L2_CACHE_URL
value: "redis://localhost:6380"
# L1: Fast in-memory cache
- name: l1-cache
image: redis:7-alpine
ports:
- containerPort: 6379
command:
- redis-server
- --port
- "6379"
- --maxmemory
- "128mb"
- --maxmemory-policy
- volatile-lru
resources:
limits:
memory: "256Mi"
# L2: Larger persistent cache
- name: l2-cache
image: redis:7-alpine
ports:
- containerPort: 6380
command:
- redis-server
- --port
- "6380"
- --maxmemory
- "512mb"
- --appendonly
- "yes"
volumeMounts:
- name: cache-storage
mountPath: /data
resources:
limits:
memory: "768Mi"
Application code for multi-tier caching:
class TieredCache {
constructor() {
this.l1 = new Redis(6379); // Fast, small
this.l2 = new Redis(6380); // Slower, larger
}
async get(key) {
// Try L1 first
let value = await this.l1.get(key);
if (value) return value;
// Try L2
value = await this.l2.get(key);
if (value) {
// Promote to L1
await this.l1.set(key, value, 'EX', 300);
return value;
}
return null;
}
async set(key, value, ttl = 3600) {
// Write to both tiers
await Promise.all([
this.l1.set(key, value, 'EX', Math.min(ttl, 300)),
this.l2.set(key, value, 'EX', ttl)
]);
}
}
Pattern 4: Service Mesh Integration
Integrate with Istio or Linkerd for advanced traffic management and observability:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
template:
metadata:
annotations:
# Istio sidecar injection
sidecar.istio.io/inject: "true"
# Custom cache sidecar
sidecar.istio.io/userVolume: '{"cache-storage":{"emptyDir":{}}}'
sidecar.istio.io/userVolumeMount: '{"cache-storage":{"mountPath":"/cache"}}'
spec:
containers:
- name: app
image: myapp:v1.0
- name: cache
image: cachee-ai/sidecar:latest
ports:
- containerPort: 6379
name: cache
env:
- name: CACHE_SIZE_MB
value: "512"
- name: ML_OPTIMIZATION
value: "true"
Service Mesh Benefits
- Automatic mTLS: Encrypted cache communication
- Traffic policies: Rate limiting, circuit breaking
- Observability: Built-in metrics and tracing
- Zero-trust security: Fine-grained access control
Pattern 5: Init Container for Cache Warming
Pre-populate cache before the main application starts:
spec:
template:
spec:
# Share cache storage between init and main containers
volumes:
- name: cache-storage
emptyDir: {}
initContainers:
# Start cache
- name: cache-init
image: redis:7-alpine
command:
- sh
- -c
- |
redis-server --daemonize yes --dir /data
sleep 2
# Warm cache
- name: cache-warmer
image: myapp:v1.0
command: ["node", "warm-cache.js"]
env:
- name: CACHE_URL
value: "redis://localhost:6379"
volumeMounts:
- name: cache-storage
mountPath: /data
containers:
- name: app
image: myapp:v1.0
- name: cache
image: redis:7-alpine
volumeMounts:
- name: cache-storage
mountPath: /data
Resource Management Best Practices
Right-Sizing Cache Containers
# Calculate optimal cache size
apiVersion: v1
kind: ConfigMap
metadata:
name: cache-sizing
data:
sizing-guide: |
For each pod:
- L1 cache: 10-20% of app memory (hot data)
- L2 cache: 50-100% of app memory (warm data)
Example: 2GB app container
- L1: 256MB (Redis sidecar)
- L2: 1GB (Redis sidecar with persistence)
Total pod memory: ~3.5GB
- App: 2GB
- L1 cache: 256MB
- L2 cache: 1GB
- Overhead: ~250MB
CPU Allocation
resources:
# Cache sidecar CPU allocation
requests:
cpu: "100m" # 0.1 core baseline
limits:
cpu: "500m" # Burst to 0.5 core
# For high-throughput applications
requests:
cpu: "250m" # 0.25 core baseline
limits:
cpu: "1000m" # Burst to 1 core
Monitoring and Observability
Deploy Prometheus ServiceMonitor for cache metrics:
apiVersion: v1
kind: Service
metadata:
name: api-server-cache-metrics
labels:
app: api-server
spec:
selector:
app: api-server
ports:
- name: cache-metrics
port: 9121
targetPort: 9121
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cache-sidecar
spec:
selector:
matchLabels:
app: api-server
endpoints:
- port: cache-metrics
interval: 30s
Add Redis exporter as a third sidecar:
- name: cache-exporter
image: oliver006/redis_exporter:latest
ports:
- containerPort: 9121
env:
- name: REDIS_ADDR
value: "localhost:6379"
resources:
requests:
memory: "32Mi"
cpu: "10m"
limits:
memory: "64Mi"
cpu: "50m"
Zero-Downtime Cache Updates
Use rolling updates with readiness probes:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
spec:
containers:
- name: cache
image: redis:7-alpine
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 30
periodSeconds: 10
Common Pitfalls to Avoid
- Over-allocating memory: Set maxmemory lower than container limits
- No eviction policy: Always configure maxmemory-policy
- Missing resource limits: Cache can starve application of resources
- Ignoring persistence: Use volumes for caches that need to survive restarts
- No monitoring: Deploy exporters and track hit rates, memory usage
Conclusion
Sidecar caching in Kubernetes offers the best of both worlds: the simplicity of local caching with the scalability of cloud-native infrastructure. Start with a basic Redis sidecar, add persistence as needed, and consider multi-tier patterns for high-traffic applications.
For production deployments, combine sidecar caching with proper resource limits, monitoring, service mesh integration, and ML-powered optimization to achieve 94%+ hit rates with minimal operational overhead.
Deploy ML-Optimized Sidecar Caching
Cachee AI's Kubernetes operator automatically deploys and configures optimized cache sidecars with ML-powered TTL optimization and predictive prefetching.
Start Free Trial