MEC Architecture: Deploying Cachee at the 5G Edge
Multi-access Edge Computing (MEC) moves computation from centralized clouds to the network edge—right inside the carrier infrastructure, meters from the cell tower. Cachee deploys natively in this environment as a containerized service within carrier MEC Kubernetes, sitting between the User Plane Function (UPF) and application servers.
Zero changes to the 5G core. Zero disruption to existing traffic flows. Under 30 seconds to deploy.
The Four-Layer Architecture
Cachee slots into the existing 5G MEC stack without requiring modifications to any layer above or below it. Here is how each layer fits together.
User Plane
UE devices (phones, IoT sensors, AR headsets) connect through the 5G Radio Access Network (gNodeB) and are routed by the User Plane Function (UPF). This layer is entirely standard 3GPP. Cachee does not touch it.
- UE Devices — Smartphones, tablets, IoT, AR/VR headsets
- gNodeB (5G RAN) — Radio access, beamforming, scheduling
- UPF — Packet routing, traffic steering to MEC or core
Cachee MEC Layer
This is where Cachee lives. Four services run as containers within the carrier's MEC Kubernetes cluster:
- AI Prediction Engine — LSTM + Transformer + Reinforcement Learning models that predict which content will be requested up to 30 minutes ahead
- L1 In-Process Cache — 1.21ns access latency, in-memory hot store for predicted and recently-accessed content
- Edge Proxy — NVMe-backed secondary cache with connection pooling to origin servers
- Compliance Engine — Enforces 30+ regulatory frameworks (GDPR, HIPAA, PCI-DSS, local telecom regulations) at the edge before data leaves the MEC boundary
Carrier Core
The 5G core network functions remain untouched. Cachee operates entirely in the user plane—the control plane never knows it exists.
- AMF / SMF — Access and session management (unchanged)
- Network Slicing — Dedicated slices can be configured to route Cachee-optimized traffic
- Carrier CDN — Coexists with Cachee; Cachee handles dynamic/API content that CDNs cannot
Origin / Cloud
The backend servers where content originates. Cachee reduces the load on these by 94-98% through predictive caching.
- AWS / GCP / Azure — Cloud-hosted application backends
- Content Origins — Video, game assets, API responses
- API Backends — Dynamic content served through Cachee's predictive pre-fetch
Architecture Diagram
5G MEC Architecture with Cachee
============================================================
LAYER 1: USER PLANE
+----------+ +-----------+ +-------+
| UE | --> | gNodeB | --> | UPF |
| Devices | | (5G RAN) | | |
+----------+ +-----------+ +---+---+
|
traffic steering |
v
----------------------------------------+-------------------
LAYER 2: CACHEE MEC (Kubernetes) |
+---------------------------------------+----------------+
| |
| +------------------+ +------------------------+ |
| | AI Prediction | | L1 In-Process Cache | |
| | Engine |--->| (1.21ns latency) | |
| | LSTM+Trans.+RL | +----------+-------------+ |
| +------------------+ | |
| HIT (94-98%) | MISS (2-6%) |
| v v |
| +------------------+ +------------------------+ |
| | Compliance | | Edge Proxy | |
| | Engine (30+ regs)| | (NVMe backing) | |
| +------------------+ +----------+-------------+ |
| | |
+-------------------------------------+------------------+
|
--------------------------------------+------------------
LAYER 3: CARRIER CORE |
+-------------+ +-----------+ |
| AMF / SMF | | Network | |
| (unchanged) | | Slicing | |
+-------------+ +-----------+ |
|
--------------------------------------+------------------
LAYER 4: ORIGIN / CLOUD |
+----------+ +----------+ +---------+------+
| AWS/GCP/ | | Content | | API |
| Azure | | Origins | | Backends |
+----------+ +----------+ +----------------+
Request Flow: Cache Hit (94-98% of Requests)
The vast majority of requests never leave the MEC boundary. Here is the timing breakdown for a cache hit:
| Step | Component | Cumulative Time |
|---|---|---|
| 1. UE Request | Device sends request | t = 0 ms |
| 2. 5G RAN | gNodeB radio processing | t = 4 ms |
| 3. UPF Steering | Traffic routed to MEC | t = 7 ms |
| 4. Cachee AI Check | Prediction engine lookup | t = 7.5 ms |
| 5. L1 Cache HIT | In-process memory return | t = 7.5 ms |
| 6. Response to UE | Return through RAN | t ≈ 10.5 ms |
Request Flow: Cache Miss (2-6% of Requests)
When the AI engine does not have a prediction and the L1 cache does not contain the requested content, the request falls through to origin:
| Step | Component | Cumulative Time |
|---|---|---|
| 1. UE Request | Device sends request | t = 0 ms |
| 2. 5G RAN | gNodeB radio processing | t = 4 ms |
| 3. UPF Steering | Traffic routed to MEC | t = 7 ms |
| 4. Cachee AI Check | Prediction engine — MISS | t = 7.5 ms |
| 5. Edge Proxy Forward | Request forwarded to origin | t = 8 ms |
| 6. Origin Fetch | Cloud backend responds | t = 20–35 ms |
| 7. Cache + Respond | Store in L1, return to UE | t = 25–40 ms |
Even on a miss, Cachee is still faster than a traditional architecture because the edge proxy maintains persistent connection pools to origin servers. And critically, the fetched content is now cached—every subsequent request for the same content hits the 10.5ms path.
AI-Powered Predictive Pre-Fetching
The most powerful component of the Cachee MEC layer is the AI Prediction Engine. Rather than waiting for a cache miss to occur, Cachee predicts what content will be requested up to 30 minutes ahead and pre-fetches it into the L1 cache.
The engine combines three models:
- LSTM (Long Short-Term Memory) — Learns temporal access patterns. If users in a cell typically stream certain content at 6pm, the model pre-fetches it by 5:30pm.
- Transformer — Captures cross-content correlations. When a user loads a game lobby page, the model pre-fetches the most probable next assets (match data, player profiles, map textures).
- Reinforcement Learning — Continuously optimizes cache eviction and pre-fetch decisions based on real hit/miss feedback. The model improves with every request.
Network Slice Integration
5G network slicing allows carriers to create dedicated virtual networks for different traffic types. Cachee integrates with the slicing architecture to provide differentiated caching behavior:
| Slice Type | Cachee Behavior | Typical Use Case |
|---|---|---|
| eMBB (Enhanced Mobile Broadband) | Aggressive pre-fetch, large L1 allocation | 4K/8K video, cloud gaming |
| URLLC (Ultra-Reliable Low-Latency) | Minimal processing overhead, priority L1 path | V2X, industrial control |
| mMTC (Massive Machine-Type Comms) | High-cardinality key handling, compact values | IoT sensor networks |
| Custom Enterprise | Compliance-first routing, tenant isolation | Private 5G, campus networks |
Carriers can dedicate a low-latency slice specifically for Cachee-optimized traffic. Requests on this slice get priority UPF steering to the MEC layer, bypassing the carrier core entirely for cached content. The result is the lowest possible latency path: radio to cache to radio, with nothing in between.
Deployment: Under 30 Seconds
Cachee ships as a set of container images that deploy into any Kubernetes-based MEC platform. The deployment manifest is straightforward:
# cachee-mec-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cachee-mec
namespace: edge-services
spec:
replicas: 3
selector:
matchLabels:
app: cachee-mec
template:
spec:
containers:
- name: cachee-ai-engine
image: cachee/mec-ai:latest
resources:
requests: { cpu: "4", memory: "8Gi" }
limits: { cpu: "8", memory: "16Gi" }
env:
- name: PREDICTION_WINDOW_MIN
value: "30"
- name: L1_CACHE_MAX_ENTRIES
value: "10000000"
- name: cachee-edge-proxy
image: cachee/mec-proxy:latest
resources:
requests: { cpu: "2", memory: "4Gi" }
volumeMounts:
- name: nvme-cache
mountPath: /cache
- name: cachee-compliance
image: cachee/mec-compliance:latest
env:
- name: REGULATIONS
value: "gdpr,hipaa,pci-dss,ccpa"
volumes:
- name: nvme-cache
hostPath:
path: /mnt/nvme0
Three kubectl apply commands and Cachee is live at the edge. Container orchestration handles rolling updates, health checks, and auto-scaling. No carrier network engineer needs to touch the 5G core configuration.
Why MEC Instead of Central Cloud?
The physics are simple. A centralized cloud cache sits 20-50ms away from the user (through the 5G core, transport network, and cloud ingress). A MEC-deployed cache sits 3-7ms away (just the radio hop). Cachee's L1 cache at the MEC edge adds effectively zero latency on top of that radio hop.
For applications that need sub-15ms response times—cloud gaming, AR overlays, real-time collaboration—the difference between 50ms and 10.5ms is the difference between usable and unusable. MEC is the only deployment model that gets there.
What Cachee Does Not Replace
Cachee is not a CDN, not a 5G core component, and not a replacement for origin servers. It is a predictive caching layer that:
- Sits transparently in the data path
- Reduces origin load by 94-98%
- Adds AI-driven intelligence to content delivery at the edge
- Enforces compliance before data crosses network boundaries
The 5G core, carrier CDN, and cloud backends all continue to function exactly as they do today. Cachee simply intercepts the requests it can serve faster and lets everything else pass through unchanged.
See the Full Architecture
Interactive diagrams, deployment guides, and integration details for carrier MEC environments.
Explore 5G Telecom →