MEC Architecture: Deploying Cachee at the 5G Edge

February 8, 2026 • 7 min read • Architecture

Multi-access Edge Computing (MEC) moves computation from centralized clouds to the network edge—right inside the carrier infrastructure, meters from the cell tower. Cachee deploys natively in this environment as a containerized service within carrier MEC Kubernetes, sitting between the User Plane Function (UPF) and application servers.

Zero changes to the 5G core. Zero disruption to existing traffic flows. Under 30 seconds to deploy.

<30s

Deployment time

94-98%

Cache hit rate

1.21ns

L1 cache latency

5G core changes

The Four-Layer Architecture

Cachee slots into the existing 5G MEC stack without requiring modifications to any layer above or below it. Here is how each layer fits together.

Layer 1

User Plane

UE devices (phones, IoT sensors, AR headsets) connect through the 5G Radio Access Network (gNodeB) and are routed by the User Plane Function (UPF). This layer is entirely standard 3GPP. Cachee does not touch it.

UE Devices — Smartphones, tablets, IoT, AR/VR headsets
gNodeB (5G RAN) — Radio access, beamforming, scheduling
UPF — Packet routing, traffic steering to MEC or core

Layer 2

Cachee MEC Layer

This is where Cachee lives. Four services run as containers within the carrier's MEC Kubernetes cluster:

AI Prediction Engine — LSTM + Transformer + Reinforcement Learning models that predict which content will be requested up to 30 minutes ahead
L1 In-Process Cache — 1.21ns access latency, in-memory hot store for predicted and recently-accessed content
Edge Proxy — NVMe-backed secondary cache with connection pooling to origin servers
Compliance Engine — Enforces 30+ regulatory frameworks (GDPR, HIPAA, PCI-DSS, local telecom regulations) at the edge before data leaves the MEC boundary

Layer 3

Carrier Core

The 5G core network functions remain untouched. Cachee operates entirely in the user plane—the control plane never knows it exists.

AMF / SMF — Access and session management (unchanged)
Network Slicing — Dedicated slices can be configured to route Cachee-optimized traffic
Carrier CDN — Coexists with Cachee; Cachee handles dynamic/API content that CDNs cannot

Layer 4

Origin / Cloud

The backend servers where content originates. Cachee reduces the load on these by 94-98% through predictive caching.

AWS / GCP / Azure — Cloud-hosted application backends
Content Origins — Video, game assets, API responses
API Backends — Dynamic content served through Cachee's predictive pre-fetch

Architecture Diagram

                        5G MEC Architecture with Cachee
    ============================================================

    LAYER 1: USER PLANE
    +----------+     +-----------+     +-------+
    |    UE    | --> |  gNodeB   | --> |  UPF  |
    | Devices  |     | (5G RAN)  |     |       |
    +----------+     +-----------+     +---+---+
                                           |
                          traffic steering  |
                                           v
    ----------------------------------------+-------------------
    LAYER 2: CACHEE MEC (Kubernetes)        |
    +---------------------------------------+----------------+
    |                                                        |
    |  +------------------+    +------------------------+    |
    |  | AI Prediction    |    | L1 In-Process Cache    |    |
    |  | Engine           |--->| (1.21ns latency)       |    |
    |  | LSTM+Trans.+RL   |    +----------+-------------+    |
    |  +------------------+               |                  |
    |                           HIT (94-98%) | MISS (2-6%)   |
    |                                v       v               |
    |  +------------------+    +------------------------+    |
    |  | Compliance       |    | Edge Proxy             |    |
    |  | Engine (30+ regs)|    | (NVMe backing)         |    |
    |  +------------------+    +----------+-------------+    |
    |                                     |                  |
    +-------------------------------------+------------------+
                                          |
    --------------------------------------+------------------
    LAYER 3: CARRIER CORE                 |
    +-------------+  +-----------+        |
    |  AMF / SMF  |  | Network   |        |
    | (unchanged) |  | Slicing   |        |
    +-------------+  +-----------+        |
                                          |
    --------------------------------------+------------------
    LAYER 4: ORIGIN / CLOUD              |
    +----------+  +----------+  +---------+------+
    | AWS/GCP/ |  | Content  |  | API            |
    | Azure    |  | Origins  |  | Backends       |
    +----------+  +----------+  +----------------+

Request Flow: Cache Hit (94-98% of Requests)

The vast majority of requests never leave the MEC boundary. Here is the timing breakdown for a cache hit:

Step	Component	Cumulative Time
1. UE Request	Device sends request	t = 0 ms
2. 5G RAN	gNodeB radio processing	t = 4 ms
3. UPF Steering	Traffic routed to MEC	t = 7 ms
4. Cachee AI Check	Prediction engine lookup	t = 7.5 ms
5. L1 Cache HIT	In-process memory return	t = 7.5 ms
6. Response to UE	Return through RAN	t ≈ 10.5 ms

            10.5ms end-to-end. From the moment a user taps their screen to the moment data arrives back at their device. The physics of 5G radio adds ~7ms; Cachee adds 0.5ms. The rest is return path. Compare this to the 50-100ms typical of CDN-served 5G content.
        

Request Flow: Cache Miss (2-6% of Requests)

When the AI engine does not have a prediction and the L1 cache does not contain the requested content, the request falls through to origin:

Step	Component	Cumulative Time
1. UE Request	Device sends request	t = 0 ms
2. 5G RAN	gNodeB radio processing	t = 4 ms
3. UPF Steering	Traffic routed to MEC	t = 7 ms
4. Cachee AI Check	Prediction engine — MISS	t = 7.5 ms
5. Edge Proxy Forward	Request forwarded to origin	t = 8 ms
6. Origin Fetch	Cloud backend responds	t = 20–35 ms
7. Cache + Respond	Store in L1, return to UE	t = 25–40 ms

Even on a miss, Cachee is still faster than a traditional architecture because the edge proxy maintains persistent connection pools to origin servers. And critically, the fetched content is now cached—every subsequent request for the same content hits the 10.5ms path.

AI-Powered Predictive Pre-Fetching

The most powerful component of the Cachee MEC layer is the AI Prediction Engine. Rather than waiting for a cache miss to occur, Cachee predicts what content will be requested up to 30 minutes ahead and pre-fetches it into the L1 cache.

The engine combines three models:

LSTM (Long Short-Term Memory) — Learns temporal access patterns. If users in a cell typically stream certain content at 6pm, the model pre-fetches it by 5:30pm.
Transformer — Captures cross-content correlations. When a user loads a game lobby page, the model pre-fetches the most probable next assets (match data, player profiles, map textures).
Reinforcement Learning — Continuously optimizes cache eviction and pre-fetch decisions based on real hit/miss feedback. The model improves with every request.

            This is why cache hit rates reach 94-98%. Traditional CDNs use static TTL rules and reactive caching. Cachee proactively fills the cache with content it knows will be requested. The 2-6% miss rate consists almost entirely of genuinely novel, never-before-seen content.
        

Network Slice Integration

5G network slicing allows carriers to create dedicated virtual networks for different traffic types. Cachee integrates with the slicing architecture to provide differentiated caching behavior:

Slice Type	Cachee Behavior	Typical Use Case
eMBB (Enhanced Mobile Broadband)	Aggressive pre-fetch, large L1 allocation	4K/8K video, cloud gaming
URLLC (Ultra-Reliable Low-Latency)	Minimal processing overhead, priority L1 path	V2X, industrial control
mMTC (Massive Machine-Type Comms)	High-cardinality key handling, compact values	IoT sensor networks
Custom Enterprise	Compliance-first routing, tenant isolation	Private 5G, campus networks

Carriers can dedicate a low-latency slice specifically for Cachee-optimized traffic. Requests on this slice get priority UPF steering to the MEC layer, bypassing the carrier core entirely for cached content. The result is the lowest possible latency path: radio to cache to radio, with nothing in between.

Deployment: Under 30 Seconds

Cachee ships as a set of container images that deploy into any Kubernetes-based MEC platform. The deployment manifest is straightforward:

# cachee-mec-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cachee-mec
  namespace: edge-services
spec:
  replicas: 3
  selector:
    matchLabels:
      app: cachee-mec
  template:
    spec:
      containers:
      - name: cachee-ai-engine
        image: cachee/mec-ai:latest
        resources:
          requests: { cpu: "4", memory: "8Gi" }
          limits:   { cpu: "8", memory: "16Gi" }
        env:
        - name: PREDICTION_WINDOW_MIN
          value: "30"
        - name: L1_CACHE_MAX_ENTRIES
          value: "10000000"
      - name: cachee-edge-proxy
        image: cachee/mec-proxy:latest
        resources:
          requests: { cpu: "2", memory: "4Gi" }
        volumeMounts:
        - name: nvme-cache
          mountPath: /cache
      - name: cachee-compliance
        image: cachee/mec-compliance:latest
        env:
        - name: REGULATIONS
          value: "gdpr,hipaa,pci-dss,ccpa"
      volumes:
      - name: nvme-cache
        hostPath:
          path: /mnt/nvme0

Three kubectl apply commands and Cachee is live at the edge. Container orchestration handles rolling updates, health checks, and auto-scaling. No carrier network engineer needs to touch the 5G core configuration.

Important distinction: Cachee operates exclusively in the user plane. It does not interact with the 5G control plane (AMF, SMF, NSSF). This means no 3GPP certification is required for the caching layer, and carrier network operations teams can deploy it as a standard MEC application.

Why MEC Instead of Central Cloud?

The physics are simple. A centralized cloud cache sits 20-50ms away from the user (through the 5G core, transport network, and cloud ingress). A MEC-deployed cache sits 3-7ms away (just the radio hop). Cachee's L1 cache at the MEC edge adds effectively zero latency on top of that radio hop.

For applications that need sub-15ms response times—cloud gaming, AR overlays, real-time collaboration—the difference between 50ms and 10.5ms is the difference between usable and unusable. MEC is the only deployment model that gets there.

What Cachee Does Not Replace

Cachee is not a CDN, not a 5G core component, and not a replacement for origin servers. It is a predictive caching layer that:

Sits transparently in the data path
Reduces origin load by 94-98%
Adds AI-driven intelligence to content delivery at the edge
Enforces compliance before data crosses network boundaries

The 5G core, carrier CDN, and cloud backends all continue to function exactly as they do today. Cachee simply intercepts the requests it can serve faster and lets everything else pass through unchanged.

See the Full Architecture

Interactive diagrams, deployment guides, and integration details for carrier MEC environments.

Explore 5G Telecom →