How to Cache GraphQL Without Losing Your Mind

GraphQL is great for clients, terrible for caching. Every query is unique. POST requests bypass HTTP caches. Nested resolvers multiply database calls exponentially. And the traditional cache-key strategies that work beautifully for REST — URL path + query parameters — break completely because the same data gets requested through dozens of different query shapes. If you have tried to cache a GraphQL API and felt like the entire caching model was working against you, you are not wrong. It is. The protocol was designed for flexibility at the client layer, and that flexibility is fundamentally hostile to every caching primitive the industry has relied on for twenty years.

Why GraphQL Breaks Traditional Caching

With REST, caching is straightforward. A GET /api/users/123 request always returns the same shape. You cache the URL, set a TTL, and move on. CDNs, reverse proxies, and browser caches all understand this model natively. The URL is the cache key. GraphQL destroys this contract in four distinct ways, and each one is enough to invalidate a traditional caching strategy on its own.

First: POST to a single endpoint. Every GraphQL request is a POST to /graphql. There is no URL path to cache. CDNs like CloudFront, Fastly, and Cloudflare are built to cache GET requests by URL — they do not inspect POST bodies by default. Your entire CDN layer, the one you are paying thousands per month for, is useless for GraphQL out of the box. You can configure some CDNs to hash request bodies, but now you are in custom territory with no standardization and limited tooling.

Second: query shape determines response shape. Two clients requesting the same user might send entirely different queries. One asks for { user(id: 123) { name, email } }. Another asks for { user(id: 123) { name, avatar, posts { title } } }. Same underlying data, different query strings, different cache keys, zero hit rate between them. In a real application with mobile, web, and internal dashboard clients, the number of unique query shapes scales combinatorially. A schema with 50 fields and 10 relationships can produce millions of structurally unique queries, all requesting overlapping subsets of the same data.

Third: nested resolvers create the N+1 problem on steroids. A single GraphQL query like { users { posts { comments { author { name } } } } } might trigger 1 query for users, 20 queries for posts, 200 queries for comments, and 200 queries for comment authors. That is 421 database calls from one client request. DataLoader helps batch these, but batching is not caching — the same author is fetched repeatedly across different requests, and each resolver operates independently with no shared cache layer unless you build one yourself.

Fourth: personalized and public data coexist in the same query. A query like { viewer { name, feed { posts { title, likeCount, viewerHasLiked } } } } mixes public data (post titles, like counts) with user-specific data (viewerHasLiked). You cannot cache the entire response because the personalized fields change per user. You cannot skip caching entirely because 90% of the response is public and shared. REST APIs typically separate these into different endpoints. GraphQL encourages combining them into a single query, which makes response-level caching nearly impossible without granular field awareness.

            The core problem: REST gives you one cache key per resource. GraphQL gives you infinite query permutations per resource. Traditional caching assumes a stable relationship between request and response shape. GraphQL breaks that assumption by design.
        

The 3 Approaches That Kinda Work

The GraphQL ecosystem has produced three caching strategies over the past several years. Each one solves part of the problem while introducing its own set of trade-offs. None of them fully address the fundamental shape-variance issue, but understanding them is essential before you can see why a different approach is needed.

1. Response-Level Caching (Hash the Query)

The simplest approach: take the entire GraphQL query string, hash it (SHA-256 or similar), and use that hash as the cache key. If the exact same query comes in again, serve the cached response. This works, and it is trivially easy to implement — a middleware that hashes request.body and checks Redis before forwarding to the GraphQL server.

The problem is hit rate. In a real application with multiple clients and varying query patterns, the same data is requested through different query shapes constantly. A mobile client requesting { user { name, avatar } } and a web client requesting { user { name, email, avatar, role } } produce entirely different hashes despite sharing three overlapping fields. Your cache stores two nearly-identical responses, serves neither to the other client, and your hit rate hovers around 15–30% in practice. You are paying for cache storage and getting minimal benefit.

// Response-level: simple but low hit rate
const cacheKey = sha256(JSON.stringify(query + variables));
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
// Hit rate: ~20% in production with multiple clients
        

2. Resolver-Level Caching (Cache Per Field)

A smarter approach: instead of caching entire responses, cache at the resolver level. Each resolver checks a cache before hitting the database, using a key derived from the type, field name, and arguments. User:123:posts is cached once and reused regardless of what other fields the client requested alongside it.

This dramatically improves hit rates — typically 50–70% — because the caching is field-granular. But managing it is painful. A schema with 50 types and 200 fields means 200 resolvers, each with its own TTL, invalidation logic, and cache key strategy. When a user updates their profile, which cached resolvers need invalidation? User:123:name, User:123:email, User:123:profile, every Post:*:author that references User 123, every Feed:* that includes posts from User 123. The invalidation graph grows with schema complexity, and a single missed edge means stale data.

3. Persisted Queries (Client Sends Query ID)

The most disciplined approach: pre-register every query at build time, assign each one a unique ID, and have the client send only the ID at runtime. The server looks up the full query from a registry. Because the set of queries is finite and known, you can cache each one individually with tailored TTLs, pre-warm them, and even serve them from CDNs by converting the POST to a GET with the query ID in the URL path.

This works extremely well for teams that control both the client and server. Apollo Client supports automatic persisted queries out of the box. But it requires client-side changes — every client must adopt the persisted query protocol. Third-party consumers of your API, internal tools with ad-hoc queries, and GraphiQL explorers all break. You also lose the flexibility that made GraphQL attractive in the first place: clients can no longer construct arbitrary queries. You have traded the query language for a fixed RPC interface with extra steps.

            The trade-off spectrum: Response-level is easy but ineffective (~20% hit rate). Resolver-level is effective but operationally complex (200+ TTLs to manage). Persisted queries are excellent but require client buy-in and sacrifice flexibility.
        

The ML Approach: Field-Level Prediction

There is a fourth approach that none of the GraphQL-native tools offer, because it requires a capability that sits outside the GraphQL execution engine: machine learning over access patterns. Instead of caching responses or individual resolver results, you normalize every query into its constituent fields, cache individual entities in a normalized store, and reassemble responses from cached entities at query time.

This is conceptually similar to how Apollo Client’s normalized cache works on the frontend — except applied server-side, across all clients, with a prediction layer that learns which fields are requested together and pre-warms them before queries arrive. When Cachee’s predictive engine observes that 85% of queries requesting User.name also request User.avatar and User.email, it pre-caches all three together. When a new query shape arrives requesting User.name and User.role, four of those five fields are already in L1 cache — only User.role triggers a resolver call.

The model continuously adapts. As client patterns shift — a new mobile app version drops avatar in favor of profilePicUrl, the dashboard team adds a lastLoginAt field — the prediction engine detects the shift within minutes and adjusts pre-warming priorities. There is no manual TTL management, no invalidation graph to maintain, no list of 200 resolvers to configure. The system observes, predicts, and caches. The result is a 95%+ hit rate even on APIs with heavy personalization and constantly evolving query shapes, because the caching operates at the entity-field level rather than the query level.

// Traditional: cache the full response (low reuse)
cache["sha256(query)"] = fullResponse;  // ~20% hit rate

// ML field-level: cache normalized entities (high reuse)
cache["User:123:name"]   = "Alice";     // shared across ALL queries
cache["User:123:email"]  = "a@b.com";   // pre-warmed by prediction
cache["User:123:avatar"] = "cdn/...";   // 95%+ hit rate
// Reassemble any query shape from cached fields
        

Invalidation becomes trivial in this model. When User:123 updates their email, you invalidate exactly one cache entry: User:123:email. Every query that touches that field gets fresh data on the next request. Every query that does not touch that field is unaffected. Compare that to the resolver-level approach where a single user update might require invalidating dozens of cached resolver results across multiple types and relationships. Field-level normalization with ML prediction turns GraphQL’s biggest caching weakness — infinite query shapes — into a strength, because more query diversity means more training signal for the prediction model.

Real Impact: Before and After

Here is what this looks like on a production GraphQL API serving a mobile app, a web dashboard, and two internal services — roughly 12,000 unique query shapes per day. The API has 85 types, 340 fields, and heavy use of nested relationships.

Before: Uncached GraphQL API

Parse & validate query

2 ms

Resolve root fields (3)

9 ms

Resolve nested (DataLoader)

54 ms

Resolve deeply nested

42 ms

Serialize response

13 ms

Total (40 resolver calls) 120 ms avg

After: Cachee Field-Level ML Cache

Parse & validate query

2 ms

L1 field assembly (38 hits)

0.06 ms

Resolve cache misses (2)

6 ms

Serialize response

0.3 ms

Total (2 resolver calls) 8 ms avg

Forty resolver calls dropped to two. Average response time fell from 120ms to 8ms. The 38 fields that hit L1 cache were served in 60 microseconds combined — faster than a single database round-trip. And because the prediction engine had already pre-warmed the most likely fields based on historical query patterns, the cache was populated before the queries arrived.

15× Faster Response

95% Field Hit Rate

95% Fewer DB Calls

60µs L1 Assembly

            The key insight: You do not cache GraphQL queries. You cache the entities and fields that queries are composed of. When every field is individually cached and pre-warmed by ML prediction, the query shape becomes irrelevant — any permutation assembles from the same cached atoms. That is how you get 95% hit rates on an API with 12,000 unique query shapes per day.
        

Cache GraphQL the Way It Was Meant to Be Cached.

Field-level ML prediction. 95%+ hit rates. Zero resolver rewrites. See what Cachee does to your GraphQL latency.

Start Free Trial Schedule Demo