Caching GraphQL Responses: Solving the N+1 Size Problem
GraphQL promised flexible data fetching. The client asks for exactly what it needs, the server assembles exactly that, and nothing more crosses the wire. This works beautifully for the client. It is a caching nightmare for the server.
In REST, the caching story is straightforward. Every endpoint returns a predictable payload shape. GET /api/products/123 always returns the same fields. The URL is the cache key. The response size is stable. CDN headers work. Redis works. Everything works because the cache key and the response shape are deterministic.
GraphQL breaks every one of these assumptions. The same endpoint -- typically POST /graphql -- serves an infinite number of query shapes. One client requests { product(id: 123) { name, price } } and gets back 200 bytes. Another client requests the same product with images, reviews, related items, seller details, and inventory across five warehouses, and gets back 85 KB. Same endpoint. Same product ID. Entirely different cache entries. Traditional URL-based caching has no mechanism to distinguish them because the URL is identical for every request.
This is the N+1 size problem: not the classic N+1 query problem that DataLoader solves, but the fact that N different query shapes against the same data produce N different response sizes, and each one needs its own cache entry with its own size-dependent latency profile.
Why Existing Solutions Do Not Solve This
The GraphQL ecosystem has produced several caching approaches over the past decade. Each one addresses part of the problem while leaving the performance-critical part untouched.
Apollo Client-Side Cache
Apollo Client normalizes GraphQL responses into a flat entity cache in the browser. When a subsequent query requests data that overlaps with a previous query, Apollo serves it from memory without a network request. This is genuinely useful for client-side performance. It does nothing for server-side latency. If your API serves 10,000 unique clients, each one builds its own cache independently. The server still assembles and serializes the full response for every request from every client. Client-side caching optimizes the wrong layer for server performance.
DataLoader and Batch Loading
DataLoader solves the N+1 query problem at the database layer. When a GraphQL resolver needs to fetch 50 products, DataLoader batches those 50 individual database queries into a single SELECT ... WHERE id IN (...). This is essential and every production GraphQL server should use it. But DataLoader operates at the resolver level -- it caches and batches individual field resolutions during a single request lifecycle. It does not cache the assembled response. After DataLoader efficiently fetches all 50 products, the server still JSON-serializes the 80 KB response, sends it over the wire, and throws away all that assembled data when the request ends. The next identical query starts from scratch.
Redis Response Cache
Some teams cache the full GraphQL response in Redis. This works conceptually: hash the query, store the response. The problem is payload size. GraphQL responses are large and variable. A product listing page might produce 80 KB of JSON. A dashboard with time-series data might produce 120 KB. An admin analytics query might produce 200 KB. Redis latency scales linearly with value size above 1 KB, and these responses sit deep into the linear zone.
| GraphQL Response Size | Redis GET P50 | Redis GET P99 | In-Process P50 | Overhead vs In-Process |
|---|---|---|---|---|
| 20 KB (simple list) | 0.85ms | 1.50ms | 31ns | 27,419x |
| 50 KB (product page) | 1.52ms | 2.80ms | 31ns | 49,032x |
| 80 KB (catalog page) | 2.10ms | 3.90ms | 31ns | 67,742x |
| 120 KB (dashboard) | 2.90ms | 5.50ms | 31ns | 93,548x |
| 150 KB (analytics) | 3.80ms | 7.10ms | 31ns | 122,581x |
At 120 KB, Redis is adding 2.9 milliseconds to every cached response. That is not caching. That is a bottleneck wearing a caching costume. And the P99 at 5.5ms is worse than many direct database queries.
CDN Caching
CDNs cache HTTP responses by URL and request method. GraphQL uses POST requests to a single endpoint. POST requests are not cached by CDNs by default. Some CDNs support persisted queries (where the query is mapped to a GET request with a query hash in the URL), but this requires the client to register queries ahead of time, which eliminates the dynamic query flexibility that is GraphQL's entire value proposition. It is also limited to read operations and breaks for any query that includes user-specific variables.
The Fundamental Mismatch
REST caching is URL-keyed and size-predictable. GraphQL is query-shape-keyed and size-variable. Every caching layer built for REST -- CDNs, reverse proxies, Redis URL patterns -- assumes fixed endpoints with stable response sizes. GraphQL violates both assumptions simultaneously.
Query Hash Caching: The Right Cache Key
The first step to caching GraphQL responses is generating a deterministic cache key from the query itself. Two clients sending the same logical query -- even with different whitespace, field ordering, or alias names -- should hit the same cache entry.
The process has three steps. First, normalize the query by sorting all field selections alphabetically, stripping comments and extraneous whitespace, and canonicalizing alias names. Second, combine the normalized query string with the serialized variables (also sorted by key). Third, hash the combined string with SHA-256 to produce a fixed-length cache key.
import hashlib
import json
def graphql_cache_key(query: str, variables: dict = None) -> str:
"""Generate a deterministic cache key for a GraphQL query.
Normalize the query to ensure identical logical queries
produce identical cache keys regardless of formatting.
"""
# Step 1: Normalize whitespace and sort fields
normalized = normalize_graphql_query(query)
# Step 2: Sort and serialize variables deterministically
vars_str = ""
if variables:
vars_str = json.dumps(variables, sort_keys=True, separators=(',', ':'))
# Step 3: SHA-256 hash of normalized query + variables
combined = f"{normalized}|{vars_str}"
return hashlib.sha256(combined.encode()).hexdigest()
def normalize_graphql_query(query: str) -> str:
"""Strip comments, collapse whitespace, sort field selections."""
# Remove comments
lines = [l.split('#')[0] for l in query.strip().splitlines()]
# Collapse whitespace
collapsed = ' '.join(' '.join(lines).split())
# Sort fields within selection sets (simplified)
return collapsed
This gives you a 64-character hex string as a cache key for any query shape. Two clients sending { product(id: 123) { name price } } and { product(id: 123) { price name } } produce the same key. Two clients sending the same query with { "id": 123 } and {"id":123} produce the same key. The normalization ensures determinism regardless of cosmetic differences in the query text.
The query hash is stable, compact, and fast to compute. SHA-256 of a typical GraphQL query string takes 200-500 nanoseconds. This is not the bottleneck. The bottleneck is what you do with the value once you have the key.
The Size Problem: Real Query Payloads
Consider the queries that actually drive traffic in a production GraphQL API.
A product listing page queries 50 products. Each product has a name, description, price, currency, 3 image URLs, average rating, review count, seller name, and availability status. That is roughly 1.6 KB per product. Fifty products: 80 KB of JSON.
An operations dashboard queries 30 metrics. Each metric has a name, current value, unit, trend direction, a sparkline of 24 hourly data points, and comparison values for previous period. That is roughly 4 KB per metric. Thirty metrics: 120 KB of JSON.
A mobile app home feed queries 20 content cards. Each card has a title, body, author, timestamp, 2 image URLs, engagement counts, and a nested array of 5 comments with author and body. That is roughly 3 KB per card. Twenty cards: 60 KB of JSON.
None of these are pathological queries. They are the normal, everyday queries that drive the majority of traffic in any GraphQL API. And they all produce responses that sit squarely in the latency scaling zone for Redis.
Partial Response Caching: Finer Granularity, More Lookups
One response to the size problem is to cache at a finer granularity. Instead of caching the entire 80 KB product listing response, cache each individual resolver result. product:123 maps to a 1.6 KB JSON fragment. product:124 maps to another 1.6 KB fragment. When the product listing query arrives, resolve it from 50 cache lookups instead of one.
The granularity is better. Individual product data can be shared across different query shapes -- the product listing page, the product detail page, the search results page, and the recommendation widget all reference product:123. Cache hit rates improve because a single cache entry serves multiple query shapes.
But partial response caching introduces a different problem: the number of lookups multiplies. A product listing page with 50 products requires 50 cache lookups. If each product also fetches its seller (another lookup) and its top review (another lookup), that is 150 lookups per request.
| Approach | Lookups per Request | Redis Total Latency | In-Process Total Latency | Ratio |
|---|---|---|---|---|
| Full response cache (80KB) | 1 | 2.10ms | 31ns | 67,742x |
| Per-entity (50 products) | 50 | 17.50ms | 1.55us | 11,290x |
| Per-entity + relations | 150 | 52.50ms | 4.65us | 11,290x |
At 50 individual Redis lookups at 0.35ms each (1.6 KB values are in the small-but-not-trivial range), the aggregate cache latency is 17.5 milliseconds. At 150 lookups, it is 52.5 milliseconds. That is not a cache. That is a second database.
With pipelining, you can reduce the 50-lookup cost to roughly one round-trip plus serialization time -- approximately 3-5ms. But pipelining requires batching all keys upfront, which conflicts with GraphQL's tree-resolution model where child resolver keys depend on parent resolver results. The dependency chain between resolvers limits how much you can pipeline in practice.
In-process caching resolves this entirely. Fifty lookups at 31 nanoseconds each take 1.55 microseconds. One hundred fifty lookups take 4.65 microseconds. The number of lookups becomes irrelevant because each lookup is a pointer dereference in local memory. You get the granularity benefits of per-entity caching (shared entries across query shapes, better hit rates) without the latency penalty of per-entity network round-trips.
The N+1 Size Problem, Precisely
In REST, you cache N endpoints with predictable response sizes. In GraphQL, you cache N query shapes with variable response sizes, or you cache N x M entity fragments with M lookups per request. Either way, the total bytes accessed per request and the number of cache operations per request are both unpredictable and both scale with query complexity. This is the N+1 size problem: not a single large value, but an unpredictable number of variable-size values per request, each one paying the network tax independently.
CacheeLFU for GraphQL: Size-Aware Eviction
Standard LRU and LFU eviction policies treat all cache entries as equal-cost. Evicting a 120 KB dashboard response and evicting a 1.6 KB product entry free the same "slot" in the cache. But they do not free the same amount of memory, and they do not have the same reload cost.
CacheeLFU factors both access frequency and entry size into its eviction decisions. A 120 KB dashboard response that is accessed 500 times per second stays in L0 indefinitely -- it is hot, and evicting it would cost 120 KB of re-serialization from the database on every miss. A 120 KB admin analytics query accessed once per hour evicts quickly -- it occupies significant memory for negligible cache benefit.
Critically, CacheeLFU prevents a single large entry from evicting many small hot entries. If the cache needs to free 120 KB of memory, it will not evict 60 hot 2 KB product entities (which would cause 60 future cache misses) to make room for one cold 120 KB analytics response. The eviction score accounts for the number of hot entries that would be displaced per byte of memory freed.
This matters specifically for GraphQL because query shapes have such extreme size variance. A single API can serve 200-byte queries and 200 KB queries from the same endpoint. Without size-aware eviction, the large cold queries continuously evict the small hot queries, destroying hit rates for the traffic that actually matters.
How CacheeLFU Prioritizes GraphQL Traffic
- L0 (in-process, 31ns): Product pages, search results, navigation menus, user profiles -- high-frequency queries with 2-50 KB responses. These stay resident because their access frequency dominates the eviction score.
- L1 (Redis or disk, 0.3-3ms): Admin dashboards, monthly reports, one-off analytics -- low-frequency queries with large responses. These evict from L0 naturally and fall through to L1 on the rare occasions they are requested.
The result is that 85-95% of GraphQL traffic -- the product pages, the search results, the mobile home feeds -- resolve from L0 at 31 nanoseconds. The remaining 5-15% -- admin tools, analytics exports, infrequent reports -- resolve from L1 at Redis-tier latency. The hot path is fast. The cold path is acceptable. Nothing is slow.
Implementation: Query Hash Cache with Cachee
Here is a complete implementation for a Node.js GraphQL server using Apollo Server and Cachee as the L0 cache layer. The approach uses query hash caching at the full-response level for maximum simplicity, with CacheeLFU handling eviction automatically.
const { ApolloServer } = require('@apollo/server');
const { createHash } = require('crypto');
const net = require('net');
// Cachee client (RESP-compatible, localhost:6380)
const cacheeClient = createCacheeClient('localhost', 6380);
// Query hash middleware for Apollo Server
const cachePlugin = {
async requestDidStart({ request }) {
// Skip mutations and subscriptions
if (request.query?.trim().startsWith('mutation') ||
request.query?.trim().startsWith('subscription')) {
return {};
}
// Generate deterministic cache key
const normalized = normalizeQuery(request.query);
const varsStr = request.variables
? JSON.stringify(request.variables, Object.keys(request.variables).sort())
: '';
const cacheKey = createHash('sha256')
.update(`${normalized}|${varsStr}`)
.digest('hex');
return {
async responseForOperation() {
// Check L0 cache (31ns if hit)
const cached = await cacheeClient.get(`gql:${cacheKey}`);
if (cached) {
return { body: { singleResult: JSON.parse(cached) } };
}
return null; // Cache miss, continue to resolvers
},
async willSendResponse({ response }) {
// Store in L0 with 60s TTL
// CacheeLFU handles eviction if memory is constrained
const body = JSON.stringify(response.body.singleResult);
await cacheeClient.set(`gql:${cacheKey}`, body, 'EX', 60);
}
};
}
};
function normalizeQuery(query) {
return query
.replace(/#[^\n]*/g, '') // strip comments
.replace(/\s+/g, ' ') // collapse whitespace
.trim();
}
The cacheeClient connects to Cachee on localhost:6380, which speaks RESP and is compatible with any Redis client library. No code changes to your Redis client are needed -- just change the port. Cachee handles L0 in-process caching with CacheeLFU eviction. Cache hits return in 31 nanoseconds. Cache misses fall through to your resolvers, and the assembled response is stored in L0 for subsequent requests.
The Difference in Production Numbers
Consider a production GraphQL API serving 5,000 requests per second, with the following query distribution:
- 60% product queries (average 30 KB response) -- 3,000 req/s
- 25% search queries (average 45 KB response) -- 1,250 req/s
- 10% dashboard queries (average 100 KB response) -- 500 req/s
- 5% admin/analytics (average 150 KB response) -- 250 req/s
With Redis as the cache layer, assuming 90% hit rate across all query types:
| Query Type | Hits/sec | Redis Latency/Hit | Total Redis Time/sec |
|---|---|---|---|
| Product (30KB) | 2,700 | 1.10ms | 2,970ms |
| Search (45KB) | 1,125 | 1.40ms | 1,575ms |
| Dashboard (100KB) | 450 | 2.70ms | 1,215ms |
| Admin (150KB) | 225 | 3.80ms | 855ms |
| Total | 4,500 | 6,615ms |
You are consuming 6.6 seconds of cumulative Redis time per second just for cache reads. That requires at least 7 concurrent Redis connections to sustain, and any queueing on the single-threaded Redis event loop will push P99 latency well beyond the P50 numbers shown above.
With Cachee L0 at the same 90% hit rate:
| Query Type | Hits/sec | L0 Latency/Hit | Total L0 Time/sec |
|---|---|---|---|
| Product (30KB) | 2,700 | 31ns | 0.084ms |
| Search (45KB) | 1,125 | 31ns | 0.035ms |
| Dashboard (100KB) | 450 | 31ns | 0.014ms |
| Admin (150KB) | 225 | 31ns | 0.007ms |
| Total | 4,500 | 0.140ms |
Total L0 cache time: 0.14 milliseconds per second. That is 47,250 times less cumulative cache overhead. No connection pool. No queueing. No serialization. The cache layer effectively disappears from your latency budget.
When to Use Which Strategy
Full response caching with query hashes is the simplest approach and the right default. It requires no changes to your resolver logic, works with any GraphQL server framework, and provides the highest single-lookup cache benefit. Use this when your query cardinality is bounded (most production APIs have 50-200 distinct query shapes that account for 95% of traffic) and your responses are under 200 KB.
Per-entity partial caching is the right choice when query cardinality is high (analytics tools, graph explorers, ad-hoc query interfaces) and entity reuse across query shapes is significant. It requires resolver-level cache integration and more cache lookups per request, which makes in-process L0 even more critical -- 150 Redis lookups per request is untenable, but 150 in-process lookups at 31ns each is invisible.
Hybrid caching -- full-response L0 for the top 50 query shapes, per-entity L0 for the long tail -- gives you the best of both approaches. The hot queries resolve in a single 31ns lookup. The rare queries assemble from cached entities, avoiding full resolver execution while still benefiting from shared entity data.
The Bottom Line
GraphQL's variable-size responses break every assumption that REST caching relies on. URL-based caching does not work because the URL is always the same. CDN caching does not work because GraphQL uses POST. Redis works but adds 0.8-3.8ms per GET for typical 20-150 KB GraphQL responses. Partial response caching improves hit rates but multiplies lookups, making Redis latency even worse. In-process L0 caching with CacheeLFU solves both problems: 31ns per lookup regardless of response size, size-aware eviction that keeps hot queries resident, and zero network overhead. For GraphQL APIs serving more than a few hundred requests per second, the difference between Redis and in-process is not an optimization. It is the difference between a cache that helps and a cache that is your bottleneck.
31ns GraphQL response caching. Any query shape. Any response size.
Install Cachee PQ-Cache Architecture