Database Query Caching: Stop Hitting Postgres

April 27, 2026 | 14 min read | Engineering

Your database is doing too much work. Not because it is poorly tuned or because your queries are unoptimized. It is doing too much work because 80% of the queries it processes are repeated reads -- the same user profile, the same product catalog entry, the same configuration lookup, the same dashboard aggregation. Every one of these queries parses SQL, plans an execution strategy, reads from disk or buffer pool, assembles the result set, serializes it, and sends it back over the network. A simple primary key lookup takes 1-5 milliseconds. A join across three tables takes 5-15 milliseconds. A dashboard aggregation with GROUP BY takes 50-200 milliseconds. And the result is the same result the database returned five seconds ago for the exact same query.

Database query caching eliminates this redundant work. Instead of sending every read to the database, you check a cache first. If the cache has the result, you return it immediately. If not, you query the database and populate the cache. The cached lookup takes 31 nanoseconds in-process. The database query takes 1-200 milliseconds depending on complexity. That is a 30,000x to 6,000,000x difference. At 10,000 queries per second, caching 80% of reads means 8,000 queries per second are served in nanoseconds instead of milliseconds. Your database handles 2,000 queries per second instead of 10,000. It runs cooler, responds faster to the queries that do reach it, and you delay your next database scaling event by months or years.

80%

Repeated Read Queries

31 ns

Cached Query Lookup

15 ms

Avg Database Round-Trip

Why Databases Are Not Caches

Postgres has a buffer pool. MySQL has a query cache (deprecated and removed in 8.0, but it existed). Both databases cache frequently accessed pages in memory. So why add another cache in front of them?

Because the database buffer pool caches disk pages, not query results. When Postgres caches a page in shared_buffers, it avoids a disk read on the next access. But it still has to parse the SQL, plan the query, traverse the index, read the cached page, apply visibility checks (MVCC), assemble the result set, and send it over the network. The buffer pool eliminates disk I/O. It does not eliminate query processing. A primary key lookup that hits the buffer pool still takes 0.5-1 millisecond because of query planning, index traversal, tuple assembly, and network serialization.

MySQL's old query cache was closer to what we are describing -- it cached the actual result set keyed by the SQL text. But MySQL removed it in version 8.0 because it had fundamental design flaws. The query cache used a global mutex that serialized all cache lookups and invalidations. At high concurrency (more than 8 cores), the mutex contention made the query cache slower than not having a cache at all. Additionally, the cache was invalidated at the table level: any write to a table invalidated every cached query that referenced that table, regardless of whether the write affected the cached rows.

An application-level query cache avoids both problems. There is no global mutex because the cache is per-process or per-thread with lock-free concurrent access. Invalidation is at the key level, not the table level: you invalidate specific cached results when the underlying data changes, not every result that touches the same table.

Pattern 1: Read-Through Cache

Read-through caching is the simplest and most common pattern. The application checks the cache before querying the database. On a cache hit, it returns the cached result. On a cache miss, it queries the database, stores the result in the cache, and returns it. The cache is populated lazily -- only when a query is actually needed.

def get_user(user_id):
    cache_key = f"user:{user_id}"

    # Check cache first (31ns)
    if cached := cache.get(cache_key):
        return cached

    # Cache miss: query database (1-5ms)
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # Populate cache with TTL
    cache.set(cache_key, user, ttl=300)  # 5 minutes

    return user

The TTL (time-to-live) determines how long the cached result is valid. A short TTL (30-60 seconds) ensures that cached data is never more than a minute stale. A long TTL (5-30 minutes) reduces database load further but increases the staleness window. The right TTL depends on how frequently the underlying data changes and how tolerant your application is of stale reads.

When to Use Read-Through

Read-through works best for data that is read frequently and written infrequently. User profiles, product catalog entries, configuration settings, and permission lookups are classic read-through candidates. These are queried hundreds or thousands of times per second but updated a few times per day. A 5-minute TTL means the cache handles 99.9% of reads and the database only sees the occasional cache miss plus the writes.

Invalidation Strategy

The simplest invalidation strategy for read-through is TTL-based: let cached entries expire naturally. When a user updates their profile, the old cached profile will be served for up to TTL seconds, after which the next read will hit the database and pick up the new data. If this staleness is unacceptable, add explicit invalidation: when a write occurs, delete the corresponding cache entry so the next read triggers a database query.

def update_user(user_id, data):
    # Write to database
    db.execute("UPDATE users SET ... WHERE id = %s", user_id, data)

    # Invalidate cache entry
    cache.delete(f"user:{user_id}")

    # Next read will miss the cache and fetch fresh data

Pattern 2: Write-Through Cache

Write-through caching writes to both the cache and the database simultaneously. When the application writes data, the write goes to the cache and the database in the same operation. This ensures the cache always has the latest data, eliminating the staleness window that exists in read-through with TTL-based invalidation.

def update_user(user_id, data):
    # Write to database
    db.execute("UPDATE users SET ... WHERE id = %s", user_id, data)

    # Write to cache (same data, same moment)
    updated_user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    cache.set(f"user:{user_id}", updated_user, ttl=300)

    return updated_user

Write-through adds latency to write operations because the application must update both the database and the cache. But reads are always served from cache with fresh data. This pattern works well when write latency is less critical than read consistency -- for example, an admin panel where settings are updated infrequently but read on every request.

The Consistency Problem

Write-through has a subtle consistency problem: what happens if the database write succeeds but the cache write fails? The cache will serve stale data until the TTL expires. What happens if the cache write succeeds but the database write fails? The cache will serve data that does not exist in the database. Both scenarios are edge cases, but they need to be handled.

The safest approach is to write to the database first, then update the cache. If the database write fails, you do not update the cache, and the cache continues to serve the old (correct) data. If the cache write fails after a successful database write, you delete the cache entry (invalidate), so the next read goes to the database and picks up the fresh data. This is "write-through with invalidation fallback" and it ensures that the cache never serves data that is newer than what is in the database.

Pattern 3: Cache-Aside

Cache-aside (also called "lazy-loading" or "look-aside") is similar to read-through but with an important distinction: the application manages the cache explicitly rather than the cache being transparent. The application decides what to cache, when to cache it, and when to invalidate it. This gives the application full control over caching behavior at the cost of more complex application code.

class UserService:
    def get_user(self, user_id):
        # Application explicitly checks cache
        cache_key = f"user:{user_id}"
        user = self.cache.get(cache_key)
        if user:
            return user

        # Application explicitly queries database
        user = self.db.query("SELECT * FROM users WHERE id = %s", user_id)

        # Application explicitly decides what to cache
        if user.is_active:  # Only cache active users
            self.cache.set(cache_key, user, ttl=300)

        return user

    def deactivate_user(self, user_id):
        self.db.execute("UPDATE users SET active=false WHERE id=%s", user_id)
        # Application explicitly invalidates related caches
        self.cache.delete(f"user:{user_id}")
        self.cache.delete(f"user_permissions:{user_id}")
        self.cache.delete(f"user_sessions:{user_id}")

Cache-aside is the most flexible pattern because the application has full control. It can choose to cache only certain records (active users, not deactivated ones). It can choose different TTLs for different record types. It can invalidate related cache entries when a write occurs (deleting user permissions and sessions when a user is deactivated). This flexibility comes at the cost of more code and more places where caching logic can go wrong.

When to Use Cache-Aside

Cache-aside works best when the caching logic is complex or conditional. If you need to cache some records but not others, if different records need different TTLs, or if writes require invalidating multiple related cache entries, cache-aside gives you the control to implement these policies. If your caching is straightforward (cache everything with the same TTL, invalidate on write), read-through is simpler and less error-prone.

Pattern 4: Materialized View Cache

Dashboard queries are the most expensive queries in most applications. A dashboard that shows "total revenue by region for the last 30 days" requires scanning millions of rows, joining tables, grouping by region, and computing aggregates. This query might take 200 milliseconds to 2 seconds depending on data volume. And every user who opens the dashboard runs the same query, producing the same result.

Materialized view caching pre-computes these expensive aggregations and caches the results. Instead of running the aggregation query on every dashboard load, you run it once (on a schedule or on demand) and cache the result. Dashboard reads hit the cache at 31 nanoseconds instead of hitting the database at 200+ milliseconds.

def refresh_revenue_dashboard():
    """Runs every 5 minutes via scheduler"""

    # Expensive aggregation query (200ms - 2s)
    result = db.query("""
        SELECT region, SUM(amount) as total_revenue,
               COUNT(*) as order_count,
               AVG(amount) as avg_order
        FROM orders
        WHERE created_at > NOW() - INTERVAL '30 days'
        GROUP BY region
        ORDER BY total_revenue DESC
    """)

    # Cache the pre-computed result (31ns reads for 5 minutes)
    cache.set("dashboard:revenue_by_region", result, ttl=300)


def get_revenue_dashboard():
    """Called on every dashboard page load"""
    return cache.get("dashboard:revenue_by_region")  # 31ns

The key insight is that dashboard data does not need to be real-time. A dashboard that refreshes every 5 minutes is acceptable for virtually all business use cases. The CEO checking revenue numbers does not need sub-second freshness. They need the numbers to load instantly when they open the page. A 5-minute refresh interval eliminates 100% of dashboard query load from the database while providing an acceptable freshness guarantee.

The Large Result Problem

Dashboard queries often return large result sets. A revenue breakdown by region and product category might produce 50KB of aggregated data. Storing this in a network cache like Redis means 50KB must be serialized, sent over the network, received by the application, and deserialized on every dashboard load. With a network cache, this takes 1.5 milliseconds. With an in-process cache, the result is already in application memory as a native data structure. The lookup takes 31 nanoseconds. There is no serialization, no network transfer, no deserialization. The data is returned by pointer reference.

For applications with many dashboards or complex reporting pages, the difference between 1.5 milliseconds (network cache) and 31 nanoseconds (in-process cache) per query adds up quickly. A reporting page that runs 20 cached queries loads in 30 milliseconds with a network cache or 0.62 microseconds with an in-process cache. The in-process version feels instantaneous. The network cache version has a perceptible delay.

Dashboard Query	Database	Network Cache	In-Process Cache
Revenue by region (5 KB)	200 ms	0.8 ms	31 ns
Revenue by product (20 KB)	350 ms	1.2 ms	31 ns
Revenue by category (50 KB)	500 ms	1.5 ms	31 ns
Full dashboard (20 queries)	3,000 ms	24 ms	0.62 us

Pattern 5: Query Result Deduplication

Query result deduplication takes a different approach from the previous patterns. Instead of caching by key (user ID, dashboard name), it caches by query fingerprint. The fingerprint is a hash of the SQL text and its parameters. Any query with the same SQL and the same parameters produces the same fingerprint, and the cached result is returned.

def query_cached(sql, params, ttl=60):
    # Compute query fingerprint
    fingerprint = sha3_256(sql + canonical_params(params))

    # Check cache (31ns)
    if result := cache.get(fingerprint):
        return result

    # Cache miss: run query (1-200ms)
    result = db.query(sql, params)

    # Cache result
    cache.set(fingerprint, result, ttl=ttl)

    return result

Query result deduplication is useful when the same query is executed by multiple code paths. In a large application, the same SQL might be called from the API layer, the background worker, and the admin panel. Each code path might have its own caching logic (or none at all). Query fingerprinting catches duplicates regardless of which code path triggers the query.

The Fingerprint Design

The query fingerprint must include everything that affects the result. The SQL text is obvious. The parameters are obvious. But there are less obvious factors that need to be included. The database schema version matters because a query that returns three columns today might return four columns after a migration. The transaction isolation level matters because a query in READ COMMITTED might return different results than the same query in SERIALIZABLE. The role or user context matters if the database applies row-level security policies.

fingerprint = SHA3-256(
    sql_text ||
    canonical_params ||
    schema_version ||
    isolation_level ||
    role_context
)

Including all of these factors in the fingerprint ensures that the cache never returns a result that was generated under different conditions. A query cached under one schema version is not returned after a migration. A query cached in one transaction context is not returned in a different context.

The Architecture: Cachee L1 + Postgres

The five patterns above compose into a layered architecture with Cachee as the L1 cache and Postgres as the source of truth. The architecture handles the full spectrum of read patterns: hot reads (user profiles, config), warm reads (product catalog, permissions), and cold reads (reporting, search).

# Application startup
cachee init --capacity 1000000 --eviction cachee-lfu

# Read-through for user profiles (Pattern 1)
# TTL: 5 minutes, invalidated on write
cache.configure("user:*", ttl=300, pattern="read-through")

# Write-through for settings (Pattern 2)
# Always fresh in cache after write
cache.configure("settings:*", ttl=600, pattern="write-through")

# Materialized view for dashboards (Pattern 4)
# Refreshed every 5 minutes by scheduler
cache.configure("dashboard:*", ttl=300, pattern="materialized")

# Query dedup for ad-hoc queries (Pattern 5)
# 60-second TTL, fingerprint-based
cache.configure("query:*", ttl=60, pattern="fingerprint")

This architecture reduces database query volume by 80% for a typical web application. The remaining 20% of queries that reach the database are genuine cache misses: first-time reads, writes, and queries for data that changes too frequently to cache. The database runs at 20% of its previous query volume, which means lower CPU utilization, more connection headroom, and faster response times for the queries that do reach it.

Invalidation: The Hard Problem

Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. He was right about cache invalidation. It is the single hardest part of any caching system, and it is where most caching implementations fail.

The core problem is knowing when to invalidate. When a row in the database changes, which cached entries depend on that row? For a read-through cache keyed by primary key, the answer is simple: invalidate the entry with that primary key. But for a query result cache, the answer is much harder. A query like SELECT * FROM orders WHERE user_id = 42 AND status = 'pending' depends on every row in the orders table where user_id = 42. When any of those rows change, the cached result is potentially stale.

Three Invalidation Strategies

TTL-based invalidation is the simplest. Every cached entry has a TTL. When the TTL expires, the entry is evicted. The next read triggers a database query and refreshes the cache. This approach accepts bounded staleness (up to TTL seconds) in exchange for simplicity. No write-path logic is needed. No dependency tracking is needed. The cache simply expires entries on a schedule.

Write-triggered invalidation is more precise. When the application writes to the database, it explicitly invalidates the cache entries that depend on the changed data. This requires the application to know which cache entries depend on which database tables and rows. For simple key-value lookups, this is straightforward. For complex queries with joins, it requires maintaining a dependency graph that maps database tables to cache keys.

Change Data Capture (CDC) invalidation is the most sophisticated. A CDC pipeline (using Postgres logical replication, Debezium, or a similar tool) monitors the database write-ahead log and emits events for every row change. A cache invalidation service consumes these events and invalidates the corresponding cache entries. This approach decouples the write path from cache invalidation, which means the application code does not need to know about the cache at all. Writes go to the database. The CDC pipeline handles invalidation. Reads go to the cache. The three components are independent.

Strategy	Complexity	Staleness	Best For
TTL-based	Low	Up to TTL	Config, catalogs, dashboards
Write-triggered	Medium	Near-zero	User profiles, sessions
CDC-based	High	Near-zero	Large systems, many writers

What Not to Cache

Not every database query should be cached. Some queries return data that changes too frequently, is too large, or is too unique to benefit from caching.

Transactional data in active workflows. An order that is being processed moves through multiple states (pending, processing, shipped, delivered) in rapid succession. Caching the order status would return stale data during the most critical phase of the customer experience. Let these queries go directly to the database.

Full-text search results. Search queries have extremely high cardinality (millions of unique search terms). The cache hit rate for individual search queries is very low, so most entries would be evicted before they are ever accessed again. Cache the underlying documents, not the search results.

Data with per-request freshness requirements. Financial balances, inventory counts, and rate limit counters must be current on every read. Caching these values, even with a 1-second TTL, can lead to overdrafts, oversells, or rate limit bypasses. Use the database (or a specialized store) for these values.

Large result sets that would exhaust cache memory. A query that returns 10MB of data would consume 10MB of cache capacity for a single entry. If the query is rarely repeated, that is 10MB of wasted cache space that could hold thousands of smaller entries with higher reuse rates. Apply size limits to cached entries and let oversized results go directly to the database.

Measuring the Impact

Before implementing database query caching, measure your current query profile. You need three numbers: the total query volume, the percentage of repeated reads, and the average query latency. Postgres provides these through pg_stat_statements.

-- Top 10 most frequently executed queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

-- Percentage of total time spent on repeated queries
SELECT
    SUM(CASE WHEN calls > 100 THEN total_exec_time ELSE 0 END) /
    SUM(total_exec_time) * 100 as pct_repeated
FROM pg_stat_statements;

In most applications, you will find that 10-20 unique queries account for 80% of total query volume. These are your caching candidates. A user profile lookup that runs 500,000 times per day at 2ms each is consuming 1,000 seconds of database time per day. Caching it at 31ns reduces the database time to 15.5 milliseconds per day -- for the cache misses only. The other 499,900+ executions are served from cache.

The Bottom Line

80% of database queries are repeated reads that produce the same result. Five caching patterns -- read-through, write-through, cache-aside, materialized views, and query deduplication -- eliminate these redundant queries. Cached results are returned in 31 nanoseconds instead of 1-200 milliseconds from the database. The database handles 20% of its previous query volume. Cache the query result, not just the connection.

31ns query results instead of 15ms database round-trips. Cache what Postgres keeps re-computing.

brew install cachee Kubernetes Cache Sidecar