Problem
Every data read that bypasses a cache hits the origin ā a database, remote API, or expensive computation. At scale, the origin cannot serve all reads directly:
Without cache:
1,000 req/s Ć 10ms DB latency = 10,000ms of DB time/second
= 10 DB connections fully saturated just for reads
With 95% cache hit rate:
1,000 req/s Ć 5% miss rate = 50 cache misses/s hitting DB
= 0.5 DB connections needed for reads
= 20Ć reduction in DB loadThe problem is not just "add a cache and it works." The hard problems are: cache invalidation, cold starts, thundering herd on misses, and choosing the right placement strategy.
---
Why It Matters (Latency, Throughput, Cost)
The fundamental cache equation:
effective_latency = hit_rate Ć cache_latency + (1 - hit_rate) Ć origin_latencyConcrete example with Redis (0.5ms) and PostgreSQL (10ms):
hit_rate=0.50: 0.50Ć0.5ms + 0.50Ć10ms = 0.25 + 5.00 = 5.25ms
hit_rate=0.80: 0.80Ć0.5ms + 0.20Ć10ms = 0.40 + 2.00 = 2.40ms
hit_rate=0.95: 0.95Ć0.5ms + 0.05Ć10ms = 0.475 + 0.50 = 0.975ms
hit_rate=0.99: 0.99Ć0.5ms + 0.01Ć10ms = 0.495 + 0.10 = 0.595msGoing from 80% to 99% hit rate: 4Ć latency reduction (2.4ms ā 0.6ms).
The cache hit rate is everything. A 50% hit rate barely helps. Target ā„90%.
Cost:
- Redis on AWS ElastiCache: ~$0.025/GB-hour
- RDS PostgreSQL: ~$0.25/GB-hour
- Read replica: $0.125/GB-hour
Serving reads from cache is 10Ć cheaper than read replicas.
---
Mental Model
Think of the memory hierarchy:
Level Latency Capacity Cost/GB
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
CPU L1 cache 4 cycles 32 KB $10,000+
CPU L2 cache 12 cycles 256 KB $10,000+
CPU L3 cache 40 cycles 8 MB $10,000+
RAM 200 cycles 32-512GB $5-20
NVMe SSD 10,000 cycles 1-8 TB $0.10-0.50
Network HDD 100,000 cycles many TB $0.02-0.10
Remote DB 500,000 cycles --- varies
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Application cache (Redis): 500 cycles over network ā 0.5msCaching moves data "up" the hierarchy. Redis at 0.5ms is to PostgreSQL at 10ms what L1 cache is to RAM.
---
Underlying Theory (OS / CN / DSA / Math Linkage)
Zipf Distribution ā Why Caches Work
Real-world access patterns follow a Zipf (power law) distribution: the most popular item is accessed proportionally more than the second most popular, which is more than the third, etc.
P(rank=k) ā 1/k^s (s ā 1 for web workloads)
Rank 1 item: 30% of all accesses
Rank 2 item: 15% of all accesses
Rank 3 item: 10% of all accesses
Top 20% of items: ~80% of all accesses ā Pareto principleIf you cache only the top 20% of your item catalog, you achieve ~80% hit rate. This is why a small cache (10% of data size) can yield excellent hit rates.
LRU Algorithm: Doubly Linked List + Hash Map
The Least Recently Used (LRU) eviction algorithm in O(1) time:
Data structure:
hash_map: key ā node (for O(1) lookup)
doubly_linked_list: (for O(1) move-to-front and evict-from-tail)
āāāāāāāā āāāāāāāā āāāāāāāā āāāāāāāā
ā MRU āāāāā B āāāāā C āāāāā LRU ā
ā head āāāāŗā āāāāŗā āāāāŗā tail ā
āāāāāāāā āāāāāāāā āāāāāāāā āāāāāāāā
(most recently used) (evict next)
get(key):
1. Look up node in hash_map ā O(1)
2. Move node to head of list ā O(1) (update 4 pointers)
3. Return value
put(key, value):
1. If key exists: update value, move to head ā O(1)
2. If cache full: remove tail from list AND hash_map ā O(1)
3. Insert new node at head, add to hash_map ā O(1)LFU vs LRU
- LRU (Least Recently Used): evicts the item not accessed for the longest time. Simple, works well for temporal locality.
- LFU (Least Frequently Used): evicts the item with the fewest total accesses. Better for Zipf workloads but complex (requires frequency counter + min-heap or bucket structure).
- Redis default: LRU (approximated with 5-sample eviction for performance)
ARC (Adaptive Replacement Cache): self-tuning between LRU and LFU. Used in ZFS, Solaris.
---
Cache Placement Strategies
1. Cache-Aside (Lazy Loading) ā Most Common
async def get_user(user_id: int) -> dict:
cache_key = f"user:{user_id}"
# 1. Check cache first
cached = await redis.get(cache_key)
if cached:
return json.loads(cached) # cache hit
# 2. Cache miss ā fetch from DB
user = await db.fetchrow("SELECT * FROM users WHERE id = $1", user_id)
if user is None:
return None
# 3. Populate cache
await redis.setex(cache_key, 300, json.dumps(dict(user))) # TTL: 5 min
return dict(user) # cache miss + populatePros: Simple. Only caches what's actually requested. No startup overhead.
Cons: Cache miss adds DB latency. First request after cache cold is slow. Race condition possible (two requests both miss and both write).
2. Read-Through
The cache layer itself fetches from the origin on a miss. The application only talks to the cache.
class ReadThroughCache:
def __init__(self, redis, db, ttl=300):
self._redis = redis
self._db = db
self._ttl = ttl
async def get(self, key: str, loader_fn):
value = await self._redis.get(key)
if value:
return json.loads(value)
# Cache handles the load
value = await loader_fn()
if value is not None:
await self._redis.setex(key, self._ttl, json.dumps(value))
return value
# Usage:
cache = ReadThroughCache(redis, db)
user = await cache.get(f"user:{user_id}", lambda: db.get_user(user_id))3. Write-Through
Every write goes to both cache and DB synchronously. Cache is always consistent.
async def update_user(user_id: int, updates: dict) -> dict:
# Write to DB first (source of truth)
user = await db.execute(
"UPDATE users SET name=$1 WHERE id=$2 RETURNING *",
updates["name"], user_id
)
# Write to cache immediately (write-through)
await redis.setex(f"user:{user_id}", 300, json.dumps(dict(user)))
return dict(user)Pros: Cache always consistent after write. No stale reads.
Cons: Every write pays cache write cost. Cache fills with data that may never be read.
4. Write-Behind (Write-Back)
Writes go to cache immediately, then asynchronously flush to DB. Highest write throughput.
import asyncio
from collections import defaultdict
class WriteBehindCache:
def __init__(self, redis, db, flush_interval=1.0):
self._redis = redis
self._db = db
self._dirty: dict[str, dict] = {} # pending writes
self._lock = asyncio.Lock()
asyncio.create_task(self._flush_loop(flush_interval))
async def set(self, key: str, value: dict):
await self._redis.set(key, json.dumps(value))
async with self._lock:
self._dirty[key] = value # queue for async flush
async def _flush_loop(self, interval: float):
while True:
await asyncio.sleep(interval)
async with self._lock:
pending = dict(self._dirty)
self._dirty.clear()
if pending:
async with self._db.transaction():
for key, value in pending.items():
await self._db.upsert("users", value)Pros: Writes are fast (in-memory only). DB batches many writes.
Cons: Data loss if cache crashes before flush. Complex. Risk of inconsistency.
---
Thundering Herd and Cache Stampede
When a popular key expires, many concurrent requests all see a cache miss simultaneously. All fire DB queries at once ā thundering herd / cache stampede.
Solution 1: Mutex / Single-Flight
import asyncio
_locks: dict[str, asyncio.Lock] = {}
async def get_with_lock(key: str, loader_fn, ttl=300):
cached = await redis.get(key)
if cached:
return json.loads(cached)
# Ensure only one coroutine fetches at a time
if key not in _locks:
_locks[key] = asyncio.Lock()
async with _locks[key]:
# Double-check after acquiring lock (another coroutine may have loaded)
cached = await redis.get(key)
if cached:
return json.loads(cached)
value = await loader_fn()
await redis.setex(key, ttl, json.dumps(value))
return valueSolution 2: Probabilistic Early Revalidation (PER)
Refresh the cache slightly before expiry, based on probability that scales as TTL approaches zero:
import math
import random
import time
async def get_with_per(key: str, loader_fn, ttl=300, beta=1.0):
"""
Probabilistic Early Revalidation (PER) algorithm.
Fetch is triggered before expiry with increasing probability.
beta controls aggressiveness (higher = earlier revalidation).
"""
entry = await redis.get(f"{key}:per") # stores {value, computed_at, ttl}
if entry:
data = json.loads(entry)
value = data["value"]
computed_at = data["computed_at"]
stored_ttl = data["ttl"]
elapsed = time.time() - computed_at
remaining = stored_ttl - elapsed
# Decide whether to recompute based on probability
# P(recompute) increases as remaining TTL decreases
gap = stored_ttl - remaining # time since creation
if remaining <= 0 or (-beta * math.log(random.random()) >= remaining):
# Recompute (in background, return cached value now)
asyncio.create_task(_refresh(key, loader_fn, ttl))
return value
# Cache miss ā must fetch
value = await loader_fn()
entry = {"value": value, "computed_at": time.time(), "ttl": ttl}
await redis.setex(f"{key}:per", ttl + 60, json.dumps(entry))
return value
async def _refresh(key, loader_fn, ttl):
value = await loader_fn()
entry = {"value": value, "computed_at": time.time(), "ttl": ttl}
await redis.setex(f"{key}:per", ttl + 60, json.dumps(entry))Solution 3: Bloom Filter for Negative Caching
Cache misses for non-existent keys are expensive. A Bloom filter provides O(1) space-efficient check for "definitely not in DB":
from pybloom_live import BloomFilter
# Bloom filter: compact set membership check
# False positive rate: ~1% with 10 bits per element
user_bloom = BloomFilter(capacity=1_000_000, error_rate=0.01)
# On DB load, add all existing IDs to bloom filter
for user_id in db.execute("SELECT id FROM users"):
user_bloom.add(user_id)
async def get_user_with_bloom(user_id: int):
# Quickly reject lookups for non-existent users (avoids DB miss)
if user_id not in user_bloom:
return None # Definitely doesn't exist ā no DB query needed
cached = await redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
user = await db.fetchrow("SELECT * FROM users WHERE id = $1", user_id)
if user:
await redis.setex(f"user:{user_id}", 300, json.dumps(dict(user)))
return dict(user) if user else None---
LRU Implementation (from scratch)
from collections import OrderedDict
from threading import Lock
class LRUCache:
"""
Thread-safe LRU cache backed by OrderedDict.
OrderedDict in Python 3.7+ is a doubly linked list + dict.
"""
def __init__(self, capacity: int):
self.capacity = capacity
self._cache: OrderedDict[str, any] = OrderedDict()
self._lock = Lock()
def get(self, key: str):
with self._lock:
if key not in self._cache:
return None
self._cache.move_to_end(key) # mark as recently used
return self._cache[key]
def put(self, key: str, value) -> None:
with self._lock:
if key in self._cache:
self._cache.move_to_end(key)
self._cache[key] = value
else:
self._cache[key] = value
if len(self._cache) > self.capacity:
self._cache.popitem(last=False) # evict LRU (first item)
@property
def stats(self) -> dict:
with self._lock:
return {"size": len(self._cache), "capacity": self.capacity}---
Cache Invalidation Strategies
1. TTL-based (simplest)
await redis.setex(f"user:{user_id}", 300, json.dumps(user)) # expires in 5 minutesConsistency window: up to TTL seconds.
Best for: semi-static data (user profiles, product catalog).
Avoid for: financial data, inventory counts, anything requiring strong consistency.
2. Event-driven invalidation
# On DB write, publish an invalidation event
async def update_user(user_id: int, updates: dict):
user = await db.update_user(user_id, updates)
# Invalidate cache immediately
await redis.delete(f"user:{user_id}")
# Publish event for other cache layers (CDN, other services)
await pubsub.publish("cache.invalidate", {"entity": "user", "id": user_id})
return user
# Other services listen and invalidate their local caches
async def on_cache_invalidate(message):
entity = message["entity"]
entity_id = message["id"]
await local_cache.delete(f"{entity}:{entity_id}")3. Version-based (cache tags)
async def get_user_posts(user_id: int):
# Key includes a version number stored separately
version = await redis.get(f"user:{user_id}:version") or "0"
cache_key = f"user:{user_id}:posts:v{version}"
cached = await redis.get(cache_key)
if cached:
return json.loads(cached)
posts = await db.fetchall("SELECT * FROM posts WHERE user_id = $1", user_id)
await redis.setex(cache_key, 3600, json.dumps(posts))
return posts
async def invalidate_user_posts(user_id: int):
# Increment version ā old keys automatically become orphaned
await redis.incr(f"user:{user_id}:version")
# Old cache entries expire via TTL naturally---
Complexity Analysis
| Operation | LRU Cache | Redis Network Cache | DB query |
|---|---|---|---|
| Cache hit | O(1) time, L1/RAM latency | O(1) time, ~0.5ms | |
| Cache miss + load | O(1) + O(query) | O(1) + O(query) | |
| Cache eviction | O(1) | O(1) | |
| Invalidation (TTL) | O(1) | O(1) | |
| Invalidation (scan) | O(N) | O(N) SCAN |
Space complexity: O(capacity) for LRU cache.
---
Benchmark (p50, p99, CPU, Memory)
Setup: 10,000 req/s, Redis 7 (single node), PostgreSQL 15, Redis on same host (0.3ms RTT).
āāāāāāāāāāāāāāāāāāāāāā¬āāāāāāāāā¬āāāāāāāāā¬āāāāāāāāāāā¬āāāāāāāāāāāāāāāā
ā Hit Rate / Config ā p50 ā p99 ā DB Calls/sā Redis Mem/GB ā
āāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāā¼āāāāāāāāā¼āāāāāāāāāāā¼āāāāāāāāāāāāāāāā¤
ā No cache ā 8ms ā 22ms ā 10,000 ā 0 ā
ā 50% hit rate ā 4ms ā 14ms ā 5,000 ā 0.5 ā
ā 80% hit rate ā 2ms ā 7ms ā 2,000 ā 1.0 ā
ā 95% hit rate ā 0.8ms ā 2ms ā 500 ā 2.0 ā
ā 99% hit rate ā 0.5ms ā 1ms ā 100 ā 4.0 ā
āāāāāāāāāāāāāāāāāāāāāā“āāāāāāāāā“āāāāāāāāā“āāāāāāāāāāā“āāāāāāāāāāāāāāāā
CPU: Redis uses ~1 CPU core at 100k ops/sec.
PostgreSQL CPU drops proportionally with hit rate.---
Observability
from prometheus_client import Counter, Histogram, Gauge
cache_ops = Counter('cache_operations_total', 'Cache ops', ['operation', 'result'])
# result: hit, miss, error, eviction
cache_latency = Histogram('cache_latency_seconds', 'Cache operation latency',
['operation'], buckets=[.0001, .0005, .001, .005, .010, .025, .100])
cache_memory = Gauge('cache_memory_bytes', 'Cache memory usage')
def track_cache(operation: str):
def decorator(fn):
async def wrapper(*args, **kwargs):
start = time.monotonic()
try:
result = await fn(*args, **kwargs)
if result is None:
cache_ops.labels(operation, 'miss').inc()
else:
cache_ops.labels(operation, 'hit').inc()
return result
except Exception as e:
cache_ops.labels(operation, 'error').inc()
raise
finally:
cache_latency.labels(operation).observe(time.monotonic() - start)
return wrapper
return decorator
# Derived metrics to alert on:
# hit_rate = rate(cache_ops{result="hit"}[5m]) / rate(cache_ops_total[5m])
# ALERT: hit_rate < 0.80 for 5 minutes
# ALERT: cache_memory_bytes / cache_max_bytes > 0.90 (approaching eviction pressure)Redis built-in stats
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses|evicted_keys|used_memory"
# keyspace_hits: 10450921
# keyspace_misses: 524051 ā hit_rate = 10450921 / (10450921 + 524051) = 95.2%
# evicted_keys: 1203 ā eviction happening ā consider increasing maxmemory
# used_memory_human: 2.50G---
Multi-language Implementation
Python ā Redis with connection pool
import redis.asyncio as aioredis
import json
from functools import wraps
redis_pool = aioredis.ConnectionPool.from_url(
"redis://localhost:6379",
max_connections=20,
decode_responses=True
)
redis_client = aioredis.Redis(connection_pool=redis_pool)
def cached(key_fn, ttl=300):
"""Decorator for cache-aside pattern."""
def decorator(fn):
@wraps(fn)
async def wrapper(*args, **kwargs):
key = key_fn(*args, **kwargs)
hit = await redis_client.get(key)
if hit:
return json.loads(hit)
result = await fn(*args, **kwargs)
if result is not None:
await redis_client.setex(key, ttl, json.dumps(result))
return result
return wrapper
return decorator
@cached(key_fn=lambda user_id: f"user:{user_id}", ttl=300)
async def get_user(user_id: int) -> dict:
return await db.fetchrow("SELECT * FROM users WHERE id = $1", user_id)Go ā Redis with go-redis
package cache
import (
"context"
"encoding/json"
"time"
"github.com/redis/go-redis/v9"
)
type Cache struct {
client *redis.Client
}
func New(addr string) *Cache {
return &Cache{
client: redis.NewClient(&redis.Options{
Addr: addr,
PoolSize: 20,
MinIdleConns: 5,
DialTimeout: 5 * time.Second,
ReadTimeout: 3 * time.Second,
WriteTimeout: 3 * time.Second,
}),
}
}
func (c *Cache) GetOrLoad(ctx context.Context, key string, loader func() (any, error), ttl time.Duration) (any, error) {
val, err := c.client.Get(ctx, key).Result()
if err == nil {
var result any
json.Unmarshal([]byte(val), &result)
return result, nil // cache hit
}
if err != redis.Nil {
return nil, err // redis error
}
// Cache miss ā load from source
value, err := loader()
if err != nil {
return nil, err
}
data, _ := json.Marshal(value)
c.client.SetEx(ctx, key, string(data), ttl)
return value, nil
}Node.js ā ioredis
const Redis = require('ioredis');
const redis = new Redis({
host: 'localhost',
port: 6379,
maxRetriesPerRequest: 3,
retryStrategy: (times) => Math.min(times * 50, 2000),
lazyConnect: false,
});
class Cache {
constructor(redis, defaultTTL = 300) {
this.redis = redis;
this.defaultTTL = defaultTTL;
}
async getOrLoad(key, loader, ttl = this.defaultTTL) {
const cached = await this.redis.get(key);
if (cached !== null) {
return JSON.parse(cached); // cache hit
}
const value = await loader();
if (value !== null && value !== undefined) {
await this.redis.setex(key, ttl, JSON.stringify(value));
}
return value;
}
async invalidate(key) {
await this.redis.del(key);
}
async invalidatePattern(pattern) {
// Use SCAN to avoid blocking Redis with KEYS
const pipeline = this.redis.pipeline();
let cursor = '0';
do {
const [nextCursor, keys] = await this.redis.scan(cursor, 'MATCH', pattern, 'COUNT', 100);
cursor = nextCursor;
keys.forEach(key => pipeline.del(key));
} while (cursor !== '0');
await pipeline.exec();
}
}
const cache = new Cache(redis);
const getUser = (userId) =>
cache.getOrLoad(`user:${userId}`, () => db.query('SELECT * FROM users WHERE id = $1', [userId]));---
Trade-offs
| Strategy | Consistency | Complexity | Write Cost | Best For |
|---|---|---|---|---|
| Cache-Aside | Eventual (TTL) | Low | None | |
| Read-Through | Eventual (TTL) | Medium | None | |
| Write-Through | Strong | Medium | 2Ć write | |
| Write-Behind | Eventual | High | Near-zero |
Cache size vs hit rate trade-off:
Working set size estimate (for Zipf with s=1):
Cache 10% of items ā ~63% hit rate
Cache 20% of items ā ~74% hit rate
Cache 50% of items ā ~88% hit rate
Diminishing returns: doubling cache size increases hit rate by decreasing amounts.---
Failure Modes
1. Cold start / cache stampede:
After deploy, cache is empty. All requests hit DB simultaneously. DB CPU spikes, queries slow down, connections exhaust. Write-through or pre-warming at deploy time prevents this.
async def warm_cache():
"""Pre-populate cache with hot data before accepting traffic."""
hot_user_ids = await db.fetchall(
"SELECT id FROM users ORDER BY last_active DESC LIMIT 10000"
)
for user_id in hot_user_ids:
user = await db.fetchrow("SELECT * FROM users WHERE id = $1", user_id)
await redis.setex(f"user:{user_id}", 3600, json.dumps(dict(user)))2. Inconsistency window:
TTL-based caching means stale reads for TTL seconds after a write. For financial data (balances, inventory), stale reads are dangerous.
Mitigation: Event-driven invalidation. On write, immediately DEL the key.
3. Memory pressure and eviction:
When Redis hits maxmemory, it begins evicting keys. If your eviction policy is allkeys-lru, hot keys may be evicted. If noeviction, new writes fail.
# Set eviction policy
redis-cli CONFIG SET maxmemory-policy allkeys-lru
redis-cli CONFIG SET maxmemory 2gb4. Cache key collision:
Two different data types sharing a key namespace causes data corruption.
# BAD: user:1 could be either a User or a UserPreference
cache.set("user:1", user_data)
cache.set("user:1", user_prefs) # overwrites!
# GOOD: namespace by entity type
cache.set("user:profile:1", user_data)
cache.set("user:prefs:1", user_prefs)5. Serialization mismatch:
Storing JSON-serialized objects and deserializing into the wrong type causes silent data bugs.
---
When NOT to Use
Strongly consistent data:
Bank balances, inventory counts, distributed locks ā these require the DB as the single source of truth. Caching introduces inconsistency windows that cause incorrect reads.
Highly write-heavy data:
If a key is written 100Ć per second and read 10Ć per second, caching increases write overhead with near-zero hit rate. Cache read-heavy paths only.
Large objects:
Caching 100MB objects in Redis wastes memory. Consider Content-Addressable Storage (S3/GCS) with a CDN for large blobs.
Personal/sensitive data at rest:
Caching PII in Redis adds attack surface. Ensure Redis has auth, TLS, and network isolation. Consider not caching sensitive fields.
---
Lab
The lab for this module is embedded in the benchmarks directory:
../../benchmarks/06-cache-vs-no-cache/README.md
It demonstrates the hit rate vs latency trade-off with a simulated Zipf workload.
---
Key Takeaways
- The cache equation:
effective_latency = hit_rate Ć cache_latency + (1 - hit_rate) Ć origin_latency. Hit rate is the dominant variable. - Zipf distribution makes caches work: 20% of items get 80% of reads. Cache that 20%.
- LRU is O(1) for both get and put using doubly linked list + hash map. Redis approximates it with 5-sample random eviction.
- Cache placement: Cache-Aside for most cases. Write-Through for write+read hot data. Write-Behind for extreme write throughput.
- Thundering herd on expiry is a real production failure mode. Use single-flight mutex or probabilistic early revalidation (PER).
- Invalidation is hard. TTL is simple but eventually consistent. Event-driven is complex but correct.
- Bloom filters eliminate DB queries for non-existent keys ā essential for attack mitigation (random key probing).
- Observe: track hit rate, eviction rate, memory usage. Alert on hit_rate < 80% and memory > 90%.