3. Caching
Caching is the technique of storing copies of frequently accessed data in a fast-access storage layer so that future requests can be served faster. It is one of the most impactful performance optimizations in system design.
Why Caching?
- Reduce latency: Memory access is ~100x faster than disk; local cache is ~1000x faster than a network call.
- Reduce database load: Fewer queries hit the database.
- Reduce network calls: Avoid repeated calls to external services.
- Improve throughput: Serve more requests with the same infrastructure.
- Save cost: Fewer backend resources needed.
Latency Numbers Every Engineer Should Know
| Operation | Approximate Time |
|---|---|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| RAM reference | 100 ns |
| SSD random read | 150 μs |
| HDD seek | 10 ms |
| Network round trip (same DC) | 0.5 ms |
| Network round trip (cross-continent) | 150 ms |
Cache Layers in a System
┌──────────────────────────────────────────────────────┐
│ Client / Browser │
│ ┌───────────────┐ │
│ │ Browser Cache │ ← HTTP cache headers │
│ └───────┬───────┘ │
└──────────┼───────────────────────────────────────────┘
│
┌──────┴──────┐
│ CDN │ ← Static assets cached at edge │
└──────┬──────┘
│
┌──────┴──────┐
│ API Gateway│ ← Response caching |
└──────┬──────┘
│
┌──────┴──────┐
│ App Server │ │
│ ┌────────┐ │ │
│ │In-Proc │ │ ← Application-level cache │
│ │ Cache │ │ (HashMap, Guava, Caffeine) │
│ └────────┘ │ │
└──────┬──────┘
│
┌──────┴──────┐
│ Distributed │ ← Redis, Memcached │
│ Cache │ │
└──────┬──────┘
│
┌──────┴──────┐
│ Database │ │
│ ┌────────┐ │ │
│ │ Query │ │ ← Database query cache │
│ │ Cache │ │ │
│ └────────┘ │ │
└─────────────┘
Caching Strategies (Read)
1. Cache-Aside (Lazy Loading)
The application is responsible for reading from and writing to the cache.
Read Path:
1. App checks cache → HIT: return cached data
2. Cache MISS: App reads from DB
3. App writes data to cache
4. App returns data
Write Path:
1. App writes to DB
2. App invalidates or updates cache
App ──→ Cache ──→ HIT? ──→ Yes: Return data
└──→ No: Read DB → Write to Cache → Return data
Pros:
- Only requested data is cached (no wasted memory).
- Cache failure doesn't break the system (fallback to DB).
- Simple to implement.
Cons:
- Cache miss incurs three round trips (cache check + DB read + cache write).
- Data can become stale (between DB write and cache invalidation).
- Cold start problem — cache is initially empty.
Best for: General-purpose read-heavy workloads.
2. Read-Through Cache
The cache itself is responsible for loading data from the database on a miss.
App ──→ Cache ──→ HIT? ──→ Yes: Return data
└──→ No: Cache reads DB → Stores data → Returns to App
Pros:
- Simplifies application code — the app always reads from cache.
- Consistent data loading logic.
Cons:
- Cache library/provider must support data source integration.
- First request for each key is slow.
Best for: When you want to abstract cache-loading logic away from the application.
Caching Strategies (Write)
3. Write-Through Cache
Every write goes to the cache AND the database, synchronously.
App ──→ Cache ──→ Database
│
└─→ (returns after BOTH writes succeed)
Pros:
- Cache is always consistent with the database.
- Reads are always fast (data is in cache).
Cons:
- Write latency is higher (two synchronous writes).
- Caches data that may never be read (wasted memory).
Best for: Applications where consistency is critical and writes are infrequent.
4. Write-Behind (Write-Back) Cache
Writes go to the cache immediately; the cache asynchronously writes to the database later.
App ──→ Cache ──→ (immediate return)
│
└─→ Async batch write to Database
Pros:
- Very low write latency.
- Batch writes to DB improve throughput.
- Can absorb write spikes.
Cons:
- Risk of data loss if cache fails before persisting to DB.
- Complex to implement correctly.
- Eventual consistency between cache and DB.
Best for: High write-throughput systems where some data loss is acceptable (e.g., analytics, metrics).
5. Write-Around Cache
Writes go directly to the database, bypassing the cache.
Write: App ──→ Database (cache not updated)
Read: App ──→ Cache ──→ Miss ──→ Database ──→ Cache
Pros:
- Avoids cache being flooded with data that isn't read.
Cons:
- Read after write causes a cache miss and is slow.
Best for: Write-heavy workloads where recently written data is rarely read immediately.
Caching Strategy Comparison
| Strategy | Read Performance | Write Performance | Consistency | Data Loss Risk |
|---|---|---|---|---|
| Cache-Aside | Good (after warm-up) | N/A (manual) | May be stale | No |
| Read-Through | Good (after warm-up) | N/A | May be stale | No |
| Write-Through | Excellent | Slow | Strong | No |
| Write-Behind | Excellent | Excellent | Eventual | Yes |
| Write-Around | Cache miss on new writes | Fast | May be stale | No |
Cache Eviction Policies
When the cache is full, something must be removed to make room.
LRU (Least Recently Used)
Evicts the item that hasn't been accessed for the longest time.
- Implementation: Doubly linked list + HashMap →
get and put. - Most commonly used in practice.
LFU (Least Frequently Used)
Evicts the item that has been accessed the fewest times.
- Better than LRU when access patterns have "hot" items.
- More complex to implement.
FIFO (First In, First Out)
Evicts the oldest item regardless of access pattern.
- Simple but often suboptimal.
TTL (Time to Live)
Items expire after a set duration, regardless of access.
- Can be combined with other policies.
- Essential for preventing stale data.
Random Replacement
Evicts a random item.
- Simple; surprisingly effective in some workloads.
MRU (Most Recently Used)
Evicts the most recently used item.
- Useful for scanning/sequential access patterns where recently accessed data is unlikely to be re-accessed.
| Policy | Best For | Complexity |
|---|---|---|
| LRU | General purpose | |
| LFU | Skewed access patterns | |
| FIFO | Simple, ordered data | |
| TTL | Time-sensitive data | |
| Random | Simple systems |
Cache Invalidation
One of the hardest problems in computer science:
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
Strategies
-
TTL-based expiration: Set a time-to-live on each cache entry.
SET user:123 '{"name":"Alice"}' EX 3600 # Expires in 1 hour -
Event-driven invalidation: When data changes, publish an event to invalidate/update the cache.
DB Write → Publish Event → Cache Subscriber → Delete/Update Cache Key -
Versioned keys: Include a version number in the cache key.
user:123:v7 → data # On update, increment version → user:123:v8 -
Manual purge: Application code explicitly deletes cache entries on writes.
UPDATE users SET name='Bob' WHERE id=123; DELETE FROM cache WHERE key='user:123';
Distributed Caching
Redis
- In-memory data structure store.
- Supports strings, hashes, lists, sets, sorted sets, streams.
- Persistence options: RDB snapshots, AOF (Append Only File).
- Built-in replication, Sentinel (HA), Cluster (sharding).
- Pub/Sub, Lua scripting, transactions.
Memcached
- Simple key-value in-memory cache.
- Multi-threaded (better CPU utilization than single-threaded Redis).
- No persistence, no data structures beyond strings.
- Simple horizontal scaling via consistent hashing.
Redis vs Memcached
| Feature | Redis | Memcached |
|---|---|---|
| Data structures | Rich (strings, lists, sets, hashes, etc.) | Strings only |
| Persistence | Yes (RDB, AOF) | No |
| Replication | Yes | No |
| Clustering | Yes (Redis Cluster) | Client-side sharding |
| Pub/Sub | Yes | No |
| Lua scripting | Yes | No |
| Multi-threaded | Single-threaded (I/O threads in v6+) | Multi-threaded |
| Memory efficiency | Less efficient (overhead per key) | More efficient |
| Use case | Feature-rich caching, sessions, queues | Simple high-throughput caching |
Caching Patterns and Problems
Cache Stampede (Thundering Herd)
When a popular cache key expires, many requests simultaneously hit the database.
Key expires → 1000 concurrent requests → All miss cache → 1000 DB queries
Solutions:
- Locking: Only one request fetches from DB; others wait.
- Probabilistic early expiration: Randomly refresh before TTL expires.
- Background refresh: A background job refreshes popular keys before they expire.
- Stale-while-revalidate: Serve stale data while refreshing in the background.
Cache Penetration
Requests for data that doesn't exist in the database bypass the cache every time.
Request for user:999999 → Cache miss → DB miss → Repeat forever
Solutions:
- Cache null values: Store
NULLwith a short TTL. - Bloom filter: Check if a key could exist before querying.
Cache Avalanche
Many cache keys expire at the same time, causing a sudden spike in database load.
Solutions:
- Randomized TTL: Add jitter to expiration times.
TTL = base_ttl + random(0, max_jitter) - Staggered refresh: Pre-warm cache keys at different intervals.
- Circuit breaker: Limit concurrent DB queries.
Hot Key Problem
A single cache key receives disproportionately high traffic.
Solutions:
- Local caching: Cache hot keys in application memory.
- Key replication: Store the same data under multiple keys (
user:123:shard1,user:123:shard2). - Rate limiting: Limit requests per key.
HTTP Caching
Cache-Control Headers
Cache-Control: public, max-age=3600 # Cacheable by CDN and browser for 1 hour
Cache-Control: private, max-age=600 # Only browser can cache for 10 minutes
Cache-Control: no-cache # Must revalidate with server
Cache-Control: no-store # Never cache
Cache-Control: stale-while-revalidate=60 # Serve stale for 60s while refreshing
ETag (Entity Tag)
# Response:
ETag: "abc123"
# Subsequent request:
If-None-Match: "abc123"
# Server response if unchanged:
304 Not Modified
Last-Modified
# Response:
Last-Modified: Wed, 09 Oct 2024 10:00:00 GMT
# Subsequent request:
If-Modified-Since: Wed, 09 Oct 2024 10:00:00 GMT
# Server response if unchanged:
304 Not Modified
Cache Metrics
| Metric | Formula | Target |
|---|---|---|
| Hit Rate | > 90% | |
| Miss Rate | < 10% | |
| Eviction Rate | Evictions per second | Low — may indicate undersized cache |
| Latency (cache hit) | Time to read from cache | < 1ms |
| Latency (cache miss) | Time to read from DB and populate cache | Application-dependent |
| Memory Usage | Current cache size / max size | Monitor for capacity planning |
Summary
| Concept | Key Point |
|---|---|
| Purpose | Store frequently accessed data in fast storage to reduce latency and DB load |
| Read strategies | Cache-Aside (most common), Read-Through |
| Write strategies | Write-Through (consistent), Write-Behind (fast), Write-Around |
| Eviction | LRU is the go-to default |
| Invalidation | TTL + event-driven for best results |
| Distributed cache | Redis (feature-rich) vs Memcached (simple, fast) |
| Watch out for | Stampede, penetration, avalanche, hot keys |
| HTTP caching | Use Cache-Control, ETag, Last-Modified headers |
Rule of thumb: Cache data that is read frequently, changes infrequently, and is expensive to compute or fetch.