3. Caching

title: 3. Caching

Caching is the technique of storing copies of frequently accessed data in a fast-access storage layer so that future requests can be served faster. It is one of the most impactful performance optimizations in system design.

Why Caching?

Reduce latency: Memory access is ~100x faster than disk; local cache is ~1000x faster than a network call.
Reduce database load: Fewer queries hit the database.
Reduce network calls: Avoid repeated calls to external services.
Improve throughput: Serve more requests with the same infrastructure.
Save cost: Fewer backend resources needed.

Latency Numbers Every Engineer Should Know

Operation	Approximate Time
L1 cache reference	0.5 ns
L2 cache reference	7 ns
RAM reference	100 ns
SSD random read	150 μs
HDD seek	10 ms
Network round trip (same DC)	0.5 ms
Network round trip (cross-continent)	150 ms

Cache Layers in a System

┌──────────────────────────────────────────────────────┐
│  Client / Browser                                    │
│  ┌───────────────┐                                   │
│  │ Browser Cache  │  ← HTTP cache headers            │
│  └───────┬───────┘                                   │
└──────────┼───────────────────────────────────────────┘
           │
    ┌──────┴──────┐
    │     CDN     │  ← Static assets cached at edge    │
    └──────┬──────┘
           │
    ┌──────┴──────┐
    │  API Gateway│  ← Response caching                 |
    └──────┬──────┘
           │
    ┌──────┴──────┐
    │  App Server │                                     │
    │  ┌────────┐ │                                     │
    │  │In-Proc │ │  ← Application-level cache          │
    │  │ Cache   │ │    (HashMap, Guava, Caffeine)       │
    │  └────────┘ │                                     │
    └──────┬──────┘
           │
    ┌──────┴──────┐
    │ Distributed │  ← Redis, Memcached                 │
    │   Cache     │                                     │
    └──────┬──────┘
           │
    ┌──────┴──────┐
    │  Database   │                                     │
    │  ┌────────┐ │                                     │
    │  │ Query  │ │  ← Database query cache              │
    │  │ Cache  │ │                                     │
    │  └────────┘ │                                     │
    └─────────────┘

Caching Strategies (Read)

1. Cache-Aside (Lazy Loading)

The application is responsible for reading from and writing to the cache.

Read Path:
1. App checks cache → HIT: return cached data
2. Cache MISS: App reads from DB
3. App writes data to cache
4. App returns data

Write Path:
1. App writes to DB
2. App invalidates or updates cache

App ──→ Cache ──→ HIT? ──→ Yes: Return data
                       └──→ No: Read DB → Write to Cache → Return data

Pros:

Only requested data is cached (no wasted memory).
Cache failure doesn't break the system (fallback to DB).
Simple to implement.

Cons:

Cache miss incurs three round trips (cache check + DB read + cache write).
Data can become stale (between DB write and cache invalidation).
Cold start problem — cache is initially empty.

Best for: General-purpose read-heavy workloads.

2. Read-Through Cache

The cache itself is responsible for loading data from the database on a miss.

App ──→ Cache ──→ HIT? ──→ Yes: Return data
                       └──→ No: Cache reads DB → Stores data → Returns to App

Pros:

Simplifies application code — the app always reads from cache.
Consistent data loading logic.

Cons:

Cache library/provider must support data source integration.
First request for each key is slow.

Best for: When you want to abstract cache-loading logic away from the application.

Caching Strategies (Write)

3. Write-Through Cache

Every write goes to the cache AND the database, synchronously.

App ──→ Cache ──→ Database
         │
         └─→ (returns after BOTH writes succeed)

Pros:

Cache is always consistent with the database.
Reads are always fast (data is in cache).

Cons:

Write latency is higher (two synchronous writes).
Caches data that may never be read (wasted memory).

Best for: Applications where consistency is critical and writes are infrequent.

4. Write-Behind (Write-Back) Cache

Writes go to the cache immediately; the cache asynchronously writes to the database later.

App ──→ Cache ──→ (immediate return)
         │
         └─→ Async batch write to Database

Pros:

Very low write latency.
Batch writes to DB improve throughput.
Can absorb write spikes.

Cons:

Risk of data loss if cache fails before persisting to DB.
Complex to implement correctly.
Eventual consistency between cache and DB.

Best for: High write-throughput systems where some data loss is acceptable (e.g., analytics, metrics).

5. Write-Around Cache

Writes go directly to the database, bypassing the cache.

Write: App ──→ Database (cache not updated)
Read:  App ──→ Cache ──→ Miss ──→ Database ──→ Cache

Pros:

Avoids cache being flooded with data that isn't read.

Cons:

Read after write causes a cache miss and is slow.

Best for: Write-heavy workloads where recently written data is rarely read immediately.

Caching Strategy Comparison

Strategy	Read Performance	Write Performance	Consistency	Data Loss Risk
Cache-Aside	Good (after warm-up)	N/A (manual)	May be stale	No
Read-Through	Good (after warm-up)	N/A	May be stale	No
Write-Through	Excellent	Slow	Strong	No
Write-Behind	Excellent	Excellent	Eventual	Yes
Write-Around	Cache miss on new writes	Fast	May be stale	No

Cache Eviction Policies

When the cache is full, something must be removed to make room.

LRU (Least Recently Used)

Evicts the item that hasn't been accessed for the longest time.

Implementation: Doubly linked list + HashMap → get and put.
Most commonly used in practice.

LFU (Least Frequently Used)

Evicts the item that has been accessed the fewest times.

Better than LRU when access patterns have "hot" items.
More complex to implement.

FIFO (First In, First Out)

Evicts the oldest item regardless of access pattern.

Simple but often suboptimal.

TTL (Time to Live)

Items expire after a set duration, regardless of access.

Can be combined with other policies.
Essential for preventing stale data.

Random Replacement

Evicts a random item.

Simple; surprisingly effective in some workloads.

MRU (Most Recently Used)

Evicts the most recently used item.

Useful for scanning/sequential access patterns where recently accessed data is unlikely to be re-accessed.

Policy	Best For	Complexity
LRU	General purpose
LFU	Skewed access patterns
FIFO	Simple, ordered data
TTL	Time-sensitive data
Random	Simple systems

Cache Invalidation

One of the hardest problems in computer science:

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Strategies

TTL-based expiration: Set a time-to-live on each cache entry.

SET user:123 '{"name":"Alice"}' EX 3600   # Expires in 1 hour

Event-driven invalidation: When data changes, publish an event to invalidate/update the cache.
```
DB Write → Publish Event → Cache Subscriber → Delete/Update Cache Key
```

Versioned keys: Include a version number in the cache key.

user:123:v7 → data
# On update, increment version → user:123:v8

Manual purge: Application code explicitly deletes cache entries on writes.

UPDATE users SET name='Bob' WHERE id=123;
DELETE FROM cache WHERE key='user:123';

Distributed Caching

Redis

In-memory data structure store.
Supports strings, hashes, lists, sets, sorted sets, streams.
Persistence options: RDB snapshots, AOF (Append Only File).
Built-in replication, Sentinel (HA), Cluster (sharding).
Pub/Sub, Lua scripting, transactions.

Memcached

Simple key-value in-memory cache.
Multi-threaded (better CPU utilization than single-threaded Redis).
No persistence, no data structures beyond strings.
Simple horizontal scaling via consistent hashing.

Redis vs Memcached

Feature	Redis	Memcached
Data structures	Rich (strings, lists, sets, hashes, etc.)	Strings only
Persistence	Yes (RDB, AOF)	No
Replication	Yes	No
Clustering	Yes (Redis Cluster)	Client-side sharding
Pub/Sub	Yes	No
Lua scripting	Yes	No
Multi-threaded	Single-threaded (I/O threads in v6+)	Multi-threaded
Memory efficiency	Less efficient (overhead per key)	More efficient
Use case	Feature-rich caching, sessions, queues	Simple high-throughput caching

Caching Patterns and Problems

Cache Stampede (Thundering Herd)

When a popular cache key expires, many requests simultaneously hit the database.

Key expires → 1000 concurrent requests → All miss cache → 1000 DB queries

Solutions:

Locking: Only one request fetches from DB; others wait.
Probabilistic early expiration: Randomly refresh before TTL expires.
Background refresh: A background job refreshes popular keys before they expire.
Stale-while-revalidate: Serve stale data while refreshing in the background.

Cache Penetration

Requests for data that doesn't exist in the database bypass the cache every time.

Request for user:999999 → Cache miss → DB miss → Repeat forever

Solutions:

Cache null values: Store NULL with a short TTL.
Bloom filter: Check if a key could exist before querying.

Cache Avalanche

Many cache keys expire at the same time, causing a sudden spike in database load.

Solutions:

Randomized TTL: Add jitter to expiration times.
```
TTL = base_ttl + random(0, max_jitter)
```
Staggered refresh: Pre-warm cache keys at different intervals.
Circuit breaker: Limit concurrent DB queries.

Hot Key Problem

A single cache key receives disproportionately high traffic.

Solutions:

Local caching: Cache hot keys in application memory.
Key replication: Store the same data under multiple keys (user:123:shard1, user:123:shard2).
Rate limiting: Limit requests per key.

HTTP Caching

Cache-Control Headers

Cache-Control: public, max-age=3600        # Cacheable by CDN and browser for 1 hour
Cache-Control: private, max-age=600        # Only browser can cache for 10 minutes
Cache-Control: no-cache                    # Must revalidate with server
Cache-Control: no-store                    # Never cache
Cache-Control: stale-while-revalidate=60   # Serve stale for 60s while refreshing

ETag (Entity Tag)

# Response:
ETag: "abc123"

# Subsequent request:
If-None-Match: "abc123"

# Server response if unchanged:
304 Not Modified

Last-Modified

# Response:
Last-Modified: Wed, 09 Oct 2024 10:00:00 GMT

# Subsequent request:
If-Modified-Since: Wed, 09 Oct 2024 10:00:00 GMT

# Server response if unchanged:
304 Not Modified

Cache Metrics

Metric	Formula	Target
Hit Rate		> 90%
Miss Rate		< 10%
Eviction Rate	Evictions per second	Low — may indicate undersized cache
Latency (cache hit)	Time to read from cache	< 1ms
Latency (cache miss)	Time to read from DB and populate cache	Application-dependent
Memory Usage	Current cache size / max size	Monitor for capacity planning

Summary

Concept	Key Point
Purpose	Store frequently accessed data in fast storage to reduce latency and DB load
Read strategies	Cache-Aside (most common), Read-Through
Write strategies	Write-Through (consistent), Write-Behind (fast), Write-Around
Eviction	LRU is the go-to default
Invalidation	TTL + event-driven for best results
Distributed cache	Redis (feature-rich) vs Memcached (simple, fast)
Watch out for	Stampede, penetration, avalanche, hot keys
HTTP caching	Use Cache-Control, ETag, Last-Modified headers

Rule of thumb: Cache data that is read frequently, changes infrequently, and is expensive to compute or fetch.