3. Caching

Caching is the technique of storing copies of frequently accessed data in a fast-access storage layer so that future requests can be served faster. It is one of the most impactful performance optimizations in system design.


Why Caching?

  • Reduce latency: Memory access is ~100x faster than disk; local cache is ~1000x faster than a network call.
  • Reduce database load: Fewer queries hit the database.
  • Reduce network calls: Avoid repeated calls to external services.
  • Improve throughput: Serve more requests with the same infrastructure.
  • Save cost: Fewer backend resources needed.

Latency Numbers Every Engineer Should Know

Operation Approximate Time
L1 cache reference 0.5 ns
L2 cache reference 7 ns
RAM reference 100 ns
SSD random read 150 μs
HDD seek 10 ms
Network round trip (same DC) 0.5 ms
Network round trip (cross-continent) 150 ms

Cache Layers in a System

┌──────────────────────────────────────────────────────┐
│  Client / Browser                                    │
│  ┌───────────────┐                                   │
│  │ Browser Cache  │  ← HTTP cache headers            │
│  └───────┬───────┘                                   │
└──────────┼───────────────────────────────────────────┘
           │
    ┌──────┴──────┐
    │     CDN     │  ← Static assets cached at edge    │
    └──────┬──────┘
           │
    ┌──────┴──────┐
    │  API Gateway│  ← Response caching                 |
    └──────┬──────┘
           │
    ┌──────┴──────┐
    │  App Server │                                     │
    │  ┌────────┐ │                                     │
    │  │In-Proc │ │  ← Application-level cache          │
    │  │ Cache   │ │    (HashMap, Guava, Caffeine)       │
    │  └────────┘ │                                     │
    └──────┬──────┘
           │
    ┌──────┴──────┐
    │ Distributed │  ← Redis, Memcached                 │
    │   Cache     │                                     │
    └──────┬──────┘
           │
    ┌──────┴──────┐
    │  Database   │                                     │
    │  ┌────────┐ │                                     │
    │  │ Query  │ │  ← Database query cache              │
    │  │ Cache  │ │                                     │
    │  └────────┘ │                                     │
    └─────────────┘

Caching Strategies (Read)

1. Cache-Aside (Lazy Loading)

The application is responsible for reading from and writing to the cache.

Read Path:
1. App checks cache → HIT: return cached data
2. Cache MISS: App reads from DB
3. App writes data to cache
4. App returns data

Write Path:
1. App writes to DB
2. App invalidates or updates cache
App ──→ Cache ──→ HIT? ──→ Yes: Return data
                       └──→ No: Read DB → Write to Cache → Return data

Pros:

  • Only requested data is cached (no wasted memory).
  • Cache failure doesn't break the system (fallback to DB).
  • Simple to implement.

Cons:

  • Cache miss incurs three round trips (cache check + DB read + cache write).
  • Data can become stale (between DB write and cache invalidation).
  • Cold start problem — cache is initially empty.

Best for: General-purpose read-heavy workloads.


2. Read-Through Cache

The cache itself is responsible for loading data from the database on a miss.

App ──→ Cache ──→ HIT? ──→ Yes: Return data
                       └──→ No: Cache reads DB → Stores data → Returns to App

Pros:

  • Simplifies application code — the app always reads from cache.
  • Consistent data loading logic.

Cons:

  • Cache library/provider must support data source integration.
  • First request for each key is slow.

Best for: When you want to abstract cache-loading logic away from the application.


Caching Strategies (Write)

3. Write-Through Cache

Every write goes to the cache AND the database, synchronously.

App ──→ Cache ──→ Database
         │
         └─→ (returns after BOTH writes succeed)

Pros:

  • Cache is always consistent with the database.
  • Reads are always fast (data is in cache).

Cons:

  • Write latency is higher (two synchronous writes).
  • Caches data that may never be read (wasted memory).

Best for: Applications where consistency is critical and writes are infrequent.


4. Write-Behind (Write-Back) Cache

Writes go to the cache immediately; the cache asynchronously writes to the database later.

App ──→ Cache ──→ (immediate return)
         │
         └─→ Async batch write to Database

Pros:

  • Very low write latency.
  • Batch writes to DB improve throughput.
  • Can absorb write spikes.

Cons:

  • Risk of data loss if cache fails before persisting to DB.
  • Complex to implement correctly.
  • Eventual consistency between cache and DB.

Best for: High write-throughput systems where some data loss is acceptable (e.g., analytics, metrics).


5. Write-Around Cache

Writes go directly to the database, bypassing the cache.

Write: App ──→ Database (cache not updated)
Read:  App ──→ Cache ──→ Miss ──→ Database ──→ Cache

Pros:

  • Avoids cache being flooded with data that isn't read.

Cons:

  • Read after write causes a cache miss and is slow.

Best for: Write-heavy workloads where recently written data is rarely read immediately.


Caching Strategy Comparison

Strategy Read Performance Write Performance Consistency Data Loss Risk
Cache-Aside Good (after warm-up) N/A (manual) May be stale No
Read-Through Good (after warm-up) N/A May be stale No
Write-Through Excellent Slow Strong No
Write-Behind Excellent Excellent Eventual Yes
Write-Around Cache miss on new writes Fast May be stale No

Cache Eviction Policies

When the cache is full, something must be removed to make room.

LRU (Least Recently Used)

Evicts the item that hasn't been accessed for the longest time.

  • Implementation: Doubly linked list + HashMap → get and put.
  • Most commonly used in practice.

LFU (Least Frequently Used)

Evicts the item that has been accessed the fewest times.

  • Better than LRU when access patterns have "hot" items.
  • More complex to implement.

FIFO (First In, First Out)

Evicts the oldest item regardless of access pattern.

  • Simple but often suboptimal.

TTL (Time to Live)

Items expire after a set duration, regardless of access.

  • Can be combined with other policies.
  • Essential for preventing stale data.

Random Replacement

Evicts a random item.

  • Simple; surprisingly effective in some workloads.

MRU (Most Recently Used)

Evicts the most recently used item.

  • Useful for scanning/sequential access patterns where recently accessed data is unlikely to be re-accessed.
Policy Best For Complexity
LRU General purpose
LFU Skewed access patterns
FIFO Simple, ordered data
TTL Time-sensitive data
Random Simple systems

Cache Invalidation

One of the hardest problems in computer science:

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Strategies

  1. TTL-based expiration: Set a time-to-live on each cache entry.

    SET user:123 '{"name":"Alice"}' EX 3600   # Expires in 1 hour
    
  2. Event-driven invalidation: When data changes, publish an event to invalidate/update the cache.

    DB Write → Publish Event → Cache Subscriber → Delete/Update Cache Key
    
  3. Versioned keys: Include a version number in the cache key.

    user:123:v7 → data
    # On update, increment version → user:123:v8
    
  4. Manual purge: Application code explicitly deletes cache entries on writes.

    UPDATE users SET name='Bob' WHERE id=123;
    DELETE FROM cache WHERE key='user:123';
    

Distributed Caching

Redis

  • In-memory data structure store.
  • Supports strings, hashes, lists, sets, sorted sets, streams.
  • Persistence options: RDB snapshots, AOF (Append Only File).
  • Built-in replication, Sentinel (HA), Cluster (sharding).
  • Pub/Sub, Lua scripting, transactions.

Memcached

  • Simple key-value in-memory cache.
  • Multi-threaded (better CPU utilization than single-threaded Redis).
  • No persistence, no data structures beyond strings.
  • Simple horizontal scaling via consistent hashing.

Redis vs Memcached

Feature Redis Memcached
Data structures Rich (strings, lists, sets, hashes, etc.) Strings only
Persistence Yes (RDB, AOF) No
Replication Yes No
Clustering Yes (Redis Cluster) Client-side sharding
Pub/Sub Yes No
Lua scripting Yes No
Multi-threaded Single-threaded (I/O threads in v6+) Multi-threaded
Memory efficiency Less efficient (overhead per key) More efficient
Use case Feature-rich caching, sessions, queues Simple high-throughput caching

Caching Patterns and Problems

Cache Stampede (Thundering Herd)

When a popular cache key expires, many requests simultaneously hit the database.

Key expires → 1000 concurrent requests → All miss cache → 1000 DB queries

Solutions:

  1. Locking: Only one request fetches from DB; others wait.
  2. Probabilistic early expiration: Randomly refresh before TTL expires.
  3. Background refresh: A background job refreshes popular keys before they expire.
  4. Stale-while-revalidate: Serve stale data while refreshing in the background.

Cache Penetration

Requests for data that doesn't exist in the database bypass the cache every time.

Request for user:999999 → Cache miss → DB miss → Repeat forever

Solutions:

  1. Cache null values: Store NULL with a short TTL.
  2. Bloom filter: Check if a key could exist before querying.

Cache Avalanche

Many cache keys expire at the same time, causing a sudden spike in database load.

Solutions:

  1. Randomized TTL: Add jitter to expiration times.
    TTL = base_ttl + random(0, max_jitter)
    
  2. Staggered refresh: Pre-warm cache keys at different intervals.
  3. Circuit breaker: Limit concurrent DB queries.

Hot Key Problem

A single cache key receives disproportionately high traffic.

Solutions:

  1. Local caching: Cache hot keys in application memory.
  2. Key replication: Store the same data under multiple keys (user:123:shard1, user:123:shard2).
  3. Rate limiting: Limit requests per key.

HTTP Caching

Cache-Control Headers

Cache-Control: public, max-age=3600        # Cacheable by CDN and browser for 1 hour
Cache-Control: private, max-age=600        # Only browser can cache for 10 minutes
Cache-Control: no-cache                    # Must revalidate with server
Cache-Control: no-store                    # Never cache
Cache-Control: stale-while-revalidate=60   # Serve stale for 60s while refreshing

ETag (Entity Tag)

# Response:
ETag: "abc123"

# Subsequent request:
If-None-Match: "abc123"

# Server response if unchanged:
304 Not Modified

Last-Modified

# Response:
Last-Modified: Wed, 09 Oct 2024 10:00:00 GMT

# Subsequent request:
If-Modified-Since: Wed, 09 Oct 2024 10:00:00 GMT

# Server response if unchanged:
304 Not Modified

Cache Metrics

Metric Formula Target
Hit Rate > 90%
Miss Rate < 10%
Eviction Rate Evictions per second Low — may indicate undersized cache
Latency (cache hit) Time to read from cache < 1ms
Latency (cache miss) Time to read from DB and populate cache Application-dependent
Memory Usage Current cache size / max size Monitor for capacity planning

Summary

Concept Key Point
Purpose Store frequently accessed data in fast storage to reduce latency and DB load
Read strategies Cache-Aside (most common), Read-Through
Write strategies Write-Through (consistent), Write-Behind (fast), Write-Around
Eviction LRU is the go-to default
Invalidation TTL + event-driven for best results
Distributed cache Redis (feature-rich) vs Memcached (simple, fast)
Watch out for Stampede, penetration, avalanche, hot keys
HTTP caching Use Cache-Control, ETag, Last-Modified headers

Rule of thumb: Cache data that is read frequently, changes infrequently, and is expensive to compute or fetch.