13. Replication & Redundancy
Replication is the process of maintaining multiple copies of data across different machines or locations. Redundancy means having backup systems ready to take over if a primary system fails. Both are essential for high availability, fault tolerance, and performance.
Why Replicate?
| Goal | How Replication Helps |
|---|---|
| High availability | If one node fails, others serve traffic |
| Fault tolerance | Data survives hardware failures |
| Read scalability | Multiple replicas handle read queries in parallel |
| Geographic performance | Replicas close to users reduce latency |
| Disaster recovery | Data accessible even if an entire datacenter goes down |
Replication Topologies
1. Single Leader (Primary-Replica / Master-Slave)
One node (leader/primary) handles all writes. Replicas (followers) receive copies of the data and serve reads.
Writes ──→ [Primary/Leader] ──replication──→ [Replica 1] ←── Reads
──replication──→ [Replica 2] ←── Reads
──replication──→ [Replica 3] ←── Reads
Advantages:
- Simple mental model — one source of truth.
- No write conflicts.
- Read scalability by adding replicas.
Disadvantages:
- Primary is a write bottleneck.
- Primary failure requires failover (leader election).
- Replication lag → eventual consistency for reads.
Failover process:
1. Detect primary failure (timeout / health check)
2. Choose a replica to promote (most up-to-date)
3. Reconfigure other replicas to follow new primary
4. Update clients/load balancers to point to new primary
Risks during failover:
- Split-brain: Two nodes think they're the primary.
- Data loss: Unreplicated writes on the old primary.
- Downtime: Brief unavailability during transition.
Examples: PostgreSQL streaming replication, MySQL replication, MongoDB replica sets, Redis Sentinel.
2. Multi-Leader (Master-Master)
Multiple nodes accept writes simultaneously. Each leader replicates to others.
[Leader 1] ←──writes/reads
↕ replication
[Leader 2] ←──writes/reads
↕ replication
[Leader 3] ←──writes/reads
Advantages:
- Write availability — no single write bottleneck.
- Works well for multi-datacenter deployments (one leader per DC).
- Better write latency (write to local leader).
Disadvantages:
- Write conflicts: Same data modified on different leaders simultaneously.
- Complex conflict resolution.
- More complex to operate and debug.
Conflict resolution strategies:
| Strategy | Description |
|----------|-------------|
| Last-writer-wins (LWW) | Timestamp-based; simple but can lose data |
| Custom resolution | Application-specific merge logic |
| CRDTs | Data structures that automatically merge |
| Conflict avoidance | Route all writes for a key to the same leader |
Examples: CouchDB, MySQL Group Replication, Active Directory.
3. Leaderless (Peer-to-Peer)
No single leader — any node can accept reads and writes. Uses quorum-based consistency.
Client ──write──→ [Node 1] ✓
──write──→ [Node 2] ✓ (W=2 of N=3 must ACK)
──write──→ [Node 3] ✗ (this failure is tolerated)
Client ──read──→ [Node 1] → v2
──read──→ [Node 2] → v2 (R=2 of N=3 must respond)
──read──→ [Node 3] → v1 (stale, ignored — majority says v2)
Quorum condition:
Where
Advantages:
- No single point of failure — any node can serve any request.
- Tunable consistency (adjust W and R).
- High availability for writes.
Disadvantages:
- Read repair and anti-entropy needed to fix stale replicas.
- More complex client logic.
- Weaker consistency guarantees.
Anti-entropy mechanisms:
| Mechanism | Description |
|-----------|-------------|
| Read repair | On read, if a stale replica is detected, update it |
| Anti-entropy process | Background process compares replicas and syncs differences |
| Merkle trees | Efficient data structure to detect and sync differences |
Examples: Cassandra, DynamoDB, Riak, Voldemort.
Synchronous vs Asynchronous Replication
Synchronous Replication
The primary waits for replicas to confirm before acknowledging the write.
Client → Primary → Write to Replica 1 → ACK
→ Write to Replica 2 → ACK
Primary → ACK to Client (after ALL/SOME replicas confirm)
| Pros | Cons |
|---|---|
| Strong consistency | Higher write latency |
| No data loss on primary failure | Write availability depends on replicas |
| Guaranteed durable | Slower overall throughput |
Asynchronous Replication
The primary acknowledges the write immediately and replicates in the background.
Client → Primary → ACK to Client (immediate)
Primary → Replicate to Replica 1 (background)
→ Replicate to Replica 2 (background)
| Pros | Cons |
|---|---|
| Low write latency | Replication lag (stale reads) |
| High write throughput | Data loss possible if primary fails before replication |
| Write availability independent of replicas | Eventual consistency |
Semi-Synchronous
A compromise: wait for at least one replica before ACK.
Client → Primary → Write to Replica 1 → ACK ← Wait for this one
→ Write to Replica 2 ← Background (best effort)
Primary → ACK to Client
Used by default in many production setups (e.g., MySQL semi-sync, PostgreSQL synchronous_commit).
Replication Lag
The delay between a write to the primary and its appearance on replicas.
Time →
Primary: WRITE(x=5) ─────────────────────────────
Replica: ──── lag ────── WRITE(x=5) ──
↑
Replication lag (e.g., 100ms)
Problems Caused by Lag
| Problem | Description | Solution |
|---|---|---|
| Stale reads | User writes then reads, but reads from stale replica | Read-your-writes: route reads to primary after writes |
| Monotonic read violations | User sees newer data then older data on next read | Monotonic reads: route user to same replica |
| Causal ordering violations | Reply appears before the original post | Causal consistency: track dependencies |
Reading Your Own Writes
User writes profile → Primary DB
User reads profile → Replica (stale!) → Sees old profile 😤
Fix: After writing, read from primary for a short window (e.g., 10 seconds)
OR: Read from replica only if replica_lag < threshold
Replication Methods
Statement-Based Replication
Replicate the SQL statements themselves.
- Compact and simple.
- Problem: Non-deterministic functions (
NOW(),RAND()) return different values.
Write-Ahead Log (WAL) Shipping
Ship the database's WAL to replicas.
- Byte-level accurate.
- Problem: Tied to storage engine version (can't replicate across versions).
Logical (Row-Based) Replication
Replicate the logical changes (insert/update/delete rows).
- Engine-independent.
- Can replicate between different database versions.
- Used by: MySQL binlog, PostgreSQL logical replication.
Change Data Capture (CDC)
Capture changes from the database log and publish to a stream (e.g., Kafka).
Database → CDC (Debezium) → Kafka → Search Index
→ Cache
→ Analytics DB
Redundancy Patterns
Active-Passive (Hot Standby)
[Active Server] ←── All traffic
↕ heartbeat / replication
[Passive Server] ← Takes over on failure
- Passive server is idle (waste of resources) but ready to take over.
- Failover time: seconds to minutes.
Active-Active
[Server 1] ←── Traffic (50%)
[Server 2] ←── Traffic (50%)
- Both servers handle traffic simultaneously.
- Better resource utilization.
- Need conflict resolution for writes.
N+1 Redundancy
Run
Need 3 servers for peak load → Run 4 servers
If one fails → 3 healthy servers still handle peak load
N+2 Redundancy
Run
Multi-Datacenter Replication
┌──── US-East DC ─────┐ ┌──── EU-West DC ─────┐
│ [Leader] │←────→│ [Leader/Replica] │
│ [Replica] [Replica] │ │ [Replica] [Replica] │
└──────────────────────┘ └──────────────────────┘
↕ ↕
┌──── US-West DC ─────┐ ┌──── APAC DC ─────────┐
│ [Replica] │ │ [Replica] │
│ [Replica] [Replica] │ │ [Replica] [Replica] │
└──────────────────────┘ └──────────────────────┘
Considerations:
- Cross-DC latency: 50-200ms per replication message.
- Asynchronous replication preferred (synchronous too slow across DCs).
- Conflict resolution needed for multi-leader across DCs.
- Data sovereignty: Some data must stay in specific regions.
Summary
| Concept | Key Point |
|---|---|
| Single leader | Simple, no conflicts, but write bottleneck |
| Multi-leader | Multi-DC writes, but conflict resolution needed |
| Leaderless | Highly available, quorum-based consistency |
| Sync replication | Strong consistency, higher latency |
| Async replication | Low latency, eventual consistency, risk of data loss |
| Replication lag | Causes stale reads; mitigate with read-your-writes |
| Active-passive | Standby takes over on failure |
| Active-active | Both serve traffic; better utilization |
Rule of thumb: Use single-leader replication with asynchronous replication for most applications. Use multi-leader for multi-datacenter setups. Use leaderless when high write availability is critical.