5. CAP Theorem & Consistency Models
The CAP theorem is a fundamental principle in distributed systems that defines the trade-offs between consistency, availability, and partition tolerance.
CAP Theorem
Proposed by Eric Brewer in 2000 and proven by Seth Gilbert and Nancy Lynch in 2002.
In a distributed data store, it is impossible to simultaneously guarantee all three of the following:
The Three Guarantees
| Property | Definition |
|---|---|
| Consistency (C) | Every read receives the most recent write or an error. All nodes see the same data at the same time. |
| Availability (A) | Every request receives a (non-error) response, without guarantee that it contains the most recent write. |
| Partition Tolerance (P) | The system continues to operate despite network partitions (communication breakdowns between nodes). |
Consistency
/\
/ \
/ \
/ CP \
/ zone \
/----------\
/ CA AP \
/ zone zone \
/________________\
Availability ──── Partition Tolerance
Why "Pick 2 out of 3" is Misleading
In real-world distributed systems, network partitions are inevitable. You cannot choose to "not have" partition tolerance. Therefore, the real choice is:
During a network partition, do you prioritize Consistency or Availability?
- CP system: Refuses to respond (sacrifices availability) to guarantee consistency during partitions.
- AP system: Responds with potentially stale data (sacrifices consistency) to remain available during partitions.
- CA system: Only possible in a single-node (non-distributed) system where partitions cannot occur.
CAP in Practice
CP Systems (Consistency + Partition Tolerance)
When a partition occurs, the system blocks or returns an error rather than serving potentially inconsistent data.
| System | Description |
|---|---|
| ZooKeeper | Leader-based coordination; unavailable during leader election |
| etcd | Raft consensus; writes blocked without quorum |
| HBase | Strong consistency; unavailable if region server is partitioned |
| MongoDB (default) | Primary handles writes; unavailable if primary is unreachable |
| Redis Cluster | Slots become unavailable when master is down |
Use cases: Banking, inventory management, leader election — where stale data is unacceptable.
AP Systems (Availability + Partition Tolerance)
When a partition occurs, the system continues serving requests but may return stale or conflicting data.
| System | Description |
|---|---|
| Cassandra | Tunable consistency; can favor availability |
| DynamoDB | Eventually consistent reads by default |
| CouchDB | Multi-master replication, conflict resolution |
| Riak | Masterless, vector clocks for conflict resolution |
| DNS | Serves cached records even when authoritative server is unreachable |
Use cases: Social media feeds, shopping carts, DNS — where serving stale data is acceptable.
CA Systems (Consistency + Availability)
Only possible in systems that don't need partition tolerance — essentially single-node databases.
| System | Description |
|---|---|
| Single-node PostgreSQL | ACID on one machine; no partitions possible |
| Single-node MySQL | Same as above |
| SQLite | Embedded, single-process database |
PACELC Theorem
An extension of CAP by Daniel Abadi (2012):
If there is a Partition (P), choose between Availability (A) and Consistency (C); Else (E), when the system is running normally, choose between Latency (L) and Consistency (C).
PACELC: if (Partition) then {A or C} else {L or C}
| System | Partition? → | Else (Normal) → | Classification |
|---|---|---|---|
| DynamoDB | A (Available) | L (Low latency) | PA/EL |
| Cassandra | A (Available) | L (Low latency) | PA/EL |
| MongoDB | C (Consistent) | C (Consistent) | PC/EC |
| PNUTS (Yahoo) | C (Consistent) | L (Low latency) | PC/EL |
| VoltDB | C (Consistent) | C (Consistent) | PC/EC |
Key insight: Even when there's no partition, there's a trade-off between latency and consistency in distributed systems.
Consistency Models
Strong Consistency (Linearizability)
Every read returns the value of the most recent completed write. All nodes appear as a single, atomic data store.
Time →
Writer: WRITE(x=1) ───────── WRITE(x=2)
Reader A: READ(x) → 1
Reader B: READ(x) → 2
- All readers see writes in the same order.
- After a write completes, all subsequent reads return the new value.
- Implementation: Consensus protocols (Paxos, Raft), synchronous replication.
- Cost: Higher latency, lower throughput.
Sequential Consistency
All operations appear to execute in some sequential order, and each individual process's operations appear in the order they were issued.
- Weaker than linearizability: The total order doesn't need to respect real-time ordering.
- Useful for multi-processor systems.
Causal Consistency
Operations that are causally related are seen by all nodes in the same order. Concurrent operations may be seen in different orders.
A posts: "Anyone want pizza?" ← Event 1
B replies: "I do!" ← Event 2 (caused by Event 1)
All nodes must see Event 1 before Event 2.
But an unrelated Event 3 by C can appear anywhere.
- Implementation: Vector clocks, version vectors.
- Use case: Social media (replies must appear after the post).
Eventual Consistency
If no new writes are made, all replicas will eventually converge to the same value. No guarantee on when.
Time →
Writer: WRITE(x=5)
Replica1: x=5 (immediately)
Replica2: x=3 (stale) ──→ x=5 (after replication)
Replica3: x=3 (stale) ──→ x=5 (after replication)
- Convergence time: Typically milliseconds to seconds.
- Read-your-writes: A session guarantee where a client always sees its own writes.
- Monotonic reads: A client never sees an older value after seeing a newer one.
- Implementation: Conflict resolution (last-writer-wins, CRDTs, application-level merge).
Read-Your-Writes Consistency
A client always sees the effects of its own writes.
Client A: WRITE(name="Alice") → READ(name) → "Alice" ✅ (guaranteed)
Client B: READ(name) → may see old value (not guaranteed)
Monotonic Read Consistency
Once a client reads a value, it will never see an older value in subsequent reads.
Client: READ(x) → 5 → READ(x) → 5 or higher (never 3) ✅
Consistency Model Comparison
| Model | Strength | Latency | Use Case |
|---|---|---|---|
| Strong (Linearizable) | Strongest | Highest | Banking, leader election |
| Sequential | Strong | High | Multi-processor cache coherence |
| Causal | Moderate | Moderate | Social media, collaborative editing |
| Eventual | Weakest | Lowest | DNS, shopping carts, CDN |
| Read-Your-Writes | Session-level | Low | User profile updates |
Strongest ◄────────────────────────────────────────► Weakest
Linearizable → Sequential → Causal → Eventual
Highest Latency Lowest Latency
Conflict Resolution
When concurrent writes occur in an eventually consistent system, conflicts must be resolved.
Last-Writer-Wins (LWW)
The write with the latest timestamp wins. Simple but can lose data.
Vector Clocks
Each node maintains a vector of counters to track causality.
Node A: {A:1, B:0} writes x=1
Node B: {A:0, B:1} writes x=2
Conflict: {A:1, B:0} and {A:0, B:1} are concurrent → resolve
CRDTs (Conflict-Free Replicated Data Types)
Data structures designed to automatically merge without conflicts.
| CRDT Type | Description | Example |
|---|---|---|
| G-Counter | Grow-only counter | Like count |
| PN-Counter | Positive-negative counter | Upvote/downvote |
| G-Set | Grow-only set | Tags added to a post |
| OR-Set | Observed-remove set | Shopping cart items |
| LWW-Register | Last-writer-wins register | User profile field |
Application-Level Resolution
The application presents conflicting versions to the user (e.g., Amazon's shopping cart "merge all items" approach).
Quorum Consistency
In replicated systems, consistency can be tuned using quorums:
Where:
= Total number of replicas = Number of replicas that must acknowledge a write = Number of replicas that must respond to a read
| Configuration | Consistency | Availability |
|---|---|---|
| Strong consistency for reads | Writes fail if any node is down | |
| Strong consistency for writes | Reads fail if any node is down | |
| Balanced (majority quorum) | Tolerates minority failures | |
| Eventual consistency | Highest availability |
Example with N=3:
: Strong consistency, tolerates 1 node failure. : Eventual consistency, tolerates 2 node failures.
Summary
| Concept | Key Point |
|---|---|
| CAP Theorem | During partitions, choose consistency OR availability |
| Real choice | CP (block on partition) vs AP (serve stale) |
| PACELC | Even without partitions: latency vs consistency trade-off |
| Strong consistency | Safest but slowest — for critical data |
| Eventual consistency | Fastest but may serve stale — for non-critical data |
| Quorums | Tunable consistency via W + R > N |
| Conflict resolution | LWW (simple), Vector Clocks (accurate), CRDTs (automatic) |
Rule of thumb: Use strong consistency for financial/critical data. Use eventual consistency for social feeds, analytics, and caches. Most systems use a mix of both.