5. CAP Theorem & Consistency Models

The CAP theorem is a fundamental principle in distributed systems that defines the trade-offs between consistency, availability, and partition tolerance.


CAP Theorem

Proposed by Eric Brewer in 2000 and proven by Seth Gilbert and Nancy Lynch in 2002.

In a distributed data store, it is impossible to simultaneously guarantee all three of the following:

The Three Guarantees

Property Definition
Consistency (C) Every read receives the most recent write or an error. All nodes see the same data at the same time.
Availability (A) Every request receives a (non-error) response, without guarantee that it contains the most recent write.
Partition Tolerance (P) The system continues to operate despite network partitions (communication breakdowns between nodes).
            Consistency
               /\
              /  \
             /    \
            / CP   \
           /  zone  \
          /----------\
         / CA    AP   \
        /   zone  zone \
       /________________\
  Availability ──── Partition Tolerance

Why "Pick 2 out of 3" is Misleading

In real-world distributed systems, network partitions are inevitable. You cannot choose to "not have" partition tolerance. Therefore, the real choice is:

During a network partition, do you prioritize Consistency or Availability?

  • CP system: Refuses to respond (sacrifices availability) to guarantee consistency during partitions.
  • AP system: Responds with potentially stale data (sacrifices consistency) to remain available during partitions.
  • CA system: Only possible in a single-node (non-distributed) system where partitions cannot occur.

CAP in Practice

CP Systems (Consistency + Partition Tolerance)

When a partition occurs, the system blocks or returns an error rather than serving potentially inconsistent data.

System Description
ZooKeeper Leader-based coordination; unavailable during leader election
etcd Raft consensus; writes blocked without quorum
HBase Strong consistency; unavailable if region server is partitioned
MongoDB (default) Primary handles writes; unavailable if primary is unreachable
Redis Cluster Slots become unavailable when master is down

Use cases: Banking, inventory management, leader election — where stale data is unacceptable.

AP Systems (Availability + Partition Tolerance)

When a partition occurs, the system continues serving requests but may return stale or conflicting data.

System Description
Cassandra Tunable consistency; can favor availability
DynamoDB Eventually consistent reads by default
CouchDB Multi-master replication, conflict resolution
Riak Masterless, vector clocks for conflict resolution
DNS Serves cached records even when authoritative server is unreachable

Use cases: Social media feeds, shopping carts, DNS — where serving stale data is acceptable.

CA Systems (Consistency + Availability)

Only possible in systems that don't need partition tolerance — essentially single-node databases.

System Description
Single-node PostgreSQL ACID on one machine; no partitions possible
Single-node MySQL Same as above
SQLite Embedded, single-process database

PACELC Theorem

An extension of CAP by Daniel Abadi (2012):

If there is a Partition (P), choose between Availability (A) and Consistency (C); Else (E), when the system is running normally, choose between Latency (L) and Consistency (C).

PACELC: if (Partition) then {A or C} else {L or C}
System Partition? → Else (Normal) → Classification
DynamoDB A (Available) L (Low latency) PA/EL
Cassandra A (Available) L (Low latency) PA/EL
MongoDB C (Consistent) C (Consistent) PC/EC
PNUTS (Yahoo) C (Consistent) L (Low latency) PC/EL
VoltDB C (Consistent) C (Consistent) PC/EC

Key insight: Even when there's no partition, there's a trade-off between latency and consistency in distributed systems.


Consistency Models

Strong Consistency (Linearizability)

Every read returns the value of the most recent completed write. All nodes appear as a single, atomic data store.

Time →
Writer:  WRITE(x=1) ───────── WRITE(x=2)
Reader A:            READ(x) → 1
Reader B:                              READ(x) → 2
  • All readers see writes in the same order.
  • After a write completes, all subsequent reads return the new value.
  • Implementation: Consensus protocols (Paxos, Raft), synchronous replication.
  • Cost: Higher latency, lower throughput.

Sequential Consistency

All operations appear to execute in some sequential order, and each individual process's operations appear in the order they were issued.

  • Weaker than linearizability: The total order doesn't need to respect real-time ordering.
  • Useful for multi-processor systems.

Causal Consistency

Operations that are causally related are seen by all nodes in the same order. Concurrent operations may be seen in different orders.

A posts: "Anyone want pizza?"      ← Event 1
B replies: "I do!"                 ← Event 2 (caused by Event 1)

All nodes must see Event 1 before Event 2.
But an unrelated Event 3 by C can appear anywhere.
  • Implementation: Vector clocks, version vectors.
  • Use case: Social media (replies must appear after the post).

Eventual Consistency

If no new writes are made, all replicas will eventually converge to the same value. No guarantee on when.

Time →
Writer:   WRITE(x=5)
Replica1: x=5 (immediately)
Replica2: x=3 (stale) ──→ x=5 (after replication)
Replica3: x=3 (stale) ──→ x=5 (after replication)
  • Convergence time: Typically milliseconds to seconds.
  • Read-your-writes: A session guarantee where a client always sees its own writes.
  • Monotonic reads: A client never sees an older value after seeing a newer one.
  • Implementation: Conflict resolution (last-writer-wins, CRDTs, application-level merge).

Read-Your-Writes Consistency

A client always sees the effects of its own writes.

Client A: WRITE(name="Alice") → READ(name) → "Alice"  ✅ (guaranteed)
Client B: READ(name) → may see old value  (not guaranteed)

Monotonic Read Consistency

Once a client reads a value, it will never see an older value in subsequent reads.

Client: READ(x) → 5 → READ(x) → 5 or higher (never 3)  ✅

Consistency Model Comparison

Model Strength Latency Use Case
Strong (Linearizable) Strongest Highest Banking, leader election
Sequential Strong High Multi-processor cache coherence
Causal Moderate Moderate Social media, collaborative editing
Eventual Weakest Lowest DNS, shopping carts, CDN
Read-Your-Writes Session-level Low User profile updates
Strongest ◄────────────────────────────────────────► Weakest
Linearizable → Sequential → Causal → Eventual
Highest Latency                      Lowest Latency

Conflict Resolution

When concurrent writes occur in an eventually consistent system, conflicts must be resolved.

Last-Writer-Wins (LWW)

The write with the latest timestamp wins. Simple but can lose data.

Vector Clocks

Each node maintains a vector of counters to track causality.

Node A: {A:1, B:0}  writes x=1
Node B: {A:0, B:1}  writes x=2
Conflict: {A:1, B:0} and {A:0, B:1} are concurrent → resolve

CRDTs (Conflict-Free Replicated Data Types)

Data structures designed to automatically merge without conflicts.

CRDT Type Description Example
G-Counter Grow-only counter Like count
PN-Counter Positive-negative counter Upvote/downvote
G-Set Grow-only set Tags added to a post
OR-Set Observed-remove set Shopping cart items
LWW-Register Last-writer-wins register User profile field

Application-Level Resolution

The application presents conflicting versions to the user (e.g., Amazon's shopping cart "merge all items" approach).


Quorum Consistency

In replicated systems, consistency can be tuned using quorums:

Where:

  • = Total number of replicas
  • = Number of replicas that must acknowledge a write
  • = Number of replicas that must respond to a read
Configuration Consistency Availability
Strong consistency for reads Writes fail if any node is down
Strong consistency for writes Reads fail if any node is down
Balanced (majority quorum) Tolerates minority failures
Eventual consistency Highest availability

Example with N=3:

  • : Strong consistency, tolerates 1 node failure.
  • : Eventual consistency, tolerates 2 node failures.

Summary

Concept Key Point
CAP Theorem During partitions, choose consistency OR availability
Real choice CP (block on partition) vs AP (serve stale)
PACELC Even without partitions: latency vs consistency trade-off
Strong consistency Safest but slowest — for critical data
Eventual consistency Fastest but may serve stale — for non-critical data
Quorums Tunable consistency via W + R > N
Conflict resolution LWW (simple), Vector Clocks (accurate), CRDTs (automatic)

Rule of thumb: Use strong consistency for financial/critical data. Use eventual consistency for social feeds, analytics, and caches. Most systems use a mix of both.