1. Scalability

Ability of a system to handle a growing amount of work by adding resources. A scalable system can maintain or improve its level of performance as demand increases.

Why Scalability?

  • User growth: Applications must handle increasing user traffic without degradation.
  • Data growth: As data accumulates, read/write performance must remain acceptable.
  • Business continuity: Revenue generating systems cannot afford downtime under load.
  • Cost efficiency: Resources should scale proportionally with demand, not wastefully.

1. Types of Scalability

1a. Vertical Scaling (Scale Up/Down)

Adding more power (CPU, RAM, disk, network) to an existing machine.

Scale Down

Scale Up

After (1 Instances)

Node 1
(8 CPU, 16GB RAM)

Before (1 Instance)

Node 1
(2 CPU, 4GB RAM)

Advantages:

  • Simple to implement. No code changes required.
  • No distributed systems complexity.
  • Easier data consistency due to single machine.

Disadvantages:

  • Hardware ceiling: There is a physical limit to how much you can scale a single machine.
  • Single point of failure: If that machine goes down, everything goes down.
  • Expensive: High-end hardware costs grow non-linearly.
  • Downtime: Often requires stopping the machine to upgrade.

When to use:

  • Early-stage applications with predictable, moderate load.
  • Databases that are hard to distribute (e.g., traditional RDBMS).
  • Quick fixes when time is limited.

1b. Horizontal Scaling (Scale In/Out)

Adding more machines to the pool of resources.

Scale In

Scale Out

After (3 Instances)

Node 1
(2 CPU, 4GB RAM)

Node 2
(2 CPU, 4GB RAM)

Node 3
(2 CPU, 4GB RAM)

Before (1 Instance)

Node 1
(2 CPU, 4GB RAM)

Advantages:

  • Near-infinite scalability: Add more commodity machines.
  • Fault tolerance: If one server fails, others take over.
  • Cost effective: Use cheaper commodity hardware.
  • No downtime: Add or remove nodes without stopping the system.

Disadvantages:

  • Increased system complexity (load balancing, data partitioning, etc.).
  • Data consistency challenges in distributed environments.
  • Network latency between nodes.
  • More complex deployment and monitoring.

When to use:

  • Large-scale web applications (Google, Netflix, Amazon).
  • Stateless services (e.g., REST APIs).
  • Systems with unpredictable or bursty load patterns.

2. Dimensions of Scalability

Dimension Description Example
Load Scalability Handle increasing request volume 1K → 1M requests/sec
Geography Scalability Serve users across regions US, EU data centers
Administration Scalability Multiple organizations can manage the system Multi-tenant SaaS
Functional Scalability Add new features without impacting others Microservices architecture

3. Key Metrics of Scalability

3a. Throughput

The number of requests a system can handle per unit time.

  • Measured in requests per second (RPS) or transactions per second (TPS).

3b. Latency

The time it takes for a single request to be processed.

  • p50 (median): 50% of requests are faster than this.
  • p95: 95% of requests are faster (tail latency).
  • p99: 99% of requests are faster.

3c. Availability

The percentage of time the system is operational.

  • 99.9% (three 9s) = ~8.76 hours downtime/year.
  • 99.99% (four 9s) = ~52.6 minutes downtime/year.
  • 99.999% (five 9s) = ~5.26 minutes downtime/year.

3d. Capacity

Maximum load a system can handle before performance degrades.


4. Scaling Strategies

4a. Stateless Architecture

  • Store no session data on application servers.
  • Use external stores (Redis, Memcached, database) for state.
  • Any server can handle any request → easy horizontal scaling.

App Services (Stateless)

Client

Load Balancer

App Server 1

App Server 2

App Server N

Shared State Store
(Redis)

4b. Database Scaling

Strategy Description
Read Replicas Multiple copies of the database handle read queries
Sharding Split data across multiple databases by a key
Denormalization Reduce joins by storing redundant data
Connection Pooling Reuse DB connections to reduce overhead

4c. Asynchronous Processing

  • Use message queues (Kafka, RabbitMQ, SQS) to decouple work.
  • Long-running tasks are processed in the background.
  • The user gets an immediate response; work completes later.
DatabaseBackground WorkerMessage QueueAPI ServerDatabaseBackground WorkerMessage QueueAPI ServerAsynchronous Processing BeginsUser1. Submit Request2. Enqueue Task3. Immediate ACK (Status: Processing)4. Deliver Task5. Process Long-Running Task6. Save Final ResultUser

4d. Caching

  • Store frequently accessed data in memory (Redis, Memcached).
  • Dramatically reduces database load and latency.
  • Can be applied at multiple layers: application, database, CDN.

5. Scaling Patterns

5a. Load Balancer Pattern

Distribute requests across multiple servers.

Web / App Servers

Clients

Load Balancer

Server 1

Server 2

Server 3

5b. Database Replication Pattern

Primary handles writes; replicas handle reads.

Writes

Reads

Reads

Reads

Async Replication

Async Replication

Async Replication

Application Server

Primary DB

Replica DB

Replica DB

Replica DB

5c. Sharding Pattern

Partition data across multiple database nodes.

Users A-J

Users K-S

Users T-Z

Application

Shard Router

S1
User Group 1

S2
User Group 2

S3
User Group 3

5d. CQRS (Command Query Responsibility Segregation)

Separate read and write models for independent scaling.

Read Model

Write Model

Command
(Create/Update)

Query
(Read/Fetch)

Event Sync / Update

Client

Write API

Read API

Write Database

Read Database

5e. Event-Driven Architecture

Components communicate through events, enabling loose coupling and independent scaling.

Independent Services

Publishes Event
'OrderPlaced'

Consumes Event

Consumes Event

Consumes Event

Order Service
(Producer)

Event Broker
Kafka / RabbitMQ

Inventory Service

Billing Service

Notification Service

6. Amdahl's Law

Defines the theoretical speedup when scaling a system:

Where:

  • = speedup with processors.
  • = fraction of the program that is parallelizable.
  • = number of processors.

Key insight: If only 50% of the work is parallelizable, adding infinite processors only gives a 2x speedup. The serial portion becomes the bottleneck.

7. Universal Scalability Law (USL)

Extends Amdahl's Law by adding coherence overhead — the cost of keeping distributed nodes consistent:

Where:

  • = contention (serialization)
  • = coherence (crosstalk between nodes)

Key insight: Beyond a certain point, adding more nodes decreases throughput due to coordination overhead.

8. Real-World Scaling Examples

Company Strategy
Netflix Microservices + auto-scaling on AWS
Twitter Fan-out on write for timelines; Redis for caching
Uber Geospatial sharding; separate services per domain
Google Custom infrastructure (Bigtable, Spanner, MapReduce)
Instagram Cassandra for feeds; PostgreSQL for relational data

9. Common Bottlenecks

  1. Single database: All reads/writes go to one DB.
  2. CPU-bound services: Heavy computation on a single node.
  3. Memory limits: Data exceeds available RAM.
  4. Disk I/O: Slow storage access under heavy write load.
  5. Network bandwidth: Saturated network links between services.
  6. Lock contention: Threads waiting on shared resources.
  7. DNS resolution: Single DNS endpoint with no failover.

Summary

Concept Vertical Scaling Horizontal Scaling
Approach Bigger machine More machines
Complexity Low High
Cost curve Exponential Linear
Fault tolerance Low High
Ceiling Hardware limits Practically unlimited
Downtime risk High Low

Rule of thumb: Start with vertical scaling for simplicity. Move to horizontal scaling when you approach the limits of a single machine or need fault tolerance.