1. Scalability

title: 1. Scalability

Scalability is the ability of a system to handle a growing amount of work by adding resources. A scalable system can maintain or improve its level of performance as demand increases.

Why Scalability Matters

User growth: Applications must handle increasing user traffic without degradation.
Data growth: As data accumulates, read/write performance must remain acceptable.
Business continuity: Revenue-generating systems cannot afford downtime under load.
Cost efficiency: Resources should scale proportionally — not wastefully — with demand.

Types of Scalability

1. Vertical Scaling (Scale Up)

Adding more power (CPU, RAM, disk, network) to an existing machine.

Before:  [Server: 4 CPU, 16 GB RAM]
After:   [Server: 32 CPU, 256 GB RAM]

Advantages:

Simple to implement — no code changes required.
No distributed systems complexity.
Easier data consistency (single machine).

Disadvantages:

Hardware ceiling: There is a physical limit to how much you can scale a single machine.
Single point of failure: If that machine goes down, everything goes down.
Expensive: High-end hardware costs grow non-linearly.
Downtime: Often requires stopping the machine to upgrade.

When to use:

Early-stage applications with predictable, moderate load.
Databases that are hard to distribute (e.g., traditional RDBMS).
Quick fixes when time is limited.

2. Horizontal Scaling (Scale Out)

Adding more machines to the pool of resources.

Before:  [Server A]
After:   [Server A] [Server B] [Server C] [Server D]
              \          |          |          /
               \         |          |         /
                 [    Load Balancer         ]

Advantages:

Near-infinite scalability: Add more commodity machines.
Fault tolerance: If one server fails, others take over.
Cost effective: Use cheaper commodity hardware.
No downtime: Add or remove nodes without stopping the system.

Disadvantages:

Increased system complexity (load balancing, data partitioning, etc.).
Data consistency challenges in distributed environments.
Network latency between nodes.
More complex deployment and monitoring.

When to use:

Large-scale web applications (Google, Netflix, Amazon).
Stateless services (e.g., REST APIs).
Systems with unpredictable or bursty load patterns.

Scalability Dimensions

Dimension	Description	Example
Load Scalability	Handle increasing request volume	1K → 1M requests/sec
Geographic Scalability	Serve users across regions	US, EU, Asia data centers
Administrative Scalability	Multiple organizations can manage the system	Multi-tenant SaaS
Functional Scalability	Easily add new features without impacting existing ones	Microservices architecture

Key Metrics for Scalability

Throughput

The number of requests a system can handle per unit time.

Measured in requests per second (RPS) or transactions per second (TPS).

Latency

The time it takes for a single request to be processed.

p50 (median): 50% of requests are faster than this.
p95: 95% of requests are faster (tail latency).
p99: 99% of requests are faster.

Availability

The percentage of time the system is operational.

99.9% (three 9s) = ~8.76 hours downtime/year.
99.99% (four 9s) = ~52.6 minutes downtime/year.
99.999% (five 9s) = ~5.26 minutes downtime/year.

Capacity

Maximum load a system can handle before performance degrades.

Scaling Strategies

Stateless Architecture

Store no session data on application servers.
Use external stores (Redis, Memcached, database) for state.
Any server can handle any request → easy horizontal scaling.

Client → Load Balancer → Any App Server → Shared State Store (Redis)

Database Scaling

Strategy	Description
Read Replicas	Multiple copies of the database handle read queries
Sharding	Split data across multiple databases by a key
Denormalization	Reduce joins by storing redundant data
Connection Pooling	Reuse DB connections to reduce overhead

Asynchronous Processing

Use message queues (Kafka, RabbitMQ, SQS) to decouple work.
Long-running tasks are processed in the background.
The user gets an immediate response; work completes later.

User Request → API Server → Message Queue → Worker → Database
                   ↓
             Immediate ACK

Caching

Store frequently accessed data in memory (Redis, Memcached).
Dramatically reduces database load and latency.
Can be applied at multiple layers: application, database, CDN.

Scaling Patterns

1. Load Balancer Pattern

Distribute requests across multiple servers.

2. Database Replication Pattern

Primary handles writes; replicas handle reads.

3. Sharding Pattern

Partition data across multiple database nodes.

4. CQRS (Command Query Responsibility Segregation)

Separate read and write models for independent scaling.

5. Event-Driven Architecture

Components communicate through events, enabling loose coupling and independent scaling.

Amdahl's Law

Defines the theoretical speedup when scaling a system:

Where:

= speedup with processors
= fraction of the program that is parallelizable
= number of processors

Key insight: If only 50% of the work is parallelizable, adding infinite processors only gives a 2x speedup. The serial portion becomes the bottleneck.

Universal Scalability Law (USL)

Extends Amdahl's Law by adding coherence overhead — the cost of keeping distributed nodes consistent:

Where:

= contention (serialization)
= coherence (crosstalk between nodes)

Key insight: Beyond a certain point, adding more nodes decreases throughput due to coordination overhead.

Real-World Scaling Examples

Company	Strategy
Netflix	Microservices + auto-scaling on AWS
Twitter	Fan-out on write for timelines; Redis for caching
Uber	Geospatial sharding; separate services per domain
Google	Custom infrastructure (Bigtable, Spanner, MapReduce)
Instagram	Cassandra for feeds; PostgreSQL for relational data

Common Bottlenecks

Single database: All reads/writes go to one DB.
CPU-bound services: Heavy computation on a single node.
Memory limits: Data exceeds available RAM.
Disk I/O: Slow storage access under heavy write load.
Network bandwidth: Saturated network links between services.
Lock contention: Threads waiting on shared resources.
DNS resolution: Single DNS endpoint with no failover.

Summary

Concept	Vertical Scaling	Horizontal Scaling
Approach	Bigger machine	More machines
Complexity	Low	High
Cost curve	Exponential	Linear
Fault tolerance	Low	High
Ceiling	Hardware limits	Practically unlimited
Downtime risk	High	Low

Rule of thumb: Start with vertical scaling for simplicity. Move to horizontal scaling when you approach the limits of a single machine or need fault tolerance.