1. Scalability
Ability of a system to handle a growing amount of work by adding resources. A scalable system can maintain or improve its level of performance as demand increases.
Why Scalability?
- User growth: Applications must handle increasing user traffic without degradation.
- Data growth: As data accumulates, read/write performance must remain acceptable.
- Business continuity: Revenue generating systems cannot afford downtime under load.
- Cost efficiency: Resources should scale proportionally with demand, not wastefully.
1. Types of Scalability
1a. Vertical Scaling (Scale Up/Down)
Adding more power (CPU, RAM, disk, network) to an existing machine.
Advantages:
- Simple to implement. No code changes required.
- No distributed systems complexity.
- Easier data consistency due to single machine.
Disadvantages:
- Hardware ceiling: There is a physical limit to how much you can scale a single machine.
- Single point of failure: If that machine goes down, everything goes down.
- Expensive: High-end hardware costs grow non-linearly.
- Downtime: Often requires stopping the machine to upgrade.
When to use:
- Early-stage applications with predictable, moderate load.
- Databases that are hard to distribute (e.g., traditional RDBMS).
- Quick fixes when time is limited.
1b. Horizontal Scaling (Scale In/Out)
Adding more machines to the pool of resources.
Advantages:
- Near-infinite scalability: Add more commodity machines.
- Fault tolerance: If one server fails, others take over.
- Cost effective: Use cheaper commodity hardware.
- No downtime: Add or remove nodes without stopping the system.
Disadvantages:
- Increased system complexity (load balancing, data partitioning, etc.).
- Data consistency challenges in distributed environments.
- Network latency between nodes.
- More complex deployment and monitoring.
When to use:
- Large-scale web applications (Google, Netflix, Amazon).
- Stateless services (e.g., REST APIs).
- Systems with unpredictable or bursty load patterns.
2. Dimensions of Scalability
| Dimension | Description | Example |
|---|---|---|
| Load Scalability | Handle increasing request volume | 1K → 1M requests/sec |
| Geography Scalability | Serve users across regions | US, EU data centers |
| Administration Scalability | Multiple organizations can manage the system | Multi-tenant SaaS |
| Functional Scalability | Add new features without impacting others | Microservices architecture |
3. Key Metrics of Scalability
3a. Throughput
The number of requests a system can handle per unit time.
- Measured in requests per second (RPS) or transactions per second (TPS).
3b. Latency
The time it takes for a single request to be processed.
- p50 (median): 50% of requests are faster than this.
- p95: 95% of requests are faster (tail latency).
- p99: 99% of requests are faster.
3c. Availability
The percentage of time the system is operational.
- 99.9% (three 9s) = ~8.76 hours downtime/year.
- 99.99% (four 9s) = ~52.6 minutes downtime/year.
- 99.999% (five 9s) = ~5.26 minutes downtime/year.
3d. Capacity
Maximum load a system can handle before performance degrades.
4. Scaling Strategies
4a. Stateless Architecture
- Store no session data on application servers.
- Use external stores (Redis, Memcached, database) for state.
- Any server can handle any request → easy horizontal scaling.
4b. Database Scaling
| Strategy | Description |
|---|---|
| Read Replicas | Multiple copies of the database handle read queries |
| Sharding | Split data across multiple databases by a key |
| Denormalization | Reduce joins by storing redundant data |
| Connection Pooling | Reuse DB connections to reduce overhead |
4c. Asynchronous Processing
- Use message queues (Kafka, RabbitMQ, SQS) to decouple work.
- Long-running tasks are processed in the background.
- The user gets an immediate response; work completes later.
4d. Caching
- Store frequently accessed data in memory (Redis, Memcached).
- Dramatically reduces database load and latency.
- Can be applied at multiple layers: application, database, CDN.
5. Scaling Patterns
5a. Load Balancer Pattern
Distribute requests across multiple servers.
5b. Database Replication Pattern
Primary handles writes; replicas handle reads.
5c. Sharding Pattern
Partition data across multiple database nodes.
5d. CQRS (Command Query Responsibility Segregation)
Separate read and write models for independent scaling.
5e. Event-Driven Architecture
Components communicate through events, enabling loose coupling and independent scaling.
6. Amdahl's Law
Defines the theoretical speedup when scaling a system:
Where:
= speedup with processors. = fraction of the program that is parallelizable. = number of processors.
Key insight: If only 50% of the work is parallelizable, adding infinite processors only gives a 2x speedup. The serial portion becomes the bottleneck.
7. Universal Scalability Law (USL)
Extends Amdahl's Law by adding coherence overhead — the cost of keeping distributed nodes consistent:
Where:
= contention (serialization) = coherence (crosstalk between nodes)
Key insight: Beyond a certain point, adding more nodes decreases throughput due to coordination overhead.
8. Real-World Scaling Examples
| Company | Strategy |
|---|---|
| Netflix | Microservices + auto-scaling on AWS |
| Fan-out on write for timelines; Redis for caching | |
| Uber | Geospatial sharding; separate services per domain |
| Custom infrastructure (Bigtable, Spanner, MapReduce) | |
| Cassandra for feeds; PostgreSQL for relational data |
9. Common Bottlenecks
- Single database: All reads/writes go to one DB.
- CPU-bound services: Heavy computation on a single node.
- Memory limits: Data exceeds available RAM.
- Disk I/O: Slow storage access under heavy write load.
- Network bandwidth: Saturated network links between services.
- Lock contention: Threads waiting on shared resources.
- DNS resolution: Single DNS endpoint with no failover.
Summary
| Concept | Vertical Scaling | Horizontal Scaling |
|---|---|---|
| Approach | Bigger machine | More machines |
| Complexity | Low | High |
| Cost curve | Exponential | Linear |
| Fault tolerance | Low | High |
| Ceiling | Hardware limits | Practically unlimited |
| Downtime risk | High | Low |
Rule of thumb: Start with vertical scaling for simplicity. Move to horizontal scaling when you approach the limits of a single machine or need fault tolerance.