1. Scalability
Scalability is the ability of a system to handle a growing amount of work by adding resources. A scalable system can maintain or improve its level of performance as demand increases.
Why Scalability Matters
- User growth: Applications must handle increasing user traffic without degradation.
- Data growth: As data accumulates, read/write performance must remain acceptable.
- Business continuity: Revenue-generating systems cannot afford downtime under load.
- Cost efficiency: Resources should scale proportionally — not wastefully — with demand.
Types of Scalability
1. Vertical Scaling (Scale Up)
Adding more power (CPU, RAM, disk, network) to an existing machine.
Before: [Server: 4 CPU, 16 GB RAM]
After: [Server: 32 CPU, 256 GB RAM]
Advantages:
- Simple to implement — no code changes required.
- No distributed systems complexity.
- Easier data consistency (single machine).
Disadvantages:
- Hardware ceiling: There is a physical limit to how much you can scale a single machine.
- Single point of failure: If that machine goes down, everything goes down.
- Expensive: High-end hardware costs grow non-linearly.
- Downtime: Often requires stopping the machine to upgrade.
When to use:
- Early-stage applications with predictable, moderate load.
- Databases that are hard to distribute (e.g., traditional RDBMS).
- Quick fixes when time is limited.
2. Horizontal Scaling (Scale Out)
Adding more machines to the pool of resources.
Before: [Server A]
After: [Server A] [Server B] [Server C] [Server D]
\ | | /
\ | | /
[ Load Balancer ]
Advantages:
- Near-infinite scalability: Add more commodity machines.
- Fault tolerance: If one server fails, others take over.
- Cost effective: Use cheaper commodity hardware.
- No downtime: Add or remove nodes without stopping the system.
Disadvantages:
- Increased system complexity (load balancing, data partitioning, etc.).
- Data consistency challenges in distributed environments.
- Network latency between nodes.
- More complex deployment and monitoring.
When to use:
- Large-scale web applications (Google, Netflix, Amazon).
- Stateless services (e.g., REST APIs).
- Systems with unpredictable or bursty load patterns.
Scalability Dimensions
| Dimension | Description | Example |
|---|---|---|
| Load Scalability | Handle increasing request volume | 1K → 1M requests/sec |
| Geographic Scalability | Serve users across regions | US, EU, Asia data centers |
| Administrative Scalability | Multiple organizations can manage the system | Multi-tenant SaaS |
| Functional Scalability | Easily add new features without impacting existing ones | Microservices architecture |
Key Metrics for Scalability
Throughput
The number of requests a system can handle per unit time.
- Measured in requests per second (RPS) or transactions per second (TPS).
Latency
The time it takes for a single request to be processed.
- p50 (median): 50% of requests are faster than this.
- p95: 95% of requests are faster (tail latency).
- p99: 99% of requests are faster.
Availability
The percentage of time the system is operational.
- 99.9% (three 9s) = ~8.76 hours downtime/year.
- 99.99% (four 9s) = ~52.6 minutes downtime/year.
- 99.999% (five 9s) = ~5.26 minutes downtime/year.
Capacity
Maximum load a system can handle before performance degrades.
Scaling Strategies
Stateless Architecture
- Store no session data on application servers.
- Use external stores (Redis, Memcached, database) for state.
- Any server can handle any request → easy horizontal scaling.
Client → Load Balancer → Any App Server → Shared State Store (Redis)
Database Scaling
| Strategy | Description |
|---|---|
| Read Replicas | Multiple copies of the database handle read queries |
| Sharding | Split data across multiple databases by a key |
| Denormalization | Reduce joins by storing redundant data |
| Connection Pooling | Reuse DB connections to reduce overhead |
Asynchronous Processing
- Use message queues (Kafka, RabbitMQ, SQS) to decouple work.
- Long-running tasks are processed in the background.
- The user gets an immediate response; work completes later.
User Request → API Server → Message Queue → Worker → Database
↓
Immediate ACK
Caching
- Store frequently accessed data in memory (Redis, Memcached).
- Dramatically reduces database load and latency.
- Can be applied at multiple layers: application, database, CDN.
Scaling Patterns
1. Load Balancer Pattern
Distribute requests across multiple servers.
2. Database Replication Pattern
Primary handles writes; replicas handle reads.
3. Sharding Pattern
Partition data across multiple database nodes.
4. CQRS (Command Query Responsibility Segregation)
Separate read and write models for independent scaling.
5. Event-Driven Architecture
Components communicate through events, enabling loose coupling and independent scaling.
Amdahl's Law
Defines the theoretical speedup when scaling a system:
Where:
= speedup with processors = fraction of the program that is parallelizable = number of processors
Key insight: If only 50% of the work is parallelizable, adding infinite processors only gives a 2x speedup. The serial portion becomes the bottleneck.
Universal Scalability Law (USL)
Extends Amdahl's Law by adding coherence overhead — the cost of keeping distributed nodes consistent:
Where:
= contention (serialization) = coherence (crosstalk between nodes)
Key insight: Beyond a certain point, adding more nodes decreases throughput due to coordination overhead.
Real-World Scaling Examples
| Company | Strategy |
|---|---|
| Netflix | Microservices + auto-scaling on AWS |
| Fan-out on write for timelines; Redis for caching | |
| Uber | Geospatial sharding; separate services per domain |
| Custom infrastructure (Bigtable, Spanner, MapReduce) | |
| Cassandra for feeds; PostgreSQL for relational data |
Common Bottlenecks
- Single database: All reads/writes go to one DB.
- CPU-bound services: Heavy computation on a single node.
- Memory limits: Data exceeds available RAM.
- Disk I/O: Slow storage access under heavy write load.
- Network bandwidth: Saturated network links between services.
- Lock contention: Threads waiting on shared resources.
- DNS resolution: Single DNS endpoint with no failover.
Summary
| Concept | Vertical Scaling | Horizontal Scaling |
|---|---|---|
| Approach | Bigger machine | More machines |
| Complexity | Low | High |
| Cost curve | Exponential | Linear |
| Fault tolerance | Low | High |
| Ceiling | Hardware limits | Practically unlimited |
| Downtime risk | High | Low |
Rule of thumb: Start with vertical scaling for simplicity. Move to horizontal scaling when you approach the limits of a single machine or need fault tolerance.