2. Load Balancing
A load balancer distributes incoming network traffic across multiple servers to ensure no single server bears too much demand. It improves availability, reliability, throughput, and response time of applications.
Why Load Balancing?
Without a load balancer:
With a load balancer:
Benefits:
- High availability: If one server fails, traffic is rerouted to healthy servers.
- Better throughput: Multiple servers handle requests in parallel.
- Lower latency: Requests go to the closest or least-loaded server.
- Flexibility: Add/remove servers without downtime.
- Security: LB can hide internal server topology from the public internet.
Where Load Balancers Sit
Load balancers can be placed at multiple layers of a system:
Types of Load Balancers
Layer 4 (Transport Layer) Load Balancer
Operates at the TCP/UDP level. Routes traffic based on:
- Source/destination IP address
- Source/destination port number
- Protocol type
Characteristics:
- Very fast — inspects only packet headers, not content.
- Cannot make routing decisions based on URL, cookies, or HTTP headers.
- Low resource overhead.
- Suitable for non-HTTP protocols (databases, game servers, etc.).
Layer 7 (Application Layer) Load Balancer
Operates at the HTTP/HTTPS level. Routes traffic based on:
- URL path (
/api/*→ API servers,/static/*→ CDN) - HTTP headers (
Host,User-Agent, custom headers) - Cookies (session affinity)
- Request body content
- HTTP method (GET, POST, etc.)
Characteristics:
- More intelligent routing — content-aware decisions.
- Can terminate SSL/TLS.
- Can modify requests/responses (add headers, rewrite URLs).
- Higher resource overhead than L4.
- Supports advanced features: A/B testing, canary deployments.
Comparison
| Feature | Layer 4 LB | Layer 7 LB |
|---|---|---|
| OSI Layer | Transport (TCP/UDP) | Application (HTTP/HTTPS) |
| Speed | Very fast | Slower (inspects content) |
| Intelligence | Basic (IP + port) | Rich (URL, headers, cookies) |
| SSL Termination | No | Yes |
| Content Routing | No | Yes |
| Protocol Support | Any TCP/UDP | HTTP/HTTPS mainly |
| Resource Cost | Low | Higher |
| Use Case | High-throughput, simple routing | Web apps, API gateways |
Load Balancing Algorithms
1. Round Robin
Requests are distributed to servers sequentially in a circular order.
- Pros: Simple, fair distribution for homogeneous servers.
- Cons: Ignores server load; assumes all servers are equal.
2. Weighted Round Robin
Like Round Robin, but servers get traffic proportional to their assigned weights.
Server A (weight 5): Gets 5 out of every 8 requests
Server B (weight 2): Gets 2 out of every 8 requests
Server C (weight 1): Gets 1 out of every 8 requests
- Use case: Heterogeneous server fleet (different hardware specs).
3. Least Connections
Routes to the server with the fewest active connections.
Server A: 12 active connections
Server B: 5 active connections ← New request goes here
Server C: 8 active connections
- Pros: Adapts to varying request durations.
- Cons: Requires tracking connection count per server.
- Best for: Long-lived connections (WebSockets, database connections).
4. Weighted Least Connections
Combines least connections with server weights.
The server with the lowest score gets the next request.
5. Least Response Time
Routes to the server with the fastest response time and fewest connections.
- Pros: Optimizes for user experience.
- Cons: Requires continuous latency monitoring.
6. IP Hash
The client's IP address is hashed to determine which server receives the request.
- Pros: Same client always goes to the same server (session persistence).
- Cons: Uneven distribution if IP addresses are clustered.
7. Consistent Hashing
A more advanced form of hashing that minimizes redistribution when servers are added or removed. (See: Consistent Hashing)
8. Random
Each request is sent to a randomly selected server.
- Pros: Simple; with large numbers, approaches even distribution.
- Cons: No guarantee of fairness in the short term.
9. Resource-Based (Adaptive)
Servers report their current resource utilization (CPU, memory, disk). The LB routes to the server with the most available resources.
- Pros: Most accurate load distribution.
- Cons: Complex; requires health reporting agents on servers.
Health Checks
Load balancers must detect unhealthy servers to stop sending traffic to them.
Types of Health Checks
| Type | Description | Example |
|---|---|---|
| Passive | Monitors real traffic for errors | Track 5xx responses |
| Active | Sends periodic probe requests | GET /health every 10s |
| Deep (L7) | Checks application logic and dependencies | Verify DB connectivity |
| Shallow (L4) | Checks if the port is open | TCP SYN to port 80 |
Health Check Parameters
- Interval: How often to check (e.g., every 10 seconds).
- Timeout: How long to wait for a response (e.g., 5 seconds).
- Healthy threshold: Consecutive successes to mark healthy (e.g., 3).
- Unhealthy threshold: Consecutive failures to mark unhealthy (e.g., 2).
Session Persistence (Sticky Sessions)
Some applications require that a client's requests consistently go to the same backend server (e.g., shopping cart stored in server memory).
Methods:
-
Cookie-based: LB inserts a cookie identifying the backend server.
Set-Cookie: SERVERID=server-a; Path=/ -
IP-based: Use client IP hash (fragile with NAT/proxies).
-
Application-controlled: Application issues a session token; LB uses it for routing.
Trade-off: Sticky sessions reduce load balancing effectiveness and can cause uneven load distribution. Prefer stateless architecture with external session stores.
Load Balancer High Availability
The load balancer itself can be a single point of failure. Solutions:
Active-Passive (Failover)
- The passive LB monitors the active LB via heartbeats.
- On failure, the passive LB takes over the Virtual IP (VIP).
Active-Active
- Both LBs handle traffic simultaneously.
- DNS or upstream routing distributes traffic across LBs.
- Better resource utilization than active-passive.
Software vs Hardware Load Balancers
| Aspect | Hardware LB | Software LB |
|---|---|---|
| Performance | Extremely high (ASICs) | High (general-purpose CPU) |
| Cost | Very expensive ($10K-$100K+) | Free or low cost |
| Flexibility | Limited | Highly configurable |
| Deployment | Physical appliance | VM, container, or process |
| Examples | F5 BIG-IP, Citrix ADC | Nginx, HAProxy, Envoy |
| Scaling | Buy more hardware | Add more instances |
Popular Load Balancer Technologies
| Technology | Type | Key Features |
|---|---|---|
| Nginx | Software L7 | Reverse proxy, caching, SSL termination |
| HAProxy | Software L4/L7 | High performance, TCP and HTTP |
| Envoy | Software L4/L7 | Service mesh, gRPC, observability |
| Traefik | Software L7 | Auto-discovery, Docker/K8s native |
| AWS ALB | Cloud L7 | Managed, integrates with AWS services |
| AWS NLB | Cloud L4 | Ultra-low latency, static IPs |
| AWS ELB (Classic) | Cloud L4/L7 | Legacy, basic load balancing |
| Google Cloud LB | Cloud L4/L7 | Global, anycast IPs |
| Azure Load Balancer | Cloud L4 | Regional, zone-redundant |
Global Server Load Balancing (GSLB)
Distributes traffic across data centers in different geographic regions.
Methods:
- GeoDNS: DNS returns different IPs based on the client's geographic location.
- Anycast: Multiple data centers advertise the same IP; BGP routing directs traffic to the nearest one.
- Latency-based routing: DNS resolves to the data center with the lowest measured latency.
Load Balancing Patterns in Practice
Pattern 1: API Gateway + Internal LB
Pattern 2: Service Mesh (Client-Side LB)
Instead of a centralized LB, each service has a sidecar proxy (e.g., Envoy) that handles load balancing.
Summary
| Concept | Key Point |
|---|---|
| Purpose | Distribute traffic, improve availability and performance |
| L4 vs L7 | L4 = fast, simple; L7 = smart, content-aware |
| Best algorithm | Depends on workload; Least Connections is often a good default |
| Health checks | Essential — active + passive for robustness |
| HA | Use active-passive or active-active LB pairs |
| Sticky sessions | Avoid if possible; use stateless design |
| GSLB | For multi-region deployments |
Related Notes
- Scalability — Load balancing enables horizontal scaling
- Consistent Hashing — Advanced hash-based routing algorithm
- Proxies — Reverse proxies as load balancers
- Networking Basics — L4/L7 networking fundamentals
- CDN — Global traffic distribution at the edge