2. Load Balancing

title: 2. Load Balancing

A load balancer distributes incoming network traffic across multiple servers to ensure no single server bears too much demand. It improves availability, reliability, throughput, and response time of applications.

Why Load Balancing?

Without a load balancer:

With a load balancer:

Benefits:

High availability: If one server fails, traffic is rerouted to healthy servers.
Better throughput: Multiple servers handle requests in parallel.
Lower latency: Requests go to the closest or least-loaded server.
Flexibility: Add/remove servers without downtime.
Security: LB can hide internal server topology from the public internet.

Where Load Balancers Sit

Load balancers can be placed at multiple layers of a system:

Types of Load Balancers

Layer 4 (Transport Layer) Load Balancer

Operates at the TCP/UDP level. Routes traffic based on:

Source/destination IP address
Source/destination port number
Protocol type

Characteristics:

Very fast — inspects only packet headers, not content.
Cannot make routing decisions based on URL, cookies, or HTTP headers.
Low resource overhead.
Suitable for non-HTTP protocols (databases, game servers, etc.).

Layer 7 (Application Layer) Load Balancer

Operates at the HTTP/HTTPS level. Routes traffic based on:

URL path (/api/* → API servers, /static/* → CDN)
HTTP headers (Host, User-Agent, custom headers)
Cookies (session affinity)
Request body content
HTTP method (GET, POST, etc.)

Characteristics:

More intelligent routing — content-aware decisions.
Can terminate SSL/TLS.
Can modify requests/responses (add headers, rewrite URLs).
Higher resource overhead than L4.
Supports advanced features: A/B testing, canary deployments.

Comparison

Feature	Layer 4 LB	Layer 7 LB
OSI Layer	Transport (TCP/UDP)	Application (HTTP/HTTPS)
Speed	Very fast	Slower (inspects content)
Intelligence	Basic (IP + port)	Rich (URL, headers, cookies)
SSL Termination	No	Yes
Content Routing	No	Yes
Protocol Support	Any TCP/UDP	HTTP/HTTPS mainly
Resource Cost	Low	Higher
Use Case	High-throughput, simple routing	Web apps, API gateways

Load Balancing Algorithms

1. Round Robin

Requests are distributed to servers sequentially in a circular order.

Pros: Simple, fair distribution for homogeneous servers.
Cons: Ignores server load; assumes all servers are equal.

2. Weighted Round Robin

Like Round Robin, but servers get traffic proportional to their assigned weights.

Server A (weight 5): Gets 5 out of every 8 requests
Server B (weight 2): Gets 2 out of every 8 requests
Server C (weight 1): Gets 1 out of every 8 requests

Use case: Heterogeneous server fleet (different hardware specs).

3. Least Connections

Routes to the server with the fewest active connections.

Server A: 12 active connections
Server B: 5 active connections    ← New request goes here
Server C: 8 active connections

Pros: Adapts to varying request durations.
Cons: Requires tracking connection count per server.
Best for: Long-lived connections (WebSockets, database connections).

4. Weighted Least Connections

Combines least connections with server weights.

The server with the lowest score gets the next request.

5. Least Response Time

Routes to the server with the fastest response time and fewest connections.

Pros: Optimizes for user experience.
Cons: Requires continuous latency monitoring.

6. IP Hash

The client's IP address is hashed to determine which server receives the request.

Pros: Same client always goes to the same server (session persistence).
Cons: Uneven distribution if IP addresses are clustered.

7. Consistent Hashing

A more advanced form of hashing that minimizes redistribution when servers are added or removed. (See: Consistent Hashing)

8. Random

Each request is sent to a randomly selected server.

Pros: Simple; with large numbers, approaches even distribution.
Cons: No guarantee of fairness in the short term.

9. Resource-Based (Adaptive)

Servers report their current resource utilization (CPU, memory, disk). The LB routes to the server with the most available resources.

Pros: Most accurate load distribution.
Cons: Complex; requires health reporting agents on servers.

Health Checks

Load balancers must detect unhealthy servers to stop sending traffic to them.

Types of Health Checks

Type	Description	Example
Passive	Monitors real traffic for errors	Track 5xx responses
Active	Sends periodic probe requests	`GET /health` every 10s
Deep (L7)	Checks application logic and dependencies	Verify DB connectivity
Shallow (L4)	Checks if the port is open	TCP SYN to port 80

Health Check Parameters

Interval: How often to check (e.g., every 10 seconds).
Timeout: How long to wait for a response (e.g., 5 seconds).
Healthy threshold: Consecutive successes to mark healthy (e.g., 3).
Unhealthy threshold: Consecutive failures to mark unhealthy (e.g., 2).

Session Persistence (Sticky Sessions)

Some applications require that a client's requests consistently go to the same backend server (e.g., shopping cart stored in server memory).

Methods:

Cookie-based: LB inserts a cookie identifying the backend server.
```
Set-Cookie: SERVERID=server-a; Path=/
```
IP-based: Use client IP hash (fragile with NAT/proxies).
Application-controlled: Application issues a session token; LB uses it for routing.

Trade-off: Sticky sessions reduce load balancing effectiveness and can cause uneven load distribution. Prefer stateless architecture with external session stores.

Load Balancer High Availability

The load balancer itself can be a single point of failure. Solutions:

Active-Passive (Failover)

The passive LB monitors the active LB via heartbeats.
On failure, the passive LB takes over the Virtual IP (VIP).

Active-Active

Both LBs handle traffic simultaneously.
DNS or upstream routing distributes traffic across LBs.
Better resource utilization than active-passive.

Software vs Hardware Load Balancers

Aspect	Hardware LB	Software LB
Performance	Extremely high (ASICs)	High (general-purpose CPU)
Cost	Very expensive ($10K-$100K+)	Free or low cost
Flexibility	Limited	Highly configurable
Deployment	Physical appliance	VM, container, or process
Examples	F5 BIG-IP, Citrix ADC	Nginx, HAProxy, Envoy
Scaling	Buy more hardware	Add more instances

Popular Load Balancer Technologies

Technology	Type	Key Features
Nginx	Software L7	Reverse proxy, caching, SSL termination
HAProxy	Software L4/L7	High performance, TCP and HTTP
Envoy	Software L4/L7	Service mesh, gRPC, observability
Traefik	Software L7	Auto-discovery, Docker/K8s native
AWS ALB	Cloud L7	Managed, integrates with AWS services
AWS NLB	Cloud L4	Ultra-low latency, static IPs
AWS ELB (Classic)	Cloud L4/L7	Legacy, basic load balancing
Google Cloud LB	Cloud L4/L7	Global, anycast IPs
Azure Load Balancer	Cloud L4	Regional, zone-redundant

Global Server Load Balancing (GSLB)

Distributes traffic across data centers in different geographic regions.

Methods:

GeoDNS: DNS returns different IPs based on the client's geographic location.
Anycast: Multiple data centers advertise the same IP; BGP routing directs traffic to the nearest one.
Latency-based routing: DNS resolves to the data center with the lowest measured latency.

Load Balancing Patterns in Practice

Pattern 1: API Gateway + Internal LB

Pattern 2: Service Mesh (Client-Side LB)

Instead of a centralized LB, each service has a sidecar proxy (e.g., Envoy) that handles load balancing.

Summary

Concept	Key Point
Purpose	Distribute traffic, improve availability and performance
L4 vs L7	L4 = fast, simple; L7 = smart, content-aware
Best algorithm	Depends on workload; Least Connections is often a good default
Health checks	Essential — active + passive for robustness
HA	Use active-passive or active-active LB pairs
Sticky sessions	Avoid if possible; use stateless design
GSLB	For multi-region deployments

Related Notes

Scalability — Load balancing enables horizontal scaling
Consistent Hashing — Advanced hash-based routing algorithm
Proxies — Reverse proxies as load balancers
Networking Basics — L4/L7 networking fundamentals
CDN — Global traffic distribution at the edge