2. Load Balancing

A load balancer distributes incoming network traffic across multiple servers to ensure no single server bears too much demand. It improves availability, reliability, throughput, and response time of applications.


Why Load Balancing?

Without a load balancer:

Clients

Single Server (overloaded, single point of failure)

With a load balancer:

Clients

LB

Server A

Server B

Server C

Server D

Benefits:

  • High availability: If one server fails, traffic is rerouted to healthy servers.
  • Better throughput: Multiple servers handle requests in parallel.
  • Lower latency: Requests go to the closest or least-loaded server.
  • Flexibility: Add/remove servers without downtime.
  • Security: LB can hide internal server topology from the public internet.

Where Load Balancers Sit

Load balancers can be placed at multiple layers of a system:

Internet

DNS Load Bal.\nLayer 1: DNS

Global LB\nLayer 2: Edge / CDN

L7 / L4 LB\nLayer 3: Clients ↔ Web

Web Srv 1

Web Srv 2

Web Srv 3

Internal LB\nLayer 4: Web ↔ App

App Srv 1

App Srv 2

App Srv 3

Database LB\nLayer 5: App ↔ DB

DB Read 1

DB Read 2

DB Primary


Types of Load Balancers

Layer 4 (Transport Layer) Load Balancer

Operates at the TCP/UDP level. Routes traffic based on:

  • Source/destination IP address
  • Source/destination port number
  • Protocol type

Characteristics:

  • Very fast — inspects only packet headers, not content.
  • Cannot make routing decisions based on URL, cookies, or HTTP headers.
  • Low resource overhead.
  • Suitable for non-HTTP protocols (databases, game servers, etc.).

TCP SYN

TCP SYN

Client

L4 LB

Selected Backend

Layer 7 (Application Layer) Load Balancer

Operates at the HTTP/HTTPS level. Routes traffic based on:

  • URL path (/api/* → API servers, /static/* → CDN)
  • HTTP headers (Host, User-Agent, custom headers)
  • Cookies (session affinity)
  • Request body content
  • HTTP method (GET, POST, etc.)

Characteristics:

  • More intelligent routing — content-aware decisions.
  • Can terminate SSL/TLS.
  • Can modify requests/responses (add headers, rewrite URLs).
  • Higher resource overhead than L4.
  • Supports advanced features: A/B testing, canary deployments.

HTTPS GET /api/users

HTTP GET /api/users

HTTPS GET /images/logo

HTTP GET /images/logo

Client

L7 LB

API Server Pool

Client

Static Server Pool

Comparison

Feature Layer 4 LB Layer 7 LB
OSI Layer Transport (TCP/UDP) Application (HTTP/HTTPS)
Speed Very fast Slower (inspects content)
Intelligence Basic (IP + port) Rich (URL, headers, cookies)
SSL Termination No Yes
Content Routing No Yes
Protocol Support Any TCP/UDP HTTP/HTTPS mainly
Resource Cost Low Higher
Use Case High-throughput, simple routing Web apps, API gateways

Load Balancing Algorithms

1. Round Robin

Requests are distributed to servers sequentially in a circular order.

Request 1

Server A

Request 2

Server B

Request 3

Server C

Request 4

  • Pros: Simple, fair distribution for homogeneous servers.
  • Cons: Ignores server load; assumes all servers are equal.

2. Weighted Round Robin

Like Round Robin, but servers get traffic proportional to their assigned weights.

Server A (weight 5): Gets 5 out of every 8 requests
Server B (weight 2): Gets 2 out of every 8 requests
Server C (weight 1): Gets 1 out of every 8 requests
  • Use case: Heterogeneous server fleet (different hardware specs).

3. Least Connections

Routes to the server with the fewest active connections.

Server A: 12 active connections
Server B: 5 active connections    ← New request goes here
Server C: 8 active connections
  • Pros: Adapts to varying request durations.
  • Cons: Requires tracking connection count per server.
  • Best for: Long-lived connections (WebSockets, database connections).

4. Weighted Least Connections

Combines least connections with server weights.

The server with the lowest score gets the next request.

5. Least Response Time

Routes to the server with the fastest response time and fewest connections.

  • Pros: Optimizes for user experience.
  • Cons: Requires continuous latency monitoring.

6. IP Hash

The client's IP address is hashed to determine which server receives the request.

  • Pros: Same client always goes to the same server (session persistence).
  • Cons: Uneven distribution if IP addresses are clustered.

7. Consistent Hashing

A more advanced form of hashing that minimizes redistribution when servers are added or removed. (See: Consistent Hashing)

8. Random

Each request is sent to a randomly selected server.

  • Pros: Simple; with large numbers, approaches even distribution.
  • Cons: No guarantee of fairness in the short term.

9. Resource-Based (Adaptive)

Servers report their current resource utilization (CPU, memory, disk). The LB routes to the server with the most available resources.

  • Pros: Most accurate load distribution.
  • Cons: Complex; requires health reporting agents on servers.

Health Checks

Load balancers must detect unhealthy servers to stop sending traffic to them.

Types of Health Checks

Type Description Example
Passive Monitors real traffic for errors Track 5xx responses
Active Sends periodic probe requests GET /health every 10s
Deep (L7) Checks application logic and dependencies Verify DB connectivity
Shallow (L4) Checks if the port is open TCP SYN to port 80

Health Check Parameters

  • Interval: How often to check (e.g., every 10 seconds).
  • Timeout: How long to wait for a response (e.g., 5 seconds).
  • Healthy threshold: Consecutive successes to mark healthy (e.g., 3).
  • Unhealthy threshold: Consecutive failures to mark unhealthy (e.g., 2).

GET /health

200 OK ✅

GET /health

timeout ❌

GET /health

503 Error ❌

LB

Server A

Healthy

LB

Server B

Unhealthy after N failures

LB

Server C

Unhealthy


Session Persistence (Sticky Sessions)

Some applications require that a client's requests consistently go to the same backend server (e.g., shopping cart stored in server memory).

Methods:

  1. Cookie-based: LB inserts a cookie identifying the backend server.

    Set-Cookie: SERVERID=server-a; Path=/
    
  2. IP-based: Use client IP hash (fragile with NAT/proxies).

  3. Application-controlled: Application issues a session token; LB uses it for routing.

Trade-off: Sticky sessions reduce load balancing effectiveness and can cause uneven load distribution. Prefer stateless architecture with external session stores.


Load Balancer High Availability

The load balancer itself can be a single point of failure. Solutions:

Active-Passive (Failover)

heartbeat

takes over on failure

Clients

Active LB

Servers

Passive LB

  • The passive LB monitors the active LB via heartbeats.
  • On failure, the passive LB takes over the Virtual IP (VIP).

Active-Active

DNS RR

Clients

LB 1

Servers

LB 2

Servers

  • Both LBs handle traffic simultaneously.
  • DNS or upstream routing distributes traffic across LBs.
  • Better resource utilization than active-passive.

Software vs Hardware Load Balancers

Aspect Hardware LB Software LB
Performance Extremely high (ASICs) High (general-purpose CPU)
Cost Very expensive ($10K-$100K+) Free or low cost
Flexibility Limited Highly configurable
Deployment Physical appliance VM, container, or process
Examples F5 BIG-IP, Citrix ADC Nginx, HAProxy, Envoy
Scaling Buy more hardware Add more instances

Technology Type Key Features
Nginx Software L7 Reverse proxy, caching, SSL termination
HAProxy Software L4/L7 High performance, TCP and HTTP
Envoy Software L4/L7 Service mesh, gRPC, observability
Traefik Software L7 Auto-discovery, Docker/K8s native
AWS ALB Cloud L7 Managed, integrates with AWS services
AWS NLB Cloud L4 Ultra-low latency, static IPs
AWS ELB (Classic) Cloud L4/L7 Legacy, basic load balancing
Google Cloud LB Cloud L4/L7 Global, anycast IPs
Azure Load Balancer Cloud L4 Regional, zone-redundant

Global Server Load Balancing (GSLB)

Distributes traffic across data centers in different geographic regions.

User in Tokyo

DNS → Asia DC

Asia Servers

User in NYC

DNS → US-East DC

US Servers

User in London

DNS → EU DC

EU Servers

Methods:

  • GeoDNS: DNS returns different IPs based on the client's geographic location.
  • Anycast: Multiple data centers advertise the same IP; BGP routing directs traffic to the nearest one.
  • Latency-based routing: DNS resolves to the data center with the lowest measured latency.

Load Balancing Patterns in Practice

Pattern 1: API Gateway + Internal LB

Internet

API Gateway\n(L7 LB + auth + rate limiting)

User Svc

Order Svc

Payment Svc

Internal LB

DB Replicas

Internal LB

DB Shards

Internal LB

DB

Pattern 2: Service Mesh (Client-Side LB)

Instead of a centralized LB, each service has a sidecar proxy (e.g., Envoy) that handles load balancing.

Service A

Envoy Proxy

Envoy Proxy

Service B Instance 1

Envoy Proxy

Service B Instance 2

Envoy Proxy

Service B Instance 3


Summary

Concept Key Point
Purpose Distribute traffic, improve availability and performance
L4 vs L7 L4 = fast, simple; L7 = smart, content-aware
Best algorithm Depends on workload; Least Connections is often a good default
Health checks Essential — active + passive for robustness
HA Use active-passive or active-active LB pairs
Sticky sessions Avoid if possible; use stateless design
GSLB For multi-region deployments