Pick a color scheme
ad placeholder image ad placeholder image

Load Balancing: Distributing Traffic Across Servers

Load balancing is the process of distributing network traffic across multiple servers to ensure no single server becomes overwhelmed, improving performance, availability, and scalability. Understanding load balancing and its relationship with IP addresses is essential for building reliable, high-performance applications. This comprehensive guide explains load balancing concepts, algorithms, and implementation.

What is Load Balancing?

Load balancing distributes incoming network traffic across multiple backend servers (also called a server pool or server farm) to optimize resource utilization, maximize throughput, minimize response time, and avoid overload. Learn more about anycast and CDN.

Why Load Balancing?

Without load balancing:

All traffic → Single server
Problems:
- Single point of failure
- Limited capacity
- Poor performance under load
- No redundancy
- Downtime during maintenance

With load balancing:

Traffic → Load balancer → Multiple servers
Benefits:
- High availability
- Horizontal scaling
- Better performance
- Redundancy
- Zero-downtime deployments

Example:

Single server:
- Capacity: 1,000 requests/second
- Failure: Complete outage
- Maintenance: Downtime required

Load balanced (3 servers):
- Capacity: 3,000 requests/second
- Failure: 2 servers continue
- Maintenance: Rolling updates, no downtime

Load Balancing and IP Addresses

Virtual IP (VIP)

Concept:

Virtual IP: Single IP address for load balancer
Backend servers: Multiple private IPs
Clients: Connect to VIP
Load balancer: Distributes to backends

Example:
VIP: 203.0.113.100
Backend 1: 10.0.1.10
Backend 2: 10.0.1.11
Backend 3: 10.0.1.12

DNS configuration:

www.example.com.  A  203.0.113.100

User connects to: 203.0.113.100
Load balancer routes to: 10.0.1.10, 10.0.1.11, or 10.0.1.12
Transparent: User doesn't see backend IPs

NAT and IP Translation

Source NAT (SNAT):

Client IP: 198.51.100.50
Load balancer: Translates to its IP
Backend sees: Load balancer IP (10.0.1.1)
Response: Returns to load balancer
Load balancer: Sends to client

Destination NAT (DNAT):

Client connects: 203.0.113.100 (VIP)
Load balancer: Translates to backend IP
Backend receives: From client IP
Backend responds: Directly to client (DSR)
Or: Through load balancer

Direct Server Return (DSR):

Request: Client → Load balancer → Backend
Response: Backend → Client (direct)
Benefit: Load balancer not bottleneck for responses
Requirement: Backend has VIP configured

IP Hash Load Balancing

Concept:

Algorithm: Hash client IP
Result: Consistent server selection
Same client: Always same server
Benefit: Session persistence

Example:

Client IP: 198.51.100.50
Hash: hash(198.51.100.50) % 3 = 1
Server: Backend 2 (always)

Client IP: 198.51.100.51
Hash: hash(198.51.100.51) % 3 = 0
Server: Backend 1 (always)

Load Balancing Algorithms

Round Robin

How it works:

Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1 (cycle repeats)

Characteristics:

Simple: Easy to implement
Fair: Equal distribution
Stateless: No session tracking
Best for: Similar server capacity, stateless apps

Weighted Round Robin:

Server 1 (weight 3): Gets 3 requests
Server 2 (weight 2): Gets 2 requests
Server 3 (weight 1): Gets 1 request
Cycle: 1, 1, 1, 2, 2, 3, repeat

Use: Different server capacities

Least Connections

How it works:

Server 1: 10 active connections
Server 2: 15 active connections
Server 3: 8 active connections
New request → Server 3 (least connections)

Characteristics:

Dynamic: Adapts to load
Fair: Balances active connections
Best for: Long-lived connections
Example: WebSockets, database connections

Weighted Least Connections:

Server 1: 10 connections, weight 2 → ratio 5
Server 2: 15 connections, weight 3 → ratio 5
Server 3: 8 connections, weight 1 → ratio 8
New request → Server 1 or 2 (lowest ratio)

IP Hash

How it works:

Hash client IP: hash(client_ip) % server_count
Result: Consistent server selection
Same client: Always same server

Characteristics:

Persistent: Client affinity
Stateful: Session persistence
Limitation: Uneven distribution possible
Best for: Session-based applications

Example:

def ip_hash(client_ip, server_count):
    hash_value = hash(client_ip)
    return hash_value % server_count

# Client 198.51.100.50 always goes to same server
server = ip_hash("198.51.100.50", 3)  # Always returns same value

Least Response Time

How it works:

Server 1: 50ms average response time
Server 2: 75ms average response time
Server 3: 60ms average response time
New request → Server 1 (fastest)

Characteristics:

Performance-based: Routes to fastest server
Dynamic: Adapts to performance
Monitoring: Requires health checks
Best for: Performance-critical applications

Random

How it works:

Randomly select server
No pattern
Simple implementation

Characteristics:

Simple: Minimal overhead
Fair: Over time, even distribution
Stateless: No tracking needed
Best for: Simple use cases

URL Hash

How it works:

Hash URL path: hash(url_path) % server_count
Same URL: Always same server
Benefit: Cache efficiency

Example:

/images/logo.png → Server 1 (always)
/images/banner.jpg → Server 2 (always)
/css/style.css → Server 3 (always)

Benefit: Server caches specific content

Types of Load Balancers

Layer 4 (Transport Layer)

Characteristics:

Layer: TCP/UDP
Decision: Based on IP, port
Fast: Minimal processing
Protocol: TCP, UDP
Content: Not inspected

How it works:

Client connects: 203.0.113.100:80
Load balancer: Checks IP and port only
Routes to: Backend server
Connection: Maintained

Advantages:

Fast: Low latency
Simple: Easy configuration
Protocol agnostic: Works with any TCP/UDP
Efficient: Minimal overhead

Disadvantages:

No content awareness: Can't route by URL
Limited: Basic load balancing only
Session: Requires IP hash or sticky sessions

Use cases:

Database connections
Generic TCP services
High-performance requirements
Protocol-agnostic load balancing

Layer 7 (Application Layer)

Characteristics:

Layer: HTTP/HTTPS
Decision: Based on content (URL, headers, cookies)
Flexible: Advanced routing
Protocol: HTTP-specific
Content: Inspected

How it works:

Client requests: https://example.com/api/users
Load balancer: Inspects HTTP request
Routing decision: Based on /api/ path
Routes to: API server pool
Different path: Different pool

Routing examples:

URL-based:
/api/* → API servers
/static/* → Static content servers
/admin/* → Admin servers

Header-based:
User-Agent: Mobile → Mobile-optimized servers
Accept-Language: ja → Japanese servers

Cookie-based:
session_id → Sticky session to same server

Advantages:

Content-aware: Route by URL, headers
Flexible: Advanced routing rules
SSL termination: Decrypt at load balancer
Caching: Can cache responses
WAF: Web application firewall

Disadvantages:

Slower: More processing
Complex: More configuration
HTTP-specific: Only for HTTP/HTTPS
Overhead: Higher resource usage

Use cases:

Web applications
Microservices routing
API gateways
Content-based routing
SSL termination

Load Balancer Solutions

Hardware Load Balancers

F5 BIG-IP:

Type: Hardware appliance
Performance: Very high
Features: Advanced (WAF, SSL, caching)
Cost: Expensive ($10,000+)
Use: Enterprise, high-traffic

Citrix ADC (NetScaler):

Type: Hardware/software
Performance: High
Features: Application delivery
Cost: Expensive
Use: Enterprise

Characteristics:

Performance: Dedicated hardware
Reliability: High availability
Cost: High upfront cost
Maintenance: Vendor support
Scalability: Limited by hardware

Software Load Balancers

HAProxy:

Type: Open source
Layer: 4 and 7
Performance: Very high
Cost: Free
Configuration: Text-based

Example configuration:

frontend http_front
    bind *:80
    default_backend http_back

backend http_back
    balance roundrobin
    server server1 10.0.1.10:80 check
    server server2 10.0.1.11:80 check
    server server3 10.0.1.12:80 check

Nginx:

Type: Open source
Layer: 7 (HTTP)
Performance: High
Cost: Free (Plus version paid)
Features: Reverse proxy, caching

Example configuration:

upstream backend {
    least_conn;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
    server 10.0.1.12:80;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

Apache mod_proxy_balancer:

Type: Apache module
Layer: 7
Performance: Moderate
Cost: Free
Integration: Apache ecosystem

Cloud Load Balancers

AWS Elastic Load Balancing:

Types:
- Application Load Balancer (Layer 7)
- Network Load Balancer (Layer 4)
- Classic Load Balancer (Legacy)

Features:
- Auto-scaling integration
- Health checks
- SSL termination
- Multiple AZs

Pricing: Pay per hour + data processed

Google Cloud Load Balancing:

Types:
- HTTP(S) Load Balancing (Layer 7)
- TCP/UDP Load Balancing (Layer 4)
- Internal Load Balancing

Features:
- Global load balancing
- Anycast IP
- Auto-scaling
- CDN integration

Pricing: Pay per hour + data processed

Azure Load Balancer:

Types:
- Application Gateway (Layer 7)
- Load Balancer (Layer 4)

Features:
- Zone redundancy
- Health probes
- Auto-scaling
- Integration with Azure services

Pricing: Pay per hour + data processed

Cloudflare Load Balancing:

Type: DNS-based + Anycast
Layer: 7
Features:
- Global load balancing
- Health checks
- Geo-steering
- Failover

Pricing: $5/month per origin

Health Checks and Monitoring

Health Check Types

TCP check:

Method: Connect to port
Success: Connection established
Failure: Connection refused/timeout
Fast: Minimal overhead
Limited: Only checks port open

HTTP check:

Method: HTTP GET request
Success: 200 OK response
Failure: Non-200 or timeout
Flexible: Can check specific URL
Content: Can verify response content

Custom check:

Method: Application-specific
Success: Custom criteria
Example: Database query, API call
Thorough: Checks actual functionality

Health Check Configuration

HAProxy:

backend http_back
    option httpchk GET /health
    server server1 10.0.1.10:80 check inter 2000 rise 2 fall 3
    server server2 10.0.1.11:80 check inter 2000 rise 2 fall 3

# inter: Check interval (2 seconds)
# rise: Healthy after 2 successful checks
# fall: Unhealthy after 3 failed checks

Nginx:

upstream backend {
    server 10.0.1.10:80 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:80 max_fails=3 fail_timeout=30s;
}

# max_fails: Mark unhealthy after 3 failures
# fail_timeout: Retry after 30 seconds

AWS ALB:

Health check path: /health
Interval: 30 seconds
Timeout: 5 seconds
Healthy threshold: 2
Unhealthy threshold: 2
Success codes: 200

Monitoring Metrics

Key metrics:

Request rate: Requests per second
Response time: Average latency
Error rate: 4xx, 5xx errors
Connection count: Active connections
Backend health: Healthy/unhealthy servers
Traffic distribution: Per-server traffic

Alerting:

High error rate: >5% errors
Slow response: >1 second average
Backend down: Server unhealthy
Uneven distribution: Imbalanced load
Capacity: >80% utilization

Session Persistence (Sticky Sessions)

Why Needed

Stateful applications:

Problem: Session data on specific server
Load balancer: May route to different server
Result: Session lost, user logged out
Solution: Sticky sessions

Implementation Methods

Cookie-based:

Load balancer: Sets cookie with server ID
Client: Sends cookie with requests
Load balancer: Routes to same server
Example: SERVERID=server1

IP-based:

Method: IP hash algorithm
Same IP: Always same server
Limitation: NAT, proxies

Session ID:

Application: Generates session ID
Load balancer: Routes by session ID
Consistent: Same session, same server

Configuration Examples

HAProxy:

backend http_back
    balance roundrobin
    cookie SERVERID insert indirect nocache
    server server1 10.0.1.10:80 cookie server1 check
    server server2 10.0.1.11:80 cookie server2 check

Nginx:

upstream backend {
    ip_hash;
    server 10.0.1.10:80;
    server 10.0.1.11:80;
}

AWS ALB:

Target group settings:
Stickiness: Enabled
Duration: 1 day
Cookie name: AWSALB (automatic)

Drawbacks

Limitations:

Uneven distribution: Some servers overloaded
Scalability: Harder to add/remove servers
Failover: Session lost if server fails
Better: Stateless applications with shared session storage

Alternatives:

Shared session storage: Redis, Memcached
Database sessions: Centralized storage
JWT tokens: Stateless authentication
Sticky sessions: Last resort

Best Practices

Design

1. Use stateless applications:

Session storage: External (Redis, database)
Authentication: JWT tokens
State: Client-side or shared storage
Benefit: Any server can handle any request

2. Health checks:

Implement: /health endpoint
Check: Application functionality
Fast: <100ms response
Comprehensive: Database, dependencies

3. Graceful degradation:

Partial failure: Continue with remaining servers
Circuit breaker: Stop sending to failed servers
Retry: With exponential backoff
Fallback: Default responses

Configuration

1. Choose right algorithm:

Stateless apps: Round robin, least connections
Stateful apps: IP hash, sticky sessions
Performance-critical: Least response time
Cache efficiency: URL hash

2. Set appropriate timeouts:

Connection timeout: 5-10 seconds
Read timeout: 30-60 seconds
Health check: 2-5 seconds
Keep-alive: 60 seconds

3. Monitor and alert:

Metrics: Request rate, errors, latency
Alerts: Threshold-based
Dashboards: Real-time visibility
Logs: Centralized logging

Security

1. SSL termination:

Load balancer: Decrypt SSL
Backend: HTTP (internal network)
Benefit: Reduced backend load
Certificate: Managed at load balancer

2. DDoS protection:

Rate limiting: Requests per IP
Connection limits: Max connections
Geo-blocking: Block countries
WAF: Web application firewall

3. Access control:

Whitelist: Allowed IPs
Blacklist: Blocked IPs
Authentication: For admin endpoints
Firewall: Restrict backend access

Troubleshooting

Common Issues

Uneven distribution:

Cause: Sticky sessions, long-lived connections
Solution: Adjust algorithm, connection draining
Monitor: Per-server traffic

Backend unavailable:

Symptom: 502/503 errors
Cause: Backend down, health check failing
Solution: Check backend health, logs
Fix: Restart backend, fix application

Slow response:

Cause: Backend overloaded, network latency
Solution: Add servers, optimize application
Monitor: Response time per server

Session loss:

Cause: No sticky sessions, server failure
Solution: Implement sticky sessions or shared storage
Better: Use stateless design

Conclusion

Load balancing is essential for building scalable, highly available applications. By distributing traffic across multiple servers, load balancers improve performance, provide redundancy, and enable horizontal scaling. Understanding load balancing algorithms, types, and best practices ensures optimal application performance and reliability.


Related Articles

Infrastructure

Network Concepts

Security and Reliability

Explore More

Key takeaways: - Load balancing: Distributes traffic across servers - VIP: Single IP for multiple backend servers - Algorithms: Round robin, least connections, IP hash - Layer 4: Fast, IP/port-based - Layer 7: Flexible, content-based routing - Health checks: Monitor backend server health - Sticky sessions: Route same client to same server - Solutions: HAProxy, Nginx, cloud load balancers - Stateless: Preferred application design - Monitoring: Essential for reliability - SSL termination: Offload to load balancer - High availability: Eliminates single point of failure

Implement load balancing to distribute traffic across multiple servers, improving performance and availability. Use Layer 7 load balancers (HAProxy, Nginx, AWS ALB) for HTTP applications with content-based routing, or Layer 4 for high-performance TCP/UDP load balancing. Design stateless applications with external session storage (Redis) rather than relying on sticky sessions. Configure health checks to automatically remove failed servers, and monitor metrics to ensure even distribution and optimal performance.

ad placeholder image ad placeholder image
Three funny piglies - an illustration ippigly.com