What is a load balancer? Plain-English guide to scaling traffic
When one server can't handle the traffic, you put a load balancer in front of many. Here's how load balancers work, what algorithms they use, and where they fit alongside CDNs and DNS.
Every popular website you visit is, behind the scenes, a fleet of servers. The trick that makes them look like one server with one IP and one URL is a load balancer. It sits in front of the fleet, takes incoming requests, and decides which backend server handles each one.
Without load balancers, the modern internet doesn't exist. With them, sites scale to billions of users on commodity hardware.
The two-line summary
A load balancer is a piece of software (or hardware) that distributes incoming traffic across multiple backend servers. The goals: keep any one server from being overloaded, route around servers that have failed, and let you scale by adding more servers without changing the public-facing endpoint.
That's it. The interesting bits are the algorithms, the failure modes, and how it interacts with the rest of the stack.
Why you need one
Picture a website serving 1,000 requests per second. One reasonably-sized server can usually handle that. Now your traffic doubles. Then doubles again. Now you need 4 servers — but users have one URL. How do you spread requests across those 4 servers?
Three options:
- DNS round-robin. Return different server IPs in DNS responses. Simple, but DNS caching breaks the load distribution, and unhealthy servers keep getting traffic until DNS records propagate (minutes to hours).
- Client-side load balancing. Have the client pick a server. Requires every client to be aware of every server. Doesn't scale operationally.
- A load balancer. Single public IP, internally routes to backend pool. The standard.
Option 3 wins for almost everything.
How load balancers route requests
There are several routing algorithms, each with tradeoffs.
Round-robin
Simplest. Cycle through backends one at a time: request 1 → server A, request 2 → server B, request 3 → server C, request 4 → server A again. Works for stateless services where every server is identical.
Least connections
Send each new request to the backend currently handling the fewest open connections. Better when requests have variable durations.
Weighted round-robin / weighted least connections
Same as above, but with weights. Server A is twice as powerful → it gets twice the traffic share.
IP hash / consistent hashing
Hash the client's IP and pick a backend deterministically. Same client always hits the same backend, useful for session affinity. Consistent hashing minimizes disruption when backends are added or removed.
Layer 7 routing
Inspect the request itself — URL path, headers, cookies — and route based on that. /api/* to the API fleet, /static/* to a different cache, /admin/* to the admin servers. This is what powers microservice architectures.
Layer 4 vs Layer 7
Two architectural styles:
Layer 4 (transport-level)
Operates on TCP/UDP packets. Doesn't look inside the packets — just balances connections. Fast, simple, protocol-agnostic. Examples: HAProxy in TCP mode, AWS Network Load Balancer, MetalLB on Kubernetes.
Layer 7 (application-level)
Speaks HTTP. Reads requests, can modify them, can route on path/host/headers. Slower than L4 (because it's doing more work per request) but enormously more flexible. Examples: nginx, HAProxy in HTTP mode, AWS Application Load Balancer, Cloudflare's load balancing, Envoy.
For most web services, L7 is the right choice. For raw TCP services (databases, message queues), L4.
What load balancers do besides routing
Modern load balancers do far more than pick a backend.
Health checking
The LB periodically probes each backend ("is /healthz returning 200?"). Backends that fail get removed from the pool. When they recover, they're re-added. Users never see traffic go to a sick server.
TLS termination
The LB handles HTTPS — terminates TLS, forwards plain HTTP to the backend. Centralizes cert management; backends don't deal with TLS at all (or get re-encrypted via mutual TLS for defense in depth).
Caching
Some LBs (or LB-CDN hybrids) cache responses, especially for static content. Reduces backend load, speeds up clients.
Connection draining
When you remove a backend (deployment, scale-down), the LB stops sending it new requests but lets existing connections finish. Users in mid-request don't get cut off.
Sticky sessions
Some apps need the same user to stick to the same backend (in-memory session state, in-memory cache). The LB can pin via cookie or hash.
Rate limiting and DDoS protection
Block clients that exceed thresholds. Modern LBs increasingly include WAF capabilities.
Where load balancers fit in the stack
A typical modern web stack:
[User] → [DNS] → [CDN edge] → [Origin LB] → [Origin servers] → [Database]
Each layer can include load balancing of its own:
- DNS-level load balancing distributes globally across regions (often via anycast — see our anycast post for the details).
- CDN edge load-balances across edge servers within a data center.
- Origin LB balances across application servers in your fleet.
- Service mesh (in microservice architectures) balances internal RPCs.
- Database proxy (PgBouncer, ProxySQL) balances connections.
Big sites have load balancers at every level.
Common load balancer types
| Type | Examples | Best for |
|---|---|---|
| Hardware appliances | F5 Big-IP, Citrix NetScaler | Large enterprises with budget |
| Cloud-managed | AWS ALB/NLB, GCP Load Balancing, Azure Front Door | Cloud-native deployments |
| Software / open source | nginx, HAProxy, Caddy, Traefik | Self-hosted, custom configs |
| CDN-integrated | Cloudflare LB, Fastly LB | Multi-region with anycast routing |
| Service mesh | Envoy (in Istio/Linkerd), Consul Connect | Microservices internal traffic |
For a small site: a single nginx or Caddy in front of two backends is enough. For a large site: cloud-managed LBs handle the operational complexity.
Common failure modes
Single point of failure
The LB itself can fail. Solution: run multiple LBs in active-active or active-standby. Failover usually via floating IP, ECMP, or DNS-level health checks.
Hot spots
If a load balancer hashes by client IP and a CGNAT pool sends millions of clients from one shared IP, that one IP can land all on the same backend. Result: imbalanced load. Solution: smarter hashing (path + IP, etc.) or connection-count-based balancing.
Health-check flapping
If the health check is too aggressive, a brief slow response marks a backend down, causing it to be removed and re-added repeatedly. Solution: thresholds and hysteresis (require N consecutive failures before marking unhealthy, M consecutive successes before marking healthy).
Sticky session traps
If users stick to backends and you remove a backend, those users' sessions break. Solution: prefer stateless backends (sessions in shared store like Redis), or accept that scaling-down requires brief user disruption.
Quick FAQ
Does Cloudflare load-balance my site automatically? Within their network, yes — they pick which edge server handles your request. Across your origin pool, only if you configure their Load Balancing product.
What's the difference between a CDN and a load balancer? A CDN is a globally-distributed cache and edge platform. A load balancer is one piece of that platform (or a separate component). All major CDNs include load balancing; not all load balancers are CDNs.
Do I need a load balancer for a small site? If you have one server, no — there's nothing to balance. If you have two, yes — even a simple HAProxy is worth it for the health-check and zero-downtime deployment benefits.
What about Kubernetes? Kubernetes Services and Ingress controllers are essentially specialized load balancers. The same algorithms and failure modes apply.
Can a load balancer be a security risk? The LB centralizes traffic, so a compromised LB is bad. But it's also a great place to enforce security: TLS, WAF, rate limiting. The benefits usually outweigh the centralization risk.
TL;DR
- A load balancer distributes incoming requests across multiple backends.
- Modern LBs also do health checking, TLS termination, caching, and DDoS protection.
- L4 (transport) is fast and protocol-agnostic; L7 (HTTP) is more flexible.
- They're the unsung hero of every site that survives traffic.
If a website seems to never go down, there's a load balancer (probably several) doing the work.