What is a load balancer? Plain-English guide to scaling traffic

Every popular website you visit is, behind the scenes, a fleet of servers. The trick that makes them look like one server with one IP and one URL is a load balancer. It sits in front of the fleet, takes incoming requests, and decides which backend server handles each one.

Without load balancers, the modern internet doesn't exist. With them, sites scale to billions of users on commodity hardware.

The two-line summary

A load balancer is a piece of software (or hardware) that distributes incoming traffic across multiple backend servers. The goals: keep any one server from being overloaded, route around servers that have failed, and let you scale by adding more servers without changing the public-facing endpoint.

That's it. The interesting bits are the algorithms, the failure modes, and how it interacts with the rest of the stack.

Why you need one

Picture a website serving 1,000 requests per second. One reasonably-sized server can usually handle that. Now your traffic doubles. Then doubles again. Now you need 4 servers — but users have one URL. How do you spread requests across those 4 servers?

Three options:

DNS round-robin. Return different server IPs in DNS responses. Simple, but DNS caching breaks the load distribution, and unhealthy servers keep getting traffic until DNS records propagate (minutes to hours).
Client-side load balancing. Have the client pick a server. Requires every client to be aware of every server. Doesn't scale operationally.
A load balancer. Single public IP, internally routes to backend pool. The standard.

Option 3 wins for almost everything.

How load balancers route requests

There are several routing algorithms, each with tradeoffs.

Round-robin

Simplest. Cycle through backends one at a time: request 1 → server A, request 2 → server B, request 3 → server C, request 4 → server A again. Works for stateless services where every server is identical.

Least connections

Send each new request to the backend currently handling the fewest open connections. Better when requests have variable durations.

Weighted round-robin / weighted least connections

Same as above, but with weights. Server A is twice as powerful → it gets twice the traffic share.

IP hash / consistent hashing

Hash the client's IP and pick a backend deterministically. Same client always hits the same backend, useful for session affinity. Consistent hashing minimizes disruption when backends are added or removed.

Layer 7 routing

Inspect the request itself — URL path, headers, cookies — and route based on that. /api/* to the API fleet, /static/* to a different cache, /admin/* to the admin servers. This is what powers microservice architectures.

Layer 4 vs Layer 7

Two architectural styles:

Layer 4 (transport-level)

Operates on TCP/UDP packets. Doesn't look inside the packets — just balances connections. Fast, simple, protocol-agnostic. Examples: HAProxy in TCP mode, AWS Network Load Balancer, MetalLB on Kubernetes.

Layer 7 (application-level)

Speaks HTTP. Reads requests, can modify them, can route on path/host/headers. Slower than L4 (because it's doing more work per request) but enormously more flexible. Examples: nginx, HAProxy in HTTP mode, AWS Application Load Balancer, Cloudflare's load balancing, Envoy.

For most web services, L7 is the right choice. For raw TCP services (databases, message queues), L4.

What load balancers do besides routing

Modern load balancers do far more than pick a backend.

Health checking

The LB periodically probes each backend ("is /healthz returning 200?"). Backends that fail get removed from the pool. When they recover, they're re-added. Users never see traffic go to a sick server.

TLS termination

The LB handles HTTPS — terminates TLS, forwards plain HTTP to the backend. Centralizes cert management; backends don't deal with TLS at all (or get re-encrypted via mutual TLS for defense in depth).

Caching

Some LBs (or LB-CDN hybrids) cache responses, especially for static content. Reduces backend load, speeds up clients.

Connection draining

When you remove a backend (deployment, scale-down), the LB stops sending it new requests but lets existing connections finish. Users in mid-request don't get cut off.

Sticky sessions

Some apps need the same user to stick to the same backend (in-memory session state, in-memory cache). The LB can pin via cookie or hash.

Rate limiting and DDoS protection

Block clients that exceed thresholds. Modern LBs increasingly include WAF capabilities.

Where load balancers fit in the stack

A typical modern web stack:

[User] → [DNS] → [CDN edge] → [Origin LB] → [Origin servers] → [Database]

Each layer can include load balancing of its own:

DNS-level load balancing distributes globally across regions (often via anycast — see our anycast post for the details).
CDN edge load-balances across edge servers within a data center.
Origin LB balances across application servers in your fleet.
Service mesh (in microservice architectures) balances internal RPCs.
Database proxy (PgBouncer, ProxySQL) balances connections.

Big sites have load balancers at every level.

Common load balancer types

Type	Examples	Best for
Hardware appliances	F5 Big-IP, Citrix NetScaler	Large enterprises with budget
Cloud-managed	AWS ALB/NLB, GCP Load Balancing, Azure Front Door	Cloud-native deployments
Software / open source	nginx, HAProxy, Caddy, Traefik	Self-hosted, custom configs
CDN-integrated	Cloudflare LB, Fastly LB	Multi-region with anycast routing
Service mesh	Envoy (in Istio/Linkerd), Consul Connect	Microservices internal traffic

For a small site: a single nginx or Caddy in front of two backends is enough. For a large site: cloud-managed LBs handle the operational complexity.

Common failure modes

Single point of failure

The LB itself can fail. Solution: run multiple LBs in active-active or active-standby. Failover usually via floating IP, ECMP, or DNS-level health checks.

Hot spots

If a load balancer hashes by client IP and a CGNAT pool sends millions of clients from one shared IP, that one IP can land all on the same backend. Result: imbalanced load. Solution: smarter hashing (path + IP, etc.) or connection-count-based balancing.

Health-check flapping

If the health check is too aggressive, a brief slow response marks a backend down, causing it to be removed and re-added repeatedly. Solution: thresholds and hysteresis (require N consecutive failures before marking unhealthy, M consecutive successes before marking healthy).

Sticky session traps

If users stick to backends and you remove a backend, those users' sessions break. Solution: prefer stateless backends (sessions in shared store like Redis), or accept that scaling-down requires brief user disruption.

Quick FAQ

Does Cloudflare load-balance my site automatically? Within their network, yes — they pick which edge server handles your request. Across your origin pool, only if you configure their Load Balancing product.

What's the difference between a CDN and a load balancer? A CDN is a globally-distributed cache and edge platform. A load balancer is one piece of that platform (or a separate component). All major CDNs include load balancing; not all load balancers are CDNs.

Do I need a load balancer for a small site? If you have one server, no — there's nothing to balance. If you have two, yes — even a simple HAProxy is worth it for the health-check and zero-downtime deployment benefits.

What about Kubernetes? Kubernetes Services and Ingress controllers are essentially specialized load balancers. The same algorithms and failure modes apply.

Can a load balancer be a security risk? The LB centralizes traffic, so a compromised LB is bad. But it's also a great place to enforce security: TLS, WAF, rate limiting. The benefits usually outweigh the centralization risk.

TL;DR

A load balancer distributes incoming requests across multiple backends.
Modern LBs also do health checking, TLS termination, caching, and DDoS protection.
L4 (transport) is fast and protocol-agnostic; L7 (HTTP) is more flexible.
They're the unsung hero of every site that survives traffic.

If a website seems to never go down, there's a load balancer (probably several) doing the work.