SDC#20 - Load Balancers

Types, Algorithms, High Availability Load Balancers, and more...

Dec 19, 2023

Hello, this is Saurabh…👋

Welcome to the 280 new subscribers who have joined us since last week.

If you aren’t subscribed yet, join 2700+ curious Software Engineers looking to expand their system design knowledge by subscribing to this newsletter.

In this edition, I cover the following topics:

🖥 System Design Concept → Load Balancers

🍔 Food For Thought → Saying No in an Interview

So, let’s dive in.

What is a Load Balancer?

When you check-in to a hotel, you are usually greeted by a receptionist.

The receptionist takes care of checking your documents, making some data entries in their system and directing you to a specific room.

Load balancers do a similar job within a system.

It’s a type of hardware or software tool that distributes incoming network traffic across multiple servers.

You can play around with the diagram on Eraser.io

But why do we really need load balancing?

There are two main reasons:

1 - Workload Distribution

High-traffic application serving hundreds of thousands of concurrent requests from users cannot rely on a single machine.

They usually go for horizontal scaling.

Horizontal Scaling (also known as scaling out) is a technique to increase the capacity of a service by adding more machines to do the job.

More capacity means the ability to handle a greater workload

However, adding machines is just one part of the equation.

You also need to make sure that all the machines share the load appropriately and no single machine gets overwhelmed.

The answer is to use a load balancer.

2 - Redundancy

For a professional system, availability is an important metric.

High-availability just takes the concept of availability to an even higher level.

In typical terms, a server must have 99.999% uptime to be considered highly available.

But, how do we achieve these crazy numbers?

By removing any single point of failure (SPOF) within the infrastructure or software layer.

For example, if you have a backend service handling requests, you run multiple instances of the service to share the workload.

But if one instance goes down for whatever reason, the service as a whole continues running because the load balancer will make sure that the requests go to the healthy instances.

Types of Load Balancers

“Will you go for Layer 4 or Layer 7 load balancer?”

This was the question that took me down a rabbit hole.

It came up many years ago during a design discussion.

Someone had just drawn a box named “LB” on the whiteboard and another team-mate asked this question.

“Layer 4 or Layer 7?”

As I went through the details, thing became clearer:

1 - Layer 4 Load Balancer

As the name suggests, Layer 4 load balancers operate in the transport layer of the OSI model.

Yes, the famous OSI model from Computer Networks!

What does it mean?

It means that the load balancer makes routing decisions solely on the basis of information available at Layer 4.

Information such as IP address or port.

In other words, a Layer 4 load balancer cannot see the message in the data packet.

Here’s what it looks like:

Advantages of Layer 4 load balancers:

Simpler to run and maintain
Better performance since no data lookup
More secure since there’s no need to decrypt the TLS data
Only one TCP connection

Disadvantages:

No smart load balancing
No routing to different service types
No caching since the LB can’t even see the data or the message.

Examples of Layer 4 load balancers are HAProxy, AWS Network Load Balancer, Azure Load Balancer and so on.

2 - Layer 7 Load Balancer

Layer 7 load balancers operate at layer 7 of the OSI model.

That was predictable, I suppose.

Basically, this means they deal with layer 7 protocols such as HTTP(S), WebSocket, FTP, and SMTP.

But what does that change?

The Layer 7 load balancers can see the actual data within the packet and make routing decisions based on that data.

See the below illustration:

Here’s how Layer 7 load balancers work:

Let’s say you have two microservices - one dedicated to the blog posts (/blog) and the other dedicated to the comments (/comments).
The client makes a request to the load balancer by establishing a TCP connection, saying that it wants to access the /blog route.
The load balancer decrypts the data and checks the configured rules.
Based on the request destination, the load balancer establishes a new TCP connection to the server instances hosting the Blog microservice.
Once the response comes back, it sends over the response to the client.

Advantages:

Smarter load balancing (big benefit)
LB can also cache data
Can play the role of a reverse proxy

Disadvantages:

Expensive to run and maintain
Needs to decrypt the data
Maintains 2 TCP connections - one from the client to the load balancer and another from the load balancer to the service

Examples of Layer 7 load balancers include HAProxy, Nginx, AWS Application Load Balancer and Azure Application Gateway.

Load Balancing Algorithms

By now, it should be quite clear that load balancers are great at distributing a bunch of requests among multiple servers.

But how do load balancers determine which request should go to which particular server?

It all depends on the load balancing algorithm being used.

Broadly, you can divide these algorithms into two categories:

Static
Dynamic

Static Load Balancing Algorithms

The static algorithms distribute traffic among servers based on pre-determined rules or fixed configuration.

To put things simply, these algorithms don’t adapt to changing server workloads.

A few popular static load balancing algorithms are as follows:

1 - Round Robin

Requests are distributed sequentially across a group of servers.

The main assumption is that the service is stateless because there’s no guarantee that subsequent requests from the same user will reach the same instance.

2 - Sticky Round Robin

This is a better alternative to round-robin since subsequent requests from the same user go to the same instance.

Depending on the use case, this can be a desirable quality for load balancing.

3 - Weighted Round Robin

In this algorithm, each server instance gets a specific weight value.

This value determines the proportion of traffic that will be directed to the particular server.

Servers with higher weights receive a larger share of the traffic while servers with lower weights receive a smaller share.

For example, if server instance A has a weight of 0.75 and instance B has a weight of 0.25, server A will receive thrice as much traffic as instance B.

That’s a super-useful approach when different servers have different capacity levels and you want to assign traffic based on capacity.

4 - Hash

The hash algorithm distributes requests based on the hash of a particular key value.

Here, the key can be something like the combination of source and destination IP addresses.

Here’s what it looks like depending on the hash function:

See the below animated-version to understand how the static load balancing algorithms work:

You can play around with the base diagram for the illustration on Eraser.io

Dynamic Load Balancing Algorithms

Dynamic load balancing depends on some property that keeps on changing to make routing decisions.

Let’s look at a couple of such algorithms.

1 - Least Connections

In this algorithm, a new request is sent to the server instance with the least number of connections.

Of course, the number of connections is determined based on the relative computing capacity of a particular server.

So, server instances with more resources can support more connections as compared to instances with low resources.

Check out the below illustration:

You can play around with the image on Eraser.io

2 - Least Response Time

In this case, the load balancer assigns incoming requests to the server with the lowest response time in order to minimize the overall response time of the system.

This is great for cases where response time is critical and you want to ensure that the request goes to an instance that can provide a quick response.

Here’s how it looks like:

See the below animated version to understand how the dynamic load balancing algorithms work:

You can play around with the base diagram for the illustration on Eraser.io

What if the Load Balancer goes down?

This is a common sentiment I’ve seen come up during design discussions.

Generally, the question is hand-waived by saying that the cloud provider will take care of it.

And that’s mostly true.

Modern cloud systems have reached a point where developers need not concern themselves with the ins and outs of maintaining infrastructure pieces like the load balancer.

These infra pieces are treated as services and it’s the job of the service provider to make sure that things are working fine.

But it can be a good exercise to consider what a high availability load balancer setup looks like.

Here’s a diagram that attempts to show the big picture:

Ultimately, when you talk about a high-availability load balancing setup, what you mean is that the load balancer should not become a single point of failure (SPOF).

And how do we remove a single point of failure?

Of course, by investing in redundancy.

In the above example, we have multiple load balancers (one active and one or more passive) behind a static IP address. This static IP address can be remapped from one server to another.

When a user accesses your website, the request goes through the floating IP address to the active load balancer.

If that load balancer fails for some reason, the failover mechanism will detect it and automatically reassign the IP address to one of the passive servers that will take over the load balancing duties.

Note that here we are only talking about HA setup for load balancer and not the other parts such as databases and application servers.

Is Load Balancing worth it?

Finally, having discussed so much about load balancing, it’s time to answer the money question.

“Is Load Balancing really worth it?”

For me, it’s a definite YES if you’re building a serious system where managing availability and performance is important.

Few important reasons for the same:

You can’t have seamless horizontal scalability without a load balancer distributing the traffic between multiple instances.
You can’t support high availability in the absence of a load balancer to make sure the request is routed to the best possible instance.
Load balancers also prevent a single server instance from getting overwhelmed, thereby improving the performance.

🍔 Food For Thought

👉 Saying “I don’t know” in an interview

During interviews, we usually try to demonstrate how much we know about a given topic.

After all, the idea is that the more knowledge we can demonstrate, the better will be our chances.

But sometimes, saying “No, I don’t know about this” can literally keep your chances of selection in an interview alive.

I wrote about this on X (Twitter) and there were some interesting responses.

Here’s the link to the post:

https://x.com/ProgressiveCod2/status/1732721314266591667?s=20

That’s it for today! ☀️

Enjoyed this issue of the newsletter?

Share with your friends and colleagues

See you later with another value-packed edition — Saurabh

System Design Codex

Discussion about this post