SDC#32 - Intro to Circuit Breaker

Service Discovery and more...

Saurabh Dashora

Mar 12, 2024

In this week’s System Design Codex, we cover two important topics related to a microservices architecture

Circuit Breaker
Service Discovery

Circuit Breaker

This happened a couple of years ago.

It was a fine Monday morning when our mailboxes erupted with angry alerts.

One of the backend services managed by our team was going down.

Upon investigation, we discovered that one of the 3rd party APIs we integrated with had been unsubscribed from existence.

Since our service took the 3rd party response and called other services, we were creating a flood of downstream failures.

The doctor informed me that this condition was called Cascading Failure.

Very helpful!

After a few calls and escalations, the 3rd party service was brought back to life. But the failure kept re-occurring every few days.

That’s when we decided to do something about it.

We implemented the Circuit Breaker pattern.

You can play around with the diagram on Eraser.io

With this pattern, you define a threshold value for the number of failures between two services.

A proxy sits between the two services and monitors the number of failures.

If the number crosses the threshold, the proxy stops letting any more requests go through.

Also, it lets you configure a fallback response.

In our case, the fallback response was easy because the 3rd party API returned a list of offers. In the absence of a response, it was feasible to go with a basic list of offers.

So - how does the Circuit Breaker work?

It has 3 stages:

👉 Closed State

This is the initial state and the services talk to each other without any issues.

The proxy keeps monitoring the number of failures within a defined period.

👉 Open State

If failures go beyond the threshold, the circuit breaker shifts to the Open state.

The communication between services is blocked and fallback response is returned.

👉 Half-Open State

The circuit breaker allows a limited number of requests to go through.

If they are successful, the circuit breaker switches back to the Closed State

How do you implement the Circuit Breaker pattern?

Though we implemented it using Netflix Hystrix for Spring Boot, there are libraries available in all major ecosystems.

Resilience4J for the Java ecosystem
Opposum for Node.js
Pybreaker for Python
Polly for .NET

Have you used the Circuit Breaker pattern?

Service Discovery

You can’t call a person without knowing their phone number.

The same goes for calling a service.

Traditionally, applications used to run on specific and fixed locations. It wasn’t uncommon to hear conversations like this:

“Just call the service on the box with IP address 172.16.0.1”

“Ya, the one I deployed yesterday. It should work fine.”

It’s easy to keep track of things like this if your entire application is deployed on a fixed location.

But you can’t do that when your application is divided into multiple services deployed on cloud infrastructure.

Suddenly, there are no fixed network locations.

The number of service instances can go up and down depending on the auto-scaling configuration in place.

This is where the Service Discovery Pattern comes into the picture.

There are 3 parties involved in the Service Discovery Pattern:

Service Registry - a centralized server that maintains a global list of network locations of all services
Client - the entity that needs to call a service
Service - the actual service registered with the Service Registry

Here’s how the pattern works on a high level:

When a service comes up, it registers its location with the Service Registry.
The client looks up the relevant service locations in the Service Registry.
The Service Registry returns the location of the required service.
The client makes a call to the service.

Of course, this is a simplified view of the process.

But it helps to give an idea.

Of course, it’s not like the Service Discovery Pattern has no issues.

The frequent lookups for service destinations can increase network traffic and lead to higher latency.
Service Registry can become a single point of failure.
If you go with multiple instances of the registry, you need to worry about the consistency of the registration data across all instances
Dealing with failing service instances can be a problem

Having said that, Service Discovery has grown in usage over the years.

Some prominent Service Registry examples are as follows:

Netflix Eureka
Consul
Apache Zookeeper
Kubernetes Service Discovery

In fact, DNS is also a form of service discovery

So - have you used the Service Discovery pattern?

Food for Thought

Socrates was a great developer.

He said: "The secret of change is to focus all of your energy not on fighting the old, but on building the new."

This advice sounds perfect for those failing unit tests.

Shoutout

Here are a few interesting things I read this week:

Stop Planning, Start Doing by
Akos
: A refreshing read that exposes the fallacy of too much planning and obsession with productivity tools.
Top 3 DevOps Certifications by
Amrut Patil
: If you are looking for DevOps certifications, here are some good picks to consider.
Context Switching by
Gregor Ojstersek
: Context Switching is one of the big problems in the modern workplace. This article talks about how to deal with it.

That’s it for today! ☀️

Enjoyed this issue of the newsletter?

Share with your friends and colleagues.

See you later with another value-packed edition — Saurabh.

Ashwani Yadav

Mar 14, 2024

thanks Saurabh for writing this. huge thanks

Expand full comment

1 reply by Saurabh Dashora

Arvind Ramaswamy

Mar 13, 2024

Thanks Saurabh! I have used API Gateway (Ocelot) to map a standard URL API (of the gateway), which the calling systems will use, to the actual APIs residing on diff locations using a json mapping. How exactly is this different from Service Discovery? Are these 2 different or is there a real-world scenario where they can complement each other?

2 replies by Saurabh Dashora and others

7 more comments...

System Design Codex