When you perform a transaction from your credit card, the bank sends a notification to your mobile device or your email.
The notification contains details about what has changed.
Was there a debit from your account? How much was debited? What’s the current balance?
The bank doesn’t ship you the entire transaction history, leaving it for you to figure out what’s new.
This is the essence of Change Data Capture or CDC.
Whenever something interesting happens in your database such as insert, update or delete, it’s appended to a transaction log. CDC is a process of pulling out those change events from the transaction log and sending them to different consumers.
Here’s what the process looks like:
In the above diagram, the CDC process is taken care of by Debezium. If you don’t know about it, Debezium is an open-source distributed platform for Change Data Capture.
With CDC, you can deliver the captured changes to multiple consumers such as another database, data warehouse and a cache.
There are 5 main steps to get started with a CDC setup:
Create an initial snapshot and load it into the target system.
Enable CDC on the source database
Identify the tables to be replicated
Start the CDC replication process
Monitor
There are multiple approaches to implementing CDC.
Log-Based: Based on the transaction log
Trigger-Based: Based on specific events occurring in the database
Timestamp-Based: Based on a special timestamp column
Out of the three, the log-based approach is the most popular one and that’s what I prefer using.
Before we go further, a warm welcome to the 298 new subscribers who have joined us since last week.
If you aren’t subscribed yet, join 6900+ curious Software Developers looking to expand their system design knowledge by subscribing to this newsletter.
Now, Change Data Capture is pretty cool. You can use it for database replication and various data integration requirements.
But there are also a couple of interesting applications of CDC around microservices architecture, which can help you in your projects.
Let’s look at them.
Transactional Outbox Pattern with CDC
When I asked multiple developers about the challenges they faced with microservices, the majority mentioned data consistency issues.
But what was the cause of these issues?
The number one culprit turned out to be the problem of Dual Writes.
Dual Writes happen when you have to update two different systems for a requirement to be fulfilled.
For example,
Updating a database
Publishing an event to another system (say Kafka)
The second step could also be something else like sending an email to a customer or generating a notification for the transaction.
See the below diagram:
Since the two systems aren’t linked, we can’t update both in a transactional manner.
For example, if the database update is successful but a failure occurs after that, the event will never be published, resulting in inconsistent data in some downstream systems.
At this point, you might suggest using Two-Phase Commit. And that’s a nice approach but it has its share of complexities and is not supported by many database systems.
So - what’s the solution?
The Transactional Outbox Pattern.
In this pattern, we push all the transactional logic into the database. Whenever there is an update in the database, we also update an outbox table as part of the same transaction.
Think of the outbox table as a mailbox. As the database updates take place, the outbox table is filled with letters that have to be delivered to a post office.
From an application point of view,
Letters = Events
Post Office = Kafka
All we need now is the postman, who can carry those letters from the mailbox (outbox) to the post office (Kafka).
This could be an async process and you’ve multiple options on how to implement it:
A separate thread with the original microservice
A separate application
CDC process with Debezium monitoring the outbox table for changes
The third option is the most scalable and pushes the responsibility to the infrastructure rather than writing custom logic.
Here’s what it looks like:
Is the Outbox pattern perfect?
Nothing is perfect, as such.
With this pattern, you can have duplicate messages in case there are failures. This is to ensure that we have an at-least-once delivery guarantee but you need to make sure the downstream systems can de-duplicate the messages.
But that’s easier than dealing with dual writes.
Let’s move on to the next useful application of CDC.
CDC-Based Strangler Fig Pattern
I strongly believe that you should think 3.5 times before moving to a service-oriented architecture.
“Why the specific 3.5 times?” you may ask.
Because you typically go through the three stages of denial, grief and acceptance until finally doing it.
Now, I’m not against SOA as such. But SOA generally increases the complexity before it provides the benefits and unless those benefits don’t outweigh the cost of implementing SOA, it can leave a bad taste in your mouth.
However, once you’ve made the decision to move to SOA, there are few ways that are as safe as the Strangler Fig pattern.
But how do you go about it?
For one of the projects, we implemented the Strangler Fig pattern using Change Data Capture (CDC) and Kafka to smoothen the migration process.
See the below diagram that shows the migration process for a specific functionality from the monolith.
Here’s how the whole thing worked for us:
Our monolithic application supported various features and stored all data in a MySQL database.
We extracted Feature A into a separate service (Service A) with a new MongoDB database.
Next, we built a CDC workflow with Debezium to move data from MySQL to MongoDB via Kafka. MongoDB was part of a specific requirement for the new service but it could be any other database as well.
We placed a proxy (Nginx) to route read requests to Service A. All other requests went to the monolith including write requests to Feature A.
Once satisfied with the read results from Service A, we migrated the writes as well
In the end, feature A was no longer supported by the monolith and all requests went to Service A.
A couple of points to note about the process:
👉 Why Kafka?
Kafka provided some nice benefits such as:
Keeping the monolith and new service decoupled
Ordering guarantees for the messages
Great support with Debezium
👉 Why start with reads?
This was a critical system and we didn’t want to mess up the writes.
Moving the reads gave the team experience with the new architecture. It also allowed us to run comparisons between the monolith responses and the new service.
This is part of the safety-first approach of the Strangler Fig pattern.
As you can see, CDC is a versatile tool that can help you in multiple scenarios. In case you’ve used it in the past or you can think of an application where this is useful, share your thoughts in the comments section.
Roundup
Here are five interesting posts I read this week:
5 mistakes that made my documents terrible by
: As a software developer, you’ve to write technical documents all the time. A great post on what mistakes to avoid.Mastering Sales and Influence in Tech Careers by
: A super important post on how developers need to learn the art of selling whether in full-time employment or freelancing- : I never enjoyed dealing with matrix multiplication problems. But this great post by Fernando explains 3 different methods of doing so.
5 Deployment strategies that make your life easier by
: A great compilation of the most useful deployment strategies7 questions to answer before adopting Kubernetes by
: Adopting Kubernetes is a big decision. Don’t do it without asking these questions.
That’s it for today! ☀️
Enjoyed this issue of the newsletter?
Share with your friends and colleagues.
See you later with another value-packed edition — Saurabh.
This is similar to how we extracted some features into microservices a couple of years ago, but I didn't know about the benefits of Debezium or CDC. I vaguely remember working with a transaction log in a company that developed a loyalty card system, but we didn't use such solutions to calculate the transaction history. I remember we were super careful about the transaction isolation mode, though.
Thanks for mentioning my article. I'm glad you liked it! 🤝
Transactional Outbox Pattern is an impressive solution, awesome explanation, thanks for sharing!