SDC#9 - Why Replication Lag Occurs in Databases?
How Notion Handles Concurrent Updates and More...
Hello, this is Saurabh…👋
Welcome to the 158 new subscribers who have joined us since last week.
If you aren’t subscribed yet, join 900+ curious developers looking to expand their knowledge by subscribing to this newsletter.
In this edition, I cover the following topics:
🖥 System Design Concept → Why Replication Lag Occurs in Databases?
🧰 Case Study → How Notion Handles Concurrent Updates?
🍔 Food For Thought → Concurrency vs Parallelism
So, let’s dive in.
🖥 Why Replication Lag Occurs in Databases?
Building an eventually consistent system is fun and games until you run into a little problem known as Replication Lag.
But what is Replication Lag and how does it occur?
In a typical Leader-based replication setup, all writes go through a single node. However, read-only queries can be served by any replica.
This is a huge benefit for systems that consist of mostly reads and a small percentage of write operations. Just to let you know, this is a very common pattern on the web.
In order to scale the read operations, you can create multiple followers and distribute the read requests across those followers. This reduces the load on the leader node.
However, this approach works well only in the case of asynchronous replication.
Why is that the case?
Imagine performing synchronous replication where you don’t confirm a write operation to the user until all replicas give a thumbs-up.
Even a single replica going down would make the whole system unavailable for write operations.
However, asynchronous replication has its own troubles.
When your application reads data from an asynchronous replica or follower, there’s a good chance that it reads outdated information if the follower has fallen behind.
Here’s what can actually happen:
User A sends an update (write) request to the Primary or Leader node.
The Leader node sends the replication information to its replicas.
Replica 1 gets updated
User B requests (reads) data from replica 2 and gets outdated information.
Replica 2 gets updated eventually
In normal operations, this delay between a write happening on the leader and being reflected on a follower node is known as the Replication Lag.
The lag may only be a fraction of a second (hardly noticeable). However, if the system is operating at its limit, the lag can also increase to several seconds or minutes.
Of course, this inconsistency is just a temporary state. The followers will eventually catch up with the leader. Hence, this situation is also called eventual consistency.
The trouble starts when the lag becomes too large and the inconsistencies become a real problem for applications.
There are several techniques to deal with this but more on them in the next post.
🧰 How Notion Handles Concurrent Updates?
How do you build an application that lets you and your friend update a page together in real time?
Let’s learn from our favorite productivity tool Notion.
Notion serves millions of users across the globe and a lot of them work in collaborative teams.
It provides a concurrent interface to its users (meaning multiple users can collaborate and update a page at the same time)
Every item that you create in the Notion editor is a Block. We did speak about the incredibly flexible data model of Notion in an earlier post.
What’s interesting, however, is that every Block goes through a 3-phase lifecycle.
Creating a new Block
Saving the Block on the server
Rendering the Block on the friend’s screen.
Here’s what it looks like on a high level:
However, these 3 stages occur in 11 total steps full of interesting insights.
Here’s a super-detailed illustration of the entire process in a step-by-step manner.
Let’s understand what’s going happening in the entire sequence:
👉 Stage 1 - Creating a New Block
The below steps occur in this stage:
Step 1 - The user creates a new Block in the UI
Step 2 - The Block is saved to an in-memory storage or something like IndexedDB.
Step 3 - UI is re-rendered and the block is shown on the user’s screen.
Step 4 - The data is also saved to the TransactionQueue
👉 Stage 2 - Saving the Block on the Server
Step 5 - Data is serialized and posted to the backend API
Step 6 - The API does its thing and stores the data in the main database. This is the source-of-truth database.
Step 7 - The backend API also notifies the MessageStore service about the changes
👉 Stage 3 - Rendering the Block on the Friend’s Screen
Step 8 - A client websocket connection subscribes to the MessageStore service.
Step 9 - As part of the subscription, the MessageStore passes the notification to the Websocket
Step 10 - The client receives the version update notifications
Step 11 - Based on the notification data, the client calls the backend API to fetch the latest records from the database and render the friend’s UI.
P.S. This post is inspired by the explanation provided on the Notion Engineering Blog. However, the diagrams have been made from scratch based on the information shared. You can find the original article over here.
🍔 Food For Thought
👉 Concurrency vs Parallelism
Concurrency and Parallelism are concepts that often get mixed up and end up confusing people.
Not anymore.
Here’s a post I wrote a few days ago on X(Twitter) where I explained the difference between the two terms.
As of this moment, the post has got over 550 likes and over a hundred reposts.
Do check it out 👇
Link to the post below:
https://x.com/ProgressiveCod2/status/1706195858520261010?s=20
👉 The Importance of Command-Query Separation (CQRS)
We all want to create modular systems that can be easily maintainable.
Following the principle of Separation of Concerns is key to realizing this goal.
And CQRS is a pattern that helps us move in the right direction.
Here’s a great post by Helen explaining the CQRS pattern in great detail.
https://x.com/Sunshine_Layer/status/1707299295198605820?s=20
That’s it for today! ☀️
Enjoyed this issue of the newsletter?
Share with your friends and colleagues
See you later with another value-packed edition — Saurabh.