What happens when you make an SQL query to the database?
Let’s say this is the query:
SELECT name, age FROM users WHERE city = 'New York';
Here’s a diagram that shows the process visually:
Let’s go through each step in more detail:
STEP 1
When you hit run, the query string first reaches the Transport Subsystem of the database.
As the name suggests, this subsystem manages the connection with the client.
It also performs initial authentication and authorization checks, and if satisfied, lets the query go to the next step.
STEP 2
The query now reaches the Query Processor, which has two main parts:
Query Parser
Query Optimizer
The Query Parser breaks down the SQL query into its parts (SELECT, FROM, WHERE, etc). It checks for any syntax errors and creates a parse tree.
Then, the Query Optimizer goes through the parse tree, checks for semantic errors (e.g. - if the ‘users’ table exists or not), and determines the most efficient way to execute the query.
As output, the optimizer produces an execution plan.
STEP 3
Next, the execution plan makes its way to the Execution Engine.
The plan is made up of all the steps that are needed to execute the query.
The Execution Engine takes this plan and coordinates the execution of each step by calling the Storage Engine. It also collects the results from each step and returns a unified response to the upper layer.
STEP 4
The Execution Engine sends low-level read/write requests to the Storage Engine depending on the execution plan.
All of this is facilitated by the various employees of the Storage Engine.
For example,
The transaction manager ensures that the query runs within a transaction for consistency.
The lock manager acquires necessary locks on the “users” table to prevent conflicts.
The buffer manager checks if the required data pages are already in memory. If not, it requests the data to be loaded from the disk into memory.
The recovery manager logs the operations for rollback or recovery.
👉 So - what else would you add to this journey of an SQL query?
Shoutout
Here are some interesting articles I’ve read recently:
The fallacies of distributed systems by
Horizontal Scaling Starts with Load Balancers – But Doesn’t Stop There by
The Complete Guide to Starting TDD by
The styling dilemma in React by
That’s it for today! ☀️
Enjoyed this issue of the newsletter?
Share with your friends and colleagues.
See you later with another edition — Saurabh
Simply put, Saurabh.
I'd be curious to know how query caching fits into this picture – could be another interesting layer to explore!
Thanks for the shoutout.
That's a great breakdown of what is going on behind the scenes. And thank you so much for the shoutout!