When designing a data model, one of the crucial decisions to make is whether to normalize or denormalize the data.
Normalization and denormalization are two different approaches to structuring data in a database, each with its benefits and trade-offs.
Normalization
Normalization is a technique that organizes data into multiple related tables to reduce redundancy and ensure data integrity and consistency.
In a normalized data model, each table represents a single entity or concept, and relationships between entities are established through foreign keys.
The diagram below shows an example of normalization.
The key characteristics of normalization are:
Multiple Related Tables: Normalization splits data into multiple tables based on their logical relationships. Each table focuses on a specific entity and contains only the attributes directly related to that entity.
Reduced Redundancy: By organizing data into separate tables, normalization minimizes data redundancy. Instead of repeating data across multiple tables, it stores data only once in its respective table and uses foreign keys to establish relationships. For example, using customer’s id to establish relation between an order and customer.
Data Integrity and Consistency: Normalization enforces data integrity and consistency by ensuring that data is stored in a structured and standardized manner. It prevents data anomalies and inconsistencies that can arise from data duplication.
However, one potential drawback of normalization is that complex queries involving multiple tables and joins can negatively impact performance.
Retrieving data from normalized tables may require extensive join operations, which can be resource-intensive and slow down query execution. For example, generating a report about customer orders may require you to join all the tables in case you want to fetch details like customer name, product categories, order details, and so on.
Denormalization
Denormalization, on the other hand, is an approach that combines data from multiple tables into a single table to optimize read performance.
It intentionally introduces redundancy to simplify queries and improve query efficiency.
The diagram below shows as an example of denormalization.
Key characteristics of denormalization include:
Combined Tables: Denormalization merges data from multiple related tables into a single table. It duplicates data across tables to avoid the need for complex joins during query execution.
Simplified Queries: By combining data into a single table, denormalization simplifies queries. It reduces the need for joins and allows for faster data retrieval, as all the required data is readily available in one place.
Improved Read Performance: Denormalization optimizes read-heavy workloads by providing quick access to frequently queried data. It eliminates the overhead of joining multiple tables, resulting in faster query execution and improved performance.
However, denormalization comes with its own set of trade-offs.
Due to data redundancy, denormalized tables require more storage space. Additionally, data inconsistency can arise if updates are not propagated consistently across all the duplicated data.
Choosing Between the Two
When deciding between normalization and denormalization, several factors should be considered:
Data Consistency and Integrity: If maintaining data consistency and integrity is a top priority, normalization is the preferred approach. Normalized data models ensure that data is stored accurately and consistently, reducing the risk of data anomalies and inconsistencies.
Read-Heavy Workloads: If the workload is predominantly read-heavy and performance is critical, denormalization can be a better choice. Denormalized data models optimize for fast data retrieval, making them suitable for scenarios where quick access to data is essential.
Hybrid Approach: In some cases, a hybrid solution that combines normalization and denormalization can be the sweet spot. Critical data that requires strict consistency and integrity can be normalized, while frequently accessed data that benefits from faster reads can be denormalized.
So - how do you approach normalization vs denormalization in your projects?
Shoutout
Here are some interesting articles I’ve read recently:
Event-Driven Architecture (EDA) is exciting until you deal with event versioning by
Coding with AI: My Personal Reality Check by
10 ways to better organize and design your React Application by
That’s it for today! ☀️
Enjoyed this issue of the newsletter?
Share with your friends and colleagues.
See you later with another edition — Saurabh
Normalization and denormalization are two key concepts in database design, each serving a specific purpose. Great write-up.
I have to admit, it's been almost two decades in web but this is the first I'm getting into a project where we're considering actually duplicating some data.
And our use case perfectly fits what you described as ideal for this - heavy read opreations. We actually have just a single import that creates the graph db structure and everything else is read after that.
Before this I couldn't think of a situation where I would have duplicated data, but looks like I was working with a limited scope of apps. :) Thanks for the mention Saurabh!