Normalization is the process of reorganizing data in a database so that it meets two basic requirements:
There is no redundancy of data (all data is stored in only one place).
Data dependencies are logical (all related data items are stored together).
Consider an online retailer that has a database of customer purchases. The database may consist of:
The Purchases table could have details such as customer name, address, and ID number. However, this would be duplicating data and would require extra effort to maintain these tables. Entries stored in the Customer table would have to be replicated in the Purchases table. Also, if a change to some of the customer information was required, it would need to be maintained across multiple tables.
While this may not sound too bad in this small example, it can grow tedious and troublesome for larger-scale databases.
As such, normalization is the process of reducing as much duplication as possible. In this instance, we can take out a customer’s name and address from the Purchases table and only reference the customer’s unique ID number. This way, when we want to query for a customer’s purchases, we can JOIN the tables together using the customer ID as a reference between the two tables.
Normalization can be desirable because it reduces the required disk space since the need for holding duplicated data is eliminated. However, for complex queries, joining many tables together can be expensive in terms of the time it takes to execute a query.
If you enjoyed reading this post, consider reading some of my other definition posts:
Also, feel free to check out my social media accounts:
View all Courses