Trusted answers to developer questions

Definition: Normalization

Get Started With Data Science

Learn the fundamentals of Data Science with this free course. Future-proof your career by adding Data Science skills to your toolkit — or prepare to land a job in AI, Machine Learning, or Data Analysis.

Normalization is the process of reorganizing data in a database so that it meets two basic requirements:

  1. There is no redundancy of data (all data is stored in only one place).

  2. Data dependencies are logical (all related data items are stored together).

What does this mean?

Consider an online retailer that has a database of customer purchases. The database may consist of:

  • a Customer table containing details that include customer names, addresses, account ID numbers, etc.
  • a Purchases table containing item details, order numbers, and customer details (as per the details in the Customer table)

The Purchases table could have details such as customer name, address, and ID number. However, this would be duplicating data and would require extra effort to maintain these tables. Entries stored in the Customer table would have to be replicated in the Purchases table. Also, if a change to some of the customer information was required, it would need to be maintained across multiple tables.

While this may not sound too bad in this small example, it can grow tedious and troublesome for larger-scale databases.

As such, normalization is the process of reducing as much duplication as possible. In this instance, we can take out a customer’s name and address from the Purchases table and only reference the customer’s unique ID number. This way, when we want to query for a customer’s purchases, we can JOIN the tables together using the customer ID as a reference between the two tables.

Normalization can be desirable because it reduces the required disk space since the need for holding duplicated data is eliminated. However, for complex queries, joining many tables together can be expensive in terms of the time it takes to execute a query.

Further reading

https://www.techopedia.com/definition/1221/normalization

https://www.studytonight.com/dbms/database-normalization.php

http://agiledata.org/essays/dataNormalization.html

If you enjoyed reading this post, consider reading some of my other definition posts:

Propagate

Tautology

Idempotent

Also, feel free to check out my social media accounts:

Twitter: @sdrobertsCode
LinkedIn
GitHub

RELATED TAGS

computing
jargon
normalization
data
Did you find this helpful?