Search⌘ K
AI Features

Normal Forms

Explore the principles of database normalization to organize data effectively. Learn to identify and correct violations of First, Second, and Third Normal Forms, reducing redundancy and preventing update anomalies. This lesson helps you transform disorganized tables into well-structured schemas, ensuring efficient, reliable, and maintainable databases.

Imagine we’re building an OnlineStore database from scratch and decide to place all order information into a single, massive table.

This table includes columns for the order ID, order date, customer name, customer address, product ID, product name, quantity, and product price. At first glance, this setup appears straightforward. But consider what happens if a customer changes their address—we’d have to locate and update every order record for that customer.

Or what if we want to add a new product that hasn’t been ordered yet?

We can’t, because there’s no existing order to attach it to. This messy situation is exactly what normalization in a database helps us avoid. Database normalization is the process of organizing data to reduce redundancy and maintain data integrity, making our database more efficient, reliable, and scalable.

By the end of this lesson, we will be able to:

  • Define normalization and explain its importance in database design.

  • Understand and identify violations of the First Normal Form (1NF).

  • Understand and identify violations of the Second Normal Form (2NF).

  • Understand and identify violations of the Third Normal Form (3NF).

  • Apply normalization rules to transform messy tables into a well-structured database schema.

Normalization in databases

Normalization is a systematic approach to decomposing tables to eliminate data redundancy in DBMS (repetition) and undesirable characteristics known as insertion, update, and deletion anomalies.

Think of it as a set of rules or guidelines for organizing your data. The goal is to ensure that data is stored logically, with each piece of information saved in only one place. This process involves dividing a large, problematic table into smaller, well-structured tables and defining relationships between them.

These database normalization rules are categorized into different levels, known as normal forms. We’ll be focusing on the first three, which are the most essential for practical database design.

Each of these levels represents a database normal form, which defines specific rules for structuring data.

First Normal Form (1NF)

The first step in organizing our data is to achieve First Normal Form. It’s the most basic rule and sets the foundation for all other normal forms.

1NF tackles the problem of storing multiple values in a single cell. If a column contains a list of items, it becomes incredibly difficult to query, sort, or update individual items within that list. By ensuring each cell holds a single value, we make our data clear and manageable.

A table is in 1NF if it meets two conditions:

  1. Atomicity: Each column of a table must hold a single, indivisible (or atomic) value. You can’t have a list or a set of values in one cell.

  2. Uniqueness: Each record in the table must be unique, which is typically achieved by having a primary key.

Let’s imagine we had an initial, unnormalized version of an orders table that looked like this:

OrderID

CustomerName

ProductsPurchased

101

John Doe

'Laptop, Wireless Earbuds'

102

Jane Smith

'Novel A, T-Shirt, Blender'

This table violates 1NF because the ProductsPurchased column contains multiple values. How would we find all orders that included a ‘T-Shirt’? It would require complex string searching.

To bring this table into 1NF, we need to break it down so that each product is in its own row. This leads to creating a separate table for the items within an order, which is exactly what our OnlineStore database does with the Orders and Order_Details tables.

  • Orders table: Contains information about the order itself (who placed it and when).

  • Order_Details table: Contains information about the products included in each order.

This structure ensures that each cell contains only one value, thereby satisfying 1NF.

A visual depiction of the first normal form
A visual depiction of the first normal form

Second Normal Form (2NF)

Once a table is in 1NF, we can move on to the next step: Second Normal Form. 2NF is all about ensuring that every piece of data in a row relates to the entire primary key. This rule specifically applies to tables that have a composite primary key (a primary key made up of two or more columns). The 2NF helps eliminate data redundancy by removing what are called partial dependencies. A partial dependency occurs when a non-key column depends on only one part of the composite primary key, not the whole thing. This can lead to update anomalies. For instance, if a product’s price is stored in the order details, changing the price would require updating it in every single order detail record where that product appears.

A table is in 2NF if:

  1. It is already in 1NF.

  2. It has no partial dependencies. Every non-key column must be fully dependent on the entire composite primary key.

Let’s consider a hypothetical table called OrderItems that is in 1NF but not 2NF. The primary key is a composite of (OrderID, ProductID).

Violation of 2NF in OrderItems

OrderID (PK)

ProductID (PK)

Quantity

ProductName

Price

1

1

1

Laptop

1200.00

1

3

2

Earbuds

99.99

2

1

1

Laptop

1200.00

In this table, Quantity depends on both OrderID and ProductID, which is correct. However, ProductName and Price depend only on ProductID. They have nothing to do with the OrderID. This is a partial dependency. Notice how ‘Laptop’ and its price are repeated for every order that includes it.

To fix this and achieve 2NF, we decompose the table into two separate tables:

  • Order_Details: This table keeps the information that depends on the full composite key.

    • OrderDetailID (PK), OrderID (FK), ProductID (FK), Quantity

  • Products: This table contains information that depends only on ProductID.

    • ProductID (PK), ProductName, Price, etc.

This is the exact structure we have in our OnlineStore database, which is properly designed in 2NF (and beyond!).

A visual depiction of second normal form
A visual depiction of second normal form

Third Normal Form (3NF)

After ensuring our tables are in 2NF, the final step for most database designs is to achieve Third Normal Form. 3NF addresses another type of problematic dependency known as a transitive dependency.

A transitive dependency occurs when a non-key column depends on another non-key column, which in turn depends on the primary key. This creates an indirect relationship that can cause data inconsistencies. For example, if we store the category name directly in the Products table, and we decide to rename a category (e.g., from “Books” to “Literature”), we would have to update this value for every single product in that category. This is inefficient and prone to errors.

A table is in 3NF if:

  1. It is already in 2NF.

  2. It has no transitive dependencies. Every non-key column must depend only on the primary key, not on any other non-key columns.

Let’s imagine a version of our Products table that is in 2NF but violates 3NF. The primary key here is ProductID.

ProductsWithCategory (Violates 3NF)

ProductID (PK)

ProductName

CategoryID

CategoryName

1

Laptop

1

Electronics

2

Smartphone

1

Electronics

4

Novel A

2

Books

Here, CategoryName depends on CategoryID. And CategoryID depends on the primary key, ProductID. This creates the transitive dependency: ProductIDCategoryIDCategoryName.

The solution is to decompose the table to remove this transitive relationship:

  • Products: This table relates the product to its category via an ID.

    • ProductID (PK), ProductName, CategoryID (FK)

  • Categories: This table stores the details of each category.

    • CategoryID (PK), CategoryName

Once again, this is precisely the clean and efficient structure we see in our OnlineStore database, which follows 3NF principles. By breaking down our tables this way, we ensure that each piece of information is stored only once, making our database robust and easy to maintain.

The importance of third normal form
The importance of third normal form

Quiz

Time to test our knowledge! Choose the best answer for each question.

1.

What is the primary goal of database normalization?

A.

To make the database run faster.

B.

To reduce data redundancy and improve data integrity.

C.

To increase the number of tables in a database.

D.

To store all data in a single table.


1 / 6

In this lesson, we explored why relying on a messy, single-table approach to data storage can lead to serious design problems.

Then, we examined the three fundamental normal forms—1NF, 2NF, and 3NF—to understand how each one introduces a new rule that improves database structure. By ensuring atomicity (1NF), eliminating partial dependencies (2NF), and removing transitive dependencies (3NF), we can build databases that are efficient, reliable, and resistant to common data anomalies.

Mastering normalization is a key skill for any great database designer. It may feel like a lot of rules at first, but with practice, normalization becomes a natural and intuitive part of the design process. Keep up the great work—you’re well on your way to exploring even more advanced database concepts!