Search⌘ K
AI Features

Normal Forms and Anomalies

Explore the principles of database normalization and normal forms such as 1NF, 2NF, 3NF, BCNF, and more. Understand common database anomalies like update, insertion, and deletion anomalies, and learn how normalization helps maintain data consistency and prevents redundancy.

While some rules (such as the rule of silence) can’t really apply to database modeling, most of them do so in a very direct way. Normal forms offer a practical way to enforce respect for those rules. SQL provides a clean interface to connect our data structures: the join operations.

As we’re going to see later, a database model with fewer tables isn’t a better or simpler data model. The rule of separation might be the most important on that list. Also, the rule of representation in database modeling is reflected directly in the choice of correct data types with advanced behavior and processing function availability.

To summarize all those rules and the different levels for normal forms, we believe that we need to express our intentions first. Anyone reading a database schema should instantly understand the business model.

There are several levels of normalization. These levels define the database normalization rules used to reduce redundancy and prevent anomalies.

Normal forms

In this quick introduction to database normalization, we include the definition of the normal forms:

First Normal Form (1NF)

A table (relation) is in 1NF if:

  • There are no duplicated rows in the table
  • Each cell is single-valued (no repeating groups or arrays)
  • Entries in a column (field) are of the same kind

Second Normal Form (2NF)

A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all of the keys. Since a partial dependency occurs when a non-key attribute is dependent on only a part of the composite key, the definition of 2NF is sometimes phrased as: “A table is in 2NF if it is in 1NF and if it has no partial dependencies.”

Third Normal Form (3NF)

A table is in 3NF if it is in 2NF and if it has no transitive dependencies.

Boyce-Codd Normal Form (BCNF)

A table is in BCNF if it is in 3NF and if every determinant is a candidate key.

Fourth Normal Form (4NF)

A table is in 4NF if it is in BCNF and if it has no multivalued dependencies.

Fifth Normal Form (5NF)

A table is in 5NF, also called Projection-join Normal Form (PJNF), if it is in 4NF and if every join dependency in the table is a consequence of the candidate keys of the table.

Domain-Key Normal Form (DKNF)

A table is in DKNF if every constraint on the table is a logical consequence of the definition of keys and domains.

Best practices

What all of this says is that if we want to be able to process data in our database using the relational model and SQL as our main tooling, then it’s best not to make a total mess of the information and keep it logically structured.

In practice databases, models often reach for BCNF or 4NF; going all the way to the DKNF design is only seen in specific cases.

Database anomalies

Failure to normalize our model may cause database anomalies and lead to redundancy in DBMS.

When an attempt is made to modify (update, insert into, or delete from) a relation, the following undesirable side effects may arise in relations that have not been sufficiently normalized:

  • Update anomaly

    The same information can be expressed on multiple rows; therefore, updates to the relation may result in logical inconsistencies. For example, each record in an “Employees’ Skills” relation might contain an employee ID, employee address, and skill; therefore, a change of address for a particular employee may need to be applied to multiple records (one for each skill). If the update is only partially successful—the employee’s address is updated on some records but not others—then the relation is left in an inconsistent state. Specifically, the relation provides conflicting answers to the question of what this particular employee’s address is. This phenomenon is known as an update anomaly.

  • Insertion anomaly

    There are circumstances in which certain facts cannot be recorded at all. For example, each record in a “Faculty and Their Courses” relation might contain a faculty ID, faculty name, faculty hire date, and course code. Therefore, we can record the details of any faculty member who teaches at least one course, but we cannot record a newly hired faculty member who has not yet been assigned to teach any courses, except by setting the Course Code to null. This phenomenon is known as an insertion anomaly.

  • Deletion anomaly

    Under certain circumstances, the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. The “Faculty and Their Courses” relation described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the last of the records on which that faculty member appears, effectively also deleting the faculty member unless we set the course code to null. This phenomenon is known as a deletion anomaly.