Search⌘ K
AI Features

BCNF and Beyond

Explore how to enhance database schema design beyond third normal form by mastering Boyce-Codd Normal Form (BCNF) and understanding Fourth and Fifth Normal Forms. Learn to identify and resolve BCNF violations through decomposition, and grasp how higher normal forms prevent complex data anomalies, ensuring robust and efficient database structures.

Until now, we have mastered the first three normal forms, which help us design clean and efficient databases.

But what happens when even 3NF isn’t quite enough? Imagine a university database that tracks which professor teaches which course for each student. If we store this in a single table, we might encounter a tricky situation: if a professor stops teaching a course, deleting that relationship could inadvertently remove the only record of the course itself. 

This is where Boyce-Codd Normal Form (BCNF) comes into play. As databases scale, ensuring high-quality design through advanced database normalization becomes crucial for performance, integrity, and maintainability.

In this lesson, we’ll venture beyond 3NF to explore these higher levels of database purity. By the end, we will be able to:

  • Define Boyce-Codd Normal Form (BCNF) and understand why it’s a stronger version of 3NF.

  • Identify and correct violations of BCNF through decomposition.

  • Explain the purpose of Fourth Normal Form (4NF) and Fifth Normal Form (5NF).

  • Recognize the types of data anomalies that these advanced forms prevent.

Let’s dive in and learn how to make our database designs even more robust!

Boyce-Codd Normal Form (BCNF)

We’ve seen how 3NF helps us eliminate transitive dependencies, but sometimes, certain anomalies can still slip through, especially in tables with multiple overlapping candidate keys. Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF designed to handle these rare but problematic cases. It ensures that our database schema is as logical and free from redundancy as possible in a database system.

The rule here is that a table is in BCNF if, for every non-trivial functional dependency X→Y, X must be a superkey.

Let’s break that down:

  • A functional dependency (X→Y) means that the value of attribute(s) X determines the value of attribute(s) Y.

  • A non-trivial dependency is one where Y is not a part of X.

  • A superkey is a set of one or more attributes that can uniquely identify a row in a table.

The key difference from 3NF is that BCNF removes the exception that allowed the right side of the dependency (Y) to be part of a candidate key. In BCNF, the determinant (X) on the left side must always be a superkey. No exceptions. This simple, strict rule closes the loopholes left by 3NF.

BCNF vs. 3NF (a practical example)

Most tables that are in 3NF are also in BCNF. The violation typically occurs in a very specific scenario: when a table has multiple candidate keys, those keys are composite (made of multiple attributes), and they overlap.

Since our OnlineStore database is already well-designed, let’s use a classic academic example to illustrate the problem. Imagine a table Student_Instructor_Course that stores student enrollments with their assigned instructors.

StudentID

CourseName

InstructorName

101

‘Database Systems’

‘Dr. Elmasri’

101

‘Algorithms’

‘Dr. Cormen’

102

‘Database Systems’

‘Dr. Elmasriv

103

‘Algorithms’

‘Dr. Cormen’

103

‘Operating Systems’

‘Dr. Tanenbaum’

Let’s assume the following business rules:

  1. A student can take multiple courses.

  2. Each instructor teaches only one course.

  3. A course can be taught by multiple instructors (though not shown in this small sample).

Based on these rules, we can identify the functional dependencies:

  • A specific student enrolled in a specific course is taught by only one instructor.

  • InstructorName → CourseName (An instructor teaches only one course.)

The candidate keys for this table are:

  • {StudentID, CourseName}

  • {StudentID, InstructorName}

Now, let’s check the normal forms:

  • Is it in 3NF? Yes. The dependency InstructorName → CourseName doesn’t violate 3NF because CourseName is part of a candidate key ({StudentID, CourseName}).

  • Is it in BCNF? No. The dependency InstructorName → CourseName violates BCNF. The determinant, InstructorName, is not a superkey. It cannot uniquely identify a row on its own.

This violation leads to anomalies. For instance, if we want to add a new instructor who hasn’t been assigned to any students yet, we can’t, because we’d have a NULL StudentID.

To fix this, we decompose the table into two, ensuring that in each new table, every determinant is a superkey:

  1. Instructor_Course (InstructorName, CourseName)

  2. Student_Instructor (StudentID, InstructorName)

This new design is in BCNF, resolves the anomalies, and maintains data integrity.

Beyond BCNF: 4NF and 5NF

For the vast majority of database designs, achieving BCNF is the ultimate goal. However, there are even higher normal forms that address more obscure types of redundancy. We’ll cover them briefly.

Fourth Normal Form (4NF)

Fourth Normal Form (4NF) addresses a problem known as multivalued dependency. This occurs when the presence of one row in a table implies the presence of other rows in the same table. Specifically, it happens when a table represents two or more independent many-to-many relationships.

A table is in 4NF if it is in BCNF and has no non-trivial multivalued dependencies.

Consider a table Product_Supplier_Shipping that tracks which suppliers provide a product and which shipping methods are available for that product.

ProductID

SupplierID

ShippingMethod

1

10

‘Standard’

1

10

‘Express’

1

20

‘Standard’

1

20

‘Express’

Here, the SupplierID and ShippingMethod are independent. Product 1 is available from suppliers 10 and 20, and it can be shipped via ‘Standard’ or ‘Express’. To represent this, we must create a row for every possible combination, which results in redundancy.

This table has two multivalued dependencies:

A product determines a set of suppliers.

A product determines a set of shipping methods.

To bring this to 4NF, we decompose it into two tables, isolating each independent many-to-many relationship:

  1. Product_Supplier (ProductID, SupplierID)

  2. Product_Shipping (ProductID, ShippingMethod)

Our OnlineStore database already does this correctly with the Product_Suppliers table, which separates the relationship between products and suppliers into its own table, thus adhering to 4NF.

Fifth Normal Form (5NF)

Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), addresses a very rare anomaly related to join dependencies.

It comes into play when a table can be losslessly decomposed into three or more tables, but not into just two. This situation is uncommon in practical database design. A table is in 5NF if it is in 4NF and every join dependency is implied by the candidate keys.

Unless we are working with extremely complex, interlocking relationships in your data, you are unlikely to encounter a 5NF violation in the wild. For our purposes, understanding that it exists to handle these complex, cyclical dependencies is sufficient.

A summary of the different types of normalization
A summary of the different types of normalization

Quiz

Test your knowledge with these advanced normalization concepts!

1.

What is the primary rule for a table to be in BCNF?

A.

It must be in 2NF, and all determinants must be candidate keys.

B.

For every non-trivial functional dependency X→Y, X must be a superkey.

C.

It must not have any transitive dependencies.

D.

It must not have any composite keys.


1 / 5

Congratulations on reaching the highest levels of database normalization!

We’ve learned that BCNF provides a stricter, more robust alternative to 3NF by requiring every determinant to be a superkey. We also took a look at 4NF and 5NF, which address rare but complex issues such as multivalued and join dependencies.

While 3NF or BCNF is often the practical stopping point for most designs, knowing what lies beyond helps us appreciate the theoretical completeness of the relational model.

We are developing an exceptional toolkit for designing databases that are not only functional but also efficient, scalable, and logically sound. Keep up the fantastic work, and get ready to apply these skills as we continue our journey!