ER Modeling Basics

Explore the basics of Entity-Relationship modeling to develop a solid blueprint for database design. Understand how to identify key entities, define their attributes, and assign unique identifiers. This lesson helps you create well-structured databases that simplify querying and maintain data integrity.

We'll cover the following...

Entity-Relationship model
Entities (the nouns of our database)
Attributes (describing the entities)
Identifiers—ensuring every instance is unique
Common pitfalls and how to avoid them

Imagine trying to build a house without a blueprint.

We might end up with a door that leads nowhere or a kitchen without a sink! The same principle applies to databases. We need a solid design before we write any code. In the case of building a new e-commerce system like our OnlineStore database, one of the first challenges is figuring out how to represent the real-world scenario in a structured way.

Here, we need to store information about customers, products, suppliers, and orders - and we want it to be easy to query, update, and maintain.

This is where Entity-Relationship (ER) modeling comes in. It is a core technique used in semantic data modelling to represent real-world concepts in a structured way. It's the architectural blueprint for our data, ensuring everything has a logical place and connects correctly. ER modeling helps us capture the essential elements of the system and their relationships before we write any code or create tables.

By the end of this lesson, we’ll be able to:

Understand what an Entity-Relationship (ER) model is and why it’s a critical first step.
Identify and define entities, which are the main objects in our database.
Describe the different kinds of attributes that give entities their characteristics.
Explain the role of identifiers in making sure every piece of data is unique.

Entity-Relationship model

Before writing any CREATE TABLE statements, it’s important to first understand the data you’re working with.

Jumping straight into coding without a clear plan can lead to poorly structured data, making your database difficult to use, maintain, and scale. A solid plan not only saves time but also prevents future issues. One of the most effective ways to create a plan is by using an Entity-Relationship (ER) model.

This is a high-level, visual blueprint of your database, showing your data as entities (things) and relationships (connections) between them. This approach is part of semantic data modelling, where the focus is on capturing the meaning and relationships of data.

Think of the ER model as a conceptual map. It helps everyone involved—developers, business analysts, and clients—reach a shared understanding of what the database will store and how it will be organized. The diagrams used in this model are simple and intuitive, making them easy for both technical and non-technical stakeholders to interpret.

In this section, we’ll focus on the most essential components of the ER model to build a strong foundation for designing your database.

Entities (the nouns of our database)

The first step in ER modeling is to identify the core entities, or objects, about which we want to store information. These are the main subjects of our database as they form the foundation around which everything else is built. An entity is a real-world object, concept, or event that is distinct from others and about which we collect data.

In simple terms, entities are the nouns of our database system.

For example, in our OnlineStore database, the primary things we care about are Customers, Products, Orders, and Suppliers. Each of these is an entity. An entity instance is a single occurrence of an entity. For example, ‘John Doe’ is an instance of the Customer entity, and a ‘Laptop’ is an instance of the Product entity. In an ER diagram, an entity is represented as a rectangle with its name inside.

Identifying the correct entities is the cornerstone of good database design. It helps us logically separate our data into manageable and meaningful groups.

Attributes (describing the entities)

Once we have our entities, the next logical question is: what information do we want to store about them?

A Product entity is just a concept until we define its properties, called attributes, like its name and price. These properties provide the detail and context that make our data useful. An attribute is a property or characteristic of an entity. If an entity is a noun, its attributes are the adjectives and descriptors that tell us more about it. For our Product entity, attributes would include ProductName, Price, and Stock.

For our Customer entity, we might have CustomerID, CustomerName, Email, and Address.

Attributes can be classified into different types:

Simple attribute: An attribute that cannot be broken down into smaller components. For example, the Price of a product is a single value.
Composite attribute: An attribute that can be subdivided into smaller parts. For instance, a customer’s Address could be broken down into Street, City, and PostalCode. Our OnlineStore schema keeps it simple with one Address field, but in a more complex system, we might break it down.
Single-valued attribute: An attribute that holds only one value for each entity instance. For example, a product has only one ProductID.
Multi-valued attribute: An attribute that can hold multiple values for a single entity instance. For example, a customer might have several phone numbers. Designing for this requires special techniques, which we’ll cover later.
Derived attribute: An attribute whose value is calculated from other attributes. It is not stored directly but can be computed on demand. In our Products table, the InventoryTurnoverRate is a perfect example, as it’s automatically calculated from MonthlySales and Stock.

Considering attributes helps us refine our understanding of each entity and ensures that we capture all the necessary information for our application.

Identifiers—ensuring every instance is unique

We have our entities and their attributes.

But what if we have two customers named ‘Jane Smith’? How can the system distinguish between them? This is why identifiers are non-negotiable. We need a reliable method to uniquely identify every record in our database. An identifier, or a primary key, is an attribute (or a set of attributes) whose value is unique for every instance of an entity.

It acts like a social security number or a fingerprint for our data, guaranteeing that we can always find the exact record we’re looking for. In ER diagrams, the identifier attribute is typically underlined.

For example, in our OnlineStore database:

The Customers entity is uniquely identified by CustomerID.
The Products entity is uniquely identified by ProductID.
The Orders entity is uniquely identified by OrderID.

Even if other details are identical, the identifier will always be different. When we translate our ER model into a real database, this identifier becomes the table’s PRIMARY KEY. Let’s see how the ProductID, our chosen identifier, is represented in the ER diagram.

This ensures that no two products can ever have the same ProductID, providing a foundation for data integrity.

Following is a further categorization of identifiers:

A primary key is the chosen identifier for an entity.
A candidate key is any minimal set of attributes that could serve as an identifier.
A composite key utilizes multiple attributes in combination.
Natural keys use real-world attributes (e.g., Email)
Surrogate keys are artificial (e.g., auto-increment CustomerID).

The following examples are from our database.

Customers.CustomerID (INT PRIMARY KEY AUTO_INCREMENT) is a surrogate primary key.
Products.ProductName is UNIQUE and can be a candidate key.
Categories.CategoryName is UNIQUE and can be a candidate key.
Bridge tables often use composite identifiers conceptually, e.g., (ProductID, SupplierID) in Product_Suppliers.

We often pick a surrogate primary key for simplicity, but we still validate natural candidate keys (like Email) to strengthen data quality with UNIQUE constraints where appropriate.

Common pitfalls and how to avoid them

Here are some common pitfalls and ways to avoid them.

Entity vs attribute confusion: If something does not have its own lifecycle, it is probably an attribute, not an entity. For example, DeliveryStatus is an attribute of Order, not an entity.
Overusing natural keys: Real-world values like Email can change. Keep a surrogate primary key even if a natural key looks stable; add UNIQUE constraints to enforce quality.
Ignoring optionality: Decide early which attributes are NOT NULL. Optionality is a business rule, not just a technical choice.
Forgetting derived attributes: Ensure consistent calculations are applied in generated columns or views to ensure everyone receives the same values.

Following the above guidelines will help us in designing a better database model.

Excellent work! We’ve just laid the groundwork for all database design.

Before building, it’s essential to plan. This plan includes identifying the main objects (entities), defining their characteristics (attributes), and assigning each a unique identifier. These three concepts form the foundation of the Entity-Relationship (ER) model, making every future step—such as creating relationships, mapping to tables, and writing queries—simpler and more reliable.

You’re making great progress—keep up the momentum!

1.Introduction to Databases

2.Database Models and Architecture

3.Relational Model

4.Entity-Relationship Modeling

5.SQL Fundamentals

6.Relational Algebra and Relational Calculus

7.Advanced Database Concepts

8.Schema Design and Normalization

9.Transactions and Concurrency Control

10.Indexing and Query Optimization

11.NoSQL Databases Overview

12.Conclusion

Mock Interview