What is data modeling in MongoDB?

Data modeling is the process of defining how data is stored and what relationships exist between different entities in our data.

Data modeling aims to visually represent the relationship between different entities in data. It also seeks to represent the data’s organization and grouping.

Types of data modeling in MongoDB

MongoDB provides the following two ways to model our data:

  1. Embedded Documents
  2. References

Consider a scenario where you want to save data about blog posts and their comments.

Let’s take a look at how we can model this data using the methods mentioned above.

Embedded documents

One way to model blog posts and their respective comments is to embed the child document in the parent document. In our case, the blog post is the parent document and the comment is the child document.

The embedded Document model is also known as “denormalized” data model or schema.

Below is an example of a blog post document that has multiple comment documents embedded in it in the form of an array:

{
  _id: <ObjectId123>,
  title: "Data Modelling in MongoDB",
  body: "some long text...",
  comments: [
    { 
      _id: <ObjectId111>,
      comment: "some text...",
      author: "mike@email.com"     
    },
    { 
      _id: <ObjectId222>,
      comment: "some text...",
      author: "jake@email.com"    
    }
  ]
}

Advantages of embedded data model

Embedding documents lead to better performance because we can read and update data in a single database operation.

Disadvantages of embedded data model

The embedded data model has the following disadvantages:

  • It can lead to data duplication.
  • The maximum size of a MongoDB document is 16MB, which is why care must be taken before embedding documents in other documents.

When to use the embedded data model?

Use the embedded data model when:

  • Entities have a “contains” or “has a” relationship between them.

  • Entities have a one-to-many relationship between them.

References

Using this strategy, we can describe relationships between documents using references.

This is also known as “Normalized” data model or schema.

Below is an example of how we can model our blog posts and comments using this data model:

// blog post
{
  _id: <ObjectId123>,
  title: "Data Modelling in MongoDB",
  body: "some long text..."
}

// comments
{ 
  _id: <ObjectId111>,
  comment: "some text...",
  author: "mike@email.com",
  postId: <ObjectId123>. // reference to the blog post
},
{ 
  _id: <ObjectId222>,
  comment: "some text...",
  author: "jake@email.com",
  postId: <ObjectId123>    // reference to the blog post 
}

In the normalized data model, instead of embedding comment documents in the blog post document, we add the comment documents in a separate collection. In this collection, each comment is a separate document.

In addition to its own data, each comment document also contains a reference to the parent blog post using the id of the parent blog post document.

Advantages of a normalized data model

A normalized data model has the following advantages:

  • No data duplication
  • Can represent more complex many-to-many relationships
  • Can represent hierarchical data sets

Disadvantages of normalized data model

As the related data may be present in the separate documents, to get all the related data we need one of the two things. We either need multiple database operations or we need to join multiple collections.

We also need multiple database operations to write related data in multiple documents.

When to use the embedded data models

Use the Normalized data model when you want to:

  • Model hierarchical data sets

  • Represent many-to-many relationships

  • When the read performance gained as a result of using embedded documents does not outweigh the implications of the data duplication.