Biggest Differences with RDBMS

Understand the biggest differences between Cosmos DB and traditional relational databases, from the data structure to queries.

Cosmos DB vs. RDBMS

Most are familiar with relational databases such as SQL Server or MySQL, so it’s natural to use them as a baseline when looking at other databases. We will discuss the main differences between traditional databases and Cosmos DB.

Data structure

It’s common to see the following terminology table:

RDBMS

Cosmos DB (NoSQL)

Server

Account

Database

Database

Table

Container

Row

Document

Even though this is widely accepted, the translation doesn’t tell the whole story. Rows are not documents, and containers are not tables. For example, we need columns, their types, and constraints to define a table. Each row added to the table follows the same rules. It’s not the case with Cosmos DB; each document is independent, and we only need a name and partition key path to create a container.

Relationship

By far, the most significant difference is the lack of relationships. This necessary evil allows Cosmos DB to scale and distribute data worldwide. For this reason, containers are independent, and there are two main consequences we need to keep in mind right now:

  • Lack of consistency

  • Lack of multi-container joins

Lack of consistency

Since containers are independent, there are no foreign keys, and we can’t rely on the database to keep IDs and relationships consistent.

More practically:

  • There is no cascade delete.

  • There is no validation of the value of IDs.

The application must perform all the necessary checks.

Lack of joins

The lack of relationships means no JOIN of multiple containers during queries. This characteristic is fundamental because if we create containers like we would create tables, we will have an unpleasant time.

Note: We need to create containers based on the queries we are going to perform, which requires some planning.

Sometimes, we’ll have to duplicate the same information in multiple containers because different queries have different requirements.

The reason for this will become more evident when we talk about partitions later in the course.

Examples

Let’s say we have two entities, student and class, where a student belongs to a specific school class. In SQL Server, we can relate the two tables with the foreign key ClassId.

This relationship is not possible in Cosmos DB. For this reason, if we want the class when querying the students container, we need to include the className field in the student document.

Distribution

Cosmos DB replicates the data in multiple locations to guarantee excellent performance worldwide. In practice, some regions might have old versions of a document. On the other hand, in RDBMS, data distribution is typically achieved through the use of sharding or partitioning mechanisms, where data is divided and stored across multiple database instances or servers.

Cosmos DB has different consistency levels we can choose from. The higher the consistency level, the lower the chance of having outdated documents. However, the performance will be affected and so will the cost. Similarly, in RDBMS, consistency levels can vary depending on the isolation level chosen, with stronger isolation levels ensuring higher consistency but potentially impacting performance and concurrency.

All of the above is something to remember when designing a solution.

Press + to interact
High-level example of distribution in Cosmos DB
High-level example of distribution in Cosmos DB

Query

Both database types use SQL as the primary query language, so those familiar with it don’t need to worry too much. However, Cosmos DB supports a much smaller subset of clauses, and some might have different rules and effects from traditional implementations. For example, JOIN is not used to merge multiple containers but to join child elements of a JSON document to its parent.

Examples

In SQL Server, we can write the following query to get how many students there are in each class. To make things easier, class is of type string.

Press + to interact
SELECT COUNT(*) as count, class FROM students GROUP BY class

In Cosmos DB, the syntax is slightly different since queries run in a specific container. It doesn’t have to be specified, and the FROM clause is used only to create an alias.

Press + to interact
SELECT COUNT(1) as count, s.class FROM s GROUP BY s.class

Remember: In SQL Server, a query containing GROUP BY class ORDER BY class is valid, whereas Cosmos DB will reject it.

The differences seem small, but they set the two databases’ SQL queries quite apart when added up.