Deep Dive into the Internals of the Database/

...

What Is a Database Index?

Understand the use of a database index.

We'll cover the following...

Full table scan
Why database index?
Database index
Demonstration of index creation

Indexing is a common concept in the database world. A database index is an additional data structure that allows us to locate the required data quickly without going through the entire dataset in the database.

Full table scan

Before discussing indexing, we need to understand the concept of a full table scan and why indexing is required.

A full table scan (or sequential scan) is the process of reading the entire database sequentially from the beginning and comparing every entry in the database with the search request to find the desired data. The worst-case time complexity of finding the desired data is $O(N)$ , where $N$ is the total number of elements in the database.

The scan is the slowest database operation because the number of I/O calls to the disk is very high. After all, every disk block must be sequentially fetched from the disk and loaded into the memory.

Why database index?

Let’s consider the need for indexing with the help of a real-world example. In our example, we have a physical book on the database that is about 500 pages long. If we want to learn about a particular concept, say a transaction from the book, there are a couple of ways of doing this:

Start from page 1 of the book and read every page until we find the content on the transaction.
Jump to the book's appendix at the end, look for the word transaction, and jump to the appropriate page number.

The first approach is very laborious, time-consuming, and it might take a few days before we find the appropriate chapter. The second approach is speedy and it only takes a few seconds to locate the right chapter. The latter approach is the exact mechanic for how database indexing works.

Database index

Defined another way, a database index is an additional metadata structure that acts as a pointer to the corresponding data on disk without scanning the entire database dataset.

A database index is composed of two main components:

Search key(s) that contains attributes of the database on which the database performs the lookup.
Pointers that hold the disk block's address where that particular data is stored.

Index data structures use approximate structures that provide sublinear lookup times (logarithmic) that boost the performance of the query. Indexing data structures have some core attributes that dictate the usage pattern:

Space usage: Indicates the additional space used by the index data structure.
Access pattern: Indicates the type of query patterns supported by the index, for example, key-value lookup and range scans.
Access time: Time required to locate the search key and its corresponding data.
Insertion time: Time required to insert the search key and its related data.

These attributes dictate the effectiveness of the index data structure and which query patterns it can support. For example, a hash index supports a constant time lookup query and insertion but doesn't support range scans. Similarly, a sorted binary tree supports a logarithmic time lookup and range query.

Index data structures lead to write amplification. The database persists data both on disk and index for every write operation. Every index increases the time taken to insert and update, but a well-designed index improves read performance. Introducing multiple indexes in the database significantly impacts the write performance.

Demonstration of index creation

We'll use MySQL in this lesson to demonstrate index creation.

The terminal above installs the MySQL server. Wait until a mysql> prompt appears and follow the sequence of commands to play with the relational database:

Create database: Use the command CREATE DATABASE to create a database. For example, to create a database with the name users, we use CREATE DATABASE users;.
Use the database: Now that we have created a database, we need to use it. The command to do that is: USE users; There can be multiple databases in a given MySQL server.
Create table: Now that we are using the database, we need to create a table called users. The command CREATE TABLE users (id bigint(20) NOT NULL AUTO_INCREMENT,name VARCHAR(30) NOT NULL,age bigint(20) NOT NULL,PRIMARY KEY (id)); creates a table with three attributes:
- A big integer id, which acts as a primary key whose value is auto-incremented for each row.
- A string name, which stores the name of the user.
- A big integer age, which stores the age of the user.
Insert users: Once we create the table, we can insert rows into the table using the INSERT command:
- INSERT INTO users(name, age) VALUES ("XYZ", 10);
- INSERT INTO users(name, age) VALUES ("ABC", 20);
Create index: Now that we have inserted a few rows, let's create an index on the column age using the command ALTER TABLE users ADD INDEX age_idx(age);.
Examine index usage: Let's use the command EXPLAIN to demonstrate the usage of the index for a particular query. Run this command EXPLAIN SELECT * FROM users WHERE age = 10; .

Introduction

Taxonomy of Databases

Database Architecture

Data Structures used in Databases

Disk Layout

Database Index

Transaction

Replication

Partitioning

Concurrency Controls

Consistency Models

Consensus

Common Problems Associated with Distributed Databases

Conclusion

Database Internals Assessment

What Is a Database Index?

Full table scan

Why database index?

Database index

Demonstration of index creation