What is an inverted index?
What is an inverted index?
The inverted index is a data structure that allows efficient, full-text searches in the database. It is a very important part of information retrieval systems and search engines that stores a mapping of words (or any type of search terms) to their locations in the database table or document.
Why do we need an inverted index?
I will explain this whole concept with an example.
Let’s assume we have a Quotes table in our database. Here is what the table will look like:
| quote_id | quote_text |
|---|---|
| 101 | Winter is coming |
| 102 | Chaos is a ladder |
| 103 | Are you coming, mylord |
| 104 | Winter has come |
Let’s write a SQL query to search all the quotes with the text ‘winter’ in it:
Select * from Quotes where quote_text like '%winter%'
This command will look for the ‘winter’ text in all the rows, but it is
In this kind of scenario, where we have to do a full-text search in a database, it’s best to create an inverted index. This index allows for fast, full-text searches at the cost of increased processing.
A basic inverted index
This is how a basic inverted index will look for the Quotes table described above.
| term | quote_id |
|---|---|
| winter | 101,104 |
| is | 101,102 |
| coming | 101,103 |
| Chaos | 102 |
| a | 102 |
| Are | 103 |
| you | 103 |
| mylord | 103 |
| has | 104 |
| come | 104 |
Once this index is constructed, as shown in this table, we can find all quotes with the term ‘winter’ with just a quick lookup.
Improving inverted index
While a basic inverted index can answer queries that have an exact match in the database, it may not work in all scenarios. For example:
-
Users may search for a term that is not present exactly in an inverted index, but are still related to it. For example, searching for snow or snowing in place of snowfall. We can address this issue through Stemming, which is a technique that extracts the root form of the words by removing affixes. For example, the root form of the words eating, eats, and eaten is eat.
-
Or they can search for a
. To solve this, the synonyms of the searched term are also looked up in the inverted index.synonym For example, searching glad or elated in place of happy -
Users generally search for
rather thanphrases such as fastest car . To support phrase searching, Word-level Inverted indexes record the position of a word in the document as well to improve the search results.single words like fastest or car