A Real-World Example Using Generators

Understand how generators can be used in real-world scenarios like delta encoding and variable byte encoding.

Using coroutines to simplify iterators really shines when the examples are a bit more advanced. Using co_yield with the Generator class allows us to implement and combine small algorithms efficiently without needing boilerplate code to glue it all together. This next example will try to prove that.

The problem

We will here go through an example of how we can use our Generator class to implement a compression algorithm that can be used in search engines to compress the search index typically stored on disk.

Search engines use some variant of a data structure called an inverted index. It is like an index at the end of a book. Using the index, we can find all pages that contain the terms we are searching for.

Now imagine that we have a database full of recipes and that we build an inverted index for this database. Parts of this index might look something like this:

Press + to interact
An inverted index with three terms and their corresponding lists of document references
An inverted index with three terms and their corresponding lists of document references

Each term is associated with a sorted list of document identifiers. (For example, the term “apple” is included in the recipes with IDs 4, 9, 67, and 89.) If we want to find recipes that contain both beans and chili, we can run a merge-like algorithm to find the intersection of the lists for beans and chili:

Press + to interact
Intersection of the document lists for the terms beans and chili
Intersection of the document lists for the terms beans and chili

Now imagine that we have a big database, and we choose to represent the document identifier with a 32-bit integer. The lists of document identifiers can become very long for terms that appear in many documents, and therefore we need to compress those lists. One possible way to do that is to use delta encoding combined ...

Get hands-on with 1400+ tech skills courses.