A Real-World Example Using Generators
Understand how generators can be used in real-world scenarios like delta encoding and variable byte encoding.
Using coroutines to simplify iterators really shines when the examples are a bit more advanced. Using co_yield
with the Generator
class allows us to implement and combine small algorithms efficiently without needing boilerplate code to glue it all together. This next example will try to prove that.
The problem
We will here go through an example of how we can use our Generator
class to implement a compression algorithm that can be used in search engines to compress the search index typically stored on disk.
Search engines use some variant of a data structure called an inverted index. It is like an index at the end of a book. Using the index, we can find all pages that contain the terms we are searching for.
Now imagine that we have a database full of recipes and that we build an inverted index for this database. Parts of this index might look something like this:
Each term is associated with a sorted list of document identifiers. (For example, the term “apple” is included in the recipes with IDs 4, 9, 67, and 89.) If we want to find recipes that contain both beans and chili, we can run a merge-like algorithm to find the intersection of the lists for beans and chili:
Now imagine that we have a big database, and we choose to represent the document identifier with a 32-bit integer. The lists of document identifiers can become very long for terms that appear in many documents, and therefore we need to compress those lists. One possible way to do that is to use delta encoding combined ...
Get hands-on with 1400+ tech skills courses.