A Real-World Example Using Generators
Explore using C++ coroutines and the Generator class to implement a compression algorithm based on delta and variable byte encoding. Understand how coroutines simplify iterator logic, allow lazy evaluation, and minimize boilerplate in encoding sorted integer sequences for efficient data compression.
We'll cover the following...
Using coroutines to simplify iterators really shines when the examples are a bit more advanced. Using co_yield with the Generator class allows us to implement and combine small algorithms efficiently without needing boilerplate code to glue it all together. This next example will try to prove that.
The problem
We will here go through an example of how we can use our Generator class to implement a compression algorithm that can be used in search engines to compress the search index typically stored on disk.
Search engines use some variant of a data structure called an inverted index. It is like an index at the end of a book. Using the index, we can find all pages that contain the terms we are searching for.
Now imagine that we have a database full of recipes and that we build an inverted index for this database. Parts of this index might look something like this:
Each term is associated with a sorted list of document identifiers. (For example, the term “apple” is included in the recipes with IDs 4, 9, 67, and 89.) If we want to find recipes that contain both beans and chili, we can run a merge-like algorithm to find the intersection of the lists for beans and chili:
Now imagine that we have a big database, and we choose to represent the document identifier with a 32-bit integer. The lists of document identifiers can become very long for terms that appear in many documents, and therefore we need to compress those lists. One possible way to do that is to use delta encoding ...