Lossy Unique Count with hll
Explore how to apply the HyperLogLog (hll) data type in PostgreSQL to estimate unique counts in large datasets. Learn to use hll functions and operators to compute approximate distinct values while optimizing storage and performance. This lesson guides you through practical queries and explains the benefits of lossy counting with hll.
We'll cover the following...
We can rewrite the query using our hll data type now, even though, at this stage, it’s not going to be very useful because we still have the full logs of every visit, and we can afford to compute precise counts.
Computing precise counts
Nonetheless, our goal is to dispose of the daily entries that we anticipate will be just too large a dataset. So, the hll-based query looks like this:
In this query, we use several new functions and operators related to the hll data type:
-
The
#operator takes a single argument: it’s a unary operator, like factorial (written ...