Introduction to HyperLogLog
Explore the HyperLogLog algorithm and its PostgreSQL extension for estimating unique counts with low storage overhead. Understand how to install and use the hll extension to aggregate distinct visitors across time periods without scanning large datasets.
We'll cover the following...
If you’ve been keeping up with the newer statistics developments, you might have heard about this new state-of-the-art cardinality estimation algorithm called HyperLogLog.
This technique is now available for PostgreSQL in the extension postgresql-hll available on GitHub
and is packaged for multiple operating systems such as Debian and RHEL through the PostgreSQL community packaging efforts and resources.
HyperLogLog is a very special hash value. It aggregates enough information into a single scalar value to compute a distinct value with some precision loss.
Use case: Count unique visitors
Say we’re ...