Search⌘ K
AI Features

Lossy Unique Count with hll

Explore how to apply the HyperLogLog (hll) data type in PostgreSQL to estimate unique counts in large datasets. Learn to use hll functions and operators to compute approximate distinct values while optimizing storage and performance. This lesson guides you through practical queries and explains the benefits of lossy counting with hll.

We'll cover the following...

We can rewrite the query using our hll data type now, even though, at this stage, it’s not going to be very useful because we still have the full logs of every visit, and we can afford to compute precise counts.

Computing precise counts

Nonetheless, our goal is to dispose of the daily entries that we anticipate will be just too large a dataset. So, the hll-based query looks like this:

PostgreSQL
select messageid,
datetime::date as date,
# hll_add_agg(hll_hash_text(ipaddr::text)) as hll
from tweet.visitor
where messageid = 3
group by grouping sets((messageid),
(messageid, date))
order by messageid, date nulls first
limit 10;

In this query, we use several new functions and operators related to the hll data type:

  • The # operator takes a single argument: it’s a unary operator, like factorial (written ...