Search⌘ K

Scheduling Estimates Computations

Explore how to estimate unique visitor counts efficiently in PostgreSQL by using HyperLogLog with stored procedures. Learn to create scheduled background processes and interactive queries that update visitor estimates while avoiding double counting by combining data sets.

To calculate unique visitor estimates from the heavy insert table, we need a background process to do this from time to time.

The easiest way to do that here would be to create a new API endpoint on our back-end server and set up a cron-like utility to use that endpoint for our specified schedule. In case of emergency, though, it’s nice to be able to run this updating process interactively. A solution to have both the back-end API integration and the interactive approaches available consist of packaging our SQL query as a stored procedure.

SQL function

While stored procedures aren’t covered, it’s easy enough to write a SQL function around the statement we have already:

PostgreSQL
begin;
create function tweet.update_unique_visitors
(
in batch_size bigint default 1000,
out messageid bigint,
out date date,
out uniques bigint
)
returns setof record
language SQL
as $$
with new_visitors as
(
delete from tweet.visitor
where id = any (
select id
from tweet.visitor
order by datetime, messageid
for update
skip locked
limit update_unique_visitors.batch_size
)
returning messageid,
cast(datetime as date) as date,
hll_hash_text(ipaddr::text) as visitors
),
new_visitor_groups as
(
select messageid, date, hll_add_agg(visitors) as visitors
from new_visitors
group by messageid, date
)
insert into tweet.uniques(messageid, date, visitors)
select messageid, date, visitors
from new_visitor_groups
on conflict (messageid, date)
do update set visitors = hll_union(uniques.visitors, excluded.visitors)
where uniques.messageid = excluded.messageid
and uniques.date = excluded.date
returning messageid, date, cast(# visitors as bigint) as uniques;
$$;
commit;

And here’s an interactive session where we use the newly defined stored procedure to update our unique visitors hll table. ...