Counting Unique Tweet Visitors

Learn how to count inserted unique visitors using PostgreSQL.

As an example use case for the HyperLogLog data type, we’re going to count unique visitors to our tweets using the application we introduced in “Data Manipulation and Concurrency Control.”

The two main operations around an hll data type consist of the following:

  • Build a hash from an input value, such as an IP address.
  • Update the already known hll value with the hash.

The main idea behind hll is to keep a single hll value per granularity, here per Tweet message and per day. This means that each time we have a new visit on a tweet, we want to update our hll set to count that visitor.

Update heavy scenarios and concurrency

Concurrency is a deal breaker for UPDATE-heavy scenarios where the same row is solicited over and over again. So we’re going to work in two steps again here, first doing an INSERT per visit and then arranging a background process to transform those visits into an UPDATE to the single hll aggregate per tweet and date.

Here’s the visitor table where we can insert every single visit:

Get hands-on with 1200+ tech skills courses.