Search⌘ K

Getting the Visits into Unique Counts

Explore how to implement HyperLogLog in PostgreSQL for approximating unique visitor counts efficiently. Understand using common table expressions and batch deletion with skip locked for scalable and manageable visitor tracking in high-traffic environments.

Context and constraints

In a production setup, we would have the following context and constraints:

  • Tweets are published, and users from the internet are visiting tweets.

  • Our application inserts a new row in tweet.visitor with the visitor’s IP address each time there is a new visit to one of our Tweets. It also registers the precise timestamp of the visit.

  • As we anticipate quite some success on our little application idea, we also anticipate not being able to keep all the visitor logs and not being able to respect our quality-of-service terms when computing the unique visitors on the fly each time someone needs them.

  • Finally, as numbers are being used in a marketing context rather than in an invoicing context, we’re in a position to lose some precision over the number, and we would actually like to implement a system that is lossy if it allows us to relax our storage and processing ...