A Sampling of Countries
Explore how to apply PostgreSQL's tablesample methods to extract a random 1 percent sample from the large GeoNames dataset. Understand the difference between BERNOULLI and SYSTEM sampling, their performance impacts, and how to run sampling scripts for practical data querying.
We'll cover the following...
The GeoNames dataset of more than 11 million rows is not practical to include in the course’s material, where you have a database dump or Docker image to play with. We instead take a random sample of 1 percent of the table’s content, and here’s how the magic is done:
In this script, we use the tablesample feature of PostgreSQL to only keep a random selection of 1 percent of the rows in the table. The tablesample accepts several methods, and you can see the PostgreSQL documentation entitled Writing A ...