Loading the Data
Explore how to import and normalize UK pub location data in PostgreSQL. Learn to use the point data type for geolocation, parse XML with a SAX API, and stream data using the COPY protocol. Understand ELT processes for effective data normalization and querying of popular pub names.
We'll cover the following...
PostgreSQL implements the point data type. Using this data type, it’s possible to register locations of points of interest on Earth by using the point values as coordinates for the longitude and latitude. The open-source project, Open Street Map, publishes geographic data that we can use, such as for pubs in the UK.
A pub names database
Using the Overpass API services and a URL like the following, we can download an XML file containing geolocated pubs in the UK:
http://www.overpass-api.de/api/xapi?*[amenity=pub][bbox=-10.5,49.78,1.78,59]
The data itself is available from OSM in some kind of XML format where they managed to handle the data in an EAV model:
<node id="262706" lat="51.0350300" lon="-0.7251785">
<tag k="amenity" v="pub"/>
<tag k="created_by" v="Potlatch 0.10f"/>
<tag k="name" v="Kings Arms"/>
</node>
In our context, we only need a very simple database schema for where to load this dataset, and the following is going to be fine for this purpose:
To load the data in a streaming fashion with the COPY ...