Introduction to Data Manipulation and Concurrency Control
Get an introduction to data manipulation and concurrency control.
Tweets dataset
We used a dataset of 200,000 USA-geolocated Tweets with a very simple data model. The data model is a direct port of the Excel sheet format, allowing a straightforward loading process—we used the \copy command from the psql tool.
Database model and normalization
The tweets.sql database model is all wrong per the normal forms introduced earlier:
-
There’s neither a unique constraint nor a primary key, so there is nothing preventing the insertion of duplicate entries, violating 1NF.
-
Some non-key attributes are not dependent on the key because we mix data from the Twitter account posting the message and the message itself, violating ...