Search⌘ K
AI Features

Schemaless Design in PostgreSQL

Explore the concept of schemaless design in PostgreSQL by using JSON data types and GIN indexes. Understand how to manage flexible document structures, query large JSON datasets efficiently, and the trade-offs regarding data quality when using schemaless approaches. Gain practical experience using PostgreSQL to handle semistructured data with example queries and scripts.

An area where the NoSQL systems have been prominent is in breaking with the normalization rules and the hard step of modeling a database schema. Instead, most NoSQL systems will happily manage any data the application sends through. This is called the schemaless approach.

In truth, there’s no such thing as a schemaless design, actually. What it means is that the name and type of the document properties, or fields, are hard-coded into the application code.

JSON data set

A readily available JSON dataset is provided at MTGJSON that provides Magic: The GatheringTM card data in JSON format, using the CC0 license. We can load it easily given this table definition:

PostgreSQL
begin;
drop schema if exists magic cascade;
create schema magic;
create table magic.allsets(data jsonb);
commit;

Then we use a small Python script:

Python
#! /usr/bin/env python3
import psycopg2
PGCONNSTRING = "user=postgres dbname=magic"
if __name__ == '__main__':
pgconn = psycopg2.connect(PGCONNSTRING)
curs = pgconn.cursor()
allset = open('MagicAllSets.json').read()
allset = allset.replace("'", "''")
sql = "insert into magic.allsets(data) values('%s')" % allset
curs.execute(sql)
pgconn.commit()
pgconn.close()

Giant documents and PostgreSQL

The giant JSON document in a single table doesn’t represent the kind of schemaless design. It goes ...