Antipattern: Clone Tables or Columns

In the television series Star Trek“Star Trek” and related marks are trademarks of CBS Studios Inc., “tribbles” are small furry animals kept as pets. Tribbles are very appealing at first, but soon they reveal their tendency to reproduce out of control, and managing the overpopulation of tribbles becomes a serious problem.

Where does one put them? Who’s responsible for them? How long would it take to pick up every tribble? Eventually, Captain Kirk discovers that his ship and crew can’t function, and he has to order his crew to make it their top priority to remove the tribbles.

We know from experience that querying a table with a few rows is quicker than querying a table with many rows, all other things being equal. This leads to a common fallacy: we must make every table contain fewer rows, no matter what we do. This leads to two forms of antipattern:

  • Splitting a single long table into multiple smaller tables and naming each table based on the distinct data values of one of the table’s attributes.

  • Splitting a single column into multiple columns, using column names based on distinct values as another attribute.

But we cannot get something for nothing; to meet the goal of having fewer rows in every table, we have to either create tables that have too many columns or else create a greater number of tables. In both cases, we find that the number of tables or columns keep growing as new data values make us create new schema objects.

Spawning tables

To split data into separate tables, we need to define a policy about which rows belong in which tables. For example, we could split them up by the year in the date_reported column:

Get hands-on with 1200+ tech skills courses.