Planning the Physical Model

A formal way to model a database is called normalization, and it’s a dense topic full of equations, confusing terms, and mathematical proofs. Instead, we are going to outline a simpler approach that might lack the precision of theoretical computer science but is hopefully more approachable. Here’s how to go about it:

  1. Create a table for each entity in the logical model.
  2. Add columns to associate related models using foreign keys.
  3. For each attribute, decide how we will enforce its requirements and create the needed columns, constraints, and associated tables.
  4. Create indexes to enforce all uniqueness constraints.
  5. Create indexes for any queries you plan to run.

To do this, it’s immensely helpful if we understand SQL. In addition to knowing how to model our data, knowing SQL allows us to understand the runtime performance of our app, which will further help us with data modeling. Of all the programming languages we will ever learn, outside of learning SQL, the hardest part of the planning process is step 3: deciding how to enforce the requirements of each attribute.

We’ll bring together some or all of the following techniques:

  • Choosing the right column type.
  • Using database constraints.
  • Creating lookup tables.
  • Writing code in your app.

Let’s dive into each one of these.

Choosing the right column type

Each column in the database must have a type, but databases have few types to choose from. Usually, there are strings, dates, timestamps, numbers, and booleans. That said, we must familiarize ourselves with the types of our database. Unless we are writing code that has to work against any SQL database (which is rare), we should not be bound by Rails’ least-common-denominator set of types.

The type we choose should allow us to store the exact values we need. It should also make it difficult or impossible to store incorrect values. Here are some tips for each of the common types.

Strings

In the olden days, choosing the size of our string mattered. Today, this is not universally true. Consult your database’s documentation and use the largest size type you can. For example, in Postgres, we can use a TEXT field because it carries no performance or memory burden over VARCHAR. It’s important to get this right because changing column types later when we need bigger strings can be difficult.

Rational numbers

Avoid FLOAT if possible. Databases store FLOAT values using the IEEE 754 format, which does not store precise values. Either convert the rational to a base unit (for example, store money in cents as an integer) or use the DECIMAL type, which stores precise values. Note that neither type can store all rational numbers. One-third, for example, cannot be stored in either type. To store precise fractional values might require storing the numerator and denominator separately.

Booleans

Use the boolean type. Do not store, for example, “y” or “n” as a string. There’s no benefit to doing this, and it’s confusing. (Yes, people do this, but we don’t understand why.)

Dates

Remember that a date is not a timestamp. A date is a day of the month in a certain year. There is no time component. The DATE data type can store this and allow date arithmetic on it. Don’t store a timestamp set at midnight on the date in question. Time zones and daylight savings time will wreak havoc upon you.

Timestamps

As opposed to a date, a timestamp is a precise moment in time, usually a number of milliseconds from a reference timestamp. As discussed above, use TIMESTAMP WITH TIME ZONE if using Postgres. If we aren’t using Postgres, be very explicit in setting the reference timezone in all your systems. Do not rely on the operating system to provide this value. Also, do not store timestamps as numbers of seconds or milliseconds. The TIMESTAMP WITH TIME ZONE and TIMESTAMP types are there for a reason.

Enumerated types

Many databases allow us to create custom enumerated types, which are a set of allowed values for a text-based field. If the set of allowed values is stable and unlikely to change, an ENUM can be a good choice to enforce correctness. If the values might change, a lookup table might work better (we’ll talk about that below).

No matter what other techniques we use, we’ll always need to choose the appropriate column type. Next, decide how to use database constraints.

Using database constraints

All SQL databases provide the ability to prevent NULL values. In a Rails migration, this is what null: false is doing. This tells the database to prevent NULL values from being inserted. Any required value should have this set, and most of our values should be required.

Many databases provide additional constraint mechanisms, usually called check constraints. Check constraints are extremely powerful for enforcing correctness. For example, a widget’s price must be positive and less than or equal to $10,000. With a check constraint, this could enforce:

Get hands-on with 1200+ tech skills courses.