Planning the Physical Model
Learn about planning the physical model of the database for our Rails application.
A formal way to model a database is called normalization, and it’s a dense topic full of equations, confusing terms, and mathematical proofs. Instead, we are going to outline a simpler approach that might lack the precision of theoretical computer science but is hopefully more approachable. Here’s how to go about it:
- Create a table for each entity in the logical model.
- Add columns to associate related models using foreign keys.
- For each attribute, decide how we will enforce its requirements and create the needed columns, constraints, and associated tables.
- Create indexes to enforce all uniqueness constraints.
- Create indexes for any queries you plan to run.
To do this, it’s immensely helpful if we understand SQL. In addition to knowing how to model our data, knowing SQL allows us to understand the runtime performance of our app, which will further help us with data modeling. Of all the programming languages we will ever learn, outside of learning SQL, the hardest part of the planning process is step 3: deciding how to enforce the requirements of each attribute.
We’ll bring together some or all of the following techniques:
- Choosing the right column type.
- Using database constraints.
- Creating lookup tables.
- Writing code in your app.
Let’s dive into each one of these.
Choosing the right column type
Each column in the database must have a type, but databases have few types to choose from. Usually, there are strings, dates, timestamps, numbers, and booleans. That said, we must familiarize ourselves with the types of our database. Unless we are writing code that has to work against any SQL database (which is rare), we should not be bound by Rails’ least-common-denominator set of types.
The type we choose should allow us to store the exact values we need. It should also make it difficult or impossible to store incorrect values. Here are some tips for each of the common types.
Strings
In the olden days, choosing the size of our string mattered. Today, this is not universally true. Consult your database’s documentation and use the largest size type you can. For example, in Postgres, we can use a TEXT
field because it carries no performance or memory burden over VARCHAR
. It’s important to get this right because changing column types later when we need bigger strings can be difficult.
Rational numbers
Avoid FLOAT
if possible. Databases store FLOAT
values using the IEEE 754 format, which does not store precise values. Either convert the rational to a base unit (for example, store money in cents as an integer) or use the DECIMAL
type, which stores precise values. Note that neither type can store all rational numbers. One-third, for example, cannot be stored in either type. To store precise fractional values might require storing the numerator and denominator separately.
Booleans
Use the boolean
type. Do not store, for example, “y” or “n” as a string. There’s no benefit to doing this, and it’s confusing. (Yes, people do this, but we don’t understand why.)
Dates
Remember that a date is not a timestamp. A date is a day of the month in a certain year. There is no time component. The DATE
data type can store this and allow date arithmetic on it. Don’t store a timestamp set at midnight on the date in question. Time zones and daylight savings time will wreak havoc upon you.
Timestamps
As opposed to a date, a timestamp is a precise moment in time, usually a number of milliseconds from a reference timestamp. As discussed above, use TIMESTAMP WITH TIME ZONE
if using Postgres. If we aren’t using Postgres, be very explicit in setting the reference timezone in all your systems. Do not rely on the operating system to provide this value. Also, do not store timestamps as numbers of seconds or milliseconds. The TIMESTAMP WITH TIME ZONE
and TIMESTAMP
types are there for a reason.
Enumerated types
Many databases allow us to create custom enumerated types, which are a set of allowed values for a text-based field. If the set of allowed values is stable and unlikely to change, an ENUM
can be a good choice to enforce correctness. If the values might change, a lookup table might work better (we’ll talk about that below).
No matter what other techniques we use, we’ll always need to choose the appropriate column type. Next, decide how to use database constraints.
Using database constraints
All SQL databases provide the ability to prevent NULL
values. In a Rails migration, this is what null: false
is doing. This tells the database to prevent NULL
values from being inserted. Any required value should have this set, and most of our values should be required.
Many databases provide additional constraint mechanisms, usually called check constraints. Check constraints are extremely powerful for enforcing correctness. For example, a widget’s price must be positive and less than or equal to $10,000. With a check constraint, this could enforce:
Get hands-on with 1400+ tech skills courses.