Regression Testing

Learn and practice regression testing for databases in PostgreSQL.

Regression testing protects against introducing bugs when refactoring code. In SQL, we refactor queries, either because the calling application code is changed and the query must change too, or because we’re hitting problems in production and a new optimized version of the query is being checked in to replace the previous erroneous version.

Regression testing registers the expected results from our queries and then checks actual results against the expected results. Typically, we would run the regression tests each time a query is changed.

Regression test suite in lieu of “tools for testing”

The PostgreSQL project includes many SQL tests to validate its query parser, optimizer, and executor. It uses a framework named the regression tests suite based on a very simple idea:

  1. Run a SQL file containing our tests with psql
  2. Capture its output to a text file that includes the queries and their results
  3. Compare the output with the expected one that is maintained in the repository with the standard diff utility
  4. Report any difference as a failure

We can have a look at the PostgreSQL repository to see how it’s done; as an example, we could pick src/test/regress/sql/aggregates.sql, and its matching expected result file src/test/regress/expected/aggregates.out.

Implementing that kind of regression testing for our application is quite easy, as the driver is only a thin wrapper around executing standard applications such as psql and diff. The idea would be to always have a setup and a teardown step in our SQL test files, wherein the setup step builds a database model and fills it with the test data and the teardown step removes all that test data.

To automate such a setup and to go beyond the obvious, the pgTap tool is a suite of database functions that make it easy to write TAP-emitting unit tests in psql scripts or xUnit-style test functions. The TAP output is suitable for harvesting, analysis, and reporting by a TAP harness, such as those used in Perl applications.

RegreSQL

The RegreSQL tool implements that idea. It finds SQL files in our repository and allows registering plan tests against them, and then it compares the results with what’s expected.

A typical output from using RegreSQL against our cdstore application looks like the following:

$ regresql test
Connecting to 'postgres:///chinook?sslmode=disable'… ✓
TAP version 13
ok 1 - src/sql/album-by-artist.1.out
ok 2 - src/sql/album-tracks.1.out
ok 3 - src/sql/artist.1.out
ok 4 - src/sql/genre-topn.top-3.out
ok 5 - src/sql/genre-topn.top-1.out
ok 6 - src/sql/genre-tracks.out

You can see it yourself by running the tests in the “Try it yourself” section.

In the following example, we introduce a bug by changing the test plan without changing the expected result, and here’s how it looks:

$ regresql test
Connecting to 'postgres:///chinook?sslmode=disable'… ✓
TAP version 13
ok 1 - src/sql/album-by-artist.1.out
ok 2 - src/sql/album-tracks.1.out
## Query File: 'src/sql/artist.sql'
## Bindings File: 'regresql/plans/src/sql/artist.yaml'
## Bindings Name: '1'
## Query Parameters: 'map[n:2]'
## Expected Result File: 'regresql/expected/src/sql/artist.1.out'
## Actual Result File: 'regresql/out/src/sql/artist.1.out'
# 
## --- regresql/expected/src/sql/artist.1.out
## +++ regresql/out/src/sql/artist.1.out
## @@ -1,4 +1,5 @@
## -   name     | albums
## -------------+-------
## -Iron Maiden | 21
## +    name     | albums
## +-------------+-------
## +Iron Maiden  | 21
## +Led Zeppelin | 14
## 
not ok 3 - src/sql/artist.1.out
ok 4 - src/sql/genre-topn.top-3.out
ok 5 - src/sql/genre-topn.top-1.out
ok 6 - src/sql/genre-tracks.out

We can see the effects in the playground in the “Try it yourself” section.

The diagnostic output allows actions to be taken to fix the problem: either change the expected output (with regresql update) or fix the artist.yaml file in the plans folder.

A closer look

When something wrong happens in production, we need to understand it to fix it. One of the important tasks we’re confronted with is finding which part of the code is sending a specific query we can see in the monitoring, in the logs, or in the interactive activity views.

PostgreSQL implements the application_name parameter, which we can set in the connection string and with the SET command within our session. It’s then possible to have it reported in the server’s logs, and it’s also part of the system activity view pg_stat_activity.

It’s a good idea to be quite granular with this setting, going as low as the module or package level, depending on our programming language of choice. It’s one of those settings that the main application should have full control of, so usually, external (and internal) libs are not setting it.

Note: We can test it using \d+ pg_stat_activity command.

Try it yourself

All the required files are added in the playground below. Let’s perform the following tasks.

  1. First, paste the following command to update the system with the data in the expected output files:

Get hands-on with 1200+ tech skills courses.