Search⌘ K
AI Features

Antipattern: Format Comma-Separated Lists

Understand the pitfalls of storing multiple values in a single VARCHAR column separated by commas. Explore the challenges such as inefficient queries, difficulty in updating, inability to use indexes, and validation issues. Learn why this design is an antipattern and how it impacts data integrity and performance.

Returning to our example, to minimize changes to the database structure, you decide to redefine the account_id column as a VARCHAR so you can list multiple account IDs in that column, separated by commas.

MySQL
CREATE TABLE Products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(1000),
account_id VARCHAR(30) -- comma-separated list
-- . . .
);
INSERT INTO Products (product_id, product_name, account_id)
VALUES (DEFAULT, 'Visual TurboBuilder', '12,34');

This seems like a win because you’ve created no additional tables or columns; you’ve changed only one column’s data type. However, let’s look at the performance and data integrity problems this table design suffers from.

Querying products for a specific account

Although a foreign key can handle the insert, update, and delete functions, the queries are difficult if all the foreign keys are combined into a single field. Thus, you can no longer use equality; instead, you have to use a test against some kind of pattern. For example, MySQL lets you write something like the following to find all the products for account 12:

MySQL
SELECT * FROM Products WHERE account_id REGEXP '[[:<:]]12[[:>:]]';

Note: We can also use some other queries for data retrieval. For example, you can check it for accounts 34 or 23.

Pattern-matching expressions may return false matches and can’t benefit from indexesIn the context of database-related usage, the word index refers to an ordered collection of information.. Since the pattern-matching syntax is different in each database brand, your SQL code isn’t vendor-neutral.

Querying accounts for a given product

Likewise, joining a comma-separated list to the matching rows in the referenced table is awkward and costly.

Let’s run the query in the following widget. Try to run this query after making some changes, like using a different regular expression.

We can also retrieve the available data by using SELECT * FROM Products before using the code given in the following widget to retrieve the already available data in the database.

MySQL
SELECT * FROM Products AS p JOIN Accounts AS a
ON p.account_id REGEXP '[[:<:]]' || a.account_id || '[[:>:]]'
WHERE p.product_id = 123;

Joining two tables using an expression like this one spoils any chance of using indexes. The query must scan through both tables, generate a cross product, and evaluate the regular expression for every combination of rows.

If we use regular expressions to pick out parts of a string, this could be an indication that we should store those parts separately. Otherwise, this solution would be an antipattern.

Making aggregate queries

Aggregate queries use functions like COUNT(), SUM(), and AVG(). However, these functions are designed to be used over groups of rows, not comma-separated lists. We have to resort to tricks like the following:

MySQL
SELECT product_id, LENGTH(account_id) - LENGTH(REPLACE(account_id, ',', '')) + 1
AS contacts_per_product
FROM Products;

Tricks like this can be clever but never clear. These kinds of solutions are time-consuming to develop and hard to debug. Some aggregate queries can’t be accomplished with tricks at all.

Updating accounts for a specific product

We can also add a new ID to the end of the list with string concatenation, but this may affect the sorting of the list.

MySQL
UPDATE Products
SET account_id = CONCAT(account_id, ',', 56)
WHERE product_id = 123;

Let’s check in the following widget if the data is updated successfully.

MySQL
UPDATE Products
SET account_id = CONCAT(account_id, ',', 56)
WHERE product_id = 123;
SELECT * FROM Products WHERE product_id = 123;

To remove an item from the list, we have to run two SQL queries: one to fetch the old list and a second to save the updated list.

<?php

$stmt = $pdo->query(
  "SELECT account_id FROM Products WHERE product_id = 123");
$row = $stmt->fetch();
$contact_list = $row['account_id'];

// change list in PHP code
$value_to_remove = "34";
$contact_list = split(",", $contact_list);
$key_to_remove = array_search($value_to_remove, $contact_list);
unset($contact_list[$key_to_remove]);
$contact_list = join(",", $contact_list);

$stmt = $pdo->prepare(
  "UPDATE Products SET account_id = ?
   WHERE product_id = 123");
$stmt->execute(array($contact_list));

?>

That’s quite a lot of code just to remove an entry from a list.

Validating product IDs

Since we have already seen that we can query the available data in different ways, what prevents a user from entering invalid entries like “banana”?

MySQL
INSERT INTO Products (product_id, product_name, account_id)
VALUES (789, 'Visual TurboBuilder', '12,34,banana');

Let’s try to run the following widget to see the changes made.

MySQL
INSERT INTO Products (product_id, product_name, account_id)
VALUES (789, 'Visual TurboBuilder', '12,34,banana');
SELECT * FROM Products;

Users will find a way to enter any and all variations, and the database will turn to mush. There won’t necessarily be database errors, but the data will become nonsense.

Choosing a separator character

Some list entries may contain our separator character if we store a list of string values instead of integers. Using a comma as the separator between entries may become ambiguous. We can choose a different character as the separator. Still, can we guarantee that this new separator will never appear in an entry?

For example, you use a semicolon instead of a comma. How can you be sure that this will never appear in another entry?

MySQL
SELECT * FROM Products WHERE account_id REGEXP '[[:<:]]12[[:>:]]';

List length limitations

Here comes another problem. How many list entries can we store in a VARCHAR(30) column? It depends on the length of each entry. If each entry is two characters long, then we can store ten entries (since the comma is also counted as a character). But if each entry is six characters, then we can only store four entries:

MySQL
UPDATE Products
SET account_id = '10,14,18,22,26,30,34,38,42,46'
WHERE product_id = 123;
UPDATE Products
SET account_id = '101418,222630,343842,467790'
WHERE product_id = 123;

Let’s try adding some more values in the account_id column.

Let’s add 10,14,18,22,26,30,34,38,42,46,48,37 and/or 101418,222630,343842,467790,765439 in the following playground to check if it works.

MySQL
UPDATE Products
SET account_id = '10,14,18,22,26,30,34,38,42,46'
WHERE product_id = 123;
UPDATE Products
SET account_id = '101418,222630,343842,467790'
WHERE product_id = 123;

How can we know that VARCHAR(30) supports the longest list that we will need in the future? How long is long enough?

Try explaining the reason for this length limit to a boss or to customers.