Properties

Learn about properties and how property-based tests compare to example-based tests.

Properties

As suggested by its name, property-based testing is about properties. Let’s learn about what thinking in properties looks like.

The difference

Let’s start by taking a look at the differences between traditional and property-based testing.

Traditional tests

Traditional tests are often example-based. We make a list of a bunch of inputs for a given program along with the expected output. We may put in a couple of comments about what the code should do, but that’s about it. Our test will be good if we have examples that can exercise all the possible program states that we have.

Property-based tests

Property-based tests, on the other hand, have nothing to do with listing examples by hand. Instead, we’ll want to write some meta tests. This means that we find a rule that dictates the behavior that should always be the same no matter what sample input we give to our program. We then encode that behavior into some executable code known as property. A special test framework will generate examples and run them against the properties to check whether the program follows the rule.

Example

Let’s say that we have a function to represent a cash register. The function has three arguments:

  1. A series of bills and coins that are already in the register,
  2. The amount of money that the customer has to pay,
  3. The money handed by the customer to the cashier.

It should then return the list of bills and coins to cover the change.

Let’s look at the two methods we can use to test the function.

Example-based way

%% Money in the cash register
Register = [{20.00, 1}, {10.00, 2}, {5.00, 4},
{1.00, 10}, {0.25, 10}, {0.01, 100}],
%% Change = cash(Register, PriceToPay, MoneyPaid),
%% Change format = [{bill/coin, count_of_bill}]
[{10.00, 1}] = cash(Register, 10.00, 20.00),
[{10.00, 1}, {0.25, 1}] = cash(Register, 9.75, 20.00),
[{10.00, 1}, {5.00, 1}, {1.00, 3}, {0.25, 2}, {0.10, 1},{0.01, 3}] = cash(Register, 1.37, 20.00)

Let’s look at each of the three cases one by one:

  • The first test at line 6 says that a customer paying a $10 item with $20 should expect a single $10 bill back.
  • The second test at line 7 says that for a $9.75 purchase paid with $20, a $10 bill with a quarter should be returned, for a total of $10.25.
  • The final test at line 8 shows that a $1.37 item paid with a $20 bill yields $18.63 in change, with the specific cuts shown.

That’s a fairly familiar approach. Come up with a bunch of arguments with which to call the function, do some thinking, and then write down the expected result. In traditional testing, we’d try to cover the full set of rules and edge cases that describe what the code should do by listing many examples. In property-based testing, we’d have to flip that around and come up with the rules first.

Property-based way

With properties, the difficult part is figuring out how to take the abstract ideas and make them into rules expressed in the code. For our cash register example, we can have two rules:

  • The amount of change will always add up to the amount paid minus the price charged.
  • The bills and coins handed back for change are going to start from the biggest bill possible first, down to the smallest coin possible. This could alternatively be defined as trying to hand the customer the smallest amount of individual pieces of money.

Let’s assume we magically encode them into functioning Erlang code . Our test, as a property, could look something like this:

for_all(RegisterMoney, PriceToPay, MoneyPaid) ->
Change = cash(RegisterMoney, PriceToPay, MoneyPaid),
sum(Change) == MoneyPaid - PriceToPay
and
fewest_pieces_possible(RegisterMoney, Change).

Given some amount of money in the register, a price to pay, and an amount of money given by the customer, call the cash/3 function and then check the change given. Here, two rules are being tested.

  1. The cash/3 function is called initially to get the function’s output. Then the sum rule checks that the change balances out.

  2. The fewest_pieces_possible checks that the smallest amount of bills and coins possible is returned.

This property alone is useless. We need a property-based testing framework to make the property functional. The framework should figure out how to generate all the inputs required (RegisterMoney, PriceToPay, and MoneyPaid), and then it should run the property against all the inputs it has generated. If the property always remains true, the test is considered successful.

If one of the test cases fails, a good property-based testing framework will modify the generated input until it can come up with one that can still provoke the failure, but that is as small as possible. This process is called shrinking. For example, the test might find a mistake with a cash register that contains one billion dollars and an order for $100,000. The property-based test could then replicate the same failure in a cash register containing $5 and an order for $00.50. This makes the debugging process easier because our units are smaller, and therefore easier to manipulate and figure out what went wrong.

Example

For example, such a framework could generate inputs giving a call like cash([{1.00, 2}], 1.00, 2.00). We might expect the cash/3 function to return a $1 bill and pass along the denomination. Sooner or later, it would generate an input such as cash([{5.00, 1}], 20.00, 30.00), and then the program would crash and fail the property because there’s not enough change in the register. Paying a $20 purchase with $30, even if the record holds only $5, is entirely possible: take $10 from the $30 and give it back to the customer. Is that specific amount possible to give back, though? In real life, yes. We do it all the time. But in our program, since the money taken from the customer does not come in as bills, coins, or any specific denomination, there is no way to use part of the input money to form the output. The interface chosen for this function is wrong, and so are our tests.

Comparing Approaches

Let’s take a step back and compare example-based tests with property tests for this specific problem. With the former, even if all the examples we had come up with looked reasonable, we easily found ourselves working within the confines of the code and interface we had established. We were not really testing the code; we were describing its design, ensuring it conformed to expectations and demands while ensuring we don’t slip in the future. This is valuable for sure, but properties gave us something more, they highlighted a failure of imagination.

Example-based unit tests made it easy to lock down bugs we could see coming, but those we couldn’t predict were left in place and would probably have made it to production. With properties and a property-based framework, we can explore the problem space more in-depth and find bugs and design issues much earlier. This means that fewer bugs will make it to production.

To put it another way, if example-based testing helps ensure that code does what we expect, property-based testing forces the exploration of the program’s behavior to see what it can or cannot do. This helps us find whether our expectations were even right to begin with. When we test with properties, the design and growth of tests require an equal part of the growth and design of the program itself. A common pattern when a property fails will be to figure out if it’s the system that is wrong or if it’s our idea of what it should do that needs to change. We’ll fix bugs, but we’ll also fix our understanding of the problem space. We’ll be surprised by how things we thought we knew are far more complex and tricky than we thought and how often it happens.

That’s thinking in properties.