More Bad Input

Now that the from_roman() function works properly with good input, it’s time to fit in the last piece of the puzzle: making it work properly with bad input. That means finding a way to look at a string and determine if it’s a valid Roman numeral. This is inherently more difficult than validating numeric input in the to_roman() function, but you have a powerful tool at your disposal: regular expressions. (If you’re not familiar with regular expressions, now would be a good time to read the regular expressions chapter.)

As you saw in Case Study: Roman Numerals, there are several simple rules for constructing a Roman numeral, using the letters M, D, C, L, X, V, and I. Let’s review the rules:

  • Sometimes characters are additive. I is 1, II is 2, and III is 3. VI is 6 (literally, “5 and 1”), VII is 7, and VIII is 8.
  • The tens characters (I, X, C, and M) can be repeated up to three times. At 4, you need to subtract from the next highest fives character. You can’t represent 4 as IIII; instead, it is represented as IV (“1 less than 5”). 40 is written as XL (“10 less than 50”), 41 as XLI, 42 as XLII, 43 as XLIII, and then 44 as XLIV (“10 less than 50, then 1 less than 5”).
  • Sometimes characters are… the opposite of additive. By putting certain characters before others, you subtract from the final value. For example, at 9, you need to subtract from the next highest tens character: 8 is VIII, but 9 is IX (“1 less than 10”), not VIIII (since the I character can not be repeated four times). 90 is XC, 900 is CM.
  • The fives characters can not be repeated. 10 is always represented as X, never as VV. 100 is always C, never LL.
  • Roman numerals are read left to right, so the order of characters matters very much. DC is 600; CD is a completely different number (400, “100 less than 500”). CI is 101; IC is not even a valid Roman numeral (because you can’t subtract 1 directly from 100; you would need to write it as XCIX, “10 less than 100, then 1 less than 10”).

Thus, one useful test would be to ensure that the from_roman() function should fail when you pass it a string with too many repeated numerals. How many is “too many” depends on the numeral.

Get hands-on with 1200+ tech skills courses.