The Misuse and Proper Use of Metrics

Learn how metrics can be gamified and how we can create robust metrics to cater to such gamification.

When software developers first looked for ways to measure their success, they measured “lines of code” (LOC) or “thousand lines of code” (kLOC) as projects got bigger. The theory was pretty straightforward—if you had a highly productive developer, they would be cranking out some phenomenal amounts of code. Given that most of the time developers write one expression per line of code (obviously complex conditionals consist of more than one expression, but then again a closing scope-block brace doesn’t have any expressions, so it sort of evens out), LOC seemed like a pretty easy way to measure which developers were actively contributing to the project’s success, and which weren’t.

Developers are a pretty analytical and creative lot, and it didn’t take long before people figured out that if they are held accountable to metrics—and only to the metric—then the best thing to do is to optimize in favor of the metric. Pay me by the kLOC? I’ll use code generators to create lots and lots of code. Pay me by the number of features shipped? I’ll churn out feature after feature and never worry about bug fixes. Pay me by the number of bugs fixed? I’ll not only focus entirely on bugfixes (and the easiest ones at that), but I’ll look to populate the bug tracker with new bugs that are quickly fixed—and maybe even slip a few bugs in during development so I can find them and fix them. (The last example may seem a bit dark, but I’ve been at an organization that took the “bug bounty” approach—and I watched one of my colleagues do exactly that.)

Goodhart’s Law: In 1975, British economist Charles Goodhart said, “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” Later, in 1997, anthropologist Marilyn Strathern generalized Goodhart’s phrase to read, “When a measure becomes a target, it ceases to be a good measure,” which is the common form most people use when quoting “Goodhart’s Law.” A corollary to this is Campbell’s Law, which reads “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” In other words, if you offer me a large enough cash bonus for fixing bugs, I’ll discover very quickly how lucrative it can be to put some bugs into the code just so I can cash in the bonus when I fix them. Monitor me by the number of GitHub commits, and I’ll commit every keystroke. Judge me by lines of code, and I’ll make sure every method is at least a page long.

Both Goodhart’s Law and Campbell’s Law highlight the realization that if we create performance targets that set a particular target metric, then pressure (implicit and explicit) will build to hit the metric—and ignore the larger point the metric was trying to accomplish. Setting the metric for lines of code means developers will optimize to producing lots of code, not shipping features. Setting the metric for bug counts means developers will optimize to reducing bugs (such as by not shipping features at all!), not improving overall quality.

On the other end of this spectrum, we have the “no metrics” crowd, the favorite of many on social media. Cruise LinkedIn for any length of time and you’ll hear folks advocating the tempting idea to discard metrics entirely in favor of a more “human” approach: Just ask the client/customer, “Are you happy?” If the customer says “yes,” we’re doing great! If the customer says “no,” we need to improve! But this approach is often only practical in small organizations, with a close-by trusted (and trustable) customer, alongside a deep intuitive understanding of where the blockers and obstacles to improvement are. Without those preconditions, this “human” approach will break down quickly, as the team will struggle to identify what to improve when the customer says “no.” (Sadly, the customer can’t always be expected to know what they want to change. Remember Henry Ford’s popular quip: “If I had asked people what they wanted, they would have said faster horses.”)

When told to improve the performance of an application, software developers will be the first to demand access to the production environment in order to put counters and instrumentation in, so as to know where the bottlenecks are. Metrics for a development team work the same way: Unless we have some kind of tracking to know where the bottlenecks are in our software development process, there’s no way for the team or its members to get better at what they do (except by luck or accident).

The business case for metrics: By their very nature, metrics are numbers, and therefore (relatively) easy to quantify and compare. One important aspect of metrics that often goes underdiscussed, owing to the sensitive nature of the topic, is that metrics can help “objectify” the discussion of performance and skill, which can be very helpful when looking to address DEI (Diversity, Equality, and Inclusion) targets and/or shortcomings at the organization. When metrics are available to help judge the quality of work put forth by an employee, and when those metrics are used equitably to gauge promotions or performance improvement plans (PIPs), the organization becomes overall a fairer place to work—and grow.

Some of the benefits of intelligent metrics include:

  1. Accountability. How do you know you’re holding your employees accountable equally? You will inevitably find that you “click” better with certain members of your team. How do you know you’re not exhibiting some subtle favoritism? Now carry this through the rest of the organization—when the performance-improvement budget is tight, how do you justify your employee deserves the raise or the promotion against some other manager’s choice? Metrics, used appropriately, can provide a degree of objectivity to the conversation that would be difficult to obtain otherwise.
  2. Comparatively. How do you know where your team sits in relationship to other teams within your company? Or other teams within the industry? Granted, apples-to-apples comparisons are always going to be tricky, but too often groups operate inside of an echo chamber, convincing themselves that “we’re an elite team” and never actually doing a comparison with other teams to find out just how good (or bad) they really are. Metrics can help bring a healthy dose of realism into the conversation.
  3. Improvability. How can we tell if we’re getting better? By tracking the metrics over time, we can begin to infer a sense of improvement or struggle, and use that to help guide decisions and policies. (Alternatively, if the company puts a large amount of effort into an area, and metrics don’t show any sort of effect, it suggests that either the metric is not the right metric, or perhaps something is diluting the effectiveness of the effort, or a number of other possibilities.)

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.