Search⌘ K
AI Features

Confirm the Integrity of Your Data

Explore methods to confirm the integrity of your data by comparing original and transformed CPI and wages datasets. Understand how different scales and transformations affect your graphs and ensure your analysis reflects real growth trends accurately.

We'll cover the following...

Check the integrity of data

But even before going there, we should confirm that our plots make sense in the context of their data sources. Working with our BLS examples, let’s look at graphs to compare CPI and wages data from both before and after our manipulation. That way, we can ensure that our math (particularly our fake) didn’t skew things too severely.

Here’s what our CPI data looked like when plotted using the raw data:

It’s undoubtedly a dynamic graph, but you can see the gentle upward slope, punctuated by a handful of sudden jumps. Next, we’ll see that same data after removing three out of every four months’ data points. The same ups and downs are still visible. Given our overall goals, we can categorize our transformation as success.

The transformation was more intrusive because we moved from percentages to currency, and misrepresentation risks were more significant. We’ll also need to consider how a percentage will display differently from an absolute value. Here’s the original data:

Note how there’s no consistent curve - either upwards or downwards. That’s because we measure the growth rate as it took place within each quarter, not the growth itself. Now compare that with this line graph of that wage data, now converted to currency-based values:

The gentle curve you see makes sense — it’s about real growth, after all, not growth rates. But it’s also possible to recognize a few spots where the curve steepens and others where it smooths out. But why are the slopes so smooth in comparison with the percentage-based data? Look at the Y-axis labels: the index graph is measured in points between 180 and 280, while the percentage graph goes from 0-3.5. In other words, the scale is different.

We’ve produced a good match for our source data, but next we’ll explore this and other anomalies in our data.

1.

What does CPI’s original data show?

Show Answer
Did you find this helpful?