Data Quality Measurement

Data engineering is all about delivering high-quality data to the right people at the right time. High-quality data is essential for making accurate and reliable decisions. Poor data quality can lead to poor business decisions, which can lead to lost revenue, decreased customer satisfaction, increased costs, and damaged reputation. So, what does high-quality data mean, why is it important, and how to evaluate and measure it?

What is bad data?

One effective approach to grasping the concept of quality data is to consider its opposite: what is bad data? Which types of data consistently get complaints from stakeholders? Let's look at a few real-life examples:

  • Data accuracy: For example, in the computation of net revenue, there's a risk of overlooking a specific cost type, resulting in an inaccurately calculated revenue figure.

  • Data freshness: Stakeholders engage in daily analyses relying on the previous day's figures. If they discover that the numbers haven't been updated, frustration ensues, causing a delay in their decision-making process.

  • Breaking schema changes: Data users encounter challenges when a column or table has been deleted or renamed, disrupting the functionality of their scripts.

  • Column description: The column names or descriptions are not descriptive enough for users to comprehend their meaning effectively.

  • Data duplication: A bug in the SQL logic can cause duplicates.

  • Data availability: The table or the database is not reachable.

The list can go on and on, but we can summarize the above examples into the following dimensions of data quality:

Data quality dimensions

Data quality dimensions can be broadly grouped into two categories: business dimensions and technical dimensions.

Business dimensions

Get hands-on with 1200+ tech skills courses.