Search⌘ K
AI Features

Introduction to Data Bias

Explore the concept of data bias in machine learning and its impact on model performance. Learn to identify skewed data representation and feature selection bias, and understand how these biases increase risks and errors in ML pipelines.

While many of the pre-pipeline biases are not directly observed or created by data scientists, it’s important to be conscious of where and under what conditions data is sourced. In this lesson, we focus primarily on data bias.

Defined simply, data bias is a skew or tendency in the data that leads a model to make potentially erroneous conclusions. In other words, it’s a property of a dataset that greatly increases ML risk downstream in the pipeline. Data bias is a general phenomenon that doesn’t necessarily relate to discrimination, but some of the most famous cases of data bias in the media come from improperly sourced sets that lead to discriminatory models.

Misrepresentation in data

The most ...