Introduction to Data Bias

Explore the concept of data bias in machine learning and its impact on model performance. Learn to identify skewed data representation and feature selection bias, and understand how these biases increase risks and errors in ML pipelines.

We'll cover the following...

Misrepresentation in data
Feature selection bias

While many of the pre-pipeline biases are not directly observed or created by data scientists, it’s important to be conscious of where and under what conditions data is sourced. In this lesson, we focus primarily on data bias.

Defined simply, data bias is a skew or tendency in the data that leads a model to make potentially erroneous conclusions. In other words, it’s a property of a dataset that greatly increases ML risk downstream in the pipeline. Data bias is a general phenomenon that doesn’t necessarily relate to discrimination, but some of the most famous cases of data bias in the media come from improperly sourced sets that lead to discriminatory models.

Misrepresentation in data

The most ...

1.Introduction

2.Disasters in Data

3.Disasters in Models

Project

4.Alternatives to Traditional ML

Project

5.Conclusion

Assessment

Introduction to Data Bias

Misrepresentation in data