Data Science and Machine Learning Interview Handbook/

...

ML Bias in Models

Explore how bias is applied in machine learning models.

We'll cover the following...

Demographic parity analysis
Sample answer
Equalized odds evaluation
Sample answer

Machine learning models can unintentionally reinforce societal biases if fairness isn't measured. In this lesson, we'll quantify bias using fairness metrics and explore how they apply to important domains like finance and criminal justice. Let’s get started.

Demographic parity analysis

Suppose you have developed a machine learning classifier for loan approval with features including income, credit score, age, and race. Calculate the demographic parity difference for two protected groups (e.g., male and female applicants) if the overall approval rate is 60%, the approval rate for male applicants is 45%, and the approval rate for female applicants is 70%. Discuss how this metric reveals potential bias in the model and suggest potential mitigation strategies.

Press + to interact

Python 3.10.4

import numpy as np
from scipy.stats import chi2_contingency

def demographic_parity_analysis(
    total_approval_rate=0.60, 
    female_approval_rate=0.45, 
    male_approval_rate=0.70
):
    # Calculate demographic parity difference
    demographic_parity_diff = abs(male_approval_rate - female_approval_rate)
    
    # Statistical significance test
    total_sample_size = 1000
    female_sample = np.random.binomial(total_sample_size, female_approval_rate)
    male_sample = np.random.binomial(total_sample_size, male_approval_rate)
    
    # Chi-square test for independence
    contingency_table = np.array([
        [female_sample, total_sample_size - female_sample],
        [male_sample, total_sample_size - male_sample]
    ])
    chi2, p_value = chi2_contingency(contingency_table)[:2]
    
    return {
        "demographic_parity_difference": demographic_parity_diff,
        "chi2_statistic": chi2,
        "p_value": p_value,
        "bias_threshold": 0.2,
        "is_biased": demographic_parity_diff > 0.2
    }
    
outputs = demographic_parity_analysis()
print(outputs)

The function demographic_parity_analysis analyzes potential bias in model predictions by calculating key metrics like demographic parity difference, and by testing the statistical significance of disparities between groups (male vs. female).

Demographic parity difference:
1. It calculates the absolute difference in approval rates between male and female groups.
2. It measures the disparity in outcomes between the groups. A higher value indicates a greater disparity.
Simulated data:
1. It simulates the sample data using a binomial distribution.
2. This represents the number of approvals for each group based on their respective approval rates.
Chi-square test: ...

Getting Started

Handling Diverse Real-World Data

Preparing and Transforming Data for Machine Learning Pipelines

Understanding Supervised Learning Algorithms

Understanding Unsupervised Learning Algorithms

Advanced Machine Learning Concepts

ML Applications and Deployment in the Real World

Responsible Machine Learning: Ethics, Fairness, and Privacy

ML Interview Preparation and Case Studies

ML Bias in Models

Demographic parity analysis

Sample answer