ML Bias in Models
Explore how bias is applied in machine learning models.
We'll cover the following...
Machine learning models can unintentionally reinforce societal biases if fairness isn't measured. In this lesson, we'll quantify bias using fairness metrics and explore how they apply to important domains like finance and criminal justice. Let’s get started.
Demographic parity analysis
Suppose you have developed a machine learning classifier for loan approval with features including income, credit score, age, and race. Calculate the demographic parity difference for two protected groups (e.g., male and female applicants) if the overall approval rate is 60%, the approval rate for male applicants is 45%, and the approval rate for female applicants is 70%. Discuss how this metric reveals potential bias in the model and suggest potential mitigation strategies.
import numpy as npfrom scipy.stats import chi2_contingencydef demographic_parity_analysis(total_approval_rate=0.60,female_approval_rate=0.45,male_approval_rate=0.70):#TODO - your implementationreturn {"demographic_parity_difference": ...,"chi2_statistic": ...,"p_value": ...,"bias_threshold": ...,"is_biased": ...}outputs = demographic_parity_analysis()print(outputs)
Sample answer
Demographic parity difference is a fairness metric used to evaluate whether a machine learning model's predictions are equally distributed across different demographic groups. It measures the difference in the positive prediction rates (e.g., approval rates, acceptance rates) between these groups. The goal is to ensure that the model does not favor one group over another. Let's look at a code snippet to implement this:
import numpy as npfrom scipy.stats import chi2_contingencydef demographic_parity_analysis(total_approval_rate=0.60,female_approval_rate=0.45,male_approval_rate=0.70):# Calculate demographic parity differencedemographic_parity_diff = abs(male_approval_rate - female_approval_rate)# Statistical significance testtotal_sample_size = 1000female_sample = np.random.binomial(total_sample_size, female_approval_rate)male_sample = np.random.binomial(total_sample_size, male_approval_rate)# Chi-square test for independencecontingency_table = np.array([[female_sample, total_sample_size - female_sample],[male_sample, total_sample_size - male_sample]])chi2, p_value = chi2_contingency(contingency_table)[:2]return {"demographic_parity_difference": demographic_parity_diff,"chi2_statistic": chi2,"p_value": p_value,"bias_threshold": 0.2,"is_biased": demographic_parity_diff > 0.2}outputs = demographic_parity_analysis()print(outputs)
The function demographic_parity_analysis
analyzes potential bias in model predictions by calculating key metrics like demographic parity difference, and by testing the statistical significance of disparities between groups (male vs. female).
Demographic parity difference:
It calculates the absolute difference in approval rates between male and female groups.
It measures the disparity in outcomes between the groups. A higher value indicates a greater disparity.
Simulated data:
It simulates the sample data using a binomial distribution.
This represents the number of approvals for each group based on their respective approval rates.
Chi-square test: ...