Search⌘ K
AI Features

Inspecting the Opaque Models with LIME and SHAP

Explore how to use LIME and SHAP to interpret complex AI models that are opaque by nature. This lesson guides you through local explanations of specific predictions with LIME and global feature importance analyses with SHAP. You'll understand how these model-agnostic tools help detect bias, ensure fairness, and support accountability in high-stakes AI applications. Gain practical experience auditing models for transparency and fairness, preparing you to document findings for AI safety cases.

Complex models like deep neural networks (DNNs) and large language models (LLMs) operate as opaque models. Their decisions are based on billions of intertwined parameters, making their internal reasoning opaque, even to the engineers who built them.

Why is the model opaqueness a safety problem?

  • Debugging malfunctions: When a model makes a mistake, the inability to trace why the error occurred prevents engineers from writing a fix and improving reliability.

  • Auditing for fairness: Algorithmic bias is a form of unintentional harm (a malfunction). We can’t fix unintentional bias if we can’t find its source. If a model denies a loan, we must verify if a sensitive feature contributed the most to the negative decision.

  • Building trust and accountability: Stakeholders (regulators, users) cannot trust a decision they do not understand. In high-stakes domains (finance, healthcare), Explainability (XAI) is essential for establishing accountability and legal compliance.

The solution: Model-agnostic explainability (XAI)

The goal is to use model-agnostic tools to treat the complex model as a true opaque model: we only feed it inputs and observe its outputs. By repeatedly probing the inputs, these tools approximate the model’s behavior and generate simple, human-understandable explanations.

We will implement the two industry-standard methods:

  1. LIME (Local Interpretable Model-agnostic Explanations): The tool for local explanations, focusing on why a single prediction was made.

  2. SHAP (SHapley Additive exPlanations): The tool for global explanations, focusing on which features matter most across the entire dataset.

LIME for local accountability

LIME is our primary tool for achieving accountability by asking: "Why did the model make this specific decision for this person?"

The core idea is to explain a model’s single prediction by building a simple, understandable linear model that approximates the complex model’s behavior only in the area immediately surrounding that one prediction.

Step 1: Setup and training the opaque model

Before we can explain a model, we need a complex model to explain. We use an XGBoost Regressor trained on tabular data to simulate a high-stakes prediction scenario, such as credit risk or loan approval.

Instead of loading a real large dataset such as Ames Housing Datasethttps://www.kaggle.com/datasets/shashanknecrothapa/ames-housing-dataset, we construct a small, synthetic dataset to mimic key features found in housing or loan risk models. This ensures the code is executable everywhere. The features are intentionally designed to demonstrate how a sensitive attribute drives a decision:

  • Financial/quality drivers: OverallQual, GrLivArea, GarageArea (Factors expected to drive price).

  • Proxy sensitive attribute: CreditScore_Low (A binary feature, 1 for low score, 0 for high, designed to negatively impact the price).

  • Target (SalePrice): The price is calculated to be positively correlated with quality and area, but negatively correlated with CreditScore_Low.

Python 3.10.4
np.random.seed(42)
data = pd.DataFrame({
'OverallQual': np.random.randint(4, 10, 100),
'GrLivArea': np.random.randint(1000, 3000, 100),
'GarageArea': np.random.randint(0, 800, 100),
'Neighborhood_East': np.random.randint(0, 2, 100),
'Neighborhood_West': np.random.randint(0, 2, 100),
'CreditScore_Low': np.random.randint(0, 2, 100),
})
# Target is negatively correlated with CreditScore_Low
data['SalePrice'] = (
50000 +
data['OverallQual'] * 15000 +
data['GrLivArea'] * 50 +
data['GarageArea'] * 50 -
data['CreditScore_Low'] * 30000
)
  • Line 1: We set the random seed to ensure reproducibility. This guarantees that every student gets the exact same random numbers, so their LIME and SHAP plots will look identical to the course examples.

  • Lines 2–9: Feature engineering. We create a dataframe with 100 synthetic houses.

    • OverallQual, GrLivArea, GarageArea: Standard value drivers (quality and size) that should increase the price.

    • CreditScore_Low: The sensitive attribute. We create a binary flag (0 or 1) representing a high-risk borrower. This will be the source of the bias we want to detect.

  • Lines 11–17: The biased formula. We manually define the SalePrice logic.

    • Positive drivers: We add value for quality (* 15000) and size (* 50).

    • The bias: We explicitly subtract $30,000 if CreditScore_Low is 1. This hard-codes a penalty into the data itself. When we train the opaque model on this data later, it will learn this bias, giving LIME and SHAP something specific to discover.

We train our opaque model (XGBoost) on the above synthetic data.

features = [
'OverallQual', 'GrLivArea', 'GarageArea',
'Neighborhood_East', 'Neighborhood_West', 'CreditScore_Low'
]
target = 'SalePrice'
X = data[features]
y = data[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train_np = X_train.values
xgb_model = XGBRegressor(n_estimators=100, random_state=42, eval_metric='rmse')
xgb_model.fit(X_train_np, y_train)
Dataset splitting and model training
  • Lines 1–7: Data preparation. We define our features list (the inputs) and the target ("SalePrice"). We then separate the dataset into X (predictors) and y (the answer key).

  • Lines 9–10: Train/test split.

    • train_test_split: We hold back 20% of the data (test_size=0.2) to evaluate the model later. This simulates unseen data.

    • X_train.values: Critical step. We convert the training data from a pandas DataFrame to a standard numpy array. LIME ...