Search⌘ K
AI Features

Counterfactual Explanations

Explore counterfactual explanations in image classification to understand how minimal changes to an image can alter AI model predictions. Learn to generate and interpret these explanations using neural networks, which highlight crucial features influencing decisions and provide actionable insights beyond traditional saliency maps.

What are counterfactual explanations?

A counterfactual explanation reveals what should have been different in an instance to observe a diverse outcome. Counterfactual explanations describe a particular situation in the following template: “If X had not occurred, Y would not have occurred.”

For example, consider a machine learning model that approves or rejects a loan proposal based on an applicant’s income. Let’s say the loan was approved for an applicant with an annual income of $50,000 and a good credit history.

A counterfactual explanation to this model decision would look like this: “If the applicant’s annual income had not been $45,000, their loan would be denied.” This tells us that an annual income of $45,000 is necessary for the loan to get approved (even when the applicant has good credit history), explaining why the loan was approved.

Counterfactual explanations
Counterfactual explanations

Therefore, in the case of images, counterfactual explanations are images similar to the original image but with an altered prediction. Analyzing these counterfactuals helps us understand which image pixels/regions are important for prediction. The figure below illustrates a counterfactual explanation for the digit 4.

Counterfactual explanation of the digit 4
Counterfactual explanation of the digit 4

Technically, a counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output. In other words, given an image XX and its actual prediction f(X)f(X), the counterfactual explanation X ...