Counterfactual Explanations
Learn about counterfactual explanations that generate images similar to the original image but with an altered prediction.
We'll cover the following...
What are counterfactual explanations?
A counterfactual explanation reveals what should have been different in an instance to observe a diverse outcome. Counterfactual explanations describe a particular situation in the following template: “If X had not occurred, Y would not have occurred.”
For example, consider a machine learning model that approves or rejects a loan proposal based on an applicant’s income. Let’s say the loan was approved for an applicant with an annual income of $50,000 and a good credit history.
A counterfactual explanation to this model decision would look like this: “If the applicant’s annual income had not been $45,000, their loan would be denied.” This tells us that an annual income of $45,000 is necessary for the loan to get approved (even when the applicant has good credit history), explaining why the loan was approved.
Therefore, in the case of images, counterfactual explanations are images similar to the original image but with an altered prediction. Analyzing these counterfactuals helps us understand which image pixels/regions are important for prediction. The figure below illustrates a counterfactual explanation for the digit 4.
Technically, a counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output. In other words, given an image