Concept Explanations

Explore Testing with Concept Activation Vectors (TCAV) to understand how neural networks use human-defined concepts for predictions. This lesson demonstrates how TCAV quantifies conceptual sensitivity, helping you interpret image classification models in a way similar to human reasoning. You will learn the implementation and evaluation of TCAV to identify important features influencing model decisions.

We'll cover the following...

Testing with Concept Activation Vectors (TCAVs)
- Concept Activation Vectors (CAVs)
- Conceptual sensitivity
Implementation
Concept explanations vs. saliency maps

Testing with Concept Activation Vectors (TCAVs)

Testing with Concept Activation Vectors (TCAVs) is a new interpretability method to understand which signals our neural network models use for prediction. It shows the importance of high-level concepts (e.g., color, gender, race) for a prediction class—this is how humans communicate!

Typical interpretability methods, like saliency maps, CAMs, counterfactuals, etc., require us to have one particular image we are interested in understanding. TCAV explains whether a particular class of examples is sensitive to certain human-defined concepts. For example, TCAV for a class zebra will try to measure how sensitive the class zebra’s prediction is to the presence of stripes in the input image.

Concept Activation Vectors (CAVs)

TCAVs uses directional derivatives (derivative in a particular direction) to quantify the degree to which a user-defined concept, such as stripes, is important to a classification result such as zebra. The algorithm derives Concept Activation Vectors (CAVs) by training a linear classifier between examples belonging to a concept and random counterexamples.

Mathematically, given a human-defined concept $C$ , such as striped textures, TCAV receives a set of positive examples $P_C = \{ X_1, X_2, ..., X_N \}$ (e.g., photos of striped objects) and negative examples $N_C = \{X_1, X_2, ..., X_M\}$ ...

1.Introduction to Explainable AI

Project

2.Saliency Maps

3.Class Activation Maps

4.Miscellaneous Methods

5.Metrics of Interpretability

Assessment

Mini Project

Concept Explanations

Testing with Concept Activation Vectors (TCAVs)

Concept Activation Vectors (CAVs)