Concept Explanations
Learn to quantify the importance of high-level concepts via Testing with Concept Activation Vectors (TCAVs).
Testing with Concept Activation Vectors (TCAVs)
Testing with Concept Activation Vectors (TCAVs) is a new interpretability method to understand which signals our neural network models use for prediction. It shows the importance of high-level concepts (e.g., color, gender, race) for a prediction class—this is how humans communicate!
Typical interpretability methods, like saliency maps, CAMs, counterfactuals, etc., require us to have one particular image we are interested in understanding. TCAV explains whether a particular class of examples is sensitive to certain human-defined concepts. For example, TCAV for a class zebra will try to measure how sensitive the class zebra’s prediction is to the presence of stripes in the input image.
Concept Activation Vectors (CAVs)
TCAVs uses directional derivatives (derivative in a particular direction) to quantify the degree to which a user-defined concept, such as stripes, is important to a classification result such as zebra. The algorithm derives Concept Activation Vectors (CAVs) by training a linear classifier between examples belonging to a concept and random counterexamples.
Mathematically, given a human-defined concept
It then fits a linear classifier
Get hands-on with 1200+ tech skills courses.