Gini Impurity

Learn the math used by CART classification trees to define purity vs. impurity.

Impurity intuition

Like all machine learning algorithms, CART classification trees use math to learn from data. Before looking at the calculations used by CART classification trees, it’s helpful to understand the mathematics intuitively.

To keep things simple, consider the Adult Census Income dataset. This dataset is a classification scenario with two possible label values: <=50K and >50K. This scenario is also known as a binary classification scenario.

CART classification trees attempt to split labels into the purest grouping possible. Purity / impurity is a spectrum, as illustrated below:

Get hands-on with 1200+ tech skills courses.