Search⌘ K
AI Features

Engineering Features for Decision Trees

Explore how to engineer optimal features for decision tree algorithms by combining domain expertise with technical insights. Learn to avoid overfitting by managing categorical features, create new numeric features, and use data visualization to vet feature quality. This lesson helps you understand practical methods to improve decision boundaries through iterative feature engineering, enhancing your ability to build robust decision tree models.

The best features for decision trees

Feature engineering is an iterative, creative process. The best features result from combining business domain knowledge with technical knowledge of the decision tree algorithm. From the algorithmic perspective, the following are critical for engineering the best decision tree features:

  • The best categorical features produce the data’s purest splits (i.e., decision boundaries). This is especially true when multiple categorical features are used simultaneously.

  • Avoid categorical features with many categories / levels (e.g., over 30). The algorithm will prefer features with many levels, which often leads to overfitting.

  • A particular class of many level categorical features has “unique-like” data. Examples include database ID ...