Random Forests

This lesson will focus on training random forest models in Python.

Random Forests

Random Forest, as the name implies, consists of a large number of decision trees that operate as an ensemble for classification. An ensemble is a collection of different predictive models that collectively decide the predicted output. In random forests, each individual tree gives a class as an output. The class with the most votes gets chosen as the final output of the model.

Picture courtesy of towardsdatascience.com
Picture courtesy of towardsdatascience.com

In the above example, we can see that six trees predict class 1 while three predict 0. Class 1 has more votes; hence the final prediction is 1.

How do Random Forests work?

The idea behind random forests is that a large number of uncorrelated decision trees working individually will perform better as a committee than any individual tree.

There are two important words in the above statement.

  • Uncorrelated trees
  • Performing as a committee

Uncorrelated trees

For a random forest model to perform nicely, the individual decision tree models need to have low correlation amongst themselves. Just like investments with low correlations, such as stocks and bonds, combine to form a portfolio greater than the sum of its parts, a random forest can produce predictions better than its ...

Get hands-on with 1400+ tech skills courses.