Trusted answers to developer questions

What is a random forest?

Get Started With Data Science

Learn the fundamentals of Data Science with this free course. Future-proof your career by adding Data Science skills to your toolkit — or prepare to land a job in AI, Machine Learning, or Data Analysis.

Random forest, or random decision forest, is a supervised machine learning algorithm that relies on ensemble learning for classification and regression. It is a flexible, straightforward algorithm that produces excellent results most of the time (even without hyper-parameter tuning). It is also one of the most used algorithms.

Random forest is an ensemble learning method – the ‘forest’ is an ensemble of decision trees. Creating a multitude of decision trees at training time and then taking a mode (classification) or mean (regression) of the output helps to obtain better predictive performance than could be obtained from any of the decision trees alone.

Not sure what decision trees are? Check this shot.

Random forests correct the decision trees’ habit of overfitting to their training set.

Example

Suppose we have a random forest that has been trained to predict the weather.

This random forest will have multiple decision trees that each classify the input data and output a prediction. We then take a majority vote (mode) of the output, and we have our final prediction.

An example of a random forest
An example of a random forest

In the above example, the majority of decision trees predicted the weather as “rainy.” Therefore, our final prediction is “rainy.”
Through this approach, an error made by a few decision trees is covered up by other decision trees.

RELATED TAGS

machine learning
definition
random forest
Copyright ©2024 Educative, Inc. All rights reserved
Did you find this helpful?