An Introductory Guide to Data Science and Machine Learning/

...

Decision Trees

Decision Trees algorithms are versatile and easy to understand models in Machine Learning. It makes a model by learning decision rules from the underlying dataset. You will learn more in this lesson.

We'll cover the following...

Decision Trees

Decision Tree algorithms

ID3
C4.5
C5.0
CART

CART (Classification and Regression Trees)

Induction and pruning

Pros and cons

Advantages
Disadvantages

Implementation in Python

Decision Trees

Decision Trees are powerful and provide an output that domain experts and practitioners can easily understand. Decision Trees provide the basis for many Ensemble Methods, which involve using multiple models for inference and producing the output for the datasets at hand.

Decision Tree, as the name suggests, is constructed in a Tree manner, including a root node, internal nodes, and leaf nodes. Leaf nodes, also known as terminal nodes, give us the class of the instances falling in that terminal node, and the goal is to have homogeneous terminal nodes. Root Node refers to all the instances in the dataset. Interior nodes partition the set of instances. Once created, a tree can be navigated with a new row of data following each branch with the splits until a final prediction is made.

The above Decision Tree distinguishes between males and females. The Decision Tree displayed above can be represented in the form of if statements as seen below.

    If Height > 180 cm Then Male
    If Height <= 180 cm AND Weight > 80 kg Then Male
    If Height <= 180 cm AND Weight <= 80 kg Then Female

Decision Tree algorithms

ID3

ID3 stands for the Iterative Dichotomiser 3 algorithm. It was developed by Ross Quinlan in 1986, and it was a predecessor of algorithms like C4.5. The algorithm works by finding categorical features for each node in the tree, which gives us the maximum information gain for the categorical targets. Trees grow to their maximum size, exhausting all the features. We perform the pruning step in Decision Trees to help the Trees generalize well on the unseen dataset.

C4.5

The C4.5 algorithm is a successor to the ID3 algorithm. It removed the restriction that features should be categorical to build the tree, unlike the ID3 algorithm. This algorithm is used by partitioning the continuous ...

What is Data Science ?

Applications of Data Science

Overview of Libraries

Probability and Statistics

Machine Learning Part-1

Machine Learning Part-2

Machine Learning Part-3

Deep Learning

Machine Learning Tools and Libraries

Big Data Tools and Technologies

Where to go next ?

Decision Trees

Decision Trees

Decision Tree algorithms

ID3

C4.5