A binary search tree (BST) is a recursive data structure in which each node has at most two children, and the value of the left child is less than that of the right child according to the chosen property. Recursive means that each node in a binary search tree and its descendants form a binary search tree. A logical outcome of the above definition is that the value of the left child must be less than that of the right child of each node, according to the chosen property.
New York City has a population of 8.8 million and an area of 783 km2, Los Angeles has a population of 3.9 million and an area of 1299 km2, and Chicago has a population of 2.7 million and an area of 600 km2. If a binary search tree is constructed using city names, the tree may look like the following.
If we build the binary search tree using the city area property, it may look like the following.
It is important to note that the structure of a simple binary search tree solely depends upon the sequence in which nodes are inserted in it. If the name property is used and the insertion sequence is Chicago, Los Angeles, and New York, the resulting binary search tree will look like as shown below.
If a binary search tree is built by inserting Honolulu, Huntsville, Lakewood, Boston, Baltimore, Chicago, Newark, and Orlando in sequence, we get the following tree. The numbers next to the city names show the insertion sequence.
If we now insert Pittsburg and Quincy in the above binary search tree it looks like the following.
The above tree is indeed a binary search tree, but the right side looks more like a linked list. The search operation in such trees requires more than
AVL trees solve the problem associated with the binary search trees like the one shown above. AVL trees maintain the balance by performing certain operations on the tree, after nodes are inserted. Let's explore how AVL tree insertion and deletion works.
We define some terms which will be used in the discussion of AVL trees.
The node in a tree which does not have a parent node is called the root of that tree.
A node which does not have any child is called a leaf node.
Two nodes that share the same parent are called siblings.
All nodes from a given node to the root node of a tree are called its ancestors.
All nodes from a given node to leaf nodes, including the leaf nodes, are called its descendants.
The maximum number of edges from any node to a leaf node is called the height of the tree or subtree rooted at that node. The height of the tree shown above is 6. The height of the tree or subtree with Boston as root is 1. The height of the tree or subtree with Newark as root is 3.
The difference between the heights of the right and the left subtrees of a node is called a balance factor of the node.
In an AVL tree the balance factor of each node must be
We start by inserting Honolulu, Huntsville, Lakewood, Boston, Baltimore, Chicago, Newark, and Orlando in sequence, and track how the balance factor changes after each insertion. After inserting Honolulu, the tree looks like this.
Huntsville comes after Honolulu in lexicographic order, so it becomes the right child of Honolulu. New balance factors of both nodes are within the permissible range. Therefore, the tree is still an AVL tree.
Lakewood comes after Huntsville, hence, it becomes the the right child of Huntsville and the tree looks like shown below.
The balance factor of Honolulu is
Treating the arrow from Honolulu to Huntsville as axis of rotation and, keeping Huntsville fixed, perform a counter-clockwise rotation, making Huntsville the root of the tree and Honolulu its left child as shown below. All balance factors are 0 and the tree becomes a valid AVL tree.
Next node to insert is Boston. Boston must become the left child of Honolulu. After the insertion, the tree looks like the following. All balance factors are within the permissible range.
The next node to insert is Baltimore. Baltimore comes before Boston in the lexicographic order, hence, it must become the left child of Boston. After inserting Baltimore, the tree looks like as shown below.
The balance factor of Honolulu is outside the permissible range. So is the balance factor of Huntsville. After inserting a new node, we go up the tree, updating balance factors, and stop at the first node whose balance factor is outside the permissible range. Therefore, we stop at Honolulu and do not go to Huntsville to update its balance factor. That is why there is a question mark after
A balance factor of
Keeping Boston fixed, perform a clockwise rotation to make Honolulu the right child of Boston, and make Boston the left child of Huntsville, as shown below.
It is interesting to note that after performing the clockwise rotation, the balance factor of the node above the axis of rotation, Huntsville in this case, remains unchanged. Before inserting Baltimore, the balance factor of Huntsville was
The next node in the list is Chicago. Chicago comes after Boston but before Honolulu in the lexicographic order. Hence, it must become the left child of Honolulu. After inserting Chicago, the tree looks like as shown below.
The balance factor of
Keeping Boston fixed, we rotate Huntsville clockwise to make it the right child of Boston, which incidentally becomes the new root node. The right subtree of Boston is now orphaned as Huntsville has taken its place. That orphaned subtree naturally becomes the left subtree of Huntsville which is without a left subtree after becoming the right child of Boston. The result is shown below.
This rotation did not work! Before the clockwise rotation, Huntsville had a balance factor of
The solution is simple; make the balance factor of Boston negative. The tree, before the rotation operation, was in the state shown below.
We want to make the balance factor of Boston negative. To accomplish this, we perform a counter-clockwise rotation on the Boston-Honolulu axis as shown below.
Keeping Honolulu fixed, rotate Boston counter-clockwise to become the left child of Honolulu. Chicago is orphaned by this operation and must become the right child of Boston which does not have a right child after rotation. Here is what the resulting tree looks like.
Once the balance factors have the same sign, we perform a clockwise rotation on the Huntsville-Honolulu axis. Honolulu becomes the new root node, and Huntsville becomes its right subtree, as shown below.
The above operation is called a double rotation. A double rotation is required when the nodes on the axis of rotation have balance factors of opposite signs. If balance factors have the same sign, a single clockwise or counter-clockwise rotation is sufficient to bring the balance factor within the permissible range.
The next node to insert is Newark. It comes after Lakewood in the lexicographic order. Hence, it must become the right child of Lakewood, as shown below.
Axis of rotation is Huntsville-Lakewood. Both nodes have positive balance factors. A single counter-clockwise rotation is required. After the counter-clockwise rotation, the tree looks as shown below.
When the balance factor of a node exceeds the permissible range, one of the four possible types of rotations must be performed to bring the balance factor within the permissible range.
We first consider the case in which the balance factor of a node goes from
The balance factor of node
The balance factor of node
We now talk about the case where the new node ends up in subtree
Once again, the node that loses balance is
The axis of rotation to make the balance factor of node
Node
After performing a clockwise rotation on
Since the balance factors of both node
The balance factor of the new root node (node
In the foregoing paragraphs we discussed operations to perform when the balance factor of a node becomes
After the single clockwise rotation, node
If the balance factors have opposite signs, a counter-clockwise rotation followed by a clockwise rotation is required. In the figure shown below, the balance factors of nodes
To make the balance factor of node
We first perform a single counter-clockwise rotation to make the balance factor of node
In the figure above, node
If the balance factor of node C is
Deleting a node from a binary search tree is slightly more involved than insertion. Deleting a leaf node is simple; just update the parent node and we are done. If the node to be deleted has only one subtree, that subtree is attached to the parent of the deleted node and we are done. If, however, the node to be deleted has both left and right subtrees, we have to bring another node in its place. Consider the binary search tree shown below:
Suppose we want to delete the node 70. This node has two subtrees. We want to maintain the binary search property of the tree after the node 70 is deleted. There are only two options. We can replace the node 70 with the smallest value node (leftmost node) of its right subtree or the largest value node (rightmost node) of its left subtree.
The leftmost node of the right subtree has the smallest value in the right subtree and is greater than that of the node we want to delete. The rightmost node of the left subtree has the largest value in the left subtree and is smaller than that of the node we want to delete. The right subtree of node 70 is shown below.
The leftmost node of the above tree is 75. This node can replace the node 70 while maintaining the binary search property of the tree. Moving node 75 to the position of the deleted node 70 will orphan the right subtree of node 75. We attached the right subtree of node 75 with its parent which is node 80, as shown below.
The state of the binary search tree after deleting node 70 and bringing node 75 in its place is shown below.
Instead of replacing the node 70 with the leftmost node of its right subtree, let's replace it with the rightmost node of its left subtree. The left subtree of node 70 is shown below.
The rightmost node of the above tree is node 65. We can replace node 70 with this node and the binary search property of the tree will be maintained. We attached the left subtree of node 65 with its parent, which is node 55, as shown below.
The state of the tree after replacing node 70 with node 65 is shown below.
Let's consider the AVL tree of some three-digit prime numbers shown below. The numbers within the parentheses are the balance factors of the nodes.
When a node is deleted from an AVL tree, the heights and balance factors of all its ancestor nodes may change. If we delete the node 211, the heights and balance factors of all highlighted nodes may change, as shown below.
Before deleting node 211, the balance factor of its parent node 197 was
After the delete operation, if the balance factor of any node goes from
Delete | Balance Factor | Change in Height | Balancing Required |
0 to +1 | None | No | |
0 to -1 | None | No | |
+1 to 0 | Negative | Not at this node, may be up the tree | |
-1 to 0 | Negative | Not at this node, may be up the tree | |
+1 to +2 | Negative | Yes | |
-1 to -2 | Negative | Yes |
Continuing our discussion, let's delete the leaf node 197. The nodes affected by this operation are highlighted below, along with their balance factors. The parent of the deleted node, as a result of delete operation, has become unbalanced.
A counter-clockwise rotation operation must be performed to balance the tree. The axis of rotation is
The two nodes forming the axis of rotation have balance factors of opposite signs. Therefore, a single rotation will not fix the imbalance. A double rotation is required. The first rotation will make the balance factor of child of node 227 positive, and the second one will remove the imbalance of node 227. The axis of the first rotation is
A clockwise rotation makes node 233 the right child of node 227, and node 241 becomes the right child of node 233, as shown below, with the axis of second rotation highlighted.
After the second rotation, which is counter-clockwise rotation about
The height of the right subtree of the root node has reduced and the root node itself has become unbalanced. We have to fix the imbalance. The balance factor of the root node is
Since the balance factors of the nodes forming the axis of rotation have opposite signs, a single rotation will not restore balance. A double rotation is required; a counter-clockwise rotation about
When the counter-clockwise rotation is performed, node 157 becomes the left child of the root node, node 137 becomes the right child of node 157, and the left subtree of node 157 becomes the right subtree of node 137, as shown below.
The new left child of the root node is 157, and the new axis for the second of the two rotations is
Since both nodes of the axis of rotation have balance factors of the same sign, a clockwise rotation will fix the imbalance of the new root node, as shown below.
We have seen that deleting a leaf node caused two double rotations to restore the balance and bring in a new root node. This is how the delete operation is different from the insert operations. When a node becomes unbalanced after an inert operation, one single or one double rotation restores the balance as well as the balance factors of the nodes above that level do not change. In contrast, when a node becomes unbalanced after a delete operation, restoring the balance by performing a single or a double rotation may cause a ripple effect all the way up to the root node.
The two nodes that we deleted above, and balanced the tree afterwards, were leaf nodes. We now consider the case in which a non-leaf node is deleted. We will see the steps required to balance the tree when an internal (non-leaf) node is deleted. Let's consider the AVL tree below.
We want to delete node 227 from the AVL tree shown above. This node has both left and right subtrees. To delete this node, we have to replace it with the rightmost node of its left subtree or the leftmost node of its right subtree. We replace the node to delete with the rightmost node of its left subtree. This is node 211.
Moving node 211 to the position of the deleted node makes node 197 unbalanced, as shown below.
A balance factor of
The two nodes forming the axis of rotation have balance factors of opposite signs. A single rotation would not work. A double rotation is required; a counter-clockwise rotation on
The state of the tree after the counter-clockwise rotation is shown below.
Since both nodes have balance factors of the same sign, a single clockwise rotation will balance the nodes, as shown below.
In the above example, the root node was not affected. Let's look at an example where the root node will also be affected. Consider the AVL tree shown below.
The above AVL tree is a slightly modified version of the one discussed previously. We have added a two-digit prime number to the tree to illustrate delete operation on an internal node. Once again, we delete node 227. After deleting the node 227 and fixing the imbalance, the state of the tree is shown below.
The double rotation operation shown above has made the root node unbalanced. A clockwise rotation is required to fix the imbalance of the root node. The axis of rotation is
Now that we have discussed the insert and the delete operation with examples, it is important to point out that double rotations as discussed above were meant to make it easier to understand the balancing operation when the nodes forming the axis of rotation have balance factors of opposite signs. While implementing this operation on a computer, we only look at the signs of the balance factors and re-arrange all three nodes in one step. We may call this operation a tri-node rearrangement or a tri-node rebalance. We show the operation in the following figure.
Following are the main points of the above discussion about AVL tree insertion and deletion:
Insert operation either leaves the height of a tree and its subtrees unchanged or increases it.
If the height of a tree/subtree remains unchanged, the balance factors of all ancestor nodes remain unchanged.
If the height of a tree/subtree increases, the balance factor of its immediate ancestor changes but the height of its immediate ancestor may or may not change.
The balance factor of a node may change from
If the balance factor of a node becomes
If the balance factor of a node becomes
If the balance factor of a node becomes
If the balance factor of a node becomes
Once a single or a double rotation is performed, the balance factors of all nodes above that level remain unchanged.
The delete operation leaves the height of a tree and its subtrees unchanged or decreases it.
If the height of a tree/subtree remains unchanged, the balance factors of all ancestor nodes remain unchanged.
If the height of a tree/subtree is reduced, the balance factors of, at least some, ancestor nodes change
The balance factor of a node may change from
If the balance factor of a node becomes
If the balance factor of a node becomes
After a single or a double rotation, the height of the tree is reduced, and this may lead to more single or double rotations further up the tree.
Unlike the insert operation, the effect of the delete operation must be checked all the way up to the root node and may bring in a new node at root level.
Explore the following courses from Educative to learn more about AVL trees.
Free Resources