Search⌘ K

Optimal Binary Search Trees

Explore the construction of optimal binary search trees that minimize total search time by combining recursive backtracking with divide-and-conquer. Understand how to assign tree roots to reduce search costs based on access frequencies, and analyze the exponential time complexity of a backtracking solution before optimizing it using dynamic programming techniques.

Introduction to optimal binary search trees

Our final example combines recursive backtracking with the divide-and-conquer strategy. Recall that the running time for a successful search in a binary search tree is proportional to the number of ancestors of the target node. As a result, the worst-case search time is proportional to the depth of the tree. Thus, to minimize the worst-case search time, the height of the tree should be as small as possible; by this metric, the ideal tree is perfectly balanced.

A simple binary search tree
A simple binary search tree

In many applications of binary search trees, however, it is more important to minimize the total cost of several searches rather than the worst-case cost of a single search. If xx is a more frequent search target than yy, we can save time by building a tree where the depth of xx is smaller than the depth of yy, even if that means increasing the overall depth of the tree. A perfectly balanced tree is not the best choice if some items are significantly more popular than others. In fact, a totally unbalanced tree with depth Ω(n)Ω(n) might actually be the best choice!

Problem statement

This situation suggests the following problem. Suppose we are given a sorted array of keys A[1..n]A[1 .. n] and an array of corresponding access frequencies f[1..n]f [1 .. n]. Our task is to build the binary search tree that minimizes the total search time, assuming that there will be exactly f[i]f [i] searches for each key A[i]A[i].

Recurrence relation for optimal search tree

Before we think about how to solve this problem, we should first come up with a good recursive definition of the function we are trying to optimize! Suppose we are also given a binary search tree TT with nn nodes. Let v1,v2,...,vnv_1, v_2, . . . , v_n be the nodes of TT, indexed in sorted order, so that each node viv_i stores the corresponding key A[i]A[i]. Then ignoring constant factors, the total cost of performing all the binary searches is given by the following expression:

Cost(T,f[1..n]):=ni=1f[i]. #ancestors of vi in T.Cost(T,f[1..n]):= \underset{i=1}{\overset{n}{\sum}}f[i]. \space \text{\#ancestors of}\space v_i\space \text{in T}.

Now suppose vrv_r is the root of TT; by definition, vrv_r is an ancestor of every node in TT. If i<ri < r, then all ancestors of ...