Analysis of B-Trees

This lesson is about analysis of the B-trees.

We'll cover the following

An estimate about B-trees is that

  1. In the external memory model, the running time of find(x), add(x), and remove(x) in a B-tree is O(logBn)O(\log_Bn).
  2. In the word-RAM model, the running time of find(x) is O(logn)O(\log n) and the running time of add(x) and remove(x) is O(Blogn)O(B\log n).

The following lemma shows that, so far, we have overestimated the number of merge and split operations performed by B-trees.

Lemma 1: Starting with an empty B-tree and performing any sequence of m add(x) and remove(x) operations results in at most 3m/23m/2 splits, merges, and borrows being performed.

Proof: The proof of this has already been sketched for the special case in which B=2B = 2. The lemma can be proven using a credit scheme, in which

  1. each split(), merge(), or borrow() operation is paid for with two credits, i.e., a credit is removed each time one of these operations occurs; and
  2. at most three credits are created during any add(x) or remove(x) operation.

Since at most 3 m credits are ever created and each split, merge, and borrow is paid for with two credits, it follows that at most 3m/23m/2 splits, merges, and borrows are performed. These credits are illustrated using the ¢¢ symbol.

To keep track of these credits the proof maintains the following credit invariant: Any non-root node with B1B − 1 keys stores one credit and any node with 2B12B − 1 keys stores three credits. A node that stores at least BB keys and most 2B22B − 2 keys need not store any credits. What remains is to show that we can maintain the credit invariant and satisfy properties 11 and 22, above, during each add(x) and remove(x) operation.

Adding

The add(x) method does not perform any merges or borrows, so we need only consider split operations that occur as a result of calls to add(x).

Each split operation occurs because a key is added to a node, uu, that already contains 2B12B−1 keys. When this happens, u is split into two nodes, uu' and uu'' having B1B − 1 and BB keys, respectively. Prior to this operation, u was storing 2B12B − 1 keys, and hence three credits. Two of these credits can be used to pay for the split and the other credit can be given to uu' (which has B1B−1 keys) to maintain the credit invariant. Therefore, we can pay for the split and maintain the credit invariant during any split.

The only other modification to nodes that occur during an add(x) operation happens after all splits, if any, are complete. This modification involves adding a new key to some node uu'. If, prior to this, uu' had 2B22B − 2 children, then it now has 2B12B−1 children and must therefore receive three credits. These are the only credits given out by the add(x) method.

Removing

During a call to remove(x), zero or more merges occur and are possibly followed by a single borrow. Each merge occurs because two nodes, v and w, each of which had exactly B1B − 1 keys prior to calling remove(x) were merged into a single node with exactly 2B22B − 2 keys. Each such merge, therefore, frees up two credits that can be used to pay for the merge.

After any merges are performed, at most one borrow operation occurs, after which no further merges or borrows occur. This borrowing operation only occurs if we remove a key from a leaf, v, that has B1B − 1 keys. Therefore, the v node has one credit, and this credit goes towards the cost of the borrowing. This single credit is not enough to pay for the borrowing, so we create one credit to complete the payment.

At this point, we have created one credit and we still need to show that the credit invariant can be maintained. In the worst case, v’s sibling, w, has exactly BB keys before the borrow so that, afterward, both v and w have B1B − 1 keys. This means that v and w each should be storing a credit when the operation is complete. Therefore, in this case, we create an additional two credits to give to v and w. Since a borrow happens at most once during a remove(x) operation, this means that we create at most three credits, as required.

If the remove(x) operation does not include a borrow operation, this is because it finishes by removing a key from some node that, prior to the operation, had BB or more keys. In the worst case, this node had exactly BB keys, so that it now has B1B−1 keys and must be given one credit, which we create.

In either case—whether the removal finishes with a borrow operation or not—at most three credits need to be created during a call to remove(x) to maintain the credit invariant and pay for all borrows and merges that occur. This completes the proof of the lemma.

The purpose of Lemma 1 is to show that, in the word-RAM model the cost of splits, merges, and joins during a sequence of mm add(x) and remove(x) operations is only O(Bm)O(Bm). That is, the amortized cost per operation is only O(B)O(B), so the amortized cost of add(x) and remove(x) in the word-RAM model is O(B+logn)O(B + \log n). This is summarized by the following pair of theorems:

Theorem 1 (external memory B-trees): A BTree implements the SSet interface. In the external memory model, a BTree supports the operations add(x), remove(x), and find(x) in O(logBn)O(\log_Bn) time per operation

Theorem 2 (Word-RAM B-Trees): A BTree implements the SSet interface. In the word-RAM model, and ignoring the cost of splits, merges, and borrows, a BTree supports the operations add(x), remove(x), and find(x) in O(logn)O(\log n) time per operation. Furthermore, beginning with an empty BTree, any sequence of mm add(x) and remove(x) operations results in a total of O(Bm)O(Bm) time spent performing splits, merges, and borrows.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy