# Analysis of B-Trees

This lesson is about amortized analysis of B-trees.

We'll cover the following

An estimate about $B$-trees is that

1. In the external memory model, the running time of find(x), add(x), and remove(x) in a B-tree is $O(\log_Bn)$.
2. In the word-RAM model, the running time of find(x) is $O(\log n)$ and the running time of add(x) and remove(x) is $O(B\log n)$.

The following lemma shows that, so far, we have overestimated the number of merge and split operations performed by B-trees.

Lemma 1: Starting with an empty $B$-tree and performing any sequence of m add(x) and remove(x) operations results in at most $3m/2$ splits, merges, and borrows being performed.

Proof: The proof of this has already been sketched for the special case in which $B = 2$. The lemma can be proven using a credit scheme, in which

1. each split(), merge(), or borrow() operation is paid for with two credits, i.e., a credit is removed each time one of these operations occurs; and
2. at most three credits are created during any add(x) or remove(x) operation.

Since at most $3m$ credits are ever created and each split, merge, and borrow is paid for with with two credits, it follows that at most $3m/2$ splits, merges, and borrows are performed. These credits are illustrated using the $¢$ symbol.

To keep track of these credits the proof maintains the following credit invariant: Any non-root node with $B − 1$ keys stores one credit and any node with $2B − 1$ keys stores three credits. A node that stores at least $B$ keys and most $2B − 2$ keys need not store any credits. What remains is to show that we can maintain the credit invariant and satisfy properties $1$ and $2$, above, during each add(x) and remove(x) operation.

The add(x) method does not perform any merges or borrows, so we need only consider split operations that occur as a result of calls to add(x).

Each split operation occurs because a key is added to a node, $u$, that already contains $2B − 1$ keys. When this happens, u is split into two nodes, $u'$ and $u''$ having $B − 1$ and $B$ keys, respectively. Prior to this operation, u was storing 2B − 1 keys, and hence three credits. Two of these credits can be used to pay for the split and the other credit can be given to $u'$ (which has $B−1$ keys) to maintain the credit invariant. Therefore, we can pay for the split and maintain the credit invariant during any split.

The only other modification to nodes that occur during an add(x) operation happens after all splits, if any, are complete. This modification involves adding a new key to some node $u'$. If, prior to this, $u'$ had $2B − 2$ children, then it now has $2B−1$ children and must therefore receive three credits. These are the only credits given out by the add(x) method.

## Removing

During a call to remove(x), zero or more merges occur and are possibly followed by a single borrow. Each merge occurs because two nodes, v and w, each of which had exactly B − 1 keys prior to calling remove(x) were merged into a single node with exactly 2B − 2 keys. Each such merge therefore frees up two credits that can be used to pay for the merge.

After any merges are performed, at most one borrow operation occurs, after which no further merges or borrows occur. This borrow operation only occurs if we remove a key from a leaf, v, that has B − 1 keys. The node v therefore has one credit, and this credit goes towards the cost of the borrow. This single credit is not enough to pay for the borrow, so we create one credit to complete the payment.

At this point, we have created one credit and we still need to show that the credit invariant can be maintained. In the worst case, v’s sibling, w, has exactly $B$ keys before the borrow so that, afterwards, both v and $w$ have $B − 1$ keys. This means that v and w each should be storing a credit when the operation is complete. Therefore, in this case, we create an additional two credits to give to v and w. Since a borrow happens at most once during a remove(x) operation, this means that we create at most three credits, as required.

If the remove(x) operation does not include a borrow operation, this is because it finishes by removing a key from some node that, prior to the operation, had $B$ or more keys. In the worst case, this node had exactly $B$ keys, so that it now has B−1 keys and must be given one credit, which we create.

In either case—whether the removal finishes with a borrow operation or not—at most three credits need to be created during a call to remove(x) to maintain the credit invariant and pay for all borrows and merges that occur. This completes the proof of the lemma.

The purpose of Lemma 1 is to show that, in the word-RAM model the cost of splits, merges and joins during a sequence of $m$ add(x) and remove(x) operations is only $O(Bm)$. That is, the amortized cost per operation is only $O(B)$, so the amortized cost of add(x) and remove(x) in the word-RAM model is $O(B + \log n)$. This is summarized by the following pair of theorems:

Theorem 1: (External Memory $B$-Trees). A BTree implements the SSet interface. In the external memory model, a BTree supports the operations add(x), remove(x), and find(x) in $O(\log_Bn)$ time per operation.

Theorem 2: (Word-RAM $B$-Trees). A BTree implements the SSet interface. In the word-RAM model, and ignoring the cost of splits, merges, and borrows, a BTree supports the operations add(x), remove(x), and find(x) in $O(\log n)$ time per operation. Furthermore, beginning with an empty BTree, any sequence of $m$ add(x) and remove(x) operations results in a total of $O(Bm)$ time spent performing splits, merges, and borrows.