From charlesreid1

See also: Trees and Binary Search Trees

Define the goal

On the Binary Search Trees page we covered the use of a binary tree to store items in a sorted order. In the analysis of binary search trees, adding and removing from a search tree is O(h), where h is the height of the tree. In the worst case, the tree can have all nodes in the left or right subtree, and each node have one child, such that the height is n, and addition/removal is O(n).

Our goal is to design a binary search tree such that we can guarantee O(log N) insertion and deletion.

Recap of other data structures

Let's just recap some other data structures before we do so:

  • Unsorted array: this is definitely out, since this is O(N) insertion and O(N) deletion
  • Sorted array: we can find the insertion index of our new node in O(log N) time, but if we're inserting to the front of the array, it costs O(N) to move all the elements back
  • Linked list: we can perform an O(1) insertion, but actually getting to the insertion index takes O(N) time
  • Stacks/queues: provide O(1) insert/remove, but no random access
  • Priority queues: while these maintain data in a sorted order, like queues, they provide no random access
  • Hash tables: O(1) insertion and deletion, but not in sorted order
  • Sorted hash tables: now we are starting to get closer to a key-value tree, or a structure that maintains key-value pairs in sorted order

Recap of Binary Search Trees

Reviewing what we know about binary search trees:

  • Binary trees are trees with the property that each node has at most two children. (Left precedes right.)
  • A proper binary tree is a binary tree in which each node has 0 or 2 children.
  • A full binary tree is a binary tree in which each level of the tree is completely full.

Properties of binary trees:

  • Height is between log(n+1)-1 and n-1
  • Number of external nodes = number of internal nodes + 1

Performance:

  • O(1) left/right, sibling/child, number of children, parent
  • O(c) to get children(p), where c is number of children
  • O(d + 1) depth of position p (where d is depth of position p)
  • O(1) add root, add left, add right, replace, delete
  • O(N) tree traversal (in order, pre-order, post-order) (breadth-first, depth-first)

Note that binary search trees are not, by default, balanced, so when we say an operation costs O(h), where h is height of binary tree, h can potentially be as large as n or as small as log(n).

Modifying the binary search tree

To modify the binary search tree data structure to achieve guaranteed log N performance, we need a way to ensure the height of the tree is limited to a reasonable value. The way we do this is by defining a set of rebalancing operations that will not change the order of the items in the tree but that will change the relative heights of subtrees of nodes, allowing nodes to be shifted between left and right subtrees.