Notes

Goodrich Chapter 8 Trees

See Trees for more notes from this chapter.

Properties Revisited

The list of properties of binary trees is stated below but is repeated here for its importance, along with some notes as to where the identities come from.

Properties of general binary trees

There are two properties that relate the number of nodes of a binary tree, n, to the height of a binary tree, h. They both say the same thing, but in different ways:

${\displaystyle \log {(n+1)}-1\leq h\leq n-1}$

${\displaystyle h+1\leq n\leq 2^{h+1}-1}$

These relations come from the fact that, at a given level in a binary tree, the number of nodes is ${\displaystyle 2^{h}}$. Now, the maximum number of nodes that your binary tree could possibly have would be twice the number of external nodes, if you squashed down the whole tree. (Positional argument here, a bit hand-wavey, but okay.) If you were to pick pairs of nodes, and pair up the nodes one by one, you always have 1 left out. Since there are never quite twice the total number of external nodes, we have

${\displaystyle 2(2^{h})-1=2^{h+1}-1}$

which sets a maximum limit on the number of nodes. This relies on a property for complete binary trees that relates ${\displaystyle n_{E}}$ to ${\displaystyle n_{I}}$, ${\displaystyle n_{E}=n_{I}+1}$, which we shall see in a moment.

The other set limits on the number of internal and external nodes:

${\displaystyle 1\leq n_{E}\leq 2^{h}}$

and on the number of internal nodes:

${\displaystyle h\leq nI\leq 2^{h}-1}$

Here again, the lower limits correspond to the worst-case binary tree, which looks like a linked list.

Properties of complete/proper binary trees

For proper or complete binary trees, we have another set of properties that hold:

${\displaystyle \log {(n+1)}-1\leq h\leq {\dfrac {n-1}{2}}}$

or written slightly differently,

${\displaystyle 2h+1\leq n\leq 2^{h+1}-1}$

We also have a relation between the number of internal and external nodes, and limits on them:

${\displaystyle n_{E}=n_{I}+1}$

${\displaystyle h+1\leq n_{E}\leq 2^{h}}$

${\displaystyle h\leq n_{I}\leq 2^{h}-1}$

Binary Trees

A binary tree is an ordered tree that has the properties:

• Every node has max 2 children
• Each child labeled as left or right
• Left has precedence over right

Other definitions:

• Subtree rooted at left or right child of an internal node is the left subtree/right subtree
• Proper binary trees - trees in which each node has either zero or two children
• Improper binary tree - one or more nodes with a single child

Binary Tree Abstract Data Type

• tree.left(p) - returns the position that is the left child of p
• tree.right(p) - returns the position that is the right child of p
• tree.sibling(p) - return the position that represents the sibling of p

Height of a Binary Tree

Note that the height of a binary tree, like the definition of the tree itself, is recursive, with the base case being height = 0:

• If p is a leaf, then the height of p is 0
• Otherwise, the height of p is one more than the maximum of the heights of p's children

Binary Tree Properties

Let T be a non-empty binary tree, and let n, n_E, n_I, and h denote number of nodes, number of external nodes, number of internal nodes, and height of T, respectively.

Then T has the following properties:

${\displaystyle h+1\leq n\leq 2^{h+1}-1}$

${\displaystyle 1\leq n_{E}\leq 2^{h}}$

${\displaystyle h\leq n_{I}\leq 2^{h}-1}$

${\displaystyle \log {(n+1)}-1\leq h\leq n-1}$

The last one can be interpreted, intuitively, as giving us a minimum and maximum height for a binary tree. The minimum height occurs when the tree is arranged as a full or nearly-full binary tree, log(n+1)-1, while the maximum height occurs when items are added to the binary tree in sorted order and there is no rebalancing, in which case you have a string of n nodes.

Proper binary tree properties

If T is a tree that is proper, it has the following properties:

${\displaystyle 2h+1\leq n\leq 2^{h+1}-1}$

${\displaystyle h+1\leq n_{E}\leq 2^{h}}$

${\displaystyle h\leq n_{I}\leq 2^{h}-1}$

${\displaystyle \log {(n+1)}-1\leq h\leq {\dfrac {(n-1)}{2}}}$

Proper binary tree: number of external vs internal nodes

If ${\displaystyle n_{E},n_{I}}$ are the number of external and internal nodes, respectively, here is a proof by induction of the fact:

${\displaystyle n_{E}=n_{I}+1}$

• Justification by removing nodes from T and dividing into two piles, internal node pile and external node pile
• If one node v in T, remove one node and place in external node pile. External node pile has one node, internal node pile is empty.
• Otherwise, T has more than one node. We remove from T an arbitrary external node w and its parent v, which is an internal node. We place w on the external node pile, and v on the internal node pile. If v has a parent u, we reconnect u with the former sibling z of w,and thereby remove one internal node and one external node, leaving the tree as a proper binary tree.
• Eventually we are left with a final tree consisting of a single node.

the relationship above does not hold for improper trees and nonbinary trees.

Depth/Height Operations

For a given node at a given position p in the tree, the depth of the node is defined as the number of ancestors of p, excluding p itself. The height of a position p is defined as 0 if p is a leaf, otherwise it is 1 more than the maximum heights of p's children.

To determine the depth of a particular node, we can use a recursive method that increments a counter parameter for each call:

def depth(self, p):
"""Return number of layers between this node and the root"""
if is_root(p):
return 0
else:
return 1 + depth(parent(p))


The height of a position p in a tree can also be determined recursively, but we must be careful to do this efficiently. For example, if we were to make a list of every node in the tree, and determine the depth of every node separately, we would be reproducing a significant fraction of the work we are doing. So, we do it recursively:

def height(self, p):
"""Return number of layers (max) to get from this node to a leaf"""
if is_leaf(p):
return 0
else:
return 1 + max( height(c) for c in self.children(p) )


Attach/Remove Operations

Attaching and removing a node at a particular position in a binary tree is easier to do if we don't need to maintain the tree in sorted order.

Attachment:

With the attachment operation, we are passing in two trees and a position, and planting these two trees at the given node.

Removal:

When we remove a node from the tree, we have three situations: no children, one child, or two children.

If there are no children, do nothing.

If there is one child, replace the removed node with its child.

If there are two children, throw an error.

Skiena Chapter 3 Data Structures

Question 3-5

Question 3-5) Find the overhead fraction (data space/total space) for each binary tree implementation on n nodes given the following conditions:

• All nodes store data, 2 child pointers, and 1 parent pointer. Data fields are 4 bytes, pointers are 4 bytes.
• Only leaf nodes store data; internal nodes store 2 child pointers. Data field requires 4 bytes, 2 bytes per pointer.

First case:

• Binary tree with n nodes -> n-1 edges
• Child/parent ppointers means 2x edges
• 2(n-1) edges, 2(n-1) pointers
• Alternatively, here's the analysis:

n nodes x (4 bytes of data/node) = 4n bytes data

n nodes x (12 bytes of pointers/node) = 12 n bytes

Total space is 16 n bytes, so overhead fraction is 1/4, i.e., the data space to total space ratio is 1/4

Second case:

• If we have n nodes, we have ~n/2 leaves
• n nodes total x (1 leaf node / 2 nodes) ~ n/2 lleaf nodes

n/2 empty nodes x (2 pointers/1 empty node) x (2 bytes/pointer) = 2n bytes for empty nodes with pointers

n/2 data nodes x (4 bytes/1 empty node) = 2n bytes data

The overhead fraction is thus 1/2: half the bytes are for data, half the bytes are for pointers.

Question 3-6

Question 3-6) Describe how to modify any balanced tree structure such that search, insert, delete, min, max each take the expected amount of time, O(log n), but sucessor and predecessor methods now take O(1) time each. What modifications are required, and to which methods?

The methods:

• Predecessor(D,k) - returns the predecessor key (in order) to the given key k
• Successor(D,k) - returns the successor key to the given key k

Question is, how to we implement neighbor lookup in O(1) time in a balanced tree?

Two options:

• Option one - add a previous and next node pointers. This increases the complexity of the bookkeeping, and of the add/insert/remove methods, which now have to traverse the tree to fix their links.
• Option two - make a data tree that is twice as big, and only holds data in the bottom-most leaf nodes. This obviates the need for a next and previous pointer, because the tree is easy to navigate - up and over... at most log(n) operations to traverse tree.

Option one:

We would need to modify the add() and insert() method, the delete method, no need to modify search or min or max methods.

We modify them to keep track of the previous node. Finding the previous node would be somewhat tricky logic. When we add a new node, find its next and previous nodes, and link them correctly.

Option two:

It would be algorithmically easier (but more expensive and more complicated implementation wise) to use a tree structure with only data in the leaf nodes. Finding/keeping track of next or previous node then doesn't require the extra two pointers and the extra logic of traversing a binary tree, it just requires twice the number of nodes (tree of size N has N/2 leaf nodes).

This also requires implementation of the entire data structure again, and it is not so obvious how you keep a tree structure like that balanced and dynamically resized.

Implementations

Implementation of binary trees proceeds in a few directions:

Implementations of sorted binary trees

The fundamental property of a binary tree is that it hierarchically divides things into two categories, allowing O(h) insert, remove, and search algorithms, while using O(N) space.

To keep the height limited to log N, so that insertion/removal/search takes O(log N) time, we need to ensure that the left and right halves of the tree contain roughly the same number of elements. This is called balancing the binary tree.

The basic operations of a sorted binary tree:

• search (recursively or iteratively)
• insert (maintaining in-order sequence of nodes)
• delete (maintaining in-order sequence of nodes)
• traverse (in-order traversal)
• verification (is it a BST)

Basic utility functions useful in deleting node:

• find_min - find smallest node in a subtree
• replace node in parent - change the node pointers of a node's parents to point to something else

Applications:

• priority queues
• sorting

Implementations/types:

• red black trees
• search trees
• self-balancing search trees
• avl trees
• etc.