Heaps are data structures with weak sorting - that is, the item at the root node is always guaranteed to be the minimum element, but the only invariant in a heap is that any child node must be smaller than its parent.

Heaps are used in priority queues to keep items in a sorted order. But heaps are also useful for general sorting purposes. The heap sort algorithm starts by constructing a heap (tree data structure) and linking each heap node to a particular element in the original data. Once the heap is constructed, the algorithm removes items one at a time from the root node, and puts them in their final sorted position.

Top-Down Heap Construction

There are several ways to construct a heap. The "normal" or naive way is to walk through the array of elements, adding each node to the last open position in the heap.

(Note that heaps fill in nodes in a left-to-right, top-to-bottom manner.)

If we add an item to the last open position in a tree, and it is smaller than its parent, we simply swap that node with its parent.

This takes O(N log N) time.

However, we can improve the performance of this top-down construction process, by utilizing the fact that we are inserting a node into a sorted tree. This is still O(N log N), but in reality is more like O(N).

Fast, Bottom-Up Heap Construction

We can construct things faster, from O(N log N) to O(N), by utilizing a bottom-up construction process.

Imagine that we are walking through an array of data, creating nodes for each element, and adding each node to a heap.

If we follow the top-down heap construction, each item that we add to the heap will need to be downheap bubbled, and we could potentially end up downheap bubbling every single item, for a total cost of O(log N) per node, N nodes, for O(N log N). This is the top-down construction process.

To improve on that, we can limit the distance that each node will downheap bubble by working our way from the bottom up. In this way, we know that all sub-heaps of the inserted node will already be heap-ordered.

We don't want to begin with the LAST open position, because if we add a new node to the last open position, we know nothing whatsoever about the last level of the heap, and the node will definitely need to bubble up the heap - potentially the entire height of the tree.

Instead, we want to add the new node to the root position of an already-sorted subtree. This invokes the downheap bubble algorithm, but we know that when we get several steps in, we are adding a new root node to a heap that is already heap-ordered.

It is a subtle difference, but here's why this works: the maximum distance an element can possibly be moved is proportional to the distance from the insertion point to the point in the heap. This makes the runtime for a single insert O(1), because we don't have to bubble it all the way down. If runtime is proportional to the sum of the heights, then for a heap with N nodes, this is O(N).

See the "heapify" section of the Wikipedia "Heapsort" algorithm.

public static<T> void makeHeap(T[] arr, Comparator<T> comp) { 
    for(int i=arr.length/2; i>=0; i--) { 
        percolateDown(arr, comp, arr.length, i);
    }
}

public static<T> void perclateDown(T[] arr, Comparator<T> comp, int heapSize, int i) { 
    ...
}

Heap Sort Pseudocode

Flags

Heap Sort

From charlesreid1

Contents

Top-Down Heap Construction

Fast, Bottom-Up Heap Construction

Heap Sort Pseudocode

Flags