Heap Sort
From charlesreid1
Heaps are data structures with weak sorting - that is, the item at the root node is always guaranteed to be the minimum element, but the only invariant in a heap is that any child node must be smaller than its parent.
Heaps are used in priority queues to keep items in a sorted order. But heaps are also useful for general sorting purposes. The heap sort algorithm starts by constructing a heap (tree data structure) and linking each heap node to a particular element in the original data. Once the heap is constructed, the algorithm removes items one at a time from the root node, and puts them in their final sorted position.
Contents
Top-Down Heap Construction
There are several ways to construct a heap. The "normal" or naive way is to walk through the array of elements, adding each node to the last open position in the heap.
(Note that heaps fill in nodes in a left-to-right, top-to-bottom manner.)
If we add an item to the last open position in a tree, and it is smaller than its parent, we simply swap that node with its parent.
This takes O(N log N) time.
However, we can improve the performance of this top-down construction process, by utilizing the fact that we are inserting a node into a sorted tree. This is still O(N log N), but in reality is more like O(N).
Fast, Bottom-Up Heap Construction
We can construct things faster, from O(N log N) to O(N), by utilizing a bottom-up construction process.
Imagine that we are walking through an array of data, creating nodes for each element, and adding each node to a heap.
If we follow the top-down heap construction, each item that we add to the heap will need to be downheap bubbled, and we could potentially end up downheap bubbling every single item, for a total cost of O(log N) per node, N nodes, for O(N log N). This is the top-down construction process.
To improve on that, we can limit the distance that each node will downheap bubble by working our way from the bottom up. In this way, we know that all sub-heaps of the inserted node will already be heap-ordered.
We don't want to begin with the LAST open position, because if we add a new node to the last open position, we know nothing whatsoever about the last level of the heap, and the node will definitely need to bubble up the heap - potentially the entire height of the tree.
Instead, we want to add the new node to the root position of an already-sorted subtree. This invokes the downheap bubble algorithm, but we know that when we get several steps in, we are adding a new root node to a heap that is already heap-ordered.
It is a subtle difference, but here's why this works: the maximum distance an element can possibly be moved is proportional to the distance from the insertion point to the point in the heap. This makes the runtime for a single insert O(1), because we don't have to bubble it all the way down. If runtime is proportional to the sum of the heights, then for a heap with N nodes, this is O(N).
See the "heapify" section of the Wikipedia "Heapsort" algorithm.
public static<T> void makeHeap(T[] arr, Comparator<T> comp) { for(int i=arr.length/2; i>=0; i--) { percolateDown(arr, comp, arr.length, i); } } public static<T> void perclateDown(T[] arr, Comparator<T> comp, int heapSize, int i) { ... }
Heap Sort Pseudocode
Flags
Priority Queues and Heaps Part of Computer Science Notes
Series on Data Structures
Java: Priority Queues/Java · Priority Queues/ADT · Priority Queues/Sorted · Priority Queues/Unsorted Performance: Priority Queues/Timing and Performance Applications: Maximum Oriented Priority Queue · Priority Queues/Stack
Priority Queues/Heap · Priority Queues/Java · Priority Queues/Comparators
|
Trees Part of Computer Science Notes
Series on Data Structures Abstract data type: Trees/ADT Concrete implementations: Trees/LinkedTree · Trees/ArrayTree · SimpleTree
Tree Traversal Preorder traversal: Trees/Preorder Postorder traversal: Trees/Postorder In-Order traversal: Binary Trees/Inorder Breadth-First Search: BFS Breadth-First Traversal: BFT Depth-First Search: DFS Depth-First Traversal: DFT OOP Principles for Traversal: Tree Traversal/OOP · Tree Traversal/Traversal Method Template Tree operations: Trees/Operations Performance · Trees/Removal
Tree Applications Finding Minimum in Log N Time: Tree/LogN Min Search
Abstract data type: Binary Trees/ADT Concrete implementations: Binary Trees/LinkedBinTree · Binary Trees/ArrayBinTree Binary Trees/Cheat Sheet · Binary Trees/OOP · Binary Trees/Implementation Notes
|