Numerics/Sorting and Selection

The Elephant in the Room

Sorting is the most-studied problem in computer science. Seriously. More papers have been written about sorting than any other algorithmic problem. And for good reason — it's the gateway drug to algorithm design. If you understand sorting, you understand divide-and-conquer, randomized pivoting, lower-bound arguments, and the fundamental tension between worst-case guarantees and real-world performance. That's like 60% of algorithm design right there.

But here's the thing: sorting isn't just a CS 101 topic. In numerical computing, sorting shows up everywhere:

Preprocessing data for interpolation and curve fitting (sorted x-values make life way easier)
Building spatial data structures (kd-trees start with a sort)
Computing order statistics (medians, quartiles, percentiles — the backbone of robust statistics)
Sparse matrix reordering (reducing fill-in during LU decomposition)
Signal processing (sorting FFT frequency bins by magnitude)
Removing duplicates in datasets before numerical analysis

And selection — finding the k-th smallest element without fully sorting — is arguably even more important for numerics. The median is the single most robust measure of central tendency. You don't need to sort a million data points to find the median. That would be wasteful. That's what selection algorithms are for.

So let's dive in. This page covers the algorithms that actually matter for numerical work, with opinions.

The Comparison Model and the Ω(n log n) Wall

First, let's get the theory out of the way, because it's honestly kind of beautiful.

In the comparison model of sorting, the only operation allowed on keys is a pairwise comparison (≤, >, etc.). You can't peek at bits, you can't hash, you can't bucket by digit. Just compare.

The question: what's the best we can do? The answer: Ω(n log n) comparisons. No comparison-based sort can beat this lower bound.

Why? It's an information-theoretic argument. There are n! possible permutations of n distinct elements. Each comparison gives you at most 1 bit of information (it splits the remaining possibilities into two groups). To distinguish among n! possibilities, you need at least ⌈log₂(n!)⌉ comparisons. Stirling's approximation gives log₂(n!) ≈ n log₂ n − n log₂ e + O(log n). Boom — Ω(n log n).

$\log_2(n!) \approx n \log_2 n - n \log_2 e + O(\log n) = \Omega(n \log n)$

Algorithms that hit this bound exactly (like mergesort and heapsort) are asymptotically optimal. Quicksort hits it on average. The constant factors are where the real game is played.

If you can escape the comparison model — by exploiting the structure of your keys (integers in a bounded range, strings with limited length) — you can do O(n). More on that in the non-comparison section.

Quicksort: The Speed Demon

Quicksort is the algorithm everyone reaches for first, and for good reason: it's fast, it's in-place (with some care), and the average-case O(n log n) hides a very small constant factor. On real hardware, with real cache hierarchies, quicksort tends to smoke the competition.

The idea, from Tony Hoare (1960):

Pick a pivot. Partition the array so everything smaller than the pivot is on the left, everything larger is on the right. Recurse on both halves.

That's it. The entire algorithm is:

quicksort(A, lo, hi):
    if lo < hi:
        p = partition(A, lo, hi)
        quicksort(A, lo, p-1)
        quicksort(A, p+1, hi)

The partition step is where all the magic happens. Hoare's original scheme uses two pointers scanning from opposite ends; Lomuto's scheme scans from one end and is simpler but does more swaps. Both are O(n) per level of recursion.

Why Quicksort Wins in Practice

Three reasons:

Cache-friendly. The partition step does sequential scans through contiguous memory. This is exactly what modern CPU caches want. Mergesort also scans sequentially but needs extra memory; heapsort jumps around like a caffeinated kangaroo.

Small constant factor. The inner loop of quicksort is just a few comparisons and pointer increments. No function calls, no extra allocations.

It can be tuned to death. Median-of-three pivot selection, switching to insertion sort for small subarrays, three-way partitioning for duplicate-heavy data — these tricks make a real difference.

The Ugly Part: Worst-Case O(n²)

Pick a bad pivot every time (e.g., always the smallest element), and quicksort degenerates to O(n²). The recursion tree becomes a stick. For already-sorted data with naive pivot selection (always pick first or last element), this is exactly what happens. Oops.

Median-of-three pivot selection (median of first, middle, and last elements) mostly fixes this for common patterns. Randomized pivot selection makes the worst case exceedingly unlikely. And introsort (coming up) puts a hard safety net under quicksort by detecting deep recursion and switching to heapsort.

Three-Way Partitioning (Dutch National Flag)

When the data has lots of duplicate keys (think: sorting by an integer category with only 10 possible values), standard quicksort does unnecessary work swapping equal elements. The Dutch National Flag partitioning (Dijkstra's gem) splits into three regions: < pivot, = pivot, > pivot. Then you only recurse on the < and > regions.

partition3(A, lo, hi):
    lt = lo, gt = hi, i = lo
    pivot = A[lo]
    while i <= gt:
        if A[i] < pivot:      swap(A[lt++], A[i++])
        else if A[i] > pivot: swap(A[gt--], A[i])
        else:                 i++
    return lt, gt

This makes quicksort effectively O(n) for arrays with a constant number of distinct values. Beautiful.

Heapsort: The Reliable Workhorse

If quicksort is the hot-rod with occasional engine trouble, heapsort is the diesel truck that always gets you there. Guaranteed O(n log n) worst-case, in-place, no recursion depth worries.

Heapsort (J. W. J. Williams, 1964) works in two phases:

Heapify: Turn the array into a max-heap. This is O(n) — yes, building a heap is linear time, which is counterintuitive but true.
Extract: Repeatedly swap the root (maximum) with the last element of the heap, shrink the heap, and sift down. O(n log n).

heapsort(A):
    heapify(A)                    // build max-heap
    for i = n-1 down to 1:
        swap(A[0], A[i])
        sift_down(A, 0, i)       // restore heap property

Why Heapsort Doesn't Win the Drag Race

Despite the asymptotic guarantee, heapsort is usually 2-3x slower than quicksort on random data. The culprit is poor cache locality. The sift-down operation jumps across the array (parent to child, child to grandchild) in a way that thrashes the cache. The heap is a tree stored in an array, and tree traversal is not cache-friendly.

Also, heapsort does more comparisons per element than mergesort or quicksort. The heap property is weaker than full sorting, so sift-down has to compare with both children at each level.

But heapsort's guarantee makes it the perfect fallback. Which brings us to...

Introsort: Best of Both Worlds

Introsort is the practical answer to "quicksort or heapsort?" — it's both, with a heuristic to switch.

Developed by David Musser in 1997, introsort starts with quicksort (using median-of-three pivot). It tracks the recursion depth. If the depth exceeds 2⌊log₂ n⌋ — indicating quicksort is having a bad day — it switches to heapsort for the remaining subarray. This gives you:

Quicksort's average-case speed
Heapsort's worst-case guarantee
No O(n²) pathological cases

Most standard library sorts (C++ std::sort, .NET's Array.Sort) use introsort under the hood. The STL implementation is honestly a work of art — quicksort with median-of-three, switching to insertion sort when n < 16, falling back to heapsort when recursion gets deep. All in-place.

Mergesort: The Stability King

Mergesort is the algorithm you use when stability matters (more on why that matters later) or when you're sorting data that doesn't fit in memory.

The idea is pure divide-and-conquer:

mergesort(A):
    if len(A) <= 1: return
    mid = len(A) / 2
    left = mergesort(A[0:mid])
    right = mergesort(A[mid:])
    return merge(left, right)

The merge step takes two sorted arrays and interleaves them. It's O(n), and the recurrence T(n) = 2T(n/2) + O(n) resolves to O(n log n).

The Cost of Stability

Mergesort requires O(n) extra space (for the merge buffer). This is the main reason it's not the default — allocating a scratch array of size n is not free, and on memory-constrained systems it can be a dealbreaker. There are in-place merge algorithms, but they're complicated and slow (O(n log² n) typically, or O(n log n) with heroic engineering).

External Sorting

Mergesort is the foundation of external sorting — sorting data that doesn't fit in RAM. You sort chunks that fit in memory (runs), write them to disk, then do a k-way merge. This is how databases sort terabytes of data. The merge pattern maps beautifully to sequential disk I/O.

Parallel Mergesort

Mergesort parallelizes trivially — the two recursive calls are independent. On a shared-memory machine, you fork-join the two halves and merge at the end. The merge step itself can be parallelized with parallel prefix sums. This is why mergesort dominates GPU sorting and many parallel sorting benchmarks.

When n Is Small: Insertion Sort Crossover

Here's a dirty secret of practical sorting: for small n (n < ~30), insertion sort beats everything. Its constant factor is tiny — no recursion overhead, no extra allocations, just a tight loop of comparisons and shifts. On modern CPUs, insertion sort on 20 elements might complete in the time it takes quicksort to set up its first recursive call.

Every serious sorting implementation uses this. The pattern:

if n < THRESHOLD:
    insertion_sort(A)
else:
    quicksort(A)  // or mergesort, or introsort

Typical thresholds are 16–32. Below that, the O(n²) asymptotic behavior is irrelevant — the constant factor dominates and insertion sort wins.

Shellsort: The Dark Horse

Shellsort (Donald Shell, 1959) is insertion sort on steroids. Instead of comparing adjacent elements, you compare elements gap distance apart. You start with a large gap and shrink it. The final pass (gap = 1) is regular insertion sort, but by then the array is almost sorted, which is insertion sort's best case.

The key question: what gap sequence? Shell's original was n/2, n/4, ..., 1 — but that's not great. Better sequences (Hibbard: 2ᵏ−1, Sedgewick: 4ᵏ + 3·2ᵏ⁻¹ + 1) give O(n^(4/3)) or better. The exact worst-case complexity of Shellsort is still an open problem for many sequences. That's kind of amazing — we've been analyzing sorting algorithms for 60+ years and there are still open questions.

Shellsort is in-place, simple to code, and surprisingly competitive for moderate n. It's not stable, but neither is quicksort.

Non-Comparison Sorts: When Keys Are Cheap

If your keys have special structure, you can break the Ω(n log n) barrier. These algorithms are O(n) but only work on restricted key types.

Counting Sort

When keys are integers in a small range [0, k), counting sort is unbeatable:

counting_sort(A, k):
    count = array of zeros, size k
    for x in A: count[x]++
    // prefix sum to get positions
    for i = 1..k-1: count[i] += count[i-1]
    // place elements
    for x in reversed(A):
        output[--count[x]] = x

O(n + k) time, O(k) extra space. Stable. The iteration in reverse is what makes it stable — it places elements with the same key in their original relative order.

Radix Sort

Radix sort extends counting sort to larger key ranges by sorting digit by digit. For 32-bit integers, you do 4 passes with 8-bit "digits" (or 8 passes with 4-bit nibbles). Each pass is O(n). Total: O(d·n) where d is the number of digits.

The clever part: you must sort least significant digit first (LSD radix sort) for the stability of counting sort to carry through and produce a correctly sorted result. Sort by the last digit, then the second-to-last, and so on. Each pass preserves the order established by previous passes.

For floating-point radix sort, there's a neat trick: IEEE 754 floats can be sorted by interpreting their bits as integers. The sign-magnitude representation means negative numbers need special handling (flip the bits for negatives, flip sign bit for positives), but the rest works. Radix sort on floats is actually faster than comparison-based sorts for large arrays on modern hardware — GPUs love it.

Bucket Sort

If your keys are uniformly distributed in [0,1), bucket sort splits the range into n buckets, scatters elements, sorts each bucket (usually with insertion sort), and concatenates. Average case O(n), but degenerates if the distribution is skewed. This is mainly useful for floating-point data where you know the distribution.

Stability: Why You Should Care

A sort is stable if equal elements retain their relative order from the input. This matters more than people realize:

Sorting a spreadsheet by column B, then by column A: you want the B-order preserved within equal-A groups. Stable sort gives you this for free with two passes (sort by B, then stable-sort by A).
Radix sort requires a stable counting sort for each digit pass.
Numerical pipelines: if you've already grouped data and want to re-sort by a secondary key without destroying the grouping.

Stability matrix for the main algorithms:

Algorithm	Stable?	In-Place?
Quicksort	No (standard)	Yes
Heapsort	No	Yes
Mergesort	Yes	No (O(n) extra)
Introsort	No	Yes
Insertion sort	Yes	Yes
Shellsort	No	Yes
Counting sort	Yes	No (O(k) extra)
Radix sort	Yes	No (O(n) extra)
Timsort	Yes	No (O(n) extra)

Timsort (Python's default, also used in Java for objects) is a stable hybrid mergesort that exploits existing runs in real-world data. If the data is partially sorted, Timsort detects runs and merges them — adaptive to the extreme. It's named after Tim Peters, who wrote it for Python in 2002, and it's a masterpiece of pragmatic algorithm design.

Partial Sorting: You Probably Don't Need All of It

Here's something they don't tell you in CS 101: most of the time, you don't need the entire array sorted.

If you only need the top k elements, or the k smallest, or elements in a certain percentile range — do not sort the whole array. That's O(n log n) to compute something you could get in O(n + k log k) or even O(n).

Heap-Based Partial Sort

To get the top k elements: build a min-heap of the first k elements, then iterate through the rest. If the next element is larger than the heap minimum, pop and push. At the end, the heap contains the top k. This is O(n log k) — a huge win when k ≪ n.

std::nth_element (Quickselect Partitioning)

C++'s std::nth_element rearranges the array so that the element at position k is exactly what would be there if the array were sorted, everything before it is ≤ it, and everything after is ≥ it. It uses a selection algorithm (introselect — see next section). O(n) average, O(n) worst case in the standard library implementation. This is absurdly useful — for computing medians, quartiles, or trimming outliers, it's the right tool and it's O(n).

std::partial_sort

If you need the first k elements in sorted order: std::partial_sort does heap-based partial sort. O(n log k). Use this when you need both the selection AND the ordering.

Selection: Finding Order Statistics

Now we get to the really good stuff. Selection is the problem of finding the k-th smallest element in an unsorted array. Special cases: minimum (k=0), maximum (k=n−1), median (k=n/2).

The naive approach — sort, then index — is O(n log n). But we can do O(n). And the algorithms for doing so are gorgeous.

Quickselect (Hoare, 1961)

Quickselect is to quicksort what std::nth_element is to std::sort. It uses the same partition step, but only recurses on one side — the side that contains the k-th position.

quickselect(A, lo, hi, k):
    if lo == hi: return A[lo]
    p = partition(A, lo, hi)     // p is pivot's final position
    if k == p:      return A[p]
    else if k < p:  return quickselect(A, lo, p-1, k)
    else:           return quickselect(A, p+1, hi, k)

Average case: n + n/2 + n/4 + ... = O(n). This is honestly kind of beautiful — the geometric series sums to 2n.

But, like quicksort, quickselect has a worst case of O(n²) with bad pivot choices. The same median-of-three trick helps, but doesn't fully eliminate the problem.

Median of Medians (BFPRT, 1973)

This is the algorithm that proves selection can be done in guaranteed O(n) worst case. Blum, Floyd, Pratt, Rivest, and Tarjan published it in 1973. The key insight: instead of picking a random pivot, spend O(n) time picking a guaranteed good pivot.

The algorithm:

Divide the array into groups of 5.
Find the median of each group (trivial — sort 5 elements with ~7 comparisons).
Recursively find the median of these ⌈n/5⌉ medians. Call it M.
Use M as the pivot for partition.
Recurse on the appropriate side.

Why does this work? M is greater than roughly ⌈n/10⌉ medians, each of which is greater than 3 elements in its group. So M is greater than at least 3n/10 elements. Similarly, M is less than at least 3n/10 elements. The recursion can throw away at least 30% of elements each time.

The recurrence: T(n) ≤ T(⌈n/5⌉) + T(7n/10 + 6) + O(n).

This resolves to T(n) = O(n). The constant factor is large (~20-30x quickselect's average case), but it's linear.

Honestly, the fact that this exists is kind of miraculous. Five authors took on what seemed like an intractable problem and solved it with a beautifully simple (if somewhat gnarly to implement) algorithm.

Introselect

Just like introsort, introselect (Musser, 1997) combines quickselect with median-of-medians:

Start with quickselect.
Track a recursion-depth counter.
If quickselect is making insufficient progress, switch to median-of-medians.

This gives you the speed of quickselect in the average case with the O(n) guarantee in the worst case. This is what C++ std::nth_element uses under the hood (in most implementations).

Floyd-Rivest Selection (1975)

Floyd and Rivest published an algorithm that achieves roughly n + min(k, n−k) + O(√n) comparisons in the average case — which is essentially optimal. It uses recursive sampling: pick a small random sample, find bounds in the sample that bracket the target, then partition only within those bounds.

The practical performance of Floyd-Rivest is outstanding — it consistently uses fewer comparisons than quickselect. Some benchmarks show it beating quickselect by 20-30% on large arrays. The code is a bit more involved, but if you're selecting medians of billion-element arrays, it's worth the complexity.

Heap-Based Selection

For the special case of finding the k smallest (or largest) elements, heap methods are hard to beat:

k smallest: Build a max-heap of k elements. For each remaining element, if it's smaller than the heap max, pop and push. O(n log k).
k largest: Same idea with a min-heap.

When k is small (like finding the top 10 items), this is essentially O(n). When k = n/2 (median), it's O(n log n) — worse than quickselect. Know your k.

The Numerical Angle: Floating-Point Considerations

Sorting floating-point numbers has some gotchas that don't come up with integers.

NaN Handling

IEEE 754 NaN is unordered — all comparisons with NaN return false (except ≠, which returns true). This means NaN can end up anywhere in a sort, and it breaks the comparison model. Most standard sorts handle NaN by either:

Throwing it to one end (beginning or end)
Filtering it out first
Requiring the user to specify a NaN policy

In C++, std::sort with the default comparator will put NaN at the end (because a < b returns false for NaN). But if you use std::less{}, behavior is technically undefined for NaN. Fun times.

Signed Zero

IEEE 754 has both +0 and −0. They compare equal (+0 == −0 is true), but they have different bit patterns. In a stable sort, +0 and −0 retain their relative order. In an unstable sort, the result is arbitrary. This matters if you're sorting by one key and then using another key that distinguishes signed zeros.

Numerical Sensitivity of Pivot Selection

When computing median-of-three for quicksort, floating-point comparisons are exact for most values. But if you're sorting values that span many orders of magnitude, cancellation in pivot computation (e.g., averaging for approximate median) can cause issues. Don't do that. Use comparisons, not arithmetic, for pivot selection.

Partial Sorting for Robust Statistics

The median and quartiles are robust to outliers — that's their whole point. But computing them requires selection, not full sorting. If you're doing robust regression or outlier rejection, you're doing a lot of median computations. Use std::nth_element or quickselect. Your O(n) selection will dominate an O(n log n) sort every time.

Sorting for Numerical Pipeline Efficiency

A sorted array processes faster in many numerical pipelines:

Spatial locality: Sorted coordinates improve kd-tree build time and query performance.
Branch prediction: Binary search on sorted data has predictable branch patterns.
Vectorization: Processing sorted data means consecutive elements are closer in value, which can improve numerical stability of reductions (summing sorted positive numbers from smallest to largest minimizes floating-point error).

That last point deserves emphasis: summing an array of positive floats in sorted order (smallest to largest) minimizes roundoff error. The partial sums stay closer in magnitude to new terms. This is a classic numerical analysis trick that most people don't think about.

Algorithm Cheat Sheet

Algorithm	Best	Average	Worst	Space	Stable?
Quicksort	O(n log n)	O(n log n)	O(n²)	O(log n)	No
Heapsort	O(n log n)	O(n log n)	O(n log n)	O(1)	No
Mergesort	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes
Introsort	O(n log n)	O(n log n)	O(n log n)	O(log n)	No
Timsort	O(n)	O(n log n)	O(n log n)	O(n)	Yes
Insertion	O(n)	O(n²)	O(n²)	O(1)	Yes
Shellsort	O(n log n)	depends on gaps	depends on gaps	O(1)	No
Counting	O(n+k)	O(n+k)	O(n+k)	O(k)	Yes
Radix (LSD)	O(d·n)	O(d·n)	O(d·n)	O(n+k)	Yes
Quickselect	O(n)	O(n)	O(n²)	O(1)	No*
BFPRT (MoM)	O(n)	O(n)	O(n)	O(log n)	No*
Floyd-Rivest	O(n)	O(n)	O(n²)	O(log n)	No*
Heap select	O(n)	O(n log k)	O(n log k)	O(k)	No

* In-place but rearranges the array

References

Hoare, C.A.R. "Quicksort." The Computer Journal, 1962.
Hoare, C.A.R. "Algorithm 65: Find." Communications of the ACM, 1961. (Quickselect)
Williams, J.W.J. "Algorithm 232: Heapsort." Communications of the ACM, 1964.
Musser, David. "Introspective Sorting and Selection Algorithms." Software: Practice and Experience, 1997.
Blum, Floyd, Pratt, Rivest, Tarjan. "Time Bounds for Selection." JCSS, 1973.
Floyd, R.W. and Rivest, R.L. "Expected Time Bounds for Selection." Communications of the ACM, 1975.
Peters, Tim. "Timsort." Python source: listsort.txt, 2002.
Sedgewick, Robert. "Analysis of Shellsort and Related Algorithms." ESA, 1996.