Notes

Knuth AOCP Volume 3: Sorting and Searching: Combinatorics

Permutations and Inversions

Knuth begins talking about sorting by talking about combinatorics and permutations of items.

Let ${\displaystyle a_{1}a_{2}a_{3}\dots a_{n}}$ be a permutation of the integers ${\displaystyle {1\dots n}}$.

If ${\displaystyle i and ${\displaystyle a_{i}>a_{j}}$, then ${\displaystyle (a_{i},a_{j})}$ is an inversion.

Inversions are out-of-sorts pairs.

Cramer (1750) introduced inversions - utilized to find determinant.

Can also construct an inversion table:

Let ${\displaystyle b_{1}b_{2}\dots b_{n}}$ denote the inversion table of ${\displaystyle a_{1}a_{2}\dots a_{n}}$. Then ${\displaystyle b_{j}}$ is the number of elements to the left of j that are greater than j.

Example of sequence and its inversion table:

5 9 1 8 2 6 4 7 3
2 3 6 4 0 2 2 1 0


${\displaystyle 0\leq b_{1}\leq n-1,0\leq b_{2}\leq n-2,\dots ,0\leq b_{n-1}\leq 1}$

Hall (1956) showed that inversion tables uniquely determine permutations - these make inversion tables alternative representations for different permutations.

Transformation technique: turn counting problems into inversion table problems

Now, suppose we want to count number of elements larger than their successor. (This is the number of j such that ${\displaystyle b_{j}=n-j}$).

Note that this idea is related to nested for loops:

for(int i = 0; i < N; i++ ) {
for(int j = i; j < N; j++) {


We have ${\displaystyle b_{j}=n-j}$

Since we know the probability that b1 equals n-1 is ${\displaystyle P(b_{1}=n-1)={\frac {1}{n}}}$,

and independently the probability that b2 equals n-2 is ${\displaystyle P(b_{2}=n-2)={\frac {1}{n-1}}}$,

and so on, then we can say:

${\displaystyle {\dfrac {1}{n}}+{\dfrac {1}{n-1}}+\dots +1=H_{n}}$

These are the harmonic numbers.

Counting Inversions with Generating Functions

Now, to analyze a sorting algorithm, we are interested in how many permutations of n elements have exactly k inversions. Number of inversions is denoted I, so this number is denoted ${\displaystyle I_{n}(k)}$

To pose this problem slightly differently: we can think of ${\displaystyle I_{n}(k)}$ as a number that is produced by some kind of generating function into which we plug our n and our k, and out pops ${\displaystyle I_{n}(k)}$.

In fact, we can do this by defining an infinite series polynomial whose kth coefficient is precisely ${\displaystyle I_{n}(k)}$.

Also note that ${\displaystyle I_{n}=\left({\binom {n}{2}}-k\right)=I_{n}(k)}$.

We define the generating function ${\displaystyle G_{n}(z)}$ for a sequence containing n elements as:

${\displaystyle G_{n}(z)=I_{n}(0)+I_{n}(1)z+I_{n}(2)z^{2}+\dots =\sum _{k\geq 0}I_{n}(k)z^{k}}$

Now, we know that when we choose a particular item b as the next element of our sequence ${\displaystyle b_{1}b_{2}b_{3}\dots b_{n}}$, that choice is independent of all other b's. Another thing we can observe is, there are two possible ways (two possible cases) for a sequence with n elements and k inversions:

• Either we take a sequence of length n-1 and k inversions, and add an item to it that does not change k;
• Or, we take a sequence of length n and k-1 inversions, and we change an item such that we add an inversion k.

From these, we can construct the recursive relationship:

${\displaystyle I_{n}(k)=I_{n}(k-1)+I_{n-1}(k)\qquad {\mbox{for }}k

Knuth then performs some magic, which he says is "not difficult to see," stating:

${\displaystyle G_{n}(z)=(1+z+\dots +z^{n-1})G_{n-1}(z)}$

and therefore he is able to simplify the generating function to:

${\displaystyle (1+z+\dots +z^{n-1})(\dots )(1+z+z^{2}+z^{3})(1+z+z^{2})(1+z)(1)={\dfrac {(1-z^{n})(\dots )(1-z^{2})(1-z)}{(1-z)^{n}}}}$

Knuth Goes To Outer Space

It is at this point that Knuth launches into a few pages of extremely difficult to follow material. Individual statements are sensible and logical, but how he gets them and where he's going with all of it is completely unclear...

Start by defining the generating function of n, divided by n factorial, as the generating function for the probability distribution of the number of inversions in a random permutation of n elements.

${\displaystyle {\dfrac {G_{n}(z)}{n!}}=g_{n}(z)}$

Further, let us define the function

${\displaystyle h_{k}(z)={\dfrac {(1+z+z^{2}+\dots +z^{k-1})}{k}}}$

This is the generating function for the uniform distribution of a random non-negative integer that is less than k.

Now, we can write g in terms of h:

${\displaystyle g_{n}(z)=h_{1}(z)h_{2}(z)\dots h_{n}(z)}$

Next, Knuth uses the following property:

${\displaystyle E(g_{n})=E(h_{1})+E(h_{2})+\dots +E(h_{n})}$

${\displaystyle Var(g_{n})=Var(h_{1})+Var(h_{2})+\dots +Var(h_{n})}$

(again, completely unclear where he gets this...)