From charlesreid1

Volume 3

Chapter 5: Combinatorics

Multiset Permutations

Suppose we have a multiset

M = \{ a, a, a, b, b, c, d, d, d, d\} = \{ 3 \cdot a, 2 \cdot b, c, 4 \cdot d \}

and we wish to determine the number of permutations of this multiset into words of the form "aaabbcdddd".

If each character were distinct, we have 10 characters, so there would be 10! configurations. But many of these are duplicated. To be more precise, for the case of a, where there are 3 a's, there will be exactly 3! configurations that are duplicates. So, we can eliminate these by dividing 10! by 3!. Next, for b, there will be exactly 2! configurations that are duplicates. And so on.

Then we can determine the number of permutations of M via:

N_{perm} = \dfrac{10!}{3! 2! 1! 4!}

In general, the number of permutations of a multiset is the multinomial coefficient defined as

\binom{n}{n_1, n_2, \dots} = \dfrac{n!}{n_1! n_2! \dots}

See AOCP/Binomial Coefficients and AOCP/Multinomial Coefficients.

Knuth notes that these were known in ancient times:

The Hebrew Book of Creation (300 AD), which was the earliest literary product of Jewish philosophical mysticism, gives the correct values of the first seven factorials, after which it says "Go on and obtain numbers which the mount cannot express and the ear cannot hear."

- Knuth, AOCP V3 Section 5.1

First appearance of permutations of multisets were in 1150, Lilavati of Bhascara Acharya.

Interesting note: he gives the formula for the sum of the permutations of this multiset as:

\dfrac{(4+8+5+5+5) \cdot 120 \cdot 11111}{5 \cdot 6} = 48555 + 45885 + \dots

Foata (1969) provided an extension called the "intercalation product" that makes it possible to extend ideas and results from sets to multisets fairly easily.

First, assume the elements of the multiset are ordered in some way. Now consider the two-line notation:

  • The top line will contain the elements of M in non-decreasing order
  • The bottom line will contain the permutation itself

The intercalation product a T b of the top and bottom lines is done by expressing a and b in the two-line notation, then sorting columns into STABLE, non-decreasing order of the top line. Stable means, if there is a tie on the top line, break the tie using the original left-to-right order of the bottom row.

Example: the intercalation of these two multisets c a d a b and b d d a d would be:

a a b c d
c a d a b

a b d d d
b d d a d

a a a b b c d d d d
c a b d d a b d a d

Complicated Multiset Example

Suppose we have a string of characters a, b, c that has exactly:

  • A occurrences of a
  • B occurrences of b
  • C occurrences of c
  • k occurrences of adjacent pair ca
  • l occurrences of adjacent pair cb
  • m occurrences of adjacent pair ba

This can be thought of as two-line arrays of the form:

A                       B            		C
a ... a                	b ... b             c ... c
\______/                \______/           \_______/
(A - k - m) a's         (m a's)             (k a's)
\_____________________________/            \______/
           (B - l) b's                     (l b's)
                    C c's

When we use this representation, we can turn each row (corresponding to each symbol a, b, c) into an umber of possible ways to permute that letter.

enumeration of a

In the second line, the a's are enumerated as the product of three terms;

The first term represents the number of ways that we can distribute A a's among (A - k - m) positions (the positions in which we are free to select what comes after a):


The second term represents our choice of B b's to place in m slots (next to m a's, for the m 'ab' combinations):


The third term comes from our choice of C c's to place in k slots (next to k a's, for the k 'ac' combinations):


This gives the total number of possibilities for A:

\binom{A}{A-k-m} \binom{B}{m} \binom{C}{k}

enumeration of b

In the third row, the b's can be placed in the remaining positions as our choice of placing B b's plus k combinations "ac" into B-l slots:


(This is a little magical - I would have used B in the top, not B+k. Not sure how we know to do that, since Knuth does not explain.)

Next, we have a term representing our choice of C-k remaining c characters to couple with b characters to form l pairs of 'bc' combinations:


This brings the enumeration of ways to arrange the bs to:

\binom{B+k}{B-l} \binom{C-k}{l}

enumeration of c

Tthe remaining positions that are still vacant must be filled by c's,


(Again, some magic happening here - why does Knuth use C, instead of (C - k - l)?

total enumerations

This gives the total number of arrangements:

\binom{A}{A-k-m} \binom{B}{m} \binom{C}{k} \binom{B+k}{B-l} \binom{C-k}{l}

Another Multiset Example

Let's look at another multiset example.

Suppose we have a multiset M = \{ A \cdot a, B \cdot b, C \cdot c \}.

Now let N(A, B, C, m) denote the number of permutations of M whose two-line representations contain NO columns of the form

a b c
a b c

and contains exactly m columns of the form


Then it follows that there are exactly A columns of the form


exactly B-m columns of the form


exactly C - B + m of the form


exactly C-A+m of the form


and exactly A + B - C - m of the form


Combining these facts, we get:

N(A,B,C,m) = \binom{A}{m} \binom{B}{C-A+m} \binom{C}{B-m}

Now, we can count these permutations in another way: since we have excluded columns where the two rows are identical, the only possible prime factors of the permutation are (a b), (a c), (b c), (a b c), (a c b). These each have one letter in common, so the factorization is unique.

Therefore, if the cycle (a b c) occurs k times in the factorization, this implies that (a b) occurs m-k times, (b c) occurs C - A + m - k times, (a c) occurs C - B + m - k times, and (a c b) occurs A + B - C - 2m + k times.

This allows us to put N(A,B,C,m) in terms of a multinomial coefficient:

N(A,B,C,m) = \sum_{k}{ \dfrac{(C+m-k)!}{ (m-k)!(C-A+m-k)!(C-B+m-k)!k!!(A+B-C-2m+k)!} }

and this simplifies to

N(A,B,C,m) = \sum_{k}{ \binom{m}{k} \binom{A}{m} \binom{A-m}{C-B+m-k} \binom{C+m-k}{A} }

From this, Knuth begins to do a bunch of manipulation of binomial coefficients. Specifically, he mentions the following identity:

\sum_{j} \binom{M-R+S}{j} \binom{N+R-S}{N-j} \binom{R+j}{M+N} = \binom{R}{M} \binom{S}{N}

with M = A+B-C-m, N = C-B+m, R = B, S = C, j = C - B + m - k.

Third Example

Exercise 13 (rated M21), Section 5.1.2 (Permutations of a Multiset), states: Prove that the number of permutations of M = \{ A \cdot a, B \cdot b, C \cdot c, D \cdot d, E \cdot e, F \cdot f \} containing no occurrences of the adjacent pairs of letters 'ca' and 'db' is:

\sum_{t} \binom{D}{A-t} \binom{A+B+E+F}{t} \binom{A+B+C+E+F-t}{B} \binom{C+D+E+F}{C,D,E,F}


AOCP/Binomial Coefficients

AOCP/Multinomial Coefficients