String Permutations: Difference between revisions
From charlesreid1
| Line 127: | Line 127: | ||
==Generating== | ==Generating== | ||
The task of actually generating all of the permutations of words that contain a certain set of characters is an extremely important one. Knuth covers this in Volume 4 Facsimile 2 of his [[Art of Computer Programming]] | The task of actually generating all of the permutations of words that contain a certain set of characters is an extremely important one. Knuth covers this in Volume 4 Facsimile 2 of his [[AOCP|Art of Computer Programming]] volumes. | ||
We can generate permutations in lexicographic order, or sorted order | We can generate permutations in lexicographic order, or sorted order | ||
Revision as of 11:07, 6 August 2017
Skiena Chapter 7
Question 14
Write a function to find all permutations of the letters in a particular string (example: HELLO).
Enumerating
Enumerating - Easy Case
Let's start by counting the total number of permutations of letters in a particular string.
Consider the simplest case, no repeated letters: ABCD
Each position can hold any of the four letters, but we can only choose a letter once. So the total number of permutations of a k-letter string with no repeated letters is $ k! $
Mathematically, we are picking 4 distinct objects, and we are picking all 4 objects at the same time. In other words, 4 pick 4.
This leads to a total number of possible strings:
$ P(4,4) = \dfrac{4!}{(4-4)!} = \dfrac{4!}{1} = 4! $
We are using pick, not choose, because order matters. Including a 4! on the bottom would indicate that order does not matter, and would lead to one possible outcome (obvious - any permutation with the letters ABCD will always be the same set of letters, but in different order, so there is only one outcome).
Enumerating - Multiset Case
Now consider the less trivial case where we have repeated letters: AAABCDD
To count the number of permutations of a multiset, we should use multinomial coefficients. (Note: see AOCP/Multisets and AOCP/Multinomial Coefficients).
Mathematically, this problem is what is known as a multiset problem, where we have a set of letters each occurring with some frequency:
$ \{ a \cdot A, b \cdot B, c \cdot C, d \cdot D \} $
In this case we can write the number of possible permutations by placing each character, one at a time.
Start by placing the A characters. The number of slots to place the A characters is (a) choose (number of slots):
$ \binom{a+b+c+d}{a} $
Once the As have been placed, we can enumerate the B characters The number of slots to place the B characters is (b) choose (number of slots, less A characters):
$ \binom{b+c+d}{b} $
Next, the Cs can be placed, where the number of slots to place the C characters is (c) choose (number of slots, less A and B characters):
$ \binom{c+d}{c} $
and finally, all of the other characters are placed, leaving the Ds with a single remaining configuration:
$ \binom{d}{d} $
Now, the total number of permutations is the product of each of these. The number of configurations of a multiset is given by the multinomial coefficient
$ \binom{n}{a, b, c, d} = \binom{n}{a} \binom{n-a}{b} \binom{n-a-b}{c} \binom{n-a-b-c}{d} = \dfrac{n!}{a! b! c! d!} $
More generally, for a multiset with k objects (denoted a) occurring a certain number of times (denoted r),
$ \{ r_1 \cdot a_1, r_2 \cdot a_2, \dots, r_k \cdot a_k \} $
then if there are n total slots $ r_1 + r_2 + \dots + r_k = n $, the total number of permutations is given by the multinomial coefficient,
$ \binom{n}{r_1, r_2, \dots, r_k} = \binom{n}{r_1} \binom{n-r_1}{r_2} \binom{n - r_1 - r_2}{r_3} \dots $
This can be expressed more concisely in terms of factorials:
$ \binom{n}{r_1, r_2, \dots, r_k} = \dfrac{n!}{r_1! r_2! \dots r_k!} $
Enumerating - Infinite Letters Case
Suppose we have an infinite pool of characters, and we wish to know the number of words of a given length that can be formed using these infinite pools of letters.
We can still treat this as a multiset problem, but this time the number of characters is infinite. This multiset is denoted:
$ \{ \infty \cdot A, \infty \cdot B, \infty \cdot C, \infty \cdot D \} $
In this case, the number of permutations of length n that can be formed from the 4 different characters is given by the simple expression,
$ n^4 $
More generally, if we have an infinite multiset with k distinct elements,
$ \{ \infty \cdot r_1, \infty \cdot r_2, \dots, \infty \cdot r_k \} $
then the number of words of length n that can be formed is given by:
$ n^k $
Enumerating - Stars and Bars
The stars and bars theorem/approach is another way to look at assembling/enumerating permutations.
If we have n objects being placed into k partitions, we can think of this as placing k-1 bars among the n objects.
In this case, the total number of slots for objects + bars is n + k - 1, and we are placing k - 1 bars, so we have the total number of outcomes given by
$ \binom{n+k-1}{k-1} $
Generating
The task of actually generating all of the permutations of words that contain a certain set of characters is an extremely important one. Knuth covers this in Volume 4 Facsimile 2 of his Art of Computer Programming volumes.
We can generate permutations in lexicographic order, or sorted order
We can generate permutations in de Bruijn cycles, where we remove an item from the front and add an item to the rear
We can generate permutations using binary reflected Gray code, where we change only a single item at a time
Excellent simple algorithm here: http://webhome.csc.uvic.ca/~haron/CoolMulti.pdf