String Permutations
From charlesreid1
Contents
Skiena Chapter 7
Question 14
Write a function to find all permutations of the letters in a particular string (example: HELLO).
Enumerating
Enumerating - Easy Case
Let's start by counting the total number of permutations of letters in a particular string.
Consider the simplest case, no repeated letters: ABCD
Each position can hold any of the four letters, but we can only choose a letter once. So the total number of permutations of a k-letter string with no repeated letters is
Mathematically, we are picking 4 distinct objects, and we are picking all 4 objects at the same time. In other words, 4 pick 4.
This leads to a total number of possible strings:
We are using pick, not choose, because order matters. Including a 4! on the bottom would indicate that order does not matter, and would lead to one possible outcome (obvious - any permutation with the letters ABCD will always be the same set of letters, but in different order, so there is only one outcome).
Enumerating - Multiset Case
Now consider the less trivial case where we have repeated letters: AAABCDD
To count the number of permutations of a multiset, we should use multinomial coefficients. (Note: see AOCP/Multisets and AOCP/Multinomial Coefficients).
Mathematically, this problem is what is known as a multiset problem, where we have a set of letters each occurring with some frequency:
In this case we can write the number of possible permutations by placing each character, one at a time.
Start by placing the A characters. The number of slots to place the A characters is (a) choose (number of slots):
Once the As have been placed, we can enumerate the B characters The number of slots to place the B characters is (b) choose (number of slots, less A characters):
Next, the Cs can be placed, where the number of slots to place the C characters is (c) choose (number of slots, less A and B characters):
and finally, all of the other characters are placed, leaving the Ds with a single remaining configuration:
Now, the total number of permutations is the product of each of these. The number of configurations of a multiset is given by the multinomial coefficient
More generally, for a multiset with k objects (denoted a) occurring a certain number of times (denoted r),
then if there are n total slots , the total number of permutations is given by the multinomial coefficient,
This can be expressed more concisely in terms of factorials:
Enumerating - Infinite Letters Case
Suppose we have an infinite pool of characters, and we wish to know the number of words of a given length that can be formed using these infinite pools of letters.
We can still treat this as a multiset problem, but this time the number of characters is infinite. This multiset is denoted:
In this case, the number of permutations of length n that can be formed from the 4 different characters is given by the simple expression,
More generally, if we have an infinite multiset with k distinct elements,
then the number of words of length n that can be formed is given by:
Enumerating - Stars and Bars
The stars and bars theorem/approach is another way to look at assembling/enumerating permutations.
If we have n objects being placed into k partitions, we can think of this as placing k-1 bars among the n objects.
In this case, the total number of slots for objects + bars is n + k - 1, and we are placing k - 1 bars, so we have the total number of outcomes given by
Generating
The task of actually generating all of the permutations of words that contain a certain set of characters is an extremely important one. (Knuth covers this exclusively in Volume 4 Facsimile 2 of his Art of Computer Programming volumes.)
We can generate permutations in several ways:
- lexicographic order, or sorted order
- de Bruijn cycles, where we remove an item from the front and add an item to the rear
- binary reflected Gray code, where we change only a single item at a time
- Cool-lexicographic order (see http://webhome.csc.uvic.ca/~haron/CoolMulti.pdf)
Cool-Lexicographic Order
This is a nice simple algorithm that implements an iterative, non-recursive, loopless permutation generator.
Algorithm:
- Visits every permutation of integer multiset E
- Call to init(E) creates singly linked list that stores elements of E in non-increasing order
- head, min, and inc point to first, second to last, and last nodes, respectively
- All variables are pointers
- phi is null
- If E = {1,1,2,4}, then first three visit(E) calls will produce the configurations:
- 4 2 1 1
- 1 4 2 1
- 4 1 2 1
Notes on variables:
- head points to first node of current permutation
- i points to ith node of current permutation
- afteri points to the (i+1)st node of current permutation
to apply the prefix shift of length k, the two pointers k and beforek are used:
- k points to the kth node of the current permutation
- beforek points to the (k-1)st node of the current permutation
Note that the visit(head) call denotes a new permutation that has been generated
- visit(head) happens when new permutation pointed to by head
- Represents passing control back to consumer requesting permutations
- Work in init(E) would also be done by consumer
When does the loop stop?
- Condition on loop ensures algorithm continues until it generates tail(E)
- Rather than generating tail(E) as separate case after oop ends, it initializes linked list to tail(E)
- This is the very first permutation visited
Here is the algorithm:
[head, i afteri] <- init(E) visit(head) while (afteri.next != null or afteri.value < head.value) do: if (afteri.next != null and i.value >= afteri.next.value): beforek <- afteri else beforek <- i end k <- beforek.next beforek.next <- k.next k.next <- head if (k.value < head.value): i <- k end afteri <- i.next head <- k visit(head) end
Flags
Combinatorics
Combinatorial Structures - Order Does Not Matter Ordinary generating functions
Labelled Structures - Order Matters Enumerating Permutations: String Permutations Generating Permutations: Cool · Algorithm M (add-one) · Algorithm G (Gray binary code)
Combinatorics Problems Longest Increasing Subsequence · Maximum Value Contiguous Subsequence · Racing Gems Cards (poker hands with a deck of 52 playing cards)
|
Algorithms Part of Computer Science Notes
Series on Algorithms
Algorithms/Sort · Algorithmic Analysis of Sort Functions · Divide and Conquer · Divide and Conquer/Master Theorem Three solid O(n log n) search algorithms: Merge Sort · Heap Sort · Quick Sort Algorithm Analysis/Merge Sort · Algorithm Analysis/Randomized Quick Sort
Algorithms/Search · Binary Search · Binary Search Modifications
Algorithms/Combinatorics · Algorithms/Combinatorics and Heuristics · Algorithms/Optimization · Divide and Conquer
Algorithms/Strings · Algorithm Analysis/Substring Pattern Matching
Algorithm complexity · Theta vs Big O Amortization · Amortization/Aggregate Method · Amortization/Accounting Method Algorithm Analysis/Matrix Multiplication
Estimation Estimation · Estimation/BitsAndBytes
Algorithm Practice and Writeups Project Euler · Five Letter Words · Letter Coverage
|