From charlesreid1

Notes

Skiena Chapter 8

Fibonacci dynamic programming example

Skiena begins the chapter with the Fibonacci example.

Worst way to do Fibonacci calculation: using recursion

More efficient way to do Fibonacci calculation: cache the intermediate results

Most efficient way: specify the order of evaluation in such a way that the results that are needed are always available. No caching necessary. It's a bit trivial once you reach this point, but it is still dynamic programming - you're just specifying the order in which to evaluate the sub-problems so that you always have the information you need.

Binomial dynamic programming example

Skiena uses the binomial coefficient as an example of a problem with a single expression that can be evaluated, but that single expression will cause integer overflow.

Instead: use identity,

and the base cases:

To implement this, we can build a table/array, which will contain Pascal's triangle...

Left column: m, top column: n

Now the order of operations is to fill in the edges of Pascal's triangle (the base cases) first

  • Initialize column 0 = 1
  • Initialize diagonal = 1

Now loop over each row and each column of the row to compute the binomial coefficient:

for i = 1 to n:
    for j = 1 to i:
        calculate binomial coefficient

Fuzzy string matching example

The next example Skiena covers is an important one: fuzzy string matching.

We are trying to pattern-match the string P in a larger text T.

To do approximate string matching, we must begin with a cost function. The cost function assigns a cost to each possible modification of our original string.

3 classes of edits:

  • Substitution (turning one letter into another)
  • Insertion
  • Deletion

To find the minimum cost function to turn our pattern P into something in the original text T, we need to work backwards: from the ending point, work backwards 1 edit at a time.

Maintain a table:

  • D(i,j) is the distance (the number of differences) between P1, P2, ..., Pi and the segment of T ending at j
  • D(i,j) is the minimum of the three possible ways to extend the current string
  • In other words, it is the minimum of the following:

Substitution: if then , otherwise

Insert:

Delete:

Pseudocode for program:

# Non-Dynamic Programming Version of Fuzzy String Matching
def string_compare(s, t, i, j):

	# Base case
	if i==0 or j==0:
		return 

	# Option: Match
	match = string_compare(s, t, i-1, j-1) + match(s[i], t[j])

	# Option: Insert
	insert = string_compare(s, t, i, j-1) + insertdelete(t[j])

	# Option: Delete
	delete = string_compare(s, t, i-1, j) + insertdelete(s[i])

	# Pick minimum of these
	return min(match, insert, delete)

But this is horribly inefficient. Why? Because it does not memoize. This means that it is repeating the same tasks over and over again.

Fuzzy string matching dynamic programming example

Define a C struct to hold information for cells. Each cell is a particular D(i,j) location.

Each cell holds cost, and parent cell. Now, in place of a recursive call, we have a table/cell lookup.

There is a fixed number of entries: size of P times size of T . The table has one column for each character in T, and one row for each character in P.

There are two ways to approach this problem:

  • We can get the edit distance, but not the actual sequence of edits. This is a bit easier.
  • We can get the actual sequence of edits. This is a little harder, but more useful.

To reconstruct the tour, we can use the cost matrix D(i,j) and work backwards to reconstruct the tour from D(i,j).

Here are the functions to define:

  • Initialization - this creates the "boundary conditions" or base cases for the dynamic program. This is the starting point that we build on
  • Penalty cost - this is a function that determines the actual cost of each edit (modify, insert, delete). The simplest case assigns a cost of 1, but this can tackle more general problems by making it a function.
  • Goal cell - this is a function that returns the indices of the endpoint cell. Again, the simplest case is the last cell in the table (which has a fixed size), but making this a function makes it more general.
  • Traceback - defining a "trace" action allows printing information about where in the dynamic program we are (for example, printing out the name of each operation, so we can see MMDMDIIMMD to indicate the sequence of edits being made)


Dynamic programming algorithms are easy to design, but getting boundary conditions and index manipulations correct requires great care.


Why is there so much extra fluff?

  • Fuzzy string matching is solved, but the extra fluff solves other problems too
  • Fuzzy string matching in large text body (slight modifications required to row initialization and goal cell functions)
  • Largest common subsequence - finding number of characters in common between two strings, also called the LCS (longest common subsequence) problem; requires modifying match cost function
  • Maximum monotone subsequence - finding the fewest numbr of elements to delete to make a sequence monotonic - for example, 243517698 -> 23568 (this is identical to the LCS problem)

Longest Increasing Sequence Example

Code and Example Problems

Link to some practice problems: [1]

Longest Increasing Subsequence

Link: https://git.charlesreid1.com/cs/java/src/master/dynamic-programming

Racing Gems

Link: https://git.charlesreid1.com/cs/java/src/master/dynamic-programming/racing-gems

Maximum Value Contiguous Subsequence

Link: https://git.charlesreid1.com/cs/java/src/master/dynamic-programming/max-val-seq

Letter Coverage

Link: https://git.charlesreid1.com/cs/five-letter-words/src/master/word_coverage

Resources

MIT Open Courseware 6.006

Links to Videos

Dynamic programming 1: Fibonacci, shortest paths: YouTube video link

Dynamic programming 2: Blackjack: YouTube video link

Dynamic programming 3: Parenthesization, edit distance, knapsack: YouTube video link

Dynamic programming 4: guitars, tetris, super mario: YouTube video link

MIT Open Courseware 6.046

Links to Videos

The topic of dynamic programming is covered in the MIT 6.046 open courseware playlist

The course begins with dynamic programming, by covering the divide and conquer approach for finding a convex hull: YouTube video link

It continues with a divide and conquer approach for matrix algebra and Fast Fourier Transforms: YouTube video link

It then covers divide and conquer for van Emde Boas trees: YouTube video link

Recitation with more dynamic programming examples: YouTube video link

Advanced dynamic programming: YouTube video link

Dynamic programming for all pairs, shortest paths: YouTube video link