From charlesreid1

Notes

Goodrich Chapter 3 Notes

date structure - systematic way of organizing and accessing data

algorithm - step by step procedure for performing a task in a fixed amount of time

To time algorithms: distinguish between:

  • wall time (less fair, dependent on environment) time.time
  • number of cpu cycles used by algorithm (better) time.clock
  • timeit/built in profiling tool (best)

Moving beyond experimental analysis:

  • Independence from hardware/software
  • Use higher level analysis
  • Account for all possible inputs

Worse case versus average case

  • Algorithm speed changes with different inputs
  • Worst case is easiest to determine
  • Average case requires probability distribution over inputs


\overline{t} = \int_{\Omega} t(\mathbf{x}) p(\mathbf{x}) d\mathbf{x}

Seven functions for algorithm analysis:

1. Constant function

2. Logarithmic function

3. Linear function

4. n log n function

5. Quadratic function

6. Cubic/polynomial function

7. Exponential function

Example: find_max function

  • Runtime: 2 statements always run, plus the for loop
  • Probabilit/worse case/average case;
  • The worst case assumes we have to run biggest = k every time through the for loop
  • Average case analysis:
  • If there are n numbers total, and if we have just looked at j numbers so far, the probability that the jth number will be the new largest of j numbers is 1/j
  • The number of times biggest number is updated, when we run n times, is the sum of each probability,


H_{n} = \sum_{j=1}^{n} \dfrac{1}{j}

nth Harmonic number

Harmonic numbers increase as O(\log n) - sometimes this is the source of a "magic log" term that appears.

Geometric sums:

  • For any integer n>= 0 and any real number a > 0, a neq 1,


\sum_{i=1}^{n} a^i = 1 + a + a^2 + \dots + a^{n-1} + a^n = \dfrac{ a^{n+1} - 1}{a-1}

This applies to finding the largest integer representable with n bits:


1 + 2 + 4 + 8 + \dots + 2^{n-1} = 2^n - 1

Big O notation:

  • If we have two functions f(n) and g(n) mapping positive integers to positive real numbers, wthen we say f(n) is O(g(n)) if, for a real constnat,


f(n) \leq c g(n)

for n \neq n_0

"f(n) is gib-O of g(n)

Alt: "f(n) is order of g(n)"

Alt: "f(n) is O(g(n))"

Big o notation allows ignoring constant factors and all lower order terms

Big omega notatino:

  • Big O means less than or equal to
  • Big Omega means greater than or equal to
  • Take the same two functions f(n) and g(n),
  • We say that f(n) is Omega(g(n)) if


f(n) \geq c g(n)

Example: 3 n log n - 2 n is Omega(n log n)

Asymptotic analysis:

  • To compare algorithms, look at two things:
    • Growth rates of algorithms as n increases
    • Maximum problem size (n) that can be solved as t increases
  • Examples of algorithm analysis:
    • Length of list - O(1)
    • Accessing element of list - O(1)
    • Finding maximum: worst case vs aerage case
    • Prefix averages
    • Three-way disjointness
    • Element uniqueness

Prefix averages:

  • If we want to find average of sequence S of n numbers, in another sequence A, such that A[j] is average of first j numbers,


A[j] = \dfrac{ \sum_{i=0}^{j} S[i] }{j+1}

3 implementations;

  • Implementation 1: nested for loops (poor performance, O(n^2))
  • Implementation 2: uses colon indexing notation, like sum(S[0:j+1]), but no better performance, still O(n^2)
  • Implementation 3: keep a running total, and use a cumulative algorithm

Three way set disjointness:

  • Finding items that are in all 3 sets (if not, 3 sets are disjoint)
  • 2 implementations:
    • Triple nested for loop, if a==b==c (wastes time, O(n^3))
    • Double nested for loop, if a==b, then another for loop, if b==c, O(n^2)

Sorting as an algorithm tool:

  • Sort before searching. Sorted function is O(n log n)
  • Limits cost to cost ofsearch

Justifying algorithm cost:

  • Proof by example/counter-examkple
  • Proof by contradiction/contrapositive
  • use of DeMorgan's Law
  • Induction and loop invariants

induction example:

  • To prove Fibonacci number nl, F(n), is less than 2^n: First show F(1) < 2^1, F(2) < 2^2
  • Then show F(n) < 2^{n-2} + 2^{n-1}
  • Then we know F(n) < 2^n

Loop invariant:

  • Prove a statement L about a loop
  • Show L0 (initial claim) is true before the loop is run
  • Show that if L(j-1) is true before an iteration, then L(j) is true after an iteration
  • Final statement, Lk implies L is true.

Skiena Chapter 1 and 2 Notes

Analyzing algorithms

RAM model of computation - each simple operation takes exactly one time step. Loops and subroutines are not simple operations.

Algorithms can be understood and studied in a language- and machine-independent manner.

Three ways of looking at runtime:

  • worst-case complexity
  • average-case complexity
  • best-case complexity

Why average case complexity so difficult? To illustrate: casino example.


Try to project what will happen if you bring n dollars into a casino to gamble. The best case, that you walk out owning the place, is possible but so unlikely that you should not even think about it. The worst case, that you lose all n dollars, is easy to calculate and distressingly likely to happen. The average case, that the typical bettor loses 87.32% of the money that he brings into the casino, is difficult to establish and its meaning subject to debate. What exactly does average mean? Stupid people lose more than smart people, so are you smarter or stupider than the average person, and by how much? Card counters at blackjack do better on average than customers who accept three or more free drinks. We avoid all these complexities and obtain very useful result by just considering the worst case.


That's concisely expressing in language the complexity of the parameter space \mathbf{x}. The average time is expressed mathematically as:


\overline{t} = \int_{\Omega} t(\mathbf{x}) p(\mathbf{x}) d \mathbf{x}

where t(\mathbf{x}) is algorithm time as a function of the input parameters (this is the function that maps the input parameter values to the output, the computational cost. The probability distribution p(\mathbf{x}) is the probability distribution that integrates to 1 when integrated over the entire input parameter domain \Omega:


\int_{\Omega} p(\mathbf{x}) d \mathbf{x} = 1.0

Note: this comes from the definition of the average of a function, from calculus 2:


\overline{f}(x) = \dfrac{1}{b-a} \int_{a}^{b} f(p) dp

Big O, Big Omega, Big Theta

Big O:

  • f(n) = O(g(n)) means c g(n) is an upper bound on f(n). There exists a constant such that f(x) \leq c f(x) for n \geq n_0

Big Omega:

  • f(n) = \Omega(g(n)) means c g(n) is a lower bound on f(n). There exists constant c such that f(n) \geq c g(n) for all n \geq n_0

Big Theta:

  • f(n) = \Theta(g(n)) means the function is bounded by a function g(n) within two values of constants, c_1 g(n) forming a lower bound and c_2 g(n) forming an upper bound, for n \geq n_0

Example question: is 2^{n+1} = \Theta(2^n)?

Using exponent properties, and the definition of \Theta(n),

2^{n+1} = 2 \times 2^{n+1}

and there are two constants c_1 = 1 and c_2 = 3 such that


1 \times 2^n \leq f(x) \leq 3 \times 2^{n}

Another question: is (x+y)^2 = O(x^2 + y^2)?

By expanding the quantity on the left hand side, and determining the number of operations,


x^2 + 2xy + y^2 = O(x^2) + O(xy) + O(y^2)

If x \leq y then the term 2xy \leq 2 y^2 \leq 2 (x^2 + y^2)

If  y \leq x then the middle term  2xy \leq 2 x^2 \leq 2 (x^+y^2)

Heuristics

Heuristics:

Algorithms, whatever their big Oh class, take generally the same time for n=10

Any algorithm with n! runtime is impractical for n \geq 20

Algorithms with a runtime of 2^n have a greater operating range, but become impractical for n > 40

Linear time and n log n time algorithms are practical for inputs of a billion items

An O(log n) algorithm hardly breaks a sweat for any imaginable value of n

List of functions

List of functions, in increasing order of dominance (more thorough than the Goodrich book):

  • Constant functions f(n) = 1
  • Logarithmic functions f(n) = \log n
  • Linear functions f(n) = n
  • Superlinear functions f(n) = n \log n
  • Quadratic functions f(n) = n^2
  • Cubic functions f(n) = n^3
  • Exponential functions f(n) = c^n
  • Factorial functions f(n) = n!

The sum of two functions is governed by the dominant one:

O(f(n)) + O(g(n)) \rightarrow O(\max(f(n),g(n)))

The product of two functions is governed by the dominant one:


O(f(n)) \times O(g(n)) \rightarrow O(f(n) \times g(n))

Analyzing two sort functions

Algorithmic Analysis of Sort Functions

String Pattern Matching

Pattern matching is the most fundamental algorithmic operation on text strings.

Problem: Substring pattern matching.

Input: A text string t and a pattern string p.

Output: Does t contain the pattern p as a substring, and if so, where?

Algorithmic Analysis of Substring Pattern Matching

Matrix Multiplication

Algorithmic Analysis of Matrix Multiplication

Logs and Binary Search

Binary search is an O(log n) algorithm.

To find a given name in the one-million-person Manhattan phone book, you only need 20 comparisons.


2620 \sim 1000^2 \sim 1,000,000

How many lookups would it take to find one thing in 1000 items?


2^10 \sim 1000^1


2^20 \sim 1000^2 \sim 1 M


2^30 \sim 1000^3 \sim 1 B


2^40 \sim 1000^3 \sim 1 T

The height h of a rooted binary tree with n leaf nodes is log(n)

Logs and Bits

The number of bits we need to represent any one of n different possibilities (n items, or the n integers 1 to n), using bits, is log(n) bits. (Log base 2, of course.)

Logs and Multiplication

Logarithms can be used to multiply numbers (principle of the slide rule).

log(xy) = log(x) + log(y)

Now can use exponent and log functions to calculate any exponent a^b, like so:


a^b = \exp( \ln( a^b ) ) = \exp( b \ln a )

Fast exponentiation

The simplest algorithm to raise some number a to the power of n would require n-1 multiplications.

However, can also observe that n = floor(n/2) + ceil(n/2)

Then we can split up the exponent as:


a^n = (a^{\frac{n}{2}})^2

and if n is odd,


a ( a^{floor(\frac{n}{2})} )^2

This eliminates half of the total multiplications, and only introduces two new multiplication operations.

func power(a,n):
    if(n==0) return(1)
    x = power(a, floor(n/2))
    if(n is even) then return(x^2)
        else return(a*x^2)

Questions

Goodrich Chapter 3 Questions

Sum of first n even numbers

Question R-3.6: What is the sum of the first n even numbers, 0..2n, for n > 0?

Answer: Use the sum of integers formula,


\sum_{k=0}^{n} 2k = 2 \sum_{k=0}^{n} k = 2 \dfrac{n(n+1)}{2} = n(n+1) = n^2 + n

Sum of first n even numbers is n^2 + n.

Solving 3 way disjointedness problem in n log n time

Question C-3.35: Assuming it is possible to sort n numbers in O(n log n) time, show it is possible to solve 3-way set disjointedness problem in O(n log n) time.

Answer: First, we sort 3 sets. That takes 3 * n log n units of time.

Then we can iterate over each element of a, which costs us n, and inside of that loop we can perform a binary search, which costs us log n, for an overall cost of n log n.

If we find b in a, then we perform another binary search for c in a. This will cost us log n.

Is the worst case then still O(n log n)? Worst case is, every element in A also in B, so we perform two binary searches each time. 2 * log n still does not cost us that much.

finding 10 largest elements

Question C-3.36: Describe an efficient algorithm for finding the 10 largest elements of a sequence of size n. What is the runtime of this algorithm?

Answer: sort n elements in O(n log n) time.

  • Get 10 items from the back (biggest) - or if you want the smallest, from the front.

O(n) algorithm calling O(n2) algorithm

Question C-3.38: Show that \sum_{i=1}^{n} i^2 is O(n^3).

Answer: This is the notation we use when we need to say, "we are running a loop, and inside the loop calling a function whose cost is O(i^2)."

We know (via the story about Carl Gauss - 5050):

\sum_{i=1}^{n} i = \dfrac{n(n+1)}{2} \sim O(n^2)

We can also write:

\sum_{i=1}^{n} i^2 = \dfrac{n(n+1)(2n+1)}{6}

(Derivation requires some series manipulation, starting with sum over k^3 and eventually getting to k^2

This shows that when an algorithm that runs n times, O(n), calls an algorithm that runs in O(i^2) time, it results in an O(n^3) algorithm overall

Run a scaling analysis to show Python sorted function is n log n

Question P-3.57: Perform experimental analysis to test hypothesis that Python's sorted() function runs in O(n log n) time.

Answer: see https://charlesreid1.com:3000/cs/python/src/master/algorithm_complexity/test_sorted.py

Skiena Chapter 1 Questions

Proofs by Induction




Sum of integers

Question: Prove by induction that \sum_{i=1}^{n} i = \dfrac{n(n+1)}{2} for n \geq 0

Answer: start by letting n = 1, then we see that the expression holds.

Then we let n = 2, and we see that the expression holds.

Finally, we assume that the expression is true for n, and see if that implies it is true for n+1. The strategy here is, peel off the one largest term, then substitute the expression (which we assumed is true) and see what shakes out.


\sum_{i=1}^{n+1} i = (n+1) + \sum_{i=1}^{n} i = (n+1) + \dfrac{n(n+1)}{2} = \dfrac{2n+2}{2} + \dfrac{n^2+n}{2}

Continuing,


\dfrac{n^2 + 3n + 2}{2} = \dfrac{(n+1)(n+2)}{2} = \dfrac{(n+1)((n+1)+1)}{2}



Sum of squares of integers

Question: Prove by induction that \sum_{i=1}^{n} i^2 = \dfrac{n(n+1)(2n+1)}{2} for n \geq 0 by induction.

Answer: same strategy as before: verify the case is true for n = 1, then verify the case is true for n = 2.

Then, we assume the nth case is true, and check if that implies the n+1th case is true:


\sum_{i=1}^{n+1} i^2 = (n+1)^2 + \sum_{i=1}^{n} i^2 = (n+1)^2 + dfrac{n(n+1)(2n+1)}{6}

Continuing:


= \dfrac{6(n^2+2n+1)}{6} + \dfrac{(n^2+n)(2n+1)}{6} = \dfrac{(6n^2 + 12n + 6) + (2n^3 + n^2 + 2n^2 + n)}{6}

Finally:


= \dfrac{2n^3 + 9n^2 + 13n + 6}{6}

For this one, it's a little bit easier to work backwards from the given formula, plugging in (n+1) for n,


= \dfrac{(n+1)(n+2)(2n+3)}{6} = \dfrac{(n^2 + 3n + 2)(2n + 3)}{6} = \dfrac{2n^3 + 9n^2 + 13n + 6}{6}



Sum of cubes of integers

Question: Prove that \sum_{i=1}^{n} i^3 = \dfrac{n^2 (n+1)^2}{4} for n \geq 0, by induction.

Answer: Same strategy as above.



Sum of product of neighbor numbers

Question: Prove that \sum_{i=1}^{n} i(i+1)(i+2) = \dfrac{n(n+1)(n+2)(n+3)}{4}.

Let n=1, prove the case of n=1. Let n=2, prove the case of n=2.

Then, if the nth case is true, the n+1th case become:


\sum_{i=1}^{n+1} i(i+1)(i+2) = (n+1)(n+2)(n+3) + \sum_{i=1}^{n} i(i+1)(i+2) = \frac{4(n+1)(n+2)(n+3) }{4} + \dfrac{n(n+1)(n+2)(n+3)}{4}

The last two terms can be factored by grouping, to form a fourth linear term (n+4), so


= \dfrac{(n+1)(n+2)(n+3)(n+4)}{4}

which was the premise to be proved.



Sums of expression divisible by 3

Question: Prove by induction that n^3 + 2n is divisible by 3 for all n \geq 0.

Start with case of n = 1, and n = 2. Both divisible by 3.

Then we assume the nth case is true, and see if it implies the n+1th case:


(n+1)^3 + 2(n+1) = n^3 + 3n^2 + 3n + 1 + 2n + 2 = (n^3 + 2n) + (3n^2 + 3n + 3)

The first term on the right hand side is divisible by 3 by our assumption, and the last term on the right is definitely divisible by 3.

Estimation

See Estimation



Pages of books

Question 1-19 a) Do all of the combined the books you own have at least 1M pages?

Each book has O(100) pages - let's say, O(250) pages. Using either 1E2 or 2.5E2 as number of pages per book.

100 pages per book: 1 million pages is 1E6 pages x (1 book/1E2 pages) = 1E4 books = 10,000 books. Nope.

250 pages per book: 1 million pages is 1E6 pages x (1 book/2.5E2 pages) = 4E3 books = 4,000 books. Still nope.

Question 1-19 b) How many total page are in the school library?

Cascading estimations here.

1 book ~ 100 pages.

1 shelf ~ 10 books.

1 long bookcase ~ 100 shelves.

1 section ~ 10 long bookcases.

1 small library ~ 10 sections

1 big library ~ 100 sections

1 very large library ~ 1000 sections

(Note: can squeeze in additional factors of 10 in by turning one 1 into one 2.5 and one 1 into one 4.)

Number of pages in the library:

1 small library x (10 sections/1 lib) x (10 bookcases/1 section) x (100 shelves/1 bookcase) x (10 books/1 shelf) x (10 pages/1 book)

= 10 x 10 x 100 x 10 x 10

= 100 x 100 x 100

= 1 million pages for a small library (10k books)

= 10 million pages for a big library (100k books)

= 100 million pages for a very large library (1M books)

(This estimate does seem on the low side, probably additional factors of 10 missing. Plus or minus 2 orders of magnitude.)

Note: the NY Public Library is estimated to have somewhere around 3 million books. So, this seems to be in the right general ballpark.



Number of words in Skiena textbook

Question: Number of words in the Skiena textbook?

A single full page is about 15 words per line, about 40 lines per page, for a total of about 600 words per page.

The total number of pages is 700.

So the absolute upper bound, assuming every page is densely packed, is 420,000 words.

We can get a better estimate by assuming that 2/3 of the pages are full (600 words), and 1/3 of the pages are occupied by tables, figures, and white space (half full, or 300 words).

Do the half-full pages plus the full pages:

Nwords = 700 pages x (33 half-full pages/100 pages total) x ( .50(600) words / 1 half-full page) 
        + 700 pages x (66 full pages/100 pages total) x (1.0(600) words / 1 full page)

Nwords = (700)(600)(.33*.5 + .67)

Nwords = 420000(.7)

Nwords = 290,000

So the grand total is in the ballpark of 300,000 words. (Intuitively, this seems a bit on the long side.)



Number of hours in 1M seconds

How many hours are in 1M seconds? How many days? Do the math in your head.

Here we use that trick of using quarters when doing scale analysis.

3600 seconds per hour ~ 2500

24 hours per day ~ 25

1 M seconds = 1E6/2.5E3 = 4E(5-3) = 4E2 = 400 hours

1 M seconds = 400 hours/(25 hr/day) == 16 days

The actual value?

1E6 x (1/3600) x (1/24) = 11 days

Trick here is not to do just an order of magnitude estimate, but to make use of quarters as well. 1000/25, not just 1000/10. (If you don't do this, twice, you'll be off by an order of magnitude.)



Number of towns and cities in US

Question: Estimate the number of towns and cities in the United States.

Here we use the 80/20 heuristic. We assume that 80% of the population lives in large cities, the remaining 20% live in smaller cities.

Then, of that remaining 20%, we assume that 80% of them live in large cities, and 20% of them live in smaller cities.

Of that remaining 20%, etc.

Do this multiple times, splitting into groups that live in cities of 1M, 100k, 10k, 1k, 100, and 10. Then we get:

PopCenterTable.jpg

(Note: the drawing includes cities of >10M separately from 1M, but this throws the estimate off by an order of magnitude on the small side. Not enough megacities in the US.)

When we combine this with the total population of 3 M people, we get the total number of cities/towns that people live in (that is, if 80% of 300 M people live in cities of 10 M people or more, we can determine how many 10 M person cities there are; etc.)

Here's the nested expression (p = people):


300 M = 3e8 p \times \left(
  \dfrac{.80 p}{1e6 p/city} + .20 \times \left(
    \dfrac{.80 p}{1e5 p/city} + .20 \times \left(
      \dfrac{.80 p}{1e4 p/city} + .20 \times \left(
        \dfrac{.80 p}{1e3 p/city} + .20 \times \left(
          \dfrac{.80 p}{1e2 p/city} + \dfrac{.20}{1e1 p/city}
        \right)
      \right)
    \right)
  \right)
\right)

Plugging this into Wolfram Alpha gives 17,040.

This is pretty damn close to the number of "incorporated places" listed by the census: 19,354 [1].



Sorting algorithm time

A sorting algorithm takes 1 second to sort 1000 items.

How long will it take to sort 10,000 items if:

a) O(n^2) algorithm?

b) O(n log n) algorithm?

a) If O(n^2) algorithm:


N_{ops} \sim 1000^2 \sim (1e3)^2 \sim 10^6 ops

which gives us a time per operation of 1 microsecond:


t \sim \dfrac{ T }{ N_{ops} } \sim {1}{10^6} \sim 10^{-6} s/op

Now, for 10,000 items,


N_{ops} \sim (10,000)^2 \sim (10^4)^2 \sim 10^8

and this gives a total time of


T \sim N_{ops} t \sim (10^8)(10^{-6}) \sim 10^2 \sim 100 seconds

The quadratic algorithm will sort 10,000 items in 100 seconds.


For the superlinear algorithm,


N_{ops} \sim 1000 log 1000 \sim 3000 ops

This gives time per operation of


t \sim \dfrac{T}{N_{ops}} \sim \dfrac{1 s}{3000 ops} \sim 3 \times 10^-3 s/op

So for 10,000 items,


N_{ops} \sim (10,000) log (10,000) \sim 4(10,000) \sim 4 \times 10^4 ops

This gives a total time of


T \sim N_{ops} t \sim (4 \times 10^4) \dfrac{3 \times 10^{-3} s}{op} \sim 12 \times 10 \sim 120 s

The superlinear algorithm will sort 10,000 items in 120 seconds.






See also: