# Notes

## Goodrich Chapter 3 Notes

date structure - systematic way of organizing and accessing data

algorithm - step by step procedure for performing a task in a fixed amount of time

To time algorithms: distinguish between:

• wall time (less fair, dependent on environment) time.time
• number of cpu cycles used by algorithm (better) time.clock
• timeit/built in profiling tool (best)

Moving beyond experimental analysis:

• Independence from hardware/software
• Use higher level analysis
• Account for all possible inputs

Worse case versus average case

• Algorithm speed changes with different inputs
• Worst case is easiest to determine
• Average case requires probability distribution over inputs

${\displaystyle {\overline {t}}=\int _{\Omega }t(\mathbf {x} )p(\mathbf {x} )d\mathbf {x} }$

Seven functions for algorithm analysis:

1. Constant function

2. Logarithmic function

3. Linear function

4. n log n function

6. Cubic/polynomial function

7. Exponential function

Example: find_max function

• Runtime: 2 statements always run, plus the for loop
• Probabilit/worse case/average case;
• The worst case assumes we have to run biggest = k every time through the for loop
• Average case analysis:
• If there are n numbers total, and if we have just looked at j numbers so far, the probability that the jth number will be the new largest of j numbers is 1/j
• The number of times biggest number is updated, when we run n times, is the sum of each probability,

${\displaystyle H_{n}=\sum _{j=1}^{n}{\dfrac {1}{j}}}$

nth Harmonic number

Harmonic numbers increase as ${\displaystyle O(\log n)}$ - sometimes this is the source of a "magic log" term that appears.

Geometric sums:

• For any integer n>= 0 and any real number a > 0, a neq 1,

${\displaystyle \sum _{i=1}^{n}a^{i}=1+a+a^{2}+\dots +a^{n-1}+a^{n}={\dfrac {a^{n+1}-1}{a-1}}}$

This applies to finding the largest integer representable with n bits:

${\displaystyle 1+2+4+8+\dots +2^{n-1}=2^{n}-1}$

Big O notation:

• If we have two functions f(n) and g(n) mapping positive integers to positive real numbers, wthen we say f(n) is O(g(n)) if, for a real constnat,

${\displaystyle f(n)\leq cg(n)}$

for ${\displaystyle n\neq n_{0}}$

"f(n) is gib-O of g(n)

Alt: "f(n) is order of g(n)"

Alt: "f(n) is O(g(n))"

Big o notation allows ignoring constant factors and all lower order terms

Big omega notatino:

• Big O means less than or equal to
• Big Omega means greater than or equal to
• Take the same two functions f(n) and g(n),
• We say that f(n) is Omega(g(n)) if

${\displaystyle f(n)\geq cg(n)}$

Example: 3 n log n - 2 n is Omega(n log n)

Asymptotic analysis:

• To compare algorithms, look at two things:
• Growth rates of algorithms as n increases
• Maximum problem size (n) that can be solved as t increases
• Examples of algorithm analysis:
• Length of list - O(1)
• Accessing element of list - O(1)
• Finding maximum: worst case vs aerage case
• Prefix averages
• Three-way disjointness
• Element uniqueness

Prefix averages:

• If we want to find average of sequence S of n numbers, in another sequence A, such that A[j] is average of first j numbers,

${\displaystyle A[j]={\dfrac {\sum _{i=0}^{j}S[i]}{j+1}}}$

3 implementations;

• Implementation 1: nested for loops (poor performance, O(n^2))
• Implementation 2: uses colon indexing notation, like ${\displaystyle sum(S[0:j+1])}$, but no better performance, still O(n^2)
• Implementation 3: keep a running total, and use a cumulative algorithm

Three way set disjointness:

• Finding items that are in all 3 sets (if not, 3 sets are disjoint)
• 2 implementations:
• Triple nested for loop, if a==b==c (wastes time, O(n^3))
• Double nested for loop, if a==b, then another for loop, if b==c, O(n^2)

Sorting as an algorithm tool:

• Sort before searching. Sorted function is O(n log n)
• Limits cost to cost ofsearch

Justifying algorithm cost:

• Proof by example/counter-examkple
• use of DeMorgan's Law
• Induction and loop invariants

induction example:

• To prove Fibonacci number nl, F(n), is less than 2^n: First show ${\displaystyle F(1)<2^{1},F(2)<2^{2}}$
• Then show ${\displaystyle F(n)<2^{n-2}+2^{n-1}}$
• Then we know ${\displaystyle F(n)<2^{n}}$

Loop invariant:

• Prove a statement L about a loop
• Show L0 (initial claim) is true before the loop is run
• Show that if L(j-1) is true before an iteration, then L(j) is true after an iteration
• Final statement, Lk implies L is true.

## Skiena Chapter 1 and 2 Notes

### Analyzing algorithms

RAM model of computation - each simple operation takes exactly one time step. Loops and subroutines are not simple operations.

Algorithms can be understood and studied in a language- and machine-independent manner.

Three ways of looking at runtime:

• worst-case complexity
• average-case complexity
• best-case complexity

Why average case complexity so difficult? To illustrate: casino example.

Try to project what will happen if you bring n dollars into a casino to gamble. The best case, that you walk out owning the place, is possible but so unlikely that you should not even think about it. The worst case, that you lose all n dollars, is easy to calculate and distressingly likely to happen. The average case, that the typical bettor loses 87.32% of the money that he brings into the casino, is difficult to establish and its meaning subject to debate. What exactly does average mean? Stupid people lose more than smart people, so are you smarter or stupider than the average person, and by how much? Card counters at blackjack do better on average than customers who accept three or more free drinks. We avoid all these complexities and obtain very useful result by just considering the worst case.

That's concisely expressing in language the complexity of the parameter space ${\displaystyle \mathbf {x} }$. The average time is expressed mathematically as:

${\displaystyle {\overline {t}}=\int _{\Omega }t(\mathbf {x} )p(\mathbf {x} )d\mathbf {x} }$

where ${\displaystyle t(\mathbf {x} )}$ is algorithm time as a function of the input parameters (this is the function that maps the input parameter values to the output, the computational cost. The probability distribution ${\displaystyle p(\mathbf {x} )}$ is the probability distribution that integrates to 1 when integrated over the entire input parameter domain ${\displaystyle \Omega }$:

${\displaystyle \int _{\Omega }p(\mathbf {x} )d\mathbf {x} =1.0}$

Note: this comes from the definition of the average of a function, from calculus 2:

${\displaystyle {\overline {f}}(x)={\dfrac {1}{b-a}}\int _{a}^{b}f(p)dp}$

### Big O, Big Omega, Big Theta

Big O:

• ${\displaystyle f(n)=O(g(n))}$ means ${\displaystyle cg(n)}$ is an upper bound on ${\displaystyle f(n)}$. There exists a constant such that ${\displaystyle f(x)\leq cf(x)}$ for ${\displaystyle n\geq n_{0}}$

Big Omega:

• ${\displaystyle f(n)=\Omega (g(n))}$ means ${\displaystyle cg(n)}$ is a lower bound on ${\displaystyle f(n)}$. There exists constant c such that ${\displaystyle f(n)\geq cg(n)}$ for all ${\displaystyle n\geq n_{0}}$

Big Theta:

• ${\displaystyle f(n)=\Theta (g(n))}$ means the function is bounded by a function ${\displaystyle g(n)}$ within two values of constants, ${\displaystyle c_{1}g(n)}$ forming a lower bound and ${\displaystyle c_{2}g(n)}$ forming an upper bound, for ${\displaystyle n\geq n_{0}}$

Example question: is ${\displaystyle 2^{n+1}=\Theta (2^{n})}$?

Using exponent properties, and the definition of ${\displaystyle \Theta (n)}$,

${\displaystyle 2^{n+1}=2\times 2^{n+1}}$

and there are two constants ${\displaystyle c_{1}=1}$ and ${\displaystyle c_{2}=3}$ such that

${\displaystyle 1\times 2^{n}\leq f(x)\leq 3\times 2^{n}}$

Another question: is ${\displaystyle (x+y)^{2}=O(x^{2}+y^{2})}$?

By expanding the quantity on the left hand side, and determining the number of operations,

${\displaystyle x^{2}+2xy+y^{2}=O(x^{2})+O(xy)+O(y^{2})}$

If ${\displaystyle x\leq y}$ then the term ${\displaystyle 2xy\leq 2y^{2}\leq 2(x^{2}+y^{2})}$

If ${\displaystyle y\leq x}$ then the middle term ${\displaystyle 2xy\leq 2x^{2}\leq 2(x^{+}y^{2})}$

### Heuristics

Heuristics:

Algorithms, whatever their big Oh class, take generally the same time for n=10

Any algorithm with ${\displaystyle n!}$ runtime is impractical for ${\displaystyle n\geq 20}$

Algorithms with a runtime of ${\displaystyle 2^{n}}$ have a greater operating range, but become impractical for ${\displaystyle n>40}$

Linear time and n log n time algorithms are practical for inputs of a billion items

An O(log n) algorithm hardly breaks a sweat for any imaginable value of n

### List of functions

List of functions, in increasing order of dominance (more thorough than the Goodrich book):

• Constant functions ${\displaystyle f(n)=1}$
• Logarithmic functions ${\displaystyle f(n)=\log n}$
• Linear functions ${\displaystyle f(n)=n}$
• Superlinear functions ${\displaystyle f(n)=n\log n}$
• Quadratic functions ${\displaystyle f(n)=n^{2}}$
• Cubic functions ${\displaystyle f(n)=n^{3}}$
• Exponential functions ${\displaystyle f(n)=c^{n}}$
• Factorial functions ${\displaystyle f(n)=n!}$

The sum of two functions is governed by the dominant one:

${\displaystyle O(f(n))+O(g(n))\rightarrow O(\max(f(n),g(n)))}$

The product of two functions is governed by the dominant one:

${\displaystyle O(f(n))\times O(g(n))\rightarrow O(f(n)\times g(n))}$

### String Pattern Matching

Pattern matching is the most fundamental algorithmic operation on text strings.

Problem: Substring pattern matching.

Input: A text string t and a pattern string p.

Output: Does t contain the pattern p as a substring, and if so, where?

### Logs and Binary Search

Binary search is an O(log n) algorithm.

To find a given name in the one-million-person Manhattan phone book, you only need 20 comparisons.

${\displaystyle 2620\sim 1000^{2}\sim 1,000,000}$

How many lookups would it take to find one thing in 1000 items?

${\displaystyle 2^{1}0\sim 1000^{1}}$

${\displaystyle 2^{2}0\sim 1000^{2}\sim 1M}$

${\displaystyle 2^{3}0\sim 1000^{3}\sim 1B}$

${\displaystyle 2^{4}0\sim 1000^{3}\sim 1T}$

The height h of a rooted binary tree with n leaf nodes is log(n)

### Logs and Bits

The number of bits we need to represent any one of n different possibilities (n items, or the n integers 1 to n), using bits, is log(n) bits. (Log base 2, of course.)

### Logs and Multiplication

Logarithms can be used to multiply numbers (principle of the slide rule).

${\displaystyle log(xy)=log(x)+log(y)}$

Now can use exponent and log functions to calculate any exponent ${\displaystyle a^{b}}$, like so:

${\displaystyle a^{b}=\exp(\ln(a^{b}))=\exp(b\ln a)}$

### Fast exponentiation

The simplest algorithm to raise some number a to the power of n would require n-1 multiplications.

However, can also observe that n = floor(n/2) + ceil(n/2)

Then we can split up the exponent as:

${\displaystyle a^{n}=(a^{\frac {n}{2}})^{2}}$

and if n is odd,

${\displaystyle a(a^{floor({\frac {n}{2}})})^{2}}$

This eliminates half of the total multiplications, and only introduces two new multiplication operations.

func power(a,n):
if(n==0) return(1)
x = power(a, floor(n/2))
if(n is even) then return(x^2)
else return(a*x^2)


# Questions

## Goodrich Chapter 3 Questions

### Sum of first n even numbers

Question R-3.6: What is the sum of the first n even numbers, 0..2n, for n > 0?

Answer: Use the sum of integers formula,

${\displaystyle \sum _{k=0}^{n}2k=2\sum _{k=0}^{n}k=2{\dfrac {n(n+1)}{2}}=n(n+1)=n^{2}+n}$

Sum of first n even numbers is ${\displaystyle n^{2}+n}$.

### Solving 3 way disjointedness problem in n log n time

Question C-3.35: Assuming it is possible to sort n numbers in O(n log n) time, show it is possible to solve 3-way set disjointedness problem in O(n log n) time.

Answer: First, we sort 3 sets. That takes 3 * n log n units of time.

Then we can iterate over each element of a, which costs us n, and inside of that loop we can perform a binary search, which costs us log n, for an overall cost of n log n.

If we find b in a, then we perform another binary search for c in a. This will cost us log n.

Is the worst case then still O(n log n)? Worst case is, every element in A also in B, so we perform two binary searches each time. 2 * log n still does not cost us that much.

### finding 10 largest elements

Question C-3.36: Describe an efficient algorithm for finding the 10 largest elements of a sequence of size n. What is the runtime of this algorithm?

Answer: sort n elements in O(n log n) time.

• Get 10 items from the back (biggest) - or if you want the smallest, from the front.

### O(n) algorithm calling O(n2) algorithm

Question C-3.38: Show that ${\displaystyle \sum _{i=1}^{n}i^{2}}$ is O(n^3).

Answer: This is the notation we use when we need to say, "we are running a loop, and inside the loop calling a function whose cost is O(i^2)."

We know (via the story about Carl Gauss - 5050):

${\displaystyle \sum _{i=1}^{n}i={\dfrac {n(n+1)}{2}}\sim O(n^{2})}$

We can also write:

${\displaystyle \sum _{i=1}^{n}i^{2}={\dfrac {n(n+1)(2n+1)}{6}}}$

(Derivation requires some series manipulation, starting with sum over k^3 and eventually getting to k^2

This shows that when an algorithm that runs n times, O(n), calls an algorithm that runs in O(i^2) time, it results in an O(n^3) algorithm overall

### Run a scaling analysis to show Python sorted function is n log n

Question P-3.57: Perform experimental analysis to test hypothesis that Python's sorted() function runs in O(n log n) time.

## Skiena Chapter 1 Questions

### Proofs by Induction

#### Sum of integers

Question: Prove by induction that ${\displaystyle \sum _{i=1}^{n}i={\dfrac {n(n+1)}{2}}}$ for ${\displaystyle n\geq 0}$

Answer: start by letting n = 1, then we see that the expression holds.

Then we let n = 2, and we see that the expression holds.

Finally, we assume that the expression is true for n, and see if that implies it is true for n+1. The strategy here is, peel off the one largest term, then substitute the expression (which we assumed is true) and see what shakes out.

${\displaystyle \sum _{i=1}^{n+1}i=(n+1)+\sum _{i=1}^{n}i=(n+1)+{\dfrac {n(n+1)}{2}}={\dfrac {2n+2}{2}}+{\dfrac {n^{2}+n}{2}}}$

Continuing,

${\displaystyle {\dfrac {n^{2}+3n+2}{2}}={\dfrac {(n+1)(n+2)}{2}}={\dfrac {(n+1)((n+1)+1)}{2}}}$

#### Sum of squares of integers

Question: Prove by induction that ${\displaystyle \sum _{i=1}^{n}i^{2}={\dfrac {n(n+1)(2n+1)}{2}}}$ for ${\displaystyle n\geq 0}$ by induction.

Answer: same strategy as before: verify the case is true for n = 1, then verify the case is true for n = 2.

Then, we assume the nth case is true, and check if that implies the n+1th case is true:

${\displaystyle \sum _{i=1}^{n+1}i^{2}=(n+1)^{2}+\sum _{i=1}^{n}i^{2}=(n+1)^{2}+dfrac{n(n+1)(2n+1)}{6}}$

Continuing:

${\displaystyle ={\dfrac {6(n^{2}+2n+1)}{6}}+{\dfrac {(n^{2}+n)(2n+1)}{6}}={\dfrac {(6n^{2}+12n+6)+(2n^{3}+n^{2}+2n^{2}+n)}{6}}}$

Finally:

${\displaystyle ={\dfrac {2n^{3}+9n^{2}+13n+6}{6}}}$

For this one, it's a little bit easier to work backwards from the given formula, plugging in (n+1) for n,

${\displaystyle ={\dfrac {(n+1)(n+2)(2n+3)}{6}}={\dfrac {(n^{2}+3n+2)(2n+3)}{6}}={\dfrac {2n^{3}+9n^{2}+13n+6}{6}}}$

#### Sum of cubes of integers

Question: Prove that ${\displaystyle \sum _{i=1}^{n}i^{3}={\dfrac {n^{2}(n+1)^{2}}{4}}}$ for ${\displaystyle n\geq 0}$, by induction.

#### Sum of product of neighbor numbers

Question: Prove that ${\displaystyle \sum _{i=1}^{n}i(i+1)(i+2)={\dfrac {n(n+1)(n+2)(n+3)}{4}}}$.

Let n=1, prove the case of n=1. Let n=2, prove the case of n=2.

Then, if the nth case is true, the n+1th case become:

${\displaystyle \sum _{i=1}^{n+1}i(i+1)(i+2)=(n+1)(n+2)(n+3)+\sum _{i=1}^{n}i(i+1)(i+2)={\frac {4(n+1)(n+2)(n+3)}{4}}+{\dfrac {n(n+1)(n+2)(n+3)}{4}}}$

The last two terms can be factored by grouping, to form a fourth linear term (n+4), so

${\displaystyle ={\dfrac {(n+1)(n+2)(n+3)(n+4)}{4}}}$

which was the premise to be proved.

#### Sums of expression divisible by 3

Question: Prove by induction that ${\displaystyle n^{3}+2n}$ is divisible by 3 for all ${\displaystyle n\geq 0}$.

Start with case of n = 1, and n = 2. Both divisible by 3.

Then we assume the nth case is true, and see if it implies the n+1th case:

${\displaystyle (n+1)^{3}+2(n+1)=n^{3}+3n^{2}+3n+1+2n+2=(n^{3}+2n)+(3n^{2}+3n+3)}$

The first term on the right hand side is divisible by 3 by our assumption, and the last term on the right is definitely divisible by 3.

### Estimation

See Estimation

#### Pages of books

Question 1-19 a) Do all of the combined the books you own have at least 1M pages?

Each book has O(100) pages - let's say, O(250) pages. Using either 1E2 or 2.5E2 as number of pages per book.

100 pages per book: 1 million pages is 1E6 pages x (1 book/1E2 pages) = 1E4 books = 10,000 books. Nope.

250 pages per book: 1 million pages is 1E6 pages x (1 book/2.5E2 pages) = 4E3 books = 4,000 books. Still nope.

Question 1-19 b) How many total page are in the school library?

1 book ~ 100 pages.

1 shelf ~ 10 books.

1 long bookcase ~ 100 shelves.

1 section ~ 10 long bookcases.

1 small library ~ 10 sections

1 big library ~ 100 sections

1 very large library ~ 1000 sections

(Note: can squeeze in additional factors of 10 in by turning one 1 into one 2.5 and one 1 into one 4.)

Number of pages in the library:

1 small library x (10 sections/1 lib) x (10 bookcases/1 section) x (100 shelves/1 bookcase) x (10 books/1 shelf) x (10 pages/1 book)

= 10 x 10 x 100 x 10 x 10

= 100 x 100 x 100

= 1 million pages for a small library (10k books)

= 10 million pages for a big library (100k books)

= 100 million pages for a very large library (1M books)

(This estimate does seem on the low side, probably additional factors of 10 missing. Plus or minus 2 orders of magnitude.)

Note: the NY Public Library is estimated to have somewhere around 3 million books. So, this seems to be in the right general ballpark.

#### Number of words in Skiena textbook

Question: Number of words in the Skiena textbook?

A single full page is about 15 words per line, about 40 lines per page, for a total of about 600 words per page.

The total number of pages is 700.

So the absolute upper bound, assuming every page is densely packed, is 420,000 words.

We can get a better estimate by assuming that 2/3 of the pages are full (600 words), and 1/3 of the pages are occupied by tables, figures, and white space (half full, or 300 words).

Do the half-full pages plus the full pages:

Nwords = 700 pages x (33 half-full pages/100 pages total) x ( .50(600) words / 1 half-full page)
+ 700 pages x (66 full pages/100 pages total) x (1.0(600) words / 1 full page)

Nwords = (700)(600)(.33*.5 + .67)

Nwords = 420000(.7)

Nwords = 290,000


So the grand total is in the ballpark of 300,000 words. (Intuitively, this seems a bit on the long side.)

#### Number of hours in 1M seconds

How many hours are in 1M seconds? How many days? Do the math in your head.

Here we use that trick of using quarters when doing scale analysis.

3600 seconds per hour ~ 2500

24 hours per day ~ 25

1 M seconds = 1E6/2.5E3 = 4E(5-3) = 4E2 = 400 hours

1 M seconds = 400 hours/(25 hr/day) == 16 days

The actual value?

1E6 x (1/3600) x (1/24) = 11 days

Trick here is not to do just an order of magnitude estimate, but to make use of quarters as well. 1000/25, not just 1000/10. (If you don't do this, twice, you'll be off by an order of magnitude.)

#### Number of towns and cities in US

Question: Estimate the number of towns and cities in the United States.

Here we use the 80/20 heuristic. We assume that 80% of the population lives in large cities, the remaining 20% live in smaller cities.

Then, of that remaining 20%, we assume that 80% of them live in large cities, and 20% of them live in smaller cities.

Of that remaining 20%, etc.

Do this multiple times, splitting into groups that live in cities of 1M, 100k, 10k, 1k, 100, and 10. Then we get:

(Note: the drawing includes cities of >10M separately from 1M, but this throws the estimate off by an order of magnitude on the small side. Not enough megacities in the US.)

When we combine this with the total population of 3 M people, we get the total number of cities/towns that people live in (that is, if 80% of 300 M people live in cities of 10 M people or more, we can determine how many 10 M person cities there are; etc.)

Here's the nested expression (p = people):

${\displaystyle 300M=3e8p\times \left({\dfrac {.80p}{1e6p/city}}+.20\times \left({\dfrac {.80p}{1e5p/city}}+.20\times \left({\dfrac {.80p}{1e4p/city}}+.20\times \left({\dfrac {.80p}{1e3p/city}}+.20\times \left({\dfrac {.80p}{1e2p/city}}+{\dfrac {.20}{1e1p/city}}\right)\right)\right)\right)\right)}$

Plugging this into Wolfram Alpha gives 17,040.

This is pretty damn close to the number of "incorporated places" listed by the census: 19,354 [1].

#### Sorting algorithm time

A sorting algorithm takes 1 second to sort 1000 items.

How long will it take to sort 10,000 items if:

a) O(n^2) algorithm?

b) O(n log n) algorithm?

a) If O(n^2) algorithm:

${\displaystyle N_{ops}\sim 1000^{2}\sim (1e3)^{2}\sim 10^{6}ops}$

which gives us a time per operation of 1 microsecond:

${\displaystyle t\sim {\dfrac {T}{N_{ops}}}\sim {1}{10^{6}}\sim 10^{-6}s/op}$

Now, for 10,000 items,

${\displaystyle N_{ops}\sim (10,000)^{2}\sim (10^{4})^{2}\sim 10^{8}}$

and this gives a total time of

${\displaystyle T\sim N_{ops}t\sim (10^{8})(10^{-6})\sim 10^{2}\sim 100seconds}$

The quadratic algorithm will sort 10,000 items in 100 seconds.

For the superlinear algorithm,

${\displaystyle N_{ops}\sim 1000log1000\sim 3000ops}$

This gives time per operation of

${\displaystyle t\sim {\dfrac {T}{N_{ops}}}\sim {\dfrac {1s}{3000ops}}\sim 3\times 10^{-}3s/op}$

So for 10,000 items,

${\displaystyle N_{ops}\sim (10,000)log(10,000)\sim 4(10,000)\sim 4\times 10^{4}ops}$

This gives a total time of

${\displaystyle T\sim N_{ops}t\sim (4\times 10^{4}){\dfrac {3\times 10^{-3}s}{op}}\sim 12\times 10\sim 120s}$

The superlinear algorithm will sort 10,000 items in 120 seconds.