From charlesreid1

Line 148: Line 148:
===Goodrich Chapter 3===
===Goodrich Chapter 3===


Question R-3.6: ''What is the sum of the first n even numbers, 0..2n, for n > 0?''
'''Question R-3.6: What is the sum of the first n even numbers, 0..2n, for n > 0?'''


Answer: Use the sum of integers formula,
Answer: Use the sum of integers formula,
Line 158: Line 158:
Sum of first n even numbers is <math>n^2 + n</math>.
Sum of first n even numbers is <math>n^2 + n</math>.


Question C-3.35: ''Assuming it is possible to sort n numbers in O(n log n) time, show it is possible to solve 3-way set disjointedness problem in O(n log n) time.''
'''Question C-3.35: Assuming it is possible to sort n numbers in O(n log n) time, show it is possible to solve 3-way set disjointedness problem in O(n log n) time.'''


Answer: First, we sort 3 sets. That takes 3 * n log n units of time.
Answer: First, we sort 3 sets. That takes 3 * n log n units of time.
Line 168: Line 168:
Is the worst case then still O(n log n)? Worst case is, every element in A also in B, so we perform two binary searches each time. 2 * log n still does not cost us that much.
Is the worst case then still O(n log n)? Worst case is, every element in A also in B, so we perform two binary searches each time. 2 * log n still does not cost us that much.


Question C-3.36: ''Describe an efficient algorithm for finding the 10 largest elements of a sequence of size n. What is the runtime of this algorithm?''
'''Question C-3.36: Describe an efficient algorithm for finding the 10 largest elements of a sequence of size n. What is the runtime of this algorithm?'''


Answer: sort n elements in O(n log n) time.
Answer: sort n elements in O(n log n) time.
Line 185: Line 185:
This shows that when an algorithm that runs n times, O(n), calls an algorithm that runs in O(i^2) time, it results in an O(n^3) algorithm overall
This shows that when an algorithm that runs n times, O(n), calls an algorithm that runs in O(i^2) time, it results in an O(n^3) algorithm overall


Question P-3.57: ''Perform experimental analysis to test hypothesis that Python's sorted() function runs in O(n log n) time.
'''Question P-3.57: Perform experimental analysis to test hypothesis that Python's sorted() function runs in O(n log n) time.'''


Answer: see https://charlesreid1.com:3000/cs/python/src/master/algorithm_complexity/test_sorted.py
Answer: see https://charlesreid1.com:3000/cs/python/src/master/algorithm_complexity/test_sorted.py

Revision as of 23:52, 27 May 2017

Notes

Goodrich Chapter 3

date structure - systematic way of organizing and accessing data

algorithm - step by step procedure for performing a task in a fixed amount of time

To time algorithms: distinguish betwee;

  • wall time (less fair, dependent on environment) time.time
  • number of cpu cycles used by algorithm (better) time.clock
  • timeit/built in profiling tool (best)

Moving beyond experimental analysis:

  • Independence from hardware/software
  • Use higher level analysis
  • Account for all possible inputs

Worse case versus average case

  • Algorithm speed changes with different inputs
  • Worst case is easiest to determine
  • Average case requires probability distribution over inputs

$ \overline{t} = \int_{\Omega} t(\mathbf{x}) p(\mathbf{x}) d\mathbf{x} $

Seven functions for algorithm analysis: 1. Constant function 2. Logarithmic function 3. Linear function 4. n log n function 5. Quadratic function 6. Cubic/polynomial function 7. Exponential function

Example: find_max function

  • Runtime: 2 statements always run, plus the for loop
  • Probabilit/worse case/average case;
  • The worst case assumes we have to run biggest = k every time through the for loop
  • Average case analysis:
  • If there are n numbers total, and if we have just looked at j numbers so far, the probability that the jth number will be the new largest of j numbers is 1/j
  • The number of times biggest number is updated, when we run n times, is the sum of each probability,

$ H_{n} = \sum_{j=1}^{n} \dfrac{1}{j} $

nth Harmonic number

Harmonic numbers increase as O(log n)

Geometric sums:

  • For any integer n>= 0 and any real number a > 0, a neq 1,

$ \sum_{i=1}^{n} a^i = 1 + a + a^2 + \dots + a^{n-1} + a^n = \dfrac{ a^{n+1} - 1}{a-1} $

This applies to finding the largest integer representable with n bits:

$ 1 + 2 + 4 + 8 + \dots + 2^{n-1} = 2^n - 1 $

Big O notation:

  • If we have two functions f(n) and g(n) mapping positive integers to positive real numbers, wthen we say f(n) is O(g(n)) if, for a real constnat,

$ f(n) \leq c g(n) $

for $ n \neq n_0 $

"f(n) is gib-O of g(n)

Alt: "f(n) is order of g(n)"

Alt: "f(n) is O(g(n))"

Big o notation allows ignoring constant factors and all lower order terms

Big omega notatino:

  • Big O means less than or equal to
  • Big Omega means greater than or equal to
  • Take the same two functions f(n) and g(n),
  • We say that f(n) is Omega(g(n)) if

$ f(n) \geq c g(n) $

Example: 3 n log n - 2 n is Omega(n log n)

Asymptotic analysis:

  • To compare algorithms, look at two things:
    • Growth rates of algorithms as n increases
    • Maximum problem size (n) that can be solved as t increases
  • Examples of algorithm analysis:
    • Length of list - O(1)
    • Accessing element of list - O(1)
    • Finding maximum: worst case vs aerage case
    • Prefix averages
    • Three-way disjointness
    • Element uniqueness

Prefix averages:

  • If we want to find average of sequence S of n numbers, in another sequence A, such that A[j] is average of first j numbers,

$ A[j] = \dfrac{ \sum_{i=0}^{j} S[i] }{j+1} $

3 implementations;

  • Implementation 1: nested for loops (poor performance, O(n^2))
  • Implementation 2: uses colon indexing notation, like $ sum(S[0:j+1]) $, but no better performance, still O(n^2)
  • Implementation 3: keep a running total, and use a cumulative algorithm

Three way set disjointness:

  • Finding items that are in all 3 sets (if not, 3 sets are disjoint)
  • 2 implementations:
    • Triple nested for loop, if a==b==c (wastes time, O(n^3))
    • Double nested for loop, if a==b, then another for loop, if b==c, O(n^2)

Sorting as an algorithm tool:

  • Sort before searching. Sorted function is O(n log n)
  • Limits cost to cost ofsearch

Justifying algorithm cost:

  • Proof by example/counter-examkple
  • Proof by contradiction/contrapositive
  • use of DeMorgan's Law
  • Induction and loop invariants

induction example:

  • To prove Fibonacci number nl, F(n), is less than 2^n: First show $ F(1) < 2^1, F(2) < 2^2<math> * Then show <math>F(n) < 2^{n-2} + 2^{n-1} $
  • Then we know $ F(n) < 2^n $

Loop invariant:

  • Prove a statement L about a loop
  • Show L0 (initial claim) is true before the loop is run
  • Show that if L(j-1) is true before an iteration, then L(j) is true after an iteration
  • Final statement, Lk implies L is true.

Questions

Goodrich Chapter 3

Question R-3.6: What is the sum of the first n even numbers, 0..2n, for n > 0?

Answer: Use the sum of integers formula,

$ \sum_{k=0}^{n} 2k = 2 \sum_{k=0}^{n} k = 2 \dfrac{n(n+1)}{2} = n(n+1) = n^2 + n $

Sum of first n even numbers is $ n^2 + n $.

Question C-3.35: Assuming it is possible to sort n numbers in O(n log n) time, show it is possible to solve 3-way set disjointedness problem in O(n log n) time.

Answer: First, we sort 3 sets. That takes 3 * n log n units of time.

Then we can iterate over each element of a, which costs us n, and inside of that loop we can perform a binary search, which costs us log n, for an overall cost of n log n.

If we find b in a, then we perform another binary search for c in a. This will cost us log n.

Is the worst case then still O(n log n)? Worst case is, every element in A also in B, so we perform two binary searches each time. 2 * log n still does not cost us that much.

Question C-3.36: Describe an efficient algorithm for finding the 10 largest elements of a sequence of size n. What is the runtime of this algorithm?

Answer: sort n elements in O(n log n) time.

  • Get 10 items from the back (biggest) - or if you want the smallest, from the front.

Question C-3.38: Show that $ \sum_{i=1}^{n} i^2 $ is O(n^3).

Answer: This is the notation we use when we need to say, "we are running a loop, and inside the loop calling a function whose cost is O(i^2)."

We know $ \sum_{i=1}^{n} i = \dfrac{n(n+1)}{2} \sim O(n^2) $

We can write $ \sum{i=1}^{n} i^2 = \dfrac{n(n+1)(2n+1)}{6} $

(Derivation requires some series manipulation, starting with sum over k^3 and eventually getting to k^2

This shows that when an algorithm that runs n times, O(n), calls an algorithm that runs in O(i^2) time, it results in an O(n^3) algorithm overall

Question P-3.57: Perform experimental analysis to test hypothesis that Python's sorted() function runs in O(n log n) time.

Answer: see https://charlesreid1.com:3000/cs/python/src/master/algorithm_complexity/test_sorted.py