Algorithm complexity
From charlesreid1
Notes
Goodrich Chapter 3 Notes
date structure - systematic way of organizing and accessing data
algorithm - step by step procedure for performing a task in a fixed amount of time
To time algorithms: distinguish betwee;
- wall time (less fair, dependent on environment) time.time
- number of cpu cycles used by algorithm (better) time.clock
- timeit/built in profiling tool (best)
Moving beyond experimental analysis:
- Independence from hardware/software
- Use higher level analysis
- Account for all possible inputs
Worse case versus average case
- Algorithm speed changes with different inputs
- Worst case is easiest to determine
- Average case requires probability distribution over inputs
$ \overline{t} = \int_{\Omega} t(\mathbf{x}) p(\mathbf{x}) d\mathbf{x} $
Seven functions for algorithm analysis: 1. Constant function 2. Logarithmic function 3. Linear function 4. n log n function 5. Quadratic function 6. Cubic/polynomial function 7. Exponential function
Example: find_max function
- Runtime: 2 statements always run, plus the for loop
- Probabilit/worse case/average case;
- The worst case assumes we have to run biggest = k every time through the for loop
- Average case analysis:
- If there are n numbers total, and if we have just looked at j numbers so far, the probability that the jth number will be the new largest of j numbers is 1/j
- The number of times biggest number is updated, when we run n times, is the sum of each probability,
$ H_{n} = \sum_{j=1}^{n} \dfrac{1}{j} $
nth Harmonic number
Harmonic numbers increase as O(log n)
Geometric sums:
- For any integer n>= 0 and any real number a > 0, a neq 1,
$ \sum_{i=1}^{n} a^i = 1 + a + a^2 + \dots + a^{n-1} + a^n = \dfrac{ a^{n+1} - 1}{a-1} $
This applies to finding the largest integer representable with n bits:
$ 1 + 2 + 4 + 8 + \dots + 2^{n-1} = 2^n - 1 $
Big O notation:
- If we have two functions f(n) and g(n) mapping positive integers to positive real numbers, wthen we say f(n) is O(g(n)) if, for a real constnat,
$ f(n) \leq c g(n) $
for $ n \neq n_0 $
"f(n) is gib-O of g(n)
Alt: "f(n) is order of g(n)"
Alt: "f(n) is O(g(n))"
Big o notation allows ignoring constant factors and all lower order terms
Big omega notatino:
- Big O means less than or equal to
- Big Omega means greater than or equal to
- Take the same two functions f(n) and g(n),
- We say that f(n) is Omega(g(n)) if
$ f(n) \geq c g(n) $
Example: 3 n log n - 2 n is Omega(n log n)
Asymptotic analysis:
- To compare algorithms, look at two things:
- Growth rates of algorithms as n increases
- Maximum problem size (n) that can be solved as t increases
- Examples of algorithm analysis:
- Length of list - O(1)
- Accessing element of list - O(1)
- Finding maximum: worst case vs aerage case
- Prefix averages
- Three-way disjointness
- Element uniqueness
Prefix averages:
- If we want to find average of sequence S of n numbers, in another sequence A, such that A[j] is average of first j numbers,
$ A[j] = \dfrac{ \sum_{i=0}^{j} S[i] }{j+1} $
3 implementations;
- Implementation 1: nested for loops (poor performance, O(n^2))
- Implementation 2: uses colon indexing notation, like $ sum(S[0:j+1]) $, but no better performance, still O(n^2)
- Implementation 3: keep a running total, and use a cumulative algorithm
Three way set disjointness:
- Finding items that are in all 3 sets (if not, 3 sets are disjoint)
- 2 implementations:
- Triple nested for loop, if a==b==c (wastes time, O(n^3))
- Double nested for loop, if a==b, then another for loop, if b==c, O(n^2)
Sorting as an algorithm tool:
- Sort before searching. Sorted function is O(n log n)
- Limits cost to cost ofsearch
Justifying algorithm cost:
- Proof by example/counter-examkple
- Proof by contradiction/contrapositive
- use of DeMorgan's Law
- Induction and loop invariants
induction example:
- To prove Fibonacci number nl, F(n), is less than 2^n: First show $ F(1) < 2^1, F(2) < 2^2<math> * Then show <math>F(n) < 2^{n-2} + 2^{n-1} $
- Then we know $ F(n) < 2^n $
Loop invariant:
- Prove a statement L about a loop
- Show L0 (initial claim) is true before the loop is run
- Show that if L(j-1) is true before an iteration, then L(j) is true after an iteration
- Final statement, Lk implies L is true.
Questions
Goodrich Chapter 3
Question R-3.6: What is the sum of the first n even numbers, 0..2n, for n > 0?
Answer: Use the sum of integers formula,
$ \sum_{k=0}^{n} 2k = 2 \sum_{k=0}^{n} k = 2 \dfrac{n(n+1)}{2} = n(n+1) = n^2 + n $
Sum of first n even numbers is $ n^2 + n $.
Question C-3.35: Assuming it is possible to sort n numbers in O(n log n) time, show it is possible to solve 3-way set disjointedness problem in O(n log n) time.
Answer: First, we sort 3 sets. That takes 3 * n log n units of time.
Then we can iterate over each element of a, which costs us n, and inside of that loop we can perform a binary search, which costs us log n, for an overall cost of n log n.
If we find b in a, then we perform another binary search for c in a. This will cost us log n.
Is the worst case then still O(n log n)? Worst case is, every element in A also in B, so we perform two binary searches each time. 2 * log n still does not cost us that much.
Question C-3.36: Describe an efficient algorithm for finding the 10 largest elements of a sequence of size n. What is the runtime of this algorithm?
Answer: sort n elements in O(n log n) time.
- Get 10 items from the back (biggest) - or if you want the smallest, from the front.
Question C-3.38: Show that $ \sum_{i=1}^{n} i^2 $ is O(n^3).
Answer: This is the notation we use when we need to say, "we are running a loop, and inside the loop calling a function whose cost is O(i^2)."
We know $ \sum_{i=1}^{n} i = \dfrac{n(n+1)}{2} \sim O(n^2) $
We can write $ \sum{i=1}^{n} i^2 = \dfrac{n(n+1)(2n+1)}{6} $
(Derivation requires some series manipulation, starting with sum over k^3 and eventually getting to k^2
This shows that when an algorithm that runs n times, O(n), calls an algorithm that runs in O(i^2) time, it results in an O(n^3) algorithm overall
Question P-3.57: Perform experimental analysis to test hypothesis that Python's sorted() function runs in O(n log n) time.
Answer: see https://charlesreid1.com:3000/cs/python/src/master/algorithm_complexity/test_sorted.py