## Questions

### Finding one duplicate

Question C-3.17) A is an array of size n>=2 containing integers from 1 to n-1, inclusive, one of which is repeated. Describe an algorithm for finding the integer in A that is repeated.

A special case of this question is when array A is sorted, since then we know that each array element should correspond to its own index plus 1, and we can do a binary search to locate the array index that does not match the criteria. In this case, the runtime is O(log n).

### Finding multiple duplicates

Question C-3.18) Let B be an array of size n>=6 containing integers from 1 to n-5, inclusive. Five of the integers are repeated. Describe an algorithm for finding the five integers in B that are repeated.

## Solution approach

There are a few different solution approaches, depending on the problem constraints:

• Is the array sorted, or unsorted?
• What if we have a one-time O(n) cost of construction?
• Are we after a space-efficient or operationally-efficient algorithm?
• What kind of data - primitive types or objects?
• Do we have to use an array? Why not use a collections object?

(Square-bracket indexing is nice to have.)

### Single element case

The cost of finding duplicates depends on problem parameters and tradeoffs. Assuming we want the absolute fastest algorithm, we are looking for a constant-time or logarithmic algorithm. (There are no O(1) algorithms here.)

Log time algorithms:

• O(n log n) one-time cost to sort
• O(log n) lookup time in a binary tree (less memory)
• O(1) lookup time in a hash table (more memory)

In the particular case given, of the integers 1 through n corresponding to indices 0 through (n-1), we are looking for an element that meets a particular criteria: normally in binary sort the criteria is a (greater than/less than/equal to) condition. Here, the criteria is the place where the value of element i (at zero-indexed i) changes from i+1 to i. The binary search algorithm can be outfitted to find this change point, using its head, tail, and middle pointers.

Linear time algorithms:

• O(n) add each element to a set, checking membership each time. O(1) lookup for membership in a set, iterating one item at a time, leads to O(n) time.

One-time expense algorithms:

• O(1) lookup for O(n) cost of building a hash table, at a potentially high memory cost.
• O(log n) binary search lookup for O(n log n) cost of sorting into a binary tree or similar structure, more efficient memory usage.

If we are going to be moving items in and out, what's the best approach? It all depends... Let's explore.

NOTE: Remember the ghost log. The cost of doing something with frequency of 1/j for j=1 to n is the harmonic number, which is O(log n).

Java Implementation:

• Use a HashMap, with the key being the element in the array, the value being an integer counter.

### Multi element case

In the multi-element duplicate case: