From charlesreid1

Questions

Finding one duplicate

Question C-3.17) A is an array of size n>=2 containing integers from 1 to n-1, inclusive, one of which is repeated. Describe an algorithm for finding the integer in A that is repeated.

A special case of this question is when array A is sorted, since then we know that each array element should correspond to its own index plus 1, and we can do a binary search to locate the array index that does not match the criteria. In this case, the runtime is O(log n).

Finding multiple duplicates

Question C-3.18) Let B be an array of size n>=6 containing integers from 1 to n-5, inclusive. Five of the integers are repeated. Describe an algorithm for finding the five integers in B that are repeated.

Solution approach

There are a few different solution approaches, depending on the problem constraints:

  • Is the array sorted, or unsorted?
  • What if we have a one-time O(n) cost of construction?
  • Are we after a space-efficient or operationally-efficient algorithm?
  • What kind of data - primitive types or objects?
  • Do we have to use an array? Why not use a collections object?

(Square-bracket indexing is nice to have.)

Single element case

The fastest case is using a sorted structure, such as a binary search tree, to constructor (or reconstruct) the object.

If we can build a search tree filled with references, the cost of constructing the tree is O(n), with each node being composed of a fixed number of pointers (to parent/child nodes).

Underlying data can be anything - a bunch of indexes in an array, or items in a queue, or pointers to locations in memory, etc., but must be accessible in O(1) time.

Now, to check for a duplicate:

Using arrays:

Sorted array, single element duplicate:

  • O(log n) - each element's index correlates to the element's value, so use a binary search to find the location where the correlation is off

Unsorted array, single element duplicate:

  • O(n) - to iterate through each element and check for membership in a set: if not a member then add it, otherwise it's a duplicate element.

To get a slightly more expensive answer but make it reusable,

  • O(n log n) - build a binary tree or other sorted data structure. Comoparable to cost of one-time sort. Heap sort is basically constructing a binary tree and doing a binary tree sort.
  • One-time cost, it is now sorted, and can do O(log n) searches. Like binary searches.

Add entries to a counts hashtable. O(1) lookups for each element in hash table. sum them up.

  • O(n) to add each element to a hashtable and get back a counter that is incremented.

In-place sort and binary search for duplicates:

  • O(n log n) - to sort the array, O(n log n), then costs O(log n) to find duplicates, sequentially, using binary search

Multi element case

In the multi-element duplicate case:

Code